Linux cryptographic layer development

Linux cryptographic layer development
 help / color / mirror / Atom feed

* Re: [PATCH 1/1] Crypto: [xp ]cbc: use 64bit regs on 64bit machines (rev. 2)
From: Herbert Xu @ 2007-06-22 23:40 UTC (permalink / raw)
  To: linux-crypto
In-Reply-To: <20070622224406.GA29941@Chamillionaire.breakpoint.cc>

On Sat, Jun 23, 2007 at 12:44:06AM +0200, Sebastian Siewior wrote:
>
> >OK this makes sense.  However you need to make sure that alignmask
> >is set appropriately (i.e., at least 8/16).
> 
> I don't thing I understand. Why do I have to change the alignmask for
> the xor operation? I guess you are talking about crypto_cbc_alloc()
> 
> |if (!(alg->cra_blocksize % 4))
> |        inst->alg.cra_alignmask |= 3;
> 
> don't you?

Yes.

> Since this (also) changes the alignment of in+out data I would prefer not
> to. The speed up you gain from less xors is probably less than what you
> spent on additional kmap()/memcpy()/kmalloc() in case the data is %4 but
> not %5.

Without it the data is not guaranteed to be aligned to 64 bits and
you'll get alignment traps on non-x86 architectures.

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] Check files' signatures before doing suid/sgid [2/4]
From: Alexander Wuerstlein @ 2007-06-24 22:58 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Alexander Wuerstlein, linux-kernel, Johannes Schlumberger,
	linux-crypto
In-Reply-To: <a781481a0706221236t3b399afdkf2e07cb109fba226@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3266 bytes --]

On 070622 21:40, Satyam Sharma <satyam.sharma@gmail.com> wrote:
> Hi Alexander, Johannes,
>
> But first: Have you checked the digsig project? It's been doing
> (for some time) what your current patchset proposes -- and
> it uses public key cryptosystems for the key management,
> which is decidedly better than using secret-keyed hashes
> (HMAC, XCBC). Also, digsig aims to protect executable
> binaries in general, and not just suid / sgid ones.

We have not heard about digsig before, thanks for pointing it out. After a
short look over the source (correct me if I'm wrong): The most important
difference between our project and digsig is that digsig relies on storing
signatures inside the ELF file structure. Therefore a handmade binary-loader or
just COFF binaries could be used to circumvent digsig. We decided against
altering the file itself for that and some other reasons.

The limitation to suid/sgid was only due to a limited amount of time we had for
implementing our patch. For the future we are planning further uses like
setting capabilities only for signed binaries.

> Second: Can we have some discussion on the security model /
> threat model / trust model / cryptographic key management
> scheme of your signing mechanism? [I had read through the
> [0/4] mail you had sent yesterday, but found no relevant
> discussion on these aspects there.]

Our scenario was as follows: Usually system administrators rely on cronjobs
checking their binaries for unwanted suid-bits. Because of the obvious problems
with this (time between cronjobs, performance) we wrote our patch to replace it.

An admin would verify the to-be-installed binaries (e.g. by reading the source,
checking the distribution's package signatures), sign them in a central
location.  He then distributes those signatures along with the installation
packages onto his computers. There should only be one key in use at a site the
public part of which is compiled into the kernel. Any kind of chain-of-trust
should be handled in userspace by signing or not signing with the site-wide
key depending on the earlier signatures in the chain.

So far for the initial idea. Perhaps it would be useful to have more than one
key or some more complex scheme for obtaining the keys and checking their
validity.  But as all of this would need to be part of the kernel we decided to
rather keep it as simple as possible, anything complex is better and more
flexibly done in userspace.

> From the patchset, it appears you use a *common* secret key
> for _all_ signed binaries, and it is set at kernel build-time itself:
> [...]
> Anyway, this is *totally* insecure and broken. Do you realize anybody
> who lays hands on the kernel image can now _trivially_ extract the
> should-have-been-a-secret key from it and use it to sign his own
> binaries?

We do realize that this is really really ugly, broken and nasty and nobody
would or should ever want to use it for anything but playing around as it is
atm. ;)

We only used HMAC because it was already available inside the kernel, for
implementing real asymetric cryptography there was simply no time. Of course
our next objective is to implement that. 

Ciao,

Alexander Wuerstlein.

[-- Attachment #2: Type: application/pgp-signature, Size: 185 bytes --]

^ permalink raw reply

* combined mode algorithms
From: Joy Latten @ 2007-06-25 22:13 UTC (permalink / raw)
  To: linux-crypto; +Cc: herbert

I have been reading IP Encapsulating Payload-(ESP) RFC4303 where use of
combined mode algorithms are mentioned and accommodated for. 
In trying to determine how I should handle this, I examined the
crypto code and could not readily recognize any combined mode
algorithms. Are there any current plans to implement combined mode
algorithms?  

Thanks!

Regards,
Joy

^ permalink raw reply

* Re: [PATCH] Check files' signatures before doing suid/sgid [2/4]
From: Satyam Sharma @ 2007-06-25 23:53 UTC (permalink / raw)
  To: Alexander Wuerstlein
  Cc: Alexander Wuerstlein, linux-kernel, Johannes Schlumberger,
	linux-crypto
In-Reply-To: <20070624225809.GI9741@cip.informatik.uni-erlangen.de>

On 6/25/07, Alexander Wuerstlein
<snalwuer@cip.informatik.uni-erlangen.de> wrote:
> On 070622 21:40, Satyam Sharma <satyam.sharma@gmail.com> wrote:
> > [...]
> > But first: Have you checked the digsig project? It's been doing
> > (for some time) what your current patchset proposes -- and
> > it uses public key cryptosystems for the key management,
> > which is decidedly better than using secret-keyed hashes
> > (HMAC, XCBC). Also, digsig aims to protect executable
> > binaries in general, and not just suid / sgid ones.
>
> We have not heard about digsig before, thanks for pointing it out. After a
> short look over the source (correct me if I'm wrong): The most important
> difference between our project and digsig is that digsig relies on storing
> signatures inside the ELF file structure. Therefore a handmade binary-loader or
> just COFF binaries could be used to circumvent digsig.

Yes, that's correct.

> We decided against
> altering the file itself for that and some other reasons.
> The limitation to suid/sgid was only due to a limited amount of time we had for
> implementing our patch. For the future we are planning further uses like
> setting capabilities only for signed binaries.

Ok, effectively what you have there is a signature on an entire file stored in
one of its extended attributes, so I suspect you could think of few other
applications for something like this too.

> > Second: Can we have some discussion on the security model /
> > threat model / trust model / cryptographic key management
> > scheme of your signing mechanism? [I had read through the
> > [0/4] mail you had sent yesterday, but found no relevant
> > discussion on these aspects there.]
> [...]
> An admin would verify the to-be-installed binaries (e.g. by reading the source,
> checking the distribution's package signatures), sign them in a central
> location.  He then distributes those signatures along with the installation
> packages onto his computers. There should only be one key in use at a site the
> public part of which is compiled into the kernel. Any kind of chain-of-trust
> should be handled in userspace by signing or not signing with the site-wide
> key depending on the earlier signatures in the chain.

Ok, so:

1. Admin is trusted. [ This need not mean the same as: "superuser
_account_ is trusted", but let's stay in the real world in for now. ]
2. Signing happens at some central, assumed-to-be-secure location (and say
the private key never leaves that central secure location). And let's say the
admin *repackages* the packages, this time such that the signed files get the
signature-carrying-extended-attributes with them, so the installation
automatically copies them correctly. => nothing wrong with this assumption.
3. Kernel verifies signatures at runtime. => kernel is trusted.
4. Public key needs to be *compiled into* the kernel ... so this is not getting
into mainline, but fair enough as something site administrators would patch in
and build.
5. Chain-of-trust handled in userspace. => userspace is trusted.

Let me know if I got the trust model / key management wrong.

> So far for the initial idea. Perhaps it would be useful to have more than one
> key or some more complex scheme for obtaining the keys and checking their
> validity.  But as all of this would need to be part of the kernel we decided to
> rather keep it as simple as possible, anything complex is better and more
> flexibly done in userspace.

Well, if you're trusting (privileged) userspace already, I'm suddenly
not so sure
as to what new is this patchset bringing to the table in the first
place ... could
you also describe the attack vectors / threats that you had in mind that get
blocked with the proposed scheme?

> > From the patchset, it appears you use a *common* secret key
> > for _all_ signed binaries, and it is set at kernel build-time itself:
> > [...]
> > Anyway, this is *totally* insecure and broken. Do you realize anybody
> > who lays hands on the kernel image can now _trivially_ extract the
> > should-have-been-a-secret key from it and use it to sign his own
> > binaries?
>
> We do realize that this is really really ugly, broken and nasty and nobody
> would or should ever want to use it for anything but playing around as it is
> atm. ;)
>
> We only used HMAC because it was already available inside the kernel, for
> implementing real asymetric cryptography there was simply no time. Of course
> our next objective is to implement that.

Have a look at modsign (signed kernel modules) project too (just the key
management part, specifically the asymmetric crypto and DSA implementation
that they've already ported to the kernel). You could also go through the lkml
archives for whenever that was proposed for inclusion in mainline ...

Satyam

^ permalink raw reply

* Re: [PATCH] Check files' signatures before doing suid/sgid [2/4]
From: Alexander Wuerstlein @ 2007-06-26  0:27 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Alexander Wuerstlein, linux-kernel, Johannes Schlumberger,
	linux-crypto
In-Reply-To: <a781481a0706251653wa2db7e4geb181f3827a10f9d@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4698 bytes --]

On 070626 01:56, Satyam Sharma <satyam.sharma@gmail.com> wrote:
> On 6/25/07, Alexander Wuerstlein
> <snalwuer@cip.informatik.uni-erlangen.de> wrote:
>> On 070622 21:40, Satyam Sharma <satyam.sharma@gmail.com> wrote:
>> > [...]
>> We decided against
>> altering the file itself for that and some other reasons.
>> The limitation to suid/sgid was only due to a limited amount of time we 
>> had for
>> implementing our patch. For the future we are planning further uses like
>> setting capabilities only for signed binaries.
>
> Ok, effectively what you have there is a signature on an entire file stored
> in one of its extended attributes, so I suspect you could think of few other
> applications for something like this too.

Yes, for example one could sign Java's classfiles and employ a special trusted
Java VM which checks the signatures before execution. Also, this is a more
general case of signing kernel modules (as you mentioned below). There are
really numerous applications one could imagine, we just don't really know which
ones are practical. We definitely appreciate further ideas on this.

Also the signature-in-ELF can be used complementary to our approach: for
example NFS is currently unable to handle real extended attributes (nfs does
only posix acls). So for binaries delivered over NFS our approach wouldn't
work.

> Ok, so:
>
> 1. Admin is trusted. [ This need not mean the same as: "superuser
> _account_ is trusted", but let's stay in the real world in for now. ]
> 2. Signing happens at some central, assumed-to-be-secure location (and say
> the private key never leaves that central secure location). And let's say the
> admin *repackages* the packages, this time such that the signed files get the
> signature-carrying-extended-attributes with them, so the installation
> automatically copies them correctly. => nothing wrong with this assumption.
> 3. Kernel verifies signatures at runtime. => kernel is trusted.
> 4. Public key needs to be *compiled into* the kernel ... so this is not 
> getting into mainline, but fair enough as something site administrators would
> patch in and build.

Correct up to here.

> 5. Chain-of-trust handled in userspace. => userspace is trusted.

Nope. I unluckily wrote 'userspace' where I should have said something else:
Chain-of-trust is handled in what I would label 'Adminspace' (Where we do the
signing as in points 1 and 2). There is a very small number of signatures (in
our example one) known to the kernel and only those are trusted, and those are
applied to the binaries by the administrator in your point 2. The kernel does
and should never rely on userspace to tell it which signatures are trustworthy.
Only the administrator may do so by means of the signatures directly compiled
into the kernel.

So in short: Chain-of-trust is handled by the administrator in his secure
central location.

>> So far for the initial idea. Perhaps it would be useful to have more than 
>> one
>> key or some more complex scheme for obtaining the keys and checking their
>> validity.  But as all of this would need to be part of the kernel we 
>> decided to
>> rather keep it as simple as possible, anything complex is better and more
>> flexibly done in userspace.
>
> Well, if you're trusting (privileged) userspace already, I'm suddenly not so
> sure as to what new is this patchset bringing to the table in the first place
> ...

We do not trust any userspace application, see above.

> could you also describe the attack vectors / threats that you had in mind
> that get blocked with the proposed scheme?

We focus on attacks where an attacker may alter some executable file, for
example by altering a mounted nfs-share, manipulating disk-content by simply
pulling a disk, mounting it and writing to it, etc.

This relies on the kernel beeing trustworthy of course, so one would need to
take special measures to protect the kernel-image from beeing similarly
altered. One (somewhat not-so-secure method) would be supplying kernel images
by PXE and forbidding local booting, another measure would be using a TPM
and an appropriate bootloader to check the kernel for unwanted modifications.

> Have a look at modsign (signed kernel modules) project too (just the key
> management part, specifically the asymmetric crypto and DSA implementation
> that they've already ported to the kernel). You could also go through the 
> lkml archives for whenever that was proposed for inclusion in mainline ...

We already thought about that. Using some existing code is definitely preferable
to inventing DSA again :)

Ciao,

Alexander Wuerstlein.

[-- Attachment #2: Type: application/pgp-signature, Size: 185 bytes --]

^ permalink raw reply

* Re: [PATCH] Check files' signatures before doing suid/sgid [2/4]
From: Satyam Sharma @ 2007-06-26  2:13 UTC (permalink / raw)
  To: Alexander Wuerstlein
  Cc: Alexander Wuerstlein, linux-kernel, Johannes Schlumberger,
	linux-crypto
In-Reply-To: <20070626002659.GM9741@cip.informatik.uni-erlangen.de>

On 6/26/07, Alexander Wuerstlein
<snalwuer@cip.informatik.uni-erlangen.de> wrote:
> [...]
> Nope. I unluckily wrote 'userspace' where I should have said something else:
> Chain-of-trust is handled in what I would label 'Adminspace' (Where we do the
> signing as in points 1 and 2). There is a very small number of signatures (in
> our example one) known to the kernel and only those are trusted, and those are
> applied to the binaries by the administrator in your point 2. The kernel does
> and should never rely on userspace to tell it which signatures are trustworthy.
> Only the administrator may do so by means of the signatures directly compiled
> into the kernel.
>
> So in short: Chain-of-trust is handled by the administrator in his secure
> central location.

Ok, so the "trust chain" you're talking about is simply the decision of the
admin to compile-in the (verified and trusted) public keys of known trusted
entities into the kernel at build time. That is not really scalable, but I guess
you might just as well impose such a restriction for sake of simplicity.

[ I initially thought a scenario where a given binary is signed by an
entity whose
corresponding public key is _not_ present in the kernel, but who does possess
a signature -- over its name, id and public key -- by another entity whose
corresponding public key _is_ built into the kernel). Then at the time of
verification there's really no other alternative to *build* the entire
chain at the
_point of verification_ (in-kernel) itself ... but this obviously
introduces huge and
ugly complexities that you'd have a hard time bringing into the kernel :-) That
"signature over name, id and public key" could be a _certificate_ (if you care
about following standards), and building their chains in-kernel ... well. But if
you really want to differentiate between kernel and userspace from security
perspective, and want to give such functionality, I don't see any easy
way out. ]

> >> So far for the initial idea. Perhaps it would be useful to have more than
> >> one
> >> key or some more complex scheme for obtaining the keys and checking their
> >> validity.  But as all of this would need to be part of the kernel we
> >> decided to
> >> rather keep it as simple as possible, anything complex is better and more
> >> flexibly done in userspace.
> >
> > Well, if you're trusting (privileged) userspace already, I'm suddenly not so
> > sure as to what new is this patchset bringing to the table in the first place
> > ...
>
> We do not trust any userspace application, see above.
>
> > could you also describe the attack vectors / threats that you had in mind
> > that get blocked with the proposed scheme?
>
> We focus on attacks where an attacker may alter some executable file, for
> example by altering a mounted nfs-share, manipulating disk-content by simply
> pulling a disk, mounting it and writing to it, etc.
>
> This relies on the kernel beeing trustworthy of course, so one would need to
> take special measures to protect the kernel-image from beeing similarly
> altered. One (somewhat not-so-secure method) would be supplying kernel images
> by PXE and forbidding local booting, another measure would be using a TPM
> and an appropriate bootloader to check the kernel for unwanted modifications.

Kernel-userspace differentiation from security perspective is always tricky
(so this is why I pointed you to the discussions whenever such stuff, such
as asymmetric crypto and modsign etc are proposed to be merged). It's
definitely not impossible to compromise a _running_ kernel from privileged
userspace, if it really wanted to do so ...

Satyam

^ permalink raw reply

* Re: combined mode algorithms
From: Evgeniy Polyakov @ 2007-06-26  9:09 UTC (permalink / raw)
  To: Joy Latten; +Cc: linux-crypto, herbert
In-Reply-To: <1182809638.15699.221.camel@faith.austin.ibm.com>

On Mon, Jun 25, 2007 at 05:13:58PM -0500, Joy Latten (latten@austin.ibm.com) wrote:
> I have been reading IP Encapsulating Payload-(ESP) RFC4303 where use of
> combined mode algorithms are mentioned and accommodated for. 
> In trying to determine how I should handle this, I examined the
> crypto code and could not readily recognize any combined mode
> algorithms. Are there any current plans to implement combined mode
> algorithms?  

I think it should be first supported by ipsec stack at least with state,
where SA cold be configured, integrity check for the data/header is not 
a problem after that changes are stable. sha1/encryption is a poor man's
combined algo after all with hash data being ICV :)

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: combined mode algorithms
From: Joy Latten @ 2007-06-26 15:02 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: herbert, linux-crypto
In-Reply-To: <20070626090956.GA5833@2ka.mipt.ru>

On Tue, 2007-06-26 at 13:09 +0400, Evgeniy Polyakov wrote:
> On Mon, Jun 25, 2007 at 05:13:58PM -0500, Joy Latten (latten@austin.ibm.com) wrote:
> > I have been reading IP Encapsulating Payload-(ESP) RFC4303 where use of
> > combined mode algorithms are mentioned and accommodated for. 
> > In trying to determine how I should handle this, I examined the
> > crypto code and could not readily recognize any combined mode
> > algorithms. Are there any current plans to implement combined mode
> > algorithms?  
> 
> I think it should be first supported by ipsec stack at least with state,
> where SA cold be configured, integrity check for the data/header is not 
> a problem after that changes are stable. sha1/encryption is a poor man's
> combined algo after all with hash data being ICV :)
> 

Ok, thanks. This helps. I can code up the infrastructure for this. I 
am thinking I will eventually need one of the algorithms to test and
complete it though. RFCs 4309 and 4106 specify ESP working with AES-CCM
and AES-GCM.

Regards,
Joy

^ permalink raw reply

* [RFC 0/2] AES ablkcipher driver for SPUs
From: Sebastian Siewior @ 2007-06-26 22:59 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

Hello Herbert,

This driver adds support for AES on SPU. Patch is for review only because
some parts of the code are not upstream yet.
Patch one contains the main driver (which uses ablkcipher_ctx_cast()), 
patch two is for clarity (parts of the missing API that is used).

Currently only ECB block mode is supported. I plan support for CBC but the
way the IV currently handled is unfavorable (later more).

aes_spu_wrap.c and kspu_helper.c run in the kernel, spu_main.c will run on
a SPU (my hardware for computing :)). SPU can access kernel memory (even
virtual) via asynchronous DMA transfers.
All requests from the crypto user end up in a linked list which is managed
by the kspu module (even no crypto requests will end up there as well but
currently the AES driver is the only user). AES callback function
(aes_queue_work_items()) is called to queue the request in a ring buffer 
which is located on the hardware. Once some requests are enqueued the SPU
is started.
The SPU requests the first couple of blocks via DMA (init_get_data()).
This request may not get satisfied immediately, the command does not
block. Once all requests (DMA_BUFFERS num) are fired up, the SPU waits
for the first buffer to complete and starts processing (via spu_funcs()).
Ideally there are always transfers in the background (copy new data from
main storage to SPU and copy processed data from SPU to main storage)
while the SPU is processing a block of data.
This is where my problems with the IV are starting. Currently I have to
request the IV from main storage, wait for it, than I can use it and once
I processed the block, I must write it back.
What about a different handling of the IV with two functions like
ablk_set_iv()
ablk_get_iv()
With something like this, I could store the IV in the SPU (in my key 
struct for instance) and don't have to transfer it on every request
(similar to what I do now with the key). I don't know if there are any
crypto user that have multiple IVs/key but in such a case, I could cache
IVs like I cache keys now. Any comments on that?

Sebastian
-- 

^ permalink raw reply

* [RFC 1/2] SPU-AES support (kernel side)
From: Sebastian Siewior @ 2007-06-26 23:00 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

[-- Attachment #1: aes-spu-async2.diff --]
[-- Type: text/plain, Size: 17882 bytes --]

This patch implements the AES cipher algorithm which is executed on the
SPU using the crypto async interface. Currently only the ECB mode is
implemented. The AES code that is executed on the SPU has been left
apart (it is not exciting anyway).

Signed-off-by: Sebastian Siewior <bigeasy@breakpoint.cc>
--- a/arch/powerpc/platforms/cell/Makefile
+++ b/arch/powerpc/platforms/cell/Makefile
@@ -22,4 +22,5 @@ obj-$(CONFIG_SPU_BASE)			+= spu_callback
 					   $(spufs-modular-m) \
 					   $(spu-priv1-y) \
 					   $(spu-manage-y) \
+					   crypto/ \
 					   spufs/
--- /dev/null
+++ b/arch/powerpc/platforms/cell/crypto/Kconfig
@@ -0,0 +1,12 @@
+config CRYPTO_AES_SPU
+	tristate "AES cipher algorithm (SPU support)"
+	select CRYPTO_ABLKCIPHER
+	depends on SPU_KERNEL_SUPPORT
+	default m
+	help
+	  AES cipher algorithms (FIPS-197). AES uses the Rijndael
+	  algorithm.
+	  The AES specifies three key sizes: 128, 192 and 256 bits.
+	  See <http://csrc.nist.gov/CryptoToolkit/aes/> for more information.
+
+	  This version of AES performs its work on a SPU core.
--- /dev/null
+++ b/arch/powerpc/platforms/cell/crypto/aes_spu_wrap.c
@@ -0,0 +1,479 @@
+/*
+ * AES interface module for the async crypto API.
+ *
+ * Author: Sebastian Siewior <bigeasy@breakpoint.cc>
+ * License: GPLv2
+ */
+
+#include <asm/byteorder.h>
+#include <asm/system.h>
+#include <asm/kspu/kspu.h>
+#include <asm/kspu/merged_code.h>
+#include <crypto/algapi.h>
+#include <linux/module.h>
+#include <linux/crypto.h>
+#include <linux/mutex.h>
+#include <linux/err.h>
+#include <linux/list.h>
+#include <linux/delay.h>
+#include <linux/spinlock.h>
+#include <linux/mm.h>
+#include <linux/scatterlist.h>
+#include <linux/highmem.h>
+#include <linux/vmalloc.h>
+
+#include "aes_vmx_addon.h"
+
+struct map_key_spu {
+	struct list_head list;
+	unsigned int spu_slot;
+	struct aes_ctx *slot_content;
+};
+
+struct aes_ctx {
+	/* the key used for enc|dec purpose */
+	struct aes_key_struct key;
+	/* identify the slot on the SPU */
+	struct map_key_spu *key_mapping;
+	/* identify the SPU that is used */
+	struct async_aes *spe_ctx;
+};
+
+struct async_d_request {
+	enum SPU_FUNCTIONS crypto_operation;
+	 /*
+	  * If src|dst or iv is not properly aligned, we keep here a copy of
+	  * it that is properly aligned.
+	  */
+	struct kspu_work_item kspu_work;
+	unsigned char *al_data;
+/*	unsigned char *aligned_iv; */
+	unsigned char *mapped_src;
+	unsigned char *mapped_dst;
+	unsigned char *real_src;
+	unsigned char *real_dst;
+	unsigned int progress;
+};
+
+struct async_aes {
+	struct kspu_context *ctx;
+	struct map_key_spu mapping_key_spu[SPU_KEY_SLOTS];
+	struct list_head key_ring;
+};
+
+static struct async_aes async_spu;
+
+#define AES_MIN_KEY_SIZE	16
+#define AES_MAX_KEY_SIZE	32
+#define AES_BLOCK_SIZE		16
+#define ALIGN_MASK 15
+#define MAX_TRANSFER_SIZE	(16 * 1024)
+
+static void cleanup_requests(struct ablkcipher_request *req,
+		struct async_d_request *a_d_ctx)
+{
+	char *dst_addr;
+	char *aligned_addr;
+
+	if (a_d_ctx->al_data) {
+		aligned_addr = (char *) ALIGN((unsigned long)
+				a_d_ctx->al_data, ALIGN_MASK+1);
+		dst_addr = a_d_ctx->mapped_dst + req->dst->offset;
+
+		if ((unsigned long) dst_addr & ALIGN_MASK) {
+			memcpy(dst_addr, aligned_addr, req->nbytes);
+		}
+		vfree(a_d_ctx->al_data);
+		kunmap(a_d_ctx->mapped_dst);
+		kunmap(a_d_ctx->mapped_src);
+	}
+#if 0
+	if (a_d_ctx->aligned_iv) {
+		memcpy(req->info, a_d_ctx->aligned_iv, MAX_TRANSFER_SIZE);
+		kfree(a_d_ctx->aligned_iv);
+	}
+#endif
+}
+
+static void aes_finish_callback(struct kspu_work_item *kspu_work)
+{
+	struct async_d_request *a_d_ctx = container_of(kspu_work,
+			struct async_d_request, kspu_work);
+	struct ablkcipher_request *ablk_req = ablkcipher_ctx_cast(a_d_ctx);
+
+	a_d_ctx = ablkcipher_request_ctx(ablk_req);
+	cleanup_requests(ablk_req, a_d_ctx);
+
+	pr_debug("Request %p done, memory cleaned. Now calling crypto user\n",
+			kspu_work);
+	local_bh_disable();
+	ablk_req->base.complete(&ablk_req->base, 0);
+	local_bh_enable();
+	return;
+}
+
+static void update_key_on_spu(struct aes_ctx *aes_ctx)
+{
+	struct list_head *tail;
+	struct map_key_spu *entry;
+	struct aes_update_key *aes_update_key;
+	struct kspu_job *work_item;
+
+	tail = async_spu.key_ring.prev;
+	entry = list_entry(tail, struct map_key_spu, list);
+	list_move(tail, &async_spu.key_ring);
+
+	entry->slot_content = aes_ctx;
+	aes_ctx->key_mapping = entry;
+
+	pr_debug("key for %p is not on the SPU. new slot: %d\n",
+			aes_ctx, entry->spu_slot);
+	work_item = kspu_get_rb_slot(aes_ctx->spe_ctx->ctx);
+	work_item->operation = SPU_FUNC_aes_update_key;
+	work_item->in = (unsigned long long) &aes_ctx->key;
+	work_item->in_size = sizeof(aes_ctx->key);
+
+	aes_update_key = &work_item->aes_update_key;
+	aes_update_key->keyid = entry->spu_slot;
+
+	kspu_mark_rb_slot_ready(aes_ctx->spe_ctx->ctx, NULL);
+}
+
+static int prepare_request_mem(struct ablkcipher_request *req,
+		struct async_d_request *a_d_ctx, struct aes_ctx *aes_ctx)
+{
+	char *src_addr, *dst_addr;
+	char *aligned_addr;
+
+	a_d_ctx->mapped_src = kmap(req->src->page);
+	if (!a_d_ctx->mapped_src)
+		goto err;
+
+	a_d_ctx->mapped_dst = kmap(req->dst->page);
+	if (!a_d_ctx->mapped_dst) {
+		goto err_src;
+	}
+
+	src_addr = a_d_ctx->mapped_src + req->src->offset;
+	dst_addr = a_d_ctx->mapped_dst + req->dst->offset;
+
+	if ((unsigned long) src_addr & ALIGN_MASK ||
+			(unsigned long) dst_addr & ALIGN_MASK) {
+		/*
+		 * vmalloc() is somewhat slower than __get_free_page().
+		 * However, this is the slowpath. I expect the user to align
+		 * properly in first place :).
+		 * The reason for vmalloc() is that req->nbytes may be larger
+		 * than one page and I don't want distinguish later where that
+		 * memory come from.
+		 */
+		a_d_ctx->al_data = (char *) vmalloc(req->nbytes + ALIGN_MASK);
+		if (!a_d_ctx->al_data) {
+			goto err_dst;
+		}
+
+		aligned_addr = (char *) ALIGN((unsigned long)a_d_ctx->
+				al_data, ALIGN_MASK+1);
+		pr_debug("Unaligned data replaced with %p (%p)\n",
+				a_d_ctx->al_data, aligned_addr);
+
+		if ((unsigned long) src_addr & ALIGN_MASK) {
+			memcpy(aligned_addr, src_addr, req->nbytes);
+			a_d_ctx->real_src = aligned_addr;
+		}
+
+		if ((unsigned long) dst_addr & ALIGN_MASK) {
+			a_d_ctx->real_dst = aligned_addr;
+		}
+	} else {
+		a_d_ctx->al_data = NULL;
+		a_d_ctx->real_src = src_addr;
+		a_d_ctx->real_dst = dst_addr;
+	}
+#if 0
+	pr_debug("aligned_IV: %p\n", a_d_ctx->aligned_iv);
+
+	if ((unsigned long) req->info & ALIGN_MASK)
+		a_d_ctx->aligned_iv = NULL;
+	else
+		a_d_ctx->aligned_iv = NULL;
+#endif
+	return 0;
+err_dst:
+	kunmap(a_d_ctx->mapped_dst);
+err_src:
+	kunmap(a_d_ctx->mapped_src);
+err:
+	return -ENOMEM;
+
+}
+
+/*
+ * aes_queue_work_items() is called by kspu to queue the work item on the SPU.
+ * kspu ensures atleast one slot when calling. The function may return 0 if
+ * more slots were required but not available. In this case, kspu will call
+ * again with the same work item. The function has to notice that this work
+ * item has been allready started and continue.
+ * Other return values (!=0) will remove the work item from list.
+ */
+static int aes_queue_work_items(struct kspu_work_item *kspu_work)
+{
+	struct async_d_request *a_d_ctx = container_of(kspu_work,
+			struct async_d_request, kspu_work);
+	struct ablkcipher_request *ablk_req = ablkcipher_ctx_cast(a_d_ctx);
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(ablk_req);
+	struct aes_ctx *aes_ctx = crypto_ablkcipher_ctx(tfm);
+	struct kspu_job *work_item;
+	struct aes_crypt *aes_crypt;
+	int size_left, ret;
+
+	BUG_ON(ablk_req->nbytes & (AES_BLOCK_SIZE-1));
+
+	if (!a_d_ctx->progress) {
+		if (!aes_ctx->key_mapping || aes_ctx !=
+				aes_ctx->key_mapping->slot_content)
+			update_key_on_spu(aes_ctx);
+
+		else
+			list_move(&aes_ctx->key_mapping->list,
+					&async_spu.key_ring);
+
+		ret = prepare_request_mem(ablk_req, a_d_ctx, aes_ctx);
+		if (ret)
+			return 0;
+	}
+
+	do {
+		size_left = ablk_req->nbytes - a_d_ctx->progress;
+
+		if (!size_left) {
+			a_d_ctx->kspu_work.notify = aes_finish_callback;
+			return 1;
+		}
+
+		work_item = kspu_get_rb_slot(aes_ctx->spe_ctx->ctx);
+		if (!work_item)
+			return 0;
+
+		aes_crypt = &work_item->aes_crypt;
+		work_item->operation = a_d_ctx->crypto_operation;
+		work_item->in = (unsigned long int) a_d_ctx->real_src +
+			a_d_ctx->progress;
+		aes_crypt->out = (unsigned long int) a_d_ctx->real_dst +
+			a_d_ctx->progress;
+
+		if (size_left > MAX_TRANSFER_SIZE) {
+			a_d_ctx->progress += MAX_TRANSFER_SIZE;
+			work_item->in_size = MAX_TRANSFER_SIZE;
+		} else {
+			a_d_ctx->progress += size_left;
+			work_item->in_size = size_left;
+		}
+
+		aes_crypt->iv = 0; /* XXX */
+		aes_crypt->keyid = aes_ctx->key_mapping->spu_slot;
+
+		pr_debug("in: %p, out %p, data_size: %u\n",
+				(void *) work_item->in,
+				(void *) aes_crypt->out,
+				work_item->in_size);
+		pr_debug("iv: %p, key slot: %d\n", (void *) aes_crypt->iv,
+				aes_crypt->keyid);
+
+		kspu_mark_rb_slot_ready(aes_ctx->spe_ctx->ctx,
+				a_d_ctx->progress == ablk_req->nbytes ?
+				kspu_work : NULL);
+	} while (1);
+}
+
+static int enqueue_request(struct ablkcipher_request *req,
+		enum SPU_FUNCTIONS op_type)
+{
+	struct async_d_request *asy_d_ctx = ablkcipher_request_ctx(req);
+	struct crypto_ablkcipher *tfm = crypto_ablkcipher_reqtfm(req);
+	struct aes_ctx *ctx = crypto_ablkcipher_ctx(tfm);
+	struct kspu_work_item *work = &asy_d_ctx->kspu_work;
+
+	asy_d_ctx->crypto_operation = op_type;
+	asy_d_ctx->progress = 0;
+	work->enqueue = aes_queue_work_items;
+
+	kspu_enqueue_work_item(ctx->spe_ctx->ctx, &asy_d_ctx->kspu_work);
+	return -EINPROGRESS;
+}
+
+/*
+ * AltiVec and not SPU code is because the key may disappear after calling
+ * this func (for example if it is not properly aligned)
+ */
+static int aes_set_key_async(struct crypto_ablkcipher *parent,
+		const u8 *key, unsigned int keylen)
+{
+	struct aes_ctx *ctx = crypto_ablkcipher_ctx(parent);
+	int ret;
+
+	ctx->spe_ctx = &async_spu;
+	ctx->key.len = keylen / 4;
+	ctx->key_mapping = NULL;
+
+	preempt_disable();
+	enable_kernel_altivec();
+	ret = expand_key(key, keylen / 4, &ctx->key.enc[0], &ctx->key.dec[0]);
+	preempt_enable();
+
+	if (ret == -EINVAL)
+		crypto_ablkcipher_set_flags(parent, CRYPTO_TFM_RES_BAD_KEY_LEN);
+
+	return ret;
+}
+
+static int aes_encrypt_ecb_async(struct ablkcipher_request *req)
+{
+
+	req->info = NULL;
+	return enqueue_request(req, SPU_FUNC_aes_encrypt_ecb);
+}
+
+static int aes_decrypt_ecb_async(struct ablkcipher_request *req)
+{
+
+	req->info = NULL;
+	return enqueue_request(req, SPU_FUNC_aes_decrypt_ecb);
+}
+#if 0
+static int aes_encrypt_cbc_async(struct ablkcipher_request *req)
+{
+	return enqueue_request(req, SPU_FUNC_aes_encrypt_cbc);
+}
+
+static int aes_decrypt_cbc_async(struct ablkcipher_request *req)
+{
+	return enqueue_request(req, SPU_FUNC_aes_decrypt_cbc);
+}
+#endif
+static int async_d_init(struct crypto_tfm *tfm)
+{
+	tfm->crt_ablkcipher.reqsize = sizeof(struct async_d_request);
+	return 0;
+}
+
+static struct crypto_alg aes_ecb_alg_async = {
+	.cra_name		= "ecb(aes)",
+	.cra_driver_name	= "ecb-aes-spu-async",
+	.cra_priority		= 125,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_ASYNC,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_alignmask		= 15,
+	.cra_ctxsize		= sizeof(struct aes_ctx),
+	.cra_type		= &crypto_ablkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(aes_ecb_alg_async.cra_list),
+	.cra_init		= async_d_init,
+	.cra_u	= {
+		.ablkcipher = {
+			.min_keysize	= AES_MIN_KEY_SIZE,
+			.max_keysize	= AES_MAX_KEY_SIZE,
+			.ivsize		= 0,
+			.setkey		= aes_set_key_async,
+			.encrypt	= aes_encrypt_ecb_async,
+			.decrypt	= aes_decrypt_ecb_async,
+		}
+	}
+};
+#if 0
+static struct crypto_alg aes_cbc_alg_async = {
+	.cra_name		= "cbc(aes)",
+	.cra_driver_name	= "cbc-aes-spu-async",
+	.cra_priority		= 125,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER | CRYPTO_ALG_ASYNC,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_alignmask		= 15,
+	.cra_ctxsize		= sizeof(struct aes_ctx),
+	.cra_type		= &crypto_ablkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(aes_cbc_alg_async.cra_list),
+	.cra_init		= async_d_init,
+	.cra_u	= {
+		.ablkcipher = {
+			.min_keysize	= AES_MIN_KEY_SIZE,
+			.max_keysize	= AES_MAX_KEY_SIZE,
+			.ivsize		= AES_BLOCK_SIZE,
+			.setkey		= aes_set_key_async,
+			.encrypt	= aes_encrypt_cbc_async,
+			.decrypt	= aes_decrypt_cbc_async,
+		}
+	}
+};
+#endif
+
+static void init_spu_key_mapping(struct async_aes *spe_ctx)
+{
+	unsigned int i;
+
+	INIT_LIST_HEAD(&spe_ctx->key_ring);
+
+	for (i = 0; i < SPU_KEY_SLOTS; i++) {
+		list_add_tail(&spe_ctx->mapping_key_spu[i].list,
+				&spe_ctx->key_ring);
+		spe_ctx->mapping_key_spu[i].spu_slot = i;
+	}
+}
+
+static int init_async_ctx(struct async_aes *spe_ctx)
+{
+	int ret;
+
+	spe_ctx->ctx = kspu_get_kctx();
+	init_spu_key_mapping(spe_ctx);
+
+	ret = crypto_register_alg(&aes_ecb_alg_async);
+	if (ret) {
+		printk(KERN_ERR "crypto_register_alg(ecb) failed: %d\n", ret);
+		goto err_kthread;
+	}
+#if 0
+	ret = crypto_register_alg(&aes_cbc_alg_async);
+	if (ret) {
+		printk(KERN_ERR "crypto_register_alg(cbc) failed: %d\n", ret);
+		goto fail_cbc;
+	}
+#endif
+	return 0;
+#if 0
+fail_cbc:
+	crypto_unregister_alg(&aes_ecb_alg_async);
+#endif
+err_kthread:
+	return ret;
+}
+
+static void deinit_async_ctx(struct async_aes *async_aes)
+{
+
+	crypto_unregister_alg(&aes_ecb_alg_async);
+/*	crypto_unregister_alg(&aes_cbc_alg_async); */
+}
+
+static int __init aes_init(void)
+{
+	unsigned int ret;
+
+	ret = init_async_ctx(&async_spu);
+	if (ret) {
+		printk(KERN_ERR "async_api_init() failed\n");
+		return ret;
+	}
+	return 0;
+}
+
+static void __exit aes_fini(void)
+{
+	deinit_async_ctx(&async_spu);
+}
+
+module_init(aes_init);
+module_exit(aes_fini);
+
+MODULE_DESCRIPTION("AES Cipher Algorithm with SPU support");
+MODULE_AUTHOR("Sebastian Siewior <bigeasy@breakpoint.cc>");
+MODULE_LICENSE("GPL");
--- a/arch/powerpc/platforms/cell/spufs/spu_main.c
+++ b/arch/powerpc/platforms/cell/spufs/spu_main.c
@@ -13,6 +13,14 @@
 
 spu_operation spu_funcs[TOTAL_SPU_FUNCS] __attribute__((aligned(16))) = {
 	[SPU_FUNC_nop] = spu_nop,
+	[SPU_FUNC_aes_setkey] = spu_aes_setkey,
+	[SPU_FUNC_aes_update_key] = spu_aes_update_key,
+	[SPU_FUNC_aes_encrypt_ecb] = spu_aes_encrypt_ecb,
+	[SPU_FUNC_aes_decrypt_ecb] = spu_aes_decrypt_ecb,
+#if 0
+	[SPU_FUNC_aes_encrypt_cbc] = spu_aes_encrypt_cbc,
+	[SPU_FUNC_aes_decrypt_cbc] = spu_aes_decrypt_cbc,
+#endif
 };
 
 struct kspu_buffers kspu_buff[DMA_BUFFERS];
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -78,4 +78,5 @@ config ZCRYPT_MONOLITHIC
 	  that contains all parts of the crypto device driver (ap bus,
 	  request router and all the card drivers).
 
+source "arch/powerpc/platforms/cell/crypto/Kconfig"
 endmenu
--- /dev/null
+++ b/include/asm-powerpc/kspu/aes.h
@@ -0,0 +1,49 @@
+#ifndef  __SPU_AES_H__
+#define  __SPU_AES_H__
+
+#define MAX_AES_ROUNDS 15
+#define MAX_AES_KEYSIZE_INT (MAX_AES_ROUNDS *4)
+#define MAX_AES_KEYSIZE_BYTE (MAX_AES_KEYSIZE_INT *4)
+#define SPU_KEY_SLOTS 5
+
+struct aes_key_struct {
+	unsigned char enc[MAX_AES_KEYSIZE_BYTE] __attribute__((aligned(16)));
+	unsigned char dec[MAX_AES_KEYSIZE_BYTE] __attribute__((aligned(16)));
+	unsigned int len __attribute__((aligned(16)));
+};
+
+struct aes_set_key {
+	/* in */
+	unsigned long long plain __attribute__((aligned(16)));
+	unsigned int len __attribute__((aligned(16)));
+	unsigned int keyid __attribute__((aligned(16)));
+
+	/* out */
+	unsigned long long keys __attribute__((aligned(16)));
+};
+
+struct aes_update_key {
+	/* copy key from ea to ls into a specific slot */
+	unsigned int keyid __attribute__((aligned(16)));
+};
+
+struct aes_crypt {
+	/* in */
+	unsigned int keyid __attribute__((aligned(16)));
+
+	/* out */
+	unsigned long long iv __attribute__((aligned(16))); /* as well as in */
+	unsigned long long out __attribute__((aligned(16)));
+};
+
+/* exported calls */
+#if 0
+int spu_aes_encrypt_cbc(union possible_arguments *pa);
+int spu_aes_decrypt_cbc(union possible_arguments *pa);
+#endif
+
+int spu_aes_setkey(unsigned int cur, unsigned int cur_buf);
+int spu_aes_update_key(unsigned int cur, unsigned int cur_buf);
+int spu_aes_encrypt_ecb(unsigned int cur, unsigned int cur_buf);
+int spu_aes_decrypt_ecb(unsigned int cur, unsigned int cur_buf);
+#endif
--- a/include/asm-powerpc/kspu/merged_code.h
+++ b/include/asm-powerpc/kspu/merged_code.h
@@ -1,6 +1,7 @@
 #ifndef KSPU_MERGED_CODE_H
 #define KSPU_MERGED_CODE_H
 #include <linux/autoconf.h>
+#include <asm/kspu/aes.h>
 
 #define KSPU_LS_SIZE 0x40000
 
@@ -10,18 +11,30 @@
 #define DMA_BUFF_MASK (DMA_BUFFERS-1)
 #define ALL_DMA_BUFFS ((1 << DMA_BUFFERS)-1)
 
-typedef int (*spu_operation)(unsigned int cur);
+#define RB_MASK (RB_SLOTS-1)
+
+typedef int (*spu_operation)(unsigned int cur_job, unsigned int cur_buf);
 
 enum SPU_FUNCTIONS {
+	SPU_FUNC_nop,
+	SPU_FUNC_aes_setkey,
+	SPU_FUNC_aes_update_key,
+	SPU_FUNC_aes_encrypt_ecb,
+	SPU_FUNC_aes_decrypt_ecb,
+	SPU_FUNC_aes_encrypt_cbc,
+	SPU_FUNC_aes_decrypt_cbc,
 
 	TOTAL_SPU_FUNCS,
 };
 
-struct kspu_job {
+struct kspu_job {
 	enum SPU_FUNCTIONS operation __attribute__((aligned(16)));
 	unsigned long long in __attribute__((aligned(16)));
 	unsigned int in_size __attribute__((aligned(16)));
 	union {
+		struct aes_set_key aes_set_key;
+		struct aes_update_key aes_update_key;
+		struct aes_crypt aes_crypt;
 	} __attribute__((aligned(16)));
 };
 
@@ -32,7 +45,7 @@ struct kspu_ring_data {
 
 struct kernel_spu_data {
 	struct kspu_ring_data kspu_ring_data __attribute__((aligned(16)));
-	struct kspu_job work_item[RB_SLOTS] __attribute__((aligned(16)));
+	struct kspu_job work_item[RB_SLOTS] __attribute__((aligned(16)));
 };
 
 #define KERNEL_SPU_DATA_OFFSET (KSPU_LS_SIZE - sizeof(struct kernel_spu_data))

-- 

^ permalink raw reply

* [RFC 2/2] add kernel support for spu task
From: Sebastian Siewior @ 2007-06-26 23:00 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto

[-- Attachment #1: spufs-add_kernel_spu_support.diff --]
[-- Type: text/plain, Size: 20072 bytes --]

Utilisation of SPUs by the kernel, main implementation. 
The idea behind this implementation is that there are single jobs that are
executed asynchronous on the SPU. The user queues jobs with
enqueue_for_spu() and gets a callback once the job is completed. The
function itself does not block. The job will be queued in a linked list 
(protected by a spinlock, calls from softirq context are possible) and the 
kthread that handles the SPU will be woken up.
The SPU thread takes the first element from the list, and calls the
enqueue function supplied by the user. The user has now the chance to fill
the ring buffer entry and set a callback for notification which will be
called once the SPU code accomplished the task. The queue function has
to ensure alignment & valid transfer size.

Signed-off-by: Sebastian Siewior <bigeasy@breakpoint.cc>
--- /dev/null
+++ b/arch/powerpc/platforms/cell/spufs/kspu_helper.c
@@ -0,0 +1,518 @@
+/*
+ * Interface for accessing SPUs from the kernel.
+ *
+ * Author: Sebastian Siewior <bigeasy@breakpoint.cc>
+ * License: GPLv2
+ */
+
+#include <asm/spu_priv1.h>
+#include <asm/kspu/kspu.h>
+#include <asm/kspu/merged_code.h>
+#include <linux/kthread.h>
+#include <linux/module.h>
+#include <linux/init_task.h>
+#include <linux/hardirq.h>
+#include <linux/kernel.h>
+#include "spufs.h"
+#include "kspu_util.h"
+
+static int free_kspu_context(struct kspu_context *kctx)
+{
+	struct spu_context *spu_ctx = kctx->spu_ctx;
+	int ret;
+
+	if (spu_ctx->owner)
+		spu_forget(spu_ctx);
+	
+	put_spu_context(spu_ctx);
+
+
+	kfree(kctx->notify_cb_info);
+	kfree(kctx);
+
+	return ret;
+}
+
+static void setup_stack(struct kspu_context *kctx)
+{
+	struct spu_context *ctx = kctx->spu_ctx;
+	u8 *ls;
+	u32 *u32p;
+
+	spu_acquire_saved(ctx);
+	ls = ctx->ops->get_ls(ctx);
+
+#define BACKCHAIN (kctx->spu_code->kspu_data_offset - 16)
+#define STACK_GAP 176
+#define INITIAL_STACK (BACKCHAIN - STACK_GAP)
+
+	BUG_ON(INITIAL_STACK > KSPU_LS_SIZE);
+
+	u32p = (u32 *) &ls[BACKCHAIN];
+	u32p[0] = 0;
+	u32p[1] = 0;
+	u32p[2] = 0;
+	u32p[3] = 0;
+
+	u32p = (u32 *) &ls[INITIAL_STACK];
+	u32p[0] = BACKCHAIN;
+	u32p[1] = 0;
+	u32p[2] = 0;
+	u32p[3] = 0;
+
+	ctx->csa.lscsa->gprs[1].slot[0] = INITIAL_STACK;
+	spu_release(ctx);
+	pr_debug("SPU's stack ready 0x%04x\n", INITIAL_STACK);
+}
+
+static struct kspu_context *kcreate_spu_context(int flags,
+		struct kspu_code *spu_code)
+{
+	struct kspu_context *kctx;
+	struct spu_context *ctx;
+	unsigned int ret;
+	u8 *ls;
+
+	flags |= SPU_CREATE_EVENTS_ENABLED;
+	ret = -EINVAL;
+
+	if (flags & (~SPU_CREATE_FLAG_ALL))
+		goto err;
+	/*
+	 * it must be a multiple of 16 because this value is used to calculate
+	 * the initial stack frame which must be 16byte aligned
+	 */
+	if (spu_code->kspu_data_offset & 15)
+		goto err;
+
+	pr_debug("SPU's queue: %d elemets, %d bytes each (%d bytes total)\n",
+			spu_code->queue_mask+1, spu_code->queue_entr_size,
+			(spu_code->queue_mask+1) * spu_code->queue_entr_size);
+
+	ret = -EFBIG;
+	if (spu_code->code_len > KSPU_LS_SIZE)
+		goto err;
+
+	ret = -ENOMEM;
+	kctx = kzalloc(sizeof *kctx, GFP_KERNEL);
+	if (!kctx)
+		goto err;
+
+	kctx->spu_code = spu_code;
+	init_waitqueue_head(&kctx->newitem_wq);
+	spin_lock_init(&kctx->queue_lock);
+	INIT_LIST_HEAD(&kctx->work_queue);
+	kctx->notify_cb_info = kzalloc(sizeof(*kctx->notify_cb_info) *
+			(kctx->spu_code->queue_mask+1), GFP_KERNEL);
+	if (!kctx->notify_cb_info)
+		goto err_notify;
+
+	ctx = kalloc_spu_context();
+	if (!ctx)
+		goto err_spu_ctx;
+
+	kctx->spu_ctx = ctx;
+	ctx->flags = flags;
+
+	spu_acquire(ctx);
+	ls = ctx->ops->get_ls(ctx);
+	memcpy(ls, spu_code->code, spu_code->code_len);
+	spu_release(ctx);
+	setup_stack(kctx);
+
+	return kctx;
+
+err_spu_ctx:
+	kfree(kctx->notify_cb_info);
+
+err_notify:
+	kfree(kctx);
+err:
+	return ERR_PTR(ret);
+}
+
+/**
+ * kspu_get_rb_slot - get a free slot to queue a work request on the SPU.
+ * @kctx:	kspu context, where the free slot is required
+ *
+ * Returns a free slot where a request may be queued on. Repeated calls will
+ * return the same slot until it is marked as taken (by
+ * kspu_mark_rb_slot_ready()).
+ */
+struct kspu_job *kspu_get_rb_slot(struct kspu_context *kctx)
+{
+	struct kspu_ring_data *ring_data;
+	unsigned char *ls;
+	int consumed, outstanding, queue_mask;
+
+	ls = kctx->spu_ctx->ops->get_ls(kctx->spu_ctx);
+	ls += kctx->spu_code->kspu_data_offset;
+	ring_data = (struct kspu_ring_data *) ls;
+
+	queue_mask = kctx->spu_code->queue_mask;
+	consumed = ring_data->consumed;
+	outstanding = ring_data->outstanding;
+
+	outstanding++;
+
+	if ((outstanding & queue_mask) ==
+			(consumed & queue_mask))
+		return NULL;
+
+	outstanding = ring_data->outstanding;
+
+	ls += sizeof (struct kspu_ring_data);
+	/* ls points now to the first queue slot */
+	ls += kctx->spu_code->queue_entr_size * (outstanding & queue_mask);
+
+	pr_debug("Return slot %d, at %p\n", (outstanding&queue_mask), ls);
+	return (struct kspu_job *) ls;
+
+}
+EXPORT_SYMBOL_GPL(kspu_get_rb_slot);
+
+/*
+ * kspu_mark_rb_slot_ready - mark a request valid.
+ * @kctx:	kspu context that the request belongs to
+ * @work:	work item that is used for notification. May be NULL.
+ *
+ * The slot will be marked as valid not returned kspu_get_rb_slot() until
+ * the request is processed. If @work is not NULL, work->notify will be
+ * called to notify the user, that his request is done.
+ */
+void kspu_mark_rb_slot_ready(struct kspu_context *kctx,
+		struct kspu_work_item *work)
+{
+	struct kspu_ring_data *ring_data;
+	unsigned char *ls;
+	int outstanding, queue_mask;
+
+	ls = kctx->spu_ctx->ops->get_ls(kctx->spu_ctx);
+	ls += kctx->spu_code->kspu_data_offset;
+	ring_data = (struct kspu_ring_data *) ls;
+
+	queue_mask = kctx->spu_code->queue_mask;
+	outstanding = ring_data->outstanding;
+	kctx->notify_cb_info[outstanding & queue_mask] = work;
+	pr_debug("item ready: outs %d, notification data %p\n",
+			outstanding &queue_mask, work);
+	outstanding++;
+	BUG_ON(outstanding == ring_data->consumed);
+	ring_data->outstanding = outstanding;
+}
+EXPORT_SYMBOL_GPL(kspu_mark_rb_slot_ready);
+
+static int notify_done_reqs(struct kspu_context *kctx)
+{
+	struct kspu_ring_data *ring_data;
+	struct kspu_work_item *kspu_work;
+	unsigned char *ls;
+	unsigned int current_notify, queue_mask;
+	unsigned ret = 0;
+
+	ls = kctx->spu_ctx->ops->get_ls(kctx->spu_ctx);
+	ls += kctx->spu_code->kspu_data_offset;
+	ring_data = (struct kspu_ring_data *) ls;
+	ls += sizeof (struct kspu_ring_data);
+
+	current_notify = kctx->last_notified;
+	queue_mask = kctx->spu_code->queue_mask;
+	pr_debug("notify| %d | %d\n", current_notify & queue_mask,
+			ring_data->consumed & queue_mask);
+
+	while (ring_data->consumed != current_notify) {
+
+		pr_debug("do notify %d\n", current_notify);
+
+		kspu_work = kctx->notify_cb_info[current_notify & queue_mask];
+		if (likely(kspu_work))
+			kspu_work->notify(kspu_work);
+
+		current_notify++;
+		ret = 1;
+	}
+
+	kctx->last_notified = current_notify;
+	pr_debug("notify done\n");
+	return ret;
+}
+
+static int queue_requests(struct kspu_context *kctx)
+{
+	int ret;
+	int empty;
+	int queued = 0;
+	struct kspu_work_item *work;
+
+	WARN_ON(in_irq());
+
+	do {
+		if (!kspu_get_rb_slot(kctx))
+			break;
+
+		spin_lock_bh(&kctx->queue_lock);
+		empty = list_empty(&kctx->work_queue);
+		if (unlikely(empty)) {
+			work = NULL;
+		} else {
+			work = list_first_entry(&kctx->work_queue,
+					struct kspu_work_item, list);
+			list_del(&work->list);
+		}
+		spin_unlock_bh(&kctx->queue_lock);
+
+		if (!work)
+			break;
+
+		pr_debug("Adding item %p to queue\n", work);
+		ret = work->enqueue(work);
+		if (unlikely(ret == 0)) {
+			pr_debug("Adding item %p again to list.\n", work);
+			spin_lock_bh(&kctx->queue_lock);
+			list_add(&work->list, &kctx->work_queue);
+			spin_unlock_bh(&kctx->queue_lock);
+			break;
+		}
+
+		queued = 1;
+	} while (1);
+	pr_debug("Queue requests done. => %d\n", queued);
+	return queued;
+}
+
+/**
+ * kspu_enqueue_work_item - Enqueue a request that supposed to be queued on the
+ * SPU.
+ * @kctx:	kspu context that should be used.
+ * @work:	Work item that should be placed on the SPU
+ *
+ * The functions puts the work item in a list. Once a SPU slot is available,
+ * work->enqueue will be called from a kthread context. User's enqueue
+ * function may than queue the request on the SPU.
+ * kspu_enqueue_work_item() may be called from softirq.
+ */
+void kspu_enqueue_work_item(struct kspu_context *kctx,
+		struct kspu_work_item *work)
+{
+	spin_lock_bh(&kctx->queue_lock);
+	list_add_tail(&work->list, &kctx->work_queue);
+	spin_unlock_bh(&kctx->queue_lock);
+	wake_up_all(&kctx->newitem_wq);
+}
+EXPORT_SYMBOL_GPL(kspu_enqueue_work_item);
+
+static int pending_spu_work(struct kspu_context *kctx)
+{
+	struct kspu_ring_data *ring_data;
+	unsigned char *ls;
+	int queue_mask;
+
+	ls = kctx->spu_ctx->ops->get_ls(kctx->spu_ctx);
+	ls += kctx->spu_code->kspu_data_offset;
+	ring_data = (struct kspu_ring_data *) ls;
+
+	queue_mask = kctx->spu_code->queue_mask;
+	pr_debug("pending spu work status: %u == %u ?\n",
+			ring_data->consumed & queue_mask,
+			ring_data->outstanding & queue_mask);
+	if (ring_data->consumed == ring_data->outstanding)
+		return 0;
+
+	return 1;
+}
+
+/*
+ * Fill dummy requests in the ring buffer. Dummy requests are required
+ * to let MFC "transfer" data if there are not enough real requests.
+ * Transfers with a size of 0 bytes are nops for the MFC
+ */
+static void kspu_fill_dummy_reqs(struct kspu_context *kctx)
+{
+
+	struct kspu_ring_data *ring_data;
+	unsigned char *ls;
+	unsigned int queue_mask;
+	unsigned int requests;
+	struct kspu_job *kjob;
+	int i;
+
+	ls = kctx->spu_ctx->ops->get_ls(kctx->spu_ctx);
+	ls += kctx->spu_code->kspu_data_offset;
+	ring_data = (struct kspu_ring_data *) ls;
+
+	queue_mask = kctx->spu_code->queue_mask;
+
+	/* check for overflow */
+	requests = ring_data->outstanding - ring_data->consumed;
+
+	if (requests >= DMA_BUFFERS *2)
+		return;
+
+	for (i = requests; i < (DMA_BUFFERS*2); i++) {
+		kjob = kspu_get_rb_slot(kctx);
+		kjob->in_size = 0;
+		kspu_mark_rb_slot_ready(kctx, NULL);
+	}
+}
+
+static int spufs_run_kernel_spu(void *priv)
+{
+	struct kspu_context *kctx = (struct kspu_context *) priv;
+	struct spu_context *ctx = kctx->spu_ctx;
+	int ret;
+	u32 status;
+	int npc = 0;
+	int fastpath;
+	DEFINE_WAIT(wait_for_stop);
+	DEFINE_WAIT(wait_for_ibox);
+	DEFINE_WAIT(wait_for_newitem);
+
+	spu_enable_spu(ctx);
+	ctx->event_return = 0;
+
+	ret = spu_acquire_runnable(ctx, 0);
+	if (ret) {
+		mutex_unlock(&ctx->run_mutex);
+		printk(KERN_ERR "could not obtain runable spu: %d\n", ret);
+		BUG();
+	}
+
+	spu_run_init(ctx, &npc);
+
+	do {
+		fastpath = 0;
+		prepare_to_wait(&ctx->stop_wq, &wait_for_stop,
+				TASK_INTERRUPTIBLE);
+		prepare_to_wait(&ctx->ibox_wq, &wait_for_ibox,
+				TASK_INTERRUPTIBLE);
+		prepare_to_wait(&kctx->newitem_wq, &wait_for_newitem,
+				TASK_INTERRUPTIBLE);
+
+		pr_debug("going to handle class1\n");
+		ret = spufs_handle_class1(ctx);
+		if (unlikely(ret)) {
+			/*
+			 * SPE_EVENT_SPE_DATA_STORAGE => refernce invalid memory
+			 */
+			printk(KERN_ERR "Invalid memory dereferenced by the"
+					"spu: %d\n", ret);
+			BUG();
+		}
+
+		pr_debug("going to process kspu_events\n");
+		/* FIXME BUG: We need a physical SPU to discover
+		 * ctx->spu->class_0_pending. It is not saved on context
+		 * switch. We may lose this on context switch.
+		 */
+		status = ctx->ops->status_read(ctx);
+		if ((ctx->spu && ctx->spu->class_0_pending) ||
+				status & SPU_STATUS_INVALID_INSTR) {
+			printk(KERN_ERR "kspu error, status_register: 0x%08x\n",
+					status);
+			printk(KERN_ERR "event return: 0x%08lx, spu's npc: \
+					0x%08x\n", kctx->spu_ctx->event_return,
+					kctx->spu_ctx->ops->npc_read(
+						kctx->spu_ctx));
+			BUG();
+		}
+
+		if (notify_done_reqs(kctx))
+			fastpath = 1;
+
+		if (queue_requests(kctx))
+			fastpath = 1;
+
+		if (!(status & SPU_STATUS_RUNNING)) {
+			/* spu is currently not running */
+			pr_debug("SPU not running, last stop code was: %08x\n",
+					status >> SPU_STOP_STATUS_SHIFT);
+			if (pending_spu_work(kctx)) {
+				/* spu should run again */
+				pr_debug("Activate SPU\n");
+				kspu_fill_dummy_reqs(kctx);
+				spu_release(ctx);
+				ret = spu_acquire_runnable(ctx, 0);
+				BUG_ON(ret);
+				ret = spu_run_init(ctx, &npc);
+				BUG_ON(ret);
+			} else {
+
+				/* spu probably finished working,  */
+				pr_debug("SPU will remain in stop state\n");
+				ret = spu_run_fini(ctx, &npc, &status);
+				BUG_ON(ret);
+				spu_yield(ctx);
+				spu_acquire(ctx);
+			}
+		}
+
+		if (fastpath)
+			continue;
+
+		spu_release(ctx);
+		schedule();
+		spu_acquire(ctx);
+
+	} while (!kthread_should_stop() || !list_empty(&kctx->work_queue));
+
+	finish_wait(&ctx->stop_wq, &wait_for_stop);
+	finish_wait(&ctx->ibox_wq, &wait_for_ibox);
+	finish_wait(&kctx->newitem_wq, &wait_for_newitem);
+
+	spu_release(ctx);
+	spu_disable_spu(ctx);
+	return 0;
+}
+
+static struct kspu_context *kspu_ctx;
+extern struct kspu_code single_spu_code;
+
+/**
+ * kspu_get_kctx - return a kspu context.
+ *
+ * Returns a kspu_context that identifies the SPU context used by the kernel.
+ * Right now only one static context exist which may be used by multiple users.
+ */
+struct kspu_context *kspu_get_kctx(void)
+{
+	return kspu_ctx;
+}
+EXPORT_SYMBOL_GPL(kspu_get_kctx);
+
+static int __init kspu_init(void)
+{
+	int ret = 0;
+
+	pr_debug("code @%p, len %d, offset 0x%08x, elemets: %d,"
+			"element size: %d\n", single_spu_code.code,
+			single_spu_code.code_len,
+			single_spu_code.kspu_data_offset,
+			single_spu_code.queue_mask,
+			single_spu_code.queue_entr_size);
+	kspu_ctx = kcreate_spu_context(0, &single_spu_code);
+	if (IS_ERR(kspu_ctx))
+		return PTR_ERR(kspu_ctx);
+
+	/* kthread_run */
+	kspu_ctx->thread = kthread_create(spufs_run_kernel_spu, kspu_ctx,
+			"spucode");
+	if (IS_ERR(kspu_ctx->thread)) {
+		ret = PTR_ERR(kspu_ctx->thread);
+		free_kspu_context(kspu_ctx);
+	}
+	wake_up_process(kspu_ctx->thread);
+	return ret;
+}
+
+static void __exit kspu_exit(void)
+{
+	kthread_stop(kspu_ctx->thread);
+	free_kspu_context(kspu_ctx);
+}
+
+module_init(kspu_init);
+module_exit(kspu_exit);
+
+MODULE_DESCRIPTION("KSPU interface module");
+MODULE_AUTHOR("Sebastian Siewior <bigeasy@breakpoint.cc>");
+MODULE_LICENSE("GPL");
--- /dev/null
+++ b/arch/powerpc/platforms/cell/spufs/kspu_util.h
@@ -0,0 +1,29 @@
+#ifndef KSPU_UTIL_H
+#define KSPU_UTIL_H
+#include <linux/wait.h>
+
+struct kspu_code {
+	const unsigned int *code;
+	unsigned int code_len;
+	unsigned int kspu_data_offset;
+	unsigned int queue_mask;
+	unsigned int queue_entr_size;
+};
+
+struct notify_cb_info {
+	void *notify;
+};
+
+struct kspu_context {
+	struct spu_context *spu_ctx;
+	wait_queue_head_t newitem_wq;
+	void **notify_cb_info;
+	unsigned int last_notified;
+	struct kspu_code *spu_code;
+	struct task_struct *thread;
+	struct list_head work_queue;
+	/* access to the work_queue element. May be used from softirq */
+	spinlock_t queue_lock;
+};
+
+#endif
--- /dev/null
+++ b/arch/powerpc/platforms/cell/spufs/spu_main.c
@@ -0,0 +1,100 @@
+/*
+ * This code can be considered as crt0.S
+ * Compile with -O[123S] and make sure that here is only one function
+ * that starts at 0x0
+ * Author: Sebastian Siewior <bigeasy@breakpoint.cc>
+ * License: GPLv2
+ */
+#include <asm/kspu/merged_code.h>
+#include <spu_mfcio.h>
+#include "spu_runtime.h"
+
+#define barrier() __asm__ __volatile__("": : :"memory")
+
+spu_operation spu_funcs[TOTAL_SPU_FUNCS] __attribute__((aligned(16))) = {
+	[SPU_FUNC_nop] = spu_nop,
+};
+
+struct kspu_buffers kspu_buff[DMA_BUFFERS];
+
+void _start(void) __attribute__((noreturn));
+void _start(void)
+{
+	struct kernel_spu_data *spu_data;
+
+	spu_data = (struct kernel_spu_data*) KERNEL_SPU_DATA_OFFSET;
+
+	while (37) {
+		unsigned int consumed, outstanding, cur_req, cur_item, cur_buf;
+		unsigned int i;
+
+		spu_stop(1);
+		/*
+		 * Once started, it is guaranteed that at least DMA_BUFFERS *2 requests are in ring buffer.
+		 * The work order is:
+		 * 1. request DMA_BUFFERS transfers, every in a separate buffer with its own tag.
+		 * 2. process those buffers and request new ones.
+		 * 3. if more than (DMA_BUFFERS *2) are available, than the main loop begins:
+		 *   - wait for tag to finish transfers
+		 *   - notify done work
+		 *   - process request
+		 *   - write back
+		 * 4. if no more request are available, process the last DMA_BUFFERS request that are left,
+		 *    write them back and wait until that transfers completes and spu_stop()
+		 */
+
+		consumed = spu_data->kspu_ring_data.consumed;
+		cur_req = consumed;
+		cur_item = consumed;
+
+		/* 1 */
+		for (cur_buf = 0; cur_buf < DMA_BUFFERS; cur_buf++) {
+			init_get_data(&kspu_buff[cur_buf & DMA_BUFF_MASK].space[0],
+					&spu_data->work_item[cur_req & RB_MASK], cur_buf & DMA_BUFF_MASK);
+			cur_req++;
+		}
+
+		/* 2 */
+		for (cur_buf = 0; cur_buf < DMA_BUFFERS; cur_buf++) {
+			wait_for_buffer(1<< (cur_buf & DMA_BUFF_MASK));
+			spu_funcs[spu_data->work_item[cur_item & RB_MASK].operation]
+				(cur_item & RB_MASK, cur_buf & DMA_BUFF_MASK);
+
+			init_get_data(&kspu_buff[cur_buf & DMA_BUFF_MASK].space[0],
+					&spu_data->work_item[cur_req & RB_MASK], cur_buf & DMA_BUFF_MASK);
+			cur_item++;
+			cur_req++;
+		}
+
+		outstanding = spu_data->kspu_ring_data.outstanding;
+		/* 3 */
+		while (cur_req  != outstanding) {
+			wait_for_buffer(1<< (cur_buf & DMA_BUFF_MASK));
+			spu_data->kspu_ring_data.consumed++;
+			if (spu_stat_out_mbox())
+				spu_write_out_mbox(0x0);
+
+			spu_funcs[spu_data->work_item[cur_item & RB_MASK].operation]
+				(cur_item & RB_MASK, cur_buf & DMA_BUFF_MASK);
+
+			init_get_data(&kspu_buff[cur_buf & DMA_BUFF_MASK].space[0],
+					&spu_data->work_item[cur_req & RB_MASK], cur_buf & DMA_BUFF_MASK);
+			cur_item++;
+			cur_req++;
+			cur_buf++;
+			outstanding = spu_data->kspu_ring_data.outstanding;
+		}
+
+		/* 4 */
+		for (i = 0; i < DMA_BUFFERS; i++) {
+			wait_for_buffer(1<< (cur_buf & DMA_BUFF_MASK));
+			spu_funcs[spu_data->work_item[cur_item & RB_MASK].operation]
+				(cur_item & RB_MASK, cur_buf & DMA_BUFF_MASK);
+			cur_buf++;
+			cur_item++;
+		}
+
+		wait_for_buffer(ALL_DMA_BUFFS);
+		spu_data->kspu_ring_data.consumed = cur_item;
+	}
+}
--- /dev/null
+++ b/include/asm-powerpc/kspu/kspu.h
@@ -0,0 +1,23 @@
+#ifndef KSPU_KSPU_H
+#define KSPU_KSPU_H
+#ifdef __KERNEL__
+#include <linux/list.h>
+
+#define MAX_DMA_TRANSFER (16 * 1024)
+
+struct kspu_work_item {
+	struct list_head list;
+	int (*enqueue)(struct kspu_work_item *);
+	void (*notify)(struct kspu_work_item *);
+};
+
+struct kspu_context;
+
+struct kspu_job *kspu_get_rb_slot(struct kspu_context *kspu);
+void kspu_mark_rb_slot_ready(struct kspu_context *kspu,
+		struct kspu_work_item *work);
+void kspu_enqueue_work_item(struct kspu_context *kctx,
+		struct kspu_work_item *work);
+struct kspu_context *kspu_get_kctx(void);
+#endif
+#endif
--- /dev/null
+++ b/include/asm-powerpc/kspu/merged_code.h
@@ -0,0 +1,40 @@
+#ifndef KSPU_MERGED_CODE_H
+#define KSPU_MERGED_CODE_H
+#include <linux/autoconf.h>
+
+#define KSPU_LS_SIZE 0x40000
+
+#define RB_SLOTS 256
+
+#define DMA_BUFFERS   2
+#define DMA_BUFF_MASK (DMA_BUFFERS-1)
+#define ALL_DMA_BUFFS ((1 << DMA_BUFFERS)-1)
+
+typedef int (*spu_operation)(unsigned int cur);
+
+enum SPU_FUNCTIONS {
+
+	TOTAL_SPU_FUNCS,
+};
+
+struct kspu_job {
+	enum SPU_FUNCTIONS operation __attribute__((aligned(16)));
+	unsigned long long in __attribute__((aligned(16)));
+	unsigned int in_size __attribute__((aligned(16)));
+	union {
+	} __attribute__((aligned(16)));
+};
+
+struct kspu_ring_data {
+	volatile unsigned int consumed __attribute__((aligned(16)));
+	volatile unsigned int outstanding __attribute__((aligned(16)));
+};
+
+struct kernel_spu_data {
+	struct kspu_ring_data kspu_ring_data __attribute__((aligned(16)));
+	struct kspu_job work_item[RB_SLOTS] __attribute__((aligned(16)));
+};
+
+#define KERNEL_SPU_DATA_OFFSET (KSPU_LS_SIZE - sizeof(struct kernel_spu_data))
+
+#endif

-- 

^ permalink raw reply

* Re: [RFC 0/2] AES ablkcipher driver for SPUs
From: Evgeniy Polyakov @ 2007-06-27 10:24 UTC (permalink / raw)
  To: Herbert Xu, linux-crypto; +Cc: Sebastian Siewior
In-Reply-To: <20070626225952.GA4571@Chamillionaire.breakpoint.cc>

On Wed, Jun 27, 2007 at 12:59:52AM +0200, Sebastian Siewior (linux-crypto@ml.breakpoint.cc) wrote:
> This driver adds support for AES on SPU. Patch is for review only because
> some parts of the code are not upstream yet.
> Patch one contains the main driver (which uses ablkcipher_ctx_cast()), 
> patch two is for clarity (parts of the missing API that is used).
> 
> Currently only ECB block mode is supported. I plan support for CBC but the
> way the IV currently handled is unfavorable (later more).

Interesting. Do you have any benchmark of the SPU handling AES crypto?

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [RFC 0/2] AES ablkcipher driver for SPUs
From: Sebastian Siewior @ 2007-06-27 11:41 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Herbert Xu, linux-crypto
In-Reply-To: <20070627102420.GA11016@2ka.mipt.ru>

* Evgeniy Polyakov | 2007-06-27 14:24:20 [+0400]:

>On Wed, Jun 27, 2007 at 12:59:52AM +0200, Sebastian Siewior (linux-crypto@ml.breakpoint.cc) wrote:
>> This driver adds support for AES on SPU. Patch is for review only because
>> some parts of the code are not upstream yet.
>> Patch one contains the main driver (which uses ablkcipher_ctx_cast()), 
>> patch two is for clarity (parts of the missing API that is used).
>> 
>> Currently only ECB block mode is supported. I plan support for CBC but the
>> way the IV currently handled is unfavorable (later more).
>
>Interesting. Do you have any benchmark of the SPU handling AES crypto?
Yes I do. Those number are gathered from a PS3 and with a sync
interface. sync means the SPU is idle, I queue the request, start the
SPU, SPU requests the data, waits from completion, computes it,
transfers it back and finally the SPU stops (idle again). Oh and only
one SPU is used.
The test is generated with a simple module that allocated four pages (16
kb) and calls the SPU crypto code over and over again until approx 156
MB of memory passed/processed. From the time and total size I get my
kb/sec.
Diagram [1] is exactly that. SIMD is my SIMD version of AES on SPU,
generic is the already present version of AES (crypto/aes.c, modified
to fit the required signature for the encryption. decryption has been
left apart since it is the same code, only different tables) also on the
SPU. 
Diagram [2] shows how relevant the transfer size actually is. I still
transfer 156 MB data but in different transfer sizes. Smaller transfer
size means more transfers, waiting, sbox reloading and more start/stop.
Generic-PPU is "wrong". This speed is taken the first diagram. With more
loops I should get slightly slower (at least due to branches). Operation
is ECB+Encryption+128b key

I did not measure how my SIMD code behaves if the buffers are already
there and I have never to start the SPU. Maybe later that week (as well
as fixing/completing diagram 2).

Ach one last thing: Everything is ECB mode. From experience with VMX I
must say that that one little xor operation in CBC makes no difference
at all.

[1] http://breakpoint.cc/spu_aes/spu_code.png
[2] http://breakpoint.cc/spu_aes/spu_sync_blocksize.png

>-- 
>	Evgeniy Polyakov
Sebastian

^ permalink raw reply

* Re: [RFC 0/2] AES ablkcipher driver for SPUs
From: Evgeniy Polyakov @ 2007-06-28 10:50 UTC (permalink / raw)
  To: Herbert Xu, linux-crypto
In-Reply-To: <20070627114159.GA6456@Chamillionaire.breakpoint.cc>

On Wed, Jun 27, 2007 at 01:41:59PM +0200, Sebastian Siewior (linux-crypto@ml.breakpoint.cc) wrote:
> Yes I do. Those number are gathered from a PS3 and with a sync
> interface. sync means the SPU is idle, I queue the request, start the
> SPU, SPU requests the data, waits from completion, computes it,
> transfers it back and finally the SPU stops (idle again). Oh and only
> one SPU is used.
> The test is generated with a simple module that allocated four pages (16
> kb) and calls the SPU crypto code over and over again until approx 156
> MB of memory passed/processed. From the time and total size I get my
> kb/sec.
...
> [1] http://breakpoint.cc/spu_aes/spu_code.png
> [2] http://breakpoint.cc/spu_aes/spu_sync_blocksize.png

Mmm, looks really good. Did powerpc folks acked this changes?

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: [RFC 0/2] AES ablkcipher driver for SPUs
From: Sebastian Siewior @ 2007-06-28 11:24 UTC (permalink / raw)
  To: Evgeniy Polyakov; +Cc: Herbert Xu, linux-crypto
In-Reply-To: <20070628105034.GA3771@2ka.mipt.ru>

* Evgeniy Polyakov | 2007-06-28 14:50:36 [+0400]:

>On Wed, Jun 27, 2007 at 01:41:59PM +0200, Sebastian Siewior (linux-crypto@ml.breakpoint.cc) wrote:
>> Yes I do. Those number are gathered from a PS3 and with a sync
>> interface. sync means the SPU is idle, I queue the request, start the
>> SPU, SPU requests the data, waits from completion, computes it,
>> transfers it back and finally the SPU stops (idle again). Oh and only
>> one SPU is used.
>> The test is generated with a simple module that allocated four pages (16
>> kb) and calls the SPU crypto code over and over again until approx 156
>> MB of memory passed/processed. From the time and total size I get my
>> kb/sec.
>...
>> [1] http://breakpoint.cc/spu_aes/spu_code.png
>> [2] http://breakpoint.cc/spu_aes/spu_sync_blocksize.png
>
>Mmm, looks really good. Did powerpc folks acked this changes?
I submitted some patches last week or two weeks ago to the cbe-oss-dev
ml and I did not get a nack. Just style, naming and this sort of things.
I plan to clean those up (address all issues) and post it once again.
Maybe, the IV problem is solved until then :)

>-- 
>	Evgeniy Polyakov
Sebastian

^ permalink raw reply

* [PATCH 12/12] drivers: PMC MSP71xx security engine driver
From: Marc St-Jean @ 2007-06-28 19:49 UTC (permalink / raw)
  To: davem, herbert; +Cc: brian_oostenbrink, linux-crypto, rod_sillett

[PATCH 12/12] drivers: PMC MSP71xx security engine driver

Patch to add an security engien driver for the PMC-Sierra MSP71xx devices.

Thanks,
Marc

Signed-off-by: Marc St-Jean <Marc_St-Jean@pmc-sierra.com>
---
NOTE: This patch was originally sent on June 15th but does not appear in
the archive.

Changes since last post:
-Added interrupt enable/disable to spin_locks.
-Added checks for dma_alloc_coherent() results.
-Removed sleep functionality.
-Renamed a couple variables for clarity.

 crypto/Kconfig              |   40 
 crypto/Makefile             |    8 
 drivers/crypto/Kconfig      |   18 
 drivers/crypto/Makefile     |    1 
 drivers/crypto/pmcmsp_sec.c | 2379 ++++++++++++++++++++++++++++++++++++++++++++
 5 files changed, 2442 insertions(+), 4 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 086fcec..33bdec6 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -68,12 +68,32 @@ config CRYPTO_MD5
 	help
 	  MD5 message digest algorithm (RFC1321).
 
+config CRYPTO_MD5_HW
+	tristate
+	default n
+
+config CRYPTO_MD5_SW
+	tristate
+	default y if (CRYPTO_MD5_HW = n && CRYPTO_MD5 = y)
+	default m if ((CRYPTO_MD5_HW = m || CRYPTO_MD5_HW = n) && \
+			CRYPTO_MD5 = m)
+
 config CRYPTO_SHA1
 	tristate "SHA1 digest algorithm"
 	select CRYPTO_ALGAPI
 	help
 	  SHA-1 secure hash standard (FIPS 180-1/DFIPS 180-2).
 
+config CRYPTO_SHA1_HW
+	tristate
+	default n
+
+config CRYPTO_SHA1_SW
+	tristate
+	default y if (CRYPTO_SHA1_HW = n && CRYPTO_SHA1 = y)
+	default m if ((CRYPTO_SHA1_HW = m || CRYPTO_SHA1_HW = n) && \
+			CRYPTO_SHA1 = m)
+
 config CRYPTO_SHA256
 	tristate "SHA256 digest algorithm"
 	select CRYPTO_ALGAPI
@@ -177,6 +197,16 @@ config CRYPTO_DES
 	help
 	  DES cipher algorithm (FIPS 46-2), and Triple DES EDE (FIPS 46-3).
 
+config CRYPTO_DES_HW
+	tristate
+	default n
+
+config CRYPTO_DES_SW
+	tristate
+	default y if (CRYPTO_DES_HW = n && CRYPTO_DES = y)
+	default m if ((CRYPTO_DES_HW = m || CRYPTO_DES_HW = n) && \
+			CRYPTO_DES = m)
+
 config CRYPTO_FCRYPT
 	tristate "FCrypt cipher algorithm"
 	select CRYPTO_ALGAPI
@@ -283,6 +313,16 @@ config CRYPTO_AES
 
 	  See <http://csrc.nist.gov/CryptoToolkit/aes/> for more information.
 
+config CRYPTO_AES_HW
+	tristate
+	default n
+
+config CRYPTO_AES_SW
+	tristate
+	default y if (CRYPTO_AES_HW = n && CRYPTO_AES = y)
+	default m if ((CRYPTO_AES_HW = m || CRYPTO_AES_HW = n) && \
+			CRYPTO_AES = m)
+
 config CRYPTO_AES_586
 	tristate "AES cipher algorithms (i586)"
 	depends on (X86 || UML_X86) && !64BIT
diff --git a/crypto/Makefile b/crypto/Makefile
index 12f93f5..b337967 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -17,8 +17,8 @@ obj-$(CONFIG_CRYPTO_MANAGER) += cryptomgr.o
 obj-$(CONFIG_CRYPTO_HMAC) += hmac.o
 obj-$(CONFIG_CRYPTO_XCBC) += xcbc.o
 obj-$(CONFIG_CRYPTO_NULL) += crypto_null.o
-obj-$(CONFIG_CRYPTO_MD4) += md4.o
-obj-$(CONFIG_CRYPTO_MD5) += md5.o
+obj-$(CONFIG_CRYPTO_MD5_SW) += md5.o
+obj-$(CONFIG_CRYPTO_SHA1_SW) += sha1.o
 obj-$(CONFIG_CRYPTO_SHA1) += sha1.o
 obj-$(CONFIG_CRYPTO_SHA256) += sha256.o
 obj-$(CONFIG_CRYPTO_SHA512) += sha512.o
@@ -29,13 +29,13 @@ obj-$(CONFIG_CRYPTO_ECB) += ecb.o
 obj-$(CONFIG_CRYPTO_CBC) += cbc.o
 obj-$(CONFIG_CRYPTO_PCBC) += pcbc.o
 obj-$(CONFIG_CRYPTO_LRW) += lrw.o
-obj-$(CONFIG_CRYPTO_DES) += des.o
+obj-$(CONFIG_CRYPTO_DES_SW) += des.o
 obj-$(CONFIG_CRYPTO_FCRYPT) += fcrypt.o
 obj-$(CONFIG_CRYPTO_BLOWFISH) += blowfish.o
 obj-$(CONFIG_CRYPTO_TWOFISH) += twofish.o
 obj-$(CONFIG_CRYPTO_TWOFISH_COMMON) += twofish_common.o
 obj-$(CONFIG_CRYPTO_SERPENT) += serpent.o
-obj-$(CONFIG_CRYPTO_AES) += aes.o
+obj-$(CONFIG_CRYPTO_AES_SW) += aes.o
 obj-$(CONFIG_CRYPTO_CAMELLIA) += camellia.o
 obj-$(CONFIG_CRYPTO_CAST5) += cast5.o
 obj-$(CONFIG_CRYPTO_CAST6) += cast6.o
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index ff8c4be..bbda463 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -66,4 +66,22 @@ config CRYPTO_DEV_GEODE
 	  To compile this driver as a module, choose M here: the module
 	  will be called geode-aes.
 
+config CRYPTO_PMCMSP
+	tristate "Support for PMCMSP on-chip IPSEC engine"
+	depends on CRYPTO && PMC_MSP
+
+config CRYPTO_PMCMSP_CIPHER
+	bool "Accelerate ciphers (AES, DES, 3DES)"
+	depends on CRYPTO_PMCMSP
+	default y
+	select CRYPTO_AES_HW
+	select CRYPTO_DES_HW
+
+config CRYPTO_PMCMSP_HASH
+	bool "Accelerate hashes (MD5, SHA1)"
+	depends on CRYPTO_PMCMSP
+	default y
+	select CRYPTO_MD5_HW
+	select CRYPTO_SHA1_HW
+
 endmenu
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index 6059cf8..aa6fdc4 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -2,3 +2,4 @@ obj-$(CONFIG_CRYPTO_DEV_PADLOCK) += padlock.o
 obj-$(CONFIG_CRYPTO_DEV_PADLOCK_AES) += padlock-aes.o
 obj-$(CONFIG_CRYPTO_DEV_PADLOCK_SHA) += padlock-sha.o
 obj-$(CONFIG_CRYPTO_DEV_GEODE) += geode-aes.o
+obj-$(CONFIG_CRYPTO_PMCMSP) += pmcmsp_sec.o
diff --git a/drivers/crypto/pmcmsp_sec.c b/drivers/crypto/pmcmsp_sec.c
new file mode 100644
index 0000000..9ca3134
--- /dev/null
+++ b/drivers/crypto/pmcmsp_sec.c
@@ -0,0 +1,2379 @@
+/*
+ * PMC-Sierra MSP security engine driver for linux
+ *
+ * Copyright 2000-2007 PMC-Sierra, Inc
+ *
+ * Driver for use with second, newer version of PMC security engine.
+ * Implements the Crypto API for aes, des, des3, md5 and sha1.
+ *
+ * This program is free software; you can redistribute  it and/or modify it
+ * under  the terms of  the GNU General  Public License as published by the
+ * Free Software Foundation;  either version 2 of the  License, or (at your
+ * option) any later version.
+ *
+ * THIS  SOFTWARE  IS PROVIDED   ``AS  IS'' AND   ANY  EXPRESS OR IMPLIED
+ * WARRANTIES,   INCLUDING, BUT NOT  LIMITED  TO, THE IMPLIED WARRANTIES OF
+ * MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.  IN
+ * NO  EVENT  SHALL   THE AUTHOR  BE    LIABLE FOR ANY   DIRECT, INDIRECT,
+ * INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
+ * NOT LIMITED   TO, PROCUREMENT OF  SUBSTITUTE GOODS  OR SERVICES; LOSS OF
+ * USE, DATA,  OR PROFITS; OR  BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
+ * ANY THEORY OF LIABILITY, WHETHER IN  CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
+ * THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ *
+ * You should have received a copy of the  GNU General Public License along
+ * with this program; if not, write  to the Free Software Foundation, Inc.,
+ * 675 Mass Ave, Cambridge, MA 02139, USA.
+ */
+
+#include <linux/version.h>
+#include <linux/module.h>
+#include <linux/crypto.h>
+#include <linux/kernel.h>
+#include <linux/delay.h>
+#include <linux/errno.h>
+#include <linux/miscdevice.h>
+#include <linux/fs.h>
+#include <linux/dma-mapping.h>
+#include <linux/interrupt.h>
+#include <linux/spinlock.h>
+
+#include <asm/atomic.h>
+
+#include <crypto/algapi.h>
+
+#include <msp_regs.h>
+#include <msp_regops.h>
+#include <msp_int.h>
+#include <msp_prom.h>
+
+/**************************************************************************
+ * Constants
+ */
+
+/* switches to turn on manual debug features - normally off */
+/* #define DEBUG */
+/* #define DEBUG_VERBOSE */
+/* #define DUMP_WQ_ENTRIES */
+/* #define DUMP_SA */
+/* #define DUMP_CQ_ENTRIES */
+
+#define PREFIX			"pmcmcsp_sec: "
+
+/* SoC Reset registers */
+#define MSPRST_STS		0x00
+#define MSPRST_SET		0x04
+#define MSPRST_CLR		0x08
+
+/* Random Number Generator registers */
+#define SEC_RNG_CNF		0x084
+#define SEC_RNG_VAL		0x094
+
+/* Security Engine registers */
+#define SEC2_REG		0x200
+
+/* number of hardware queues */
+#define HW_NR_WORK_QUEUES	2
+#define HW_NR_COMP_QUEUES	2
+
+/* flags field values for SA struct */
+#define SAFLG_MODE_MASK		0x7
+#define SAFLG_MODE_ESP_IN	0
+#define SAFLG_MODE_ESP_OUT	1
+#define SAFLG_MODE_HMAC		2
+#define SAFLG_MODE_HASH_PAD	3
+#define SAFLG_MODE_HASH		4
+#define SAFLG_MODE_CRYPT	5
+
+#define SAFLG_SI		0x80	/* increment sequence number */
+#define SAFLG_CRI		0x100	/* Create IV */
+#define SAFLG_CPI		0x200	/* Compare ICV */
+#define SAFLG_EM		0x400	/* ESP Manual Mode */
+#define SAFLG_CV		0x800	/* Use Chaining Variables */
+
+#define SAFLG_HASH_MASK		0xe000
+#define SAFLG_MD5_96		0x0000
+#define SAFLG_MD5		0x2000
+#define SAFLG_SHA1_96		0x4000
+#define SAFLG_SHA1		0x6000
+#define SAFLG_HASHNULL		0x8000
+
+#define SAFLG_KEYS_MASK		0x70000
+#define SAFLG_DES_K1_DECRYPT	0x10000
+#define SAFLG_DES_K2_DECRYPT	0x20000
+#define SAFLG_DES_K3_DECRYPT	0x40000
+
+#define SAFLG_AES_DECRYPT	SAFLG_DES_K1_DECRYPT
+#define SAFLG_AES_ENCRYPT	0
+#define SAFLG_DES_DECRYPT	SAFLG_DES_K1_DECRYPT
+#define SAFLG_DES_ENCRYPT	0
+#define SAFLG_EDE_ENCRYPT	(SAFLG_DES_K2_DECRYPT)
+#define SAFLG_EDE_DECRYPT	(SAFLG_DES_K1_DECRYPT | SAFLG_DES_K3_DECRYPT)
+
+#define SAFLG_BLK_MASK		0x380000
+#define SAFLG_ECB		0
+#define SAFLG_CTR		0x080000
+#define SAFLG_CBC_ENCRYPT	0x100000
+#define SAFLG_CBC_DECRYPT	0x180000
+#define SAFLG_CFB_ENCRYPT	0x200000
+#define SAFLG_CFB_DECRYPT	0x280000
+#define SAFLG_OFB		0x300000
+
+#define SAFLG_CRYPT_TYPE_MASK	0x1C00000
+#define SAFLG_DES		0
+#define SAFLG_DES3		0x0400000
+#define SAFLG_AES_128		0x0800000
+#define SAFLG_AES_192		0x0C00000
+#define SAFLG_AES_256		0x1000000
+#define SAFLG_CRYPTNULL		0x1400000
+
+/* control word */
+#define SEC2_WE_CTRL_SZ		0x0ff
+#define SEC2_WE_CTRL_CQ		0x100
+#define SEC2_WE_CTRL_GI		0x800
+#define SEC2_WE_CTRL_AKO	0x8000
+#define SEC2_WE_CTRL_NXTHDR_SHF	16
+#define SEC2_WE_CTRL_PADLEN_SHF	24
+
+/* scatter/gather flags */
+#define SEC2_WE_SG_SCATTER	0x80000000
+#define SEC2_WE_SG_SOP		0x40000000
+#define SEC2_WE_SG_EOD		0x20000000
+#define SEC2_WE_SG_EOP		0x10000000
+#define SEC2_WE_SG_SIZE		0x00001FFF
+
+/* queue sizes must be powers of two */
+#define SEC_WORK_Q_SIZE		256
+#define SEC_WORK_Q_MASK		(SEC_WORK_Q_SIZE - 1)
+#define SEC_COMP_Q_SIZE		512
+#define SEC_COMP_Q_MASK		(SEC_COMP_Q_SIZE - 1)
+
+#define WQE_MAGIC	0x11223344/* use to validate work element */
+#define CQE_SIZE	(4 * 4)	/* size completion element in bytes */
+#define WQE_MAX_BUF	16	/* max number of scatter/gather bufs */
+#define WQE_HDR_SIZE	4	/* size of work desc header in words */
+#define WQE_DESC_SIZE(sg_count)		(WQE_HDR_SIZE + ((sg_count) * 2))
+				/* work descriptor size in words */
+#define WQE_DESC_SIZE_BYTES(sg_count)	(WQE_DESC_SIZE(sg_count) << 2)
+				/* work descriptor size in bytes */
+#define WQE_LAST	1	/* signals last scatter/gather buffer */
+
+/* crypt directions and modes */
+#define CRYPT_DIRECTION_ENCRYPT	0x00000000
+#define CRYPT_DIRECTION_DECRYPT	0x00000001
+
+#define CRYPTO_TFM_MODE_OTHER	0x00000000
+#define CRYPTO_TFM_MODE_ECB	0x00000001
+#define CRYPTO_TFM_MODE_CBC	0x00000002
+
+/**************************************************************************
+ * Structures
+ */
+
+/*
+ * Requests to the hardware are placed in a "work queue".
+ * indications of completion are placed in a "completion queue".
+ *
+ * This structure describes the hardware's picture of a queue.
+ */
+struct sec2_q_regs {
+	/*
+	 * The registers live across a bus; shadow the registers
+	 * whenever possible, access them only when necessary.
+	 */
+	unsigned int	*ofst_ptr;	/* Hardware writes a copy of the in
+					 * or out register to the location
+					 * pointed to by this register (out
+					 * for work queue, in for completion
+					 * queue). Software uses this as a
+					 * shadow of register in main mem.
+					 */
+
+	unsigned int	avail;	/* space available in queue */
+
+	unsigned char	*base;	/* base address of queue
+				 * Must be aligned on the boundary
+				 * of the size of the buffer.
+				 * i.e. base & (size-1) == 0
+				 */
+	unsigned int	size;	/* size of buffer */
+	unsigned int	in;	/* offset of in address */
+				/* actual in is at base + in */
+	unsigned int	out;	/* offset of head address */
+				/* actual out is at base + out */
+};
+
+struct sec2_regs {
+	unsigned int		res1[5];
+
+	unsigned int		sis;	/* Solo Interupt Status */
+
+				#define SEC2_INT_CQ0		0x000001
+				#define SEC2_INT_CQ1		0x000002
+				#define SEC2_INT_BAD_ADDR	0x000004
+				#define SEC2_INT_HASH_NON_64	0x000008
+				#define SEC2_INT_DES_NON_8	0x000010
+				#define SEC2_INT_AES_NON_16	0x000020
+				#define SEC2_INT_WQ0_HIGH	0x000040
+				#define SEC2_INT_WQ1_HIGH	0x000080
+				#define SEC2_INT_CQ0_HIGH	0x000100
+				#define SEC2_INT_CQ1_HIGH	0x000200
+				#define SEC2_INT_WQ0_FULL	0x000400
+				#define SEC2_INT_WQ1_FULL	0x000800
+				#define SEC2_INT_CQ0_FULL	0x001000
+				#define SEC2_INT_CQ1_FULL	0x002000
+				#define SEC2_INT_WQ0_EMPTY	0x004000
+				#define SEC2_INT_WQ1_EMPTY	0x008000
+				#define SEC2_INT_CQ0_EMPTY	0x010000
+				#define SEC2_INT_CQ1_EMPTY	0x020000
+				#define SEC2_INT_BAD_GATHER	0x040000
+				#define SEC2_INT_ICV_COMP_ERR	0x080000
+				#define SEC2_INT_MBX_ENABLE	0x100000
+				#define SEC2_INT_OFFSET_ERR	0x10000000
+				#define SEC2_INT_GS_BALANCE_ERR	0x20000000
+				#define SEC2_INT_EOD_MARK_ERR	0x40000000
+
+	unsigned int		esr;	/* Engine Status Register */
+
+				#define SEC2_ESR_DMA_IDLE	0x01
+				#define SEC2_ESR_DMA_DONE	0x02
+				#define SEC2_ESR_HASH_IDLE	0x04
+				#define SEC2_ESR_HASH_DONE	0x08
+				#define SEC2_ESR_DES_IDLE	0x10
+				#define SEC2_ESR_DES_DONE	0x20
+				#define SEC2_ESR_AES_IDLE	0x40
+				#define SEC2_ESR_AES_DONE	0x80
+
+	unsigned int		ier;	/* Interrupt Enable Register */
+
+				/*
+				 * ier uses same bits as sis
+				 */
+
+	unsigned int		res2[3];
+	unsigned int		rst;	/* Reset Register */
+
+				#define SEC2_RST_DMA		0x01
+				#define SEC2_RST_HASH		0x02
+				#define SEC2_RST_DES		0x04
+				#define SEC2_RST_AES		0x08
+				#define SEC2_RST_MASTER		0x0F
+
+	unsigned int		res3;
+	struct sec2_q_regs	wq[2];	/* work queues */
+	struct sec2_q_regs	cq[2];	/* completion queues */
+	unsigned int		dwpd;	/* "Duet Write Protection Disable" */
+	unsigned int		sget;	/* "SRAM GSE End Tag" */
+	unsigned int		aesc[4];/* AES Counter mode Counter */
+	unsigned int		aesk[8];/* AES Last Expanded Key */
+};
+
+/* security association structure */
+struct sec2_sa {
+	unsigned int		flags;
+	unsigned int		esp_spi;
+	unsigned int		esp_sequence;
+	unsigned int		hash_chain_a[5];
+	unsigned int		crypt_keys[8];
+	unsigned int		hash_chain_b[5];
+	unsigned int		hash_init_len[2];
+	unsigned int		crypt_iv[4];
+	unsigned int		proto_ip[5];
+};
+
+/* local state structures maintained by Crypto API */
+struct msp_crypto {
+	struct sec2_sa		sa;
+	struct sec2_sa		aes_decrypt_sa;
+	struct sec2_sa		des_decrypt_sa;
+	unsigned int		keysize;
+};
+
+/* local state structures maintained by Crypto API */
+struct msp_hash {
+	struct sec2_sa		sa;
+	unsigned int		hmac_init_done;
+	unsigned int		resultsize;
+	u8			*data;
+	unsigned int		data_size;
+	int			data_needs_free;
+};
+
+/* local structure used to control work queue */
+struct workq {
+	spinlock_t workq_lock;	/* lock to protect work queue */
+	volatile struct sec2_q_regs *wq_regs;
+				/* ptr to hw regs for this queue */
+	unsigned char		*base; /* ptr to slowpath base of queue */
+	dma_addr_t		base_dma_addr;
+				/* dma bus address of base of queue */
+	unsigned int		in; /* new desc written at this offset */
+	wait_queue_head_t	space_wait;
+				/* tasks waiting to write into queue */
+	unsigned int		low_water;
+				/* when avail space reaches this, wake tasks */
+};
+
+/* local structure used to control completion queue */
+struct compq {
+	spinlock_t compq_lock;	/* lock to protect completion queue */
+	volatile struct sec2_q_regs *cq_regs;
+				/* ptr to hw regs for this queue */
+	unsigned char		*base; /* ptr to slowpath base of queue */
+	dma_addr_t		base_dma_addr; /* dma bus address of queue */
+	unsigned int		out; /* new desc read from this offset */
+};
+
+/* scatter/gather info */
+struct scat_gath {
+	unsigned int		ctrl;	/* buffer control flags */
+	dma_addr_t		buf_dma_addr;
+				/* bus address of scatter/gather buffer */
+};
+
+/*
+ * Local structure used to control work descriptor while being
+ * processed by engine.
+ */
+struct desc_tent {
+	unsigned int		magic;
+				/* used to confirm really is a desc_tent */
+
+	/* temporary variables used while building work element */
+	unsigned int		is_first;
+				/* set if first gather or scatter */
+	unsigned int		do_eod_correction;
+				/* set if EOD must be 2nd to last */
+	unsigned int		ctrl;
+				/* work element control flags */
+
+	/* dma addresses needed to do dma_unmap when done */
+	dma_addr_t		sa_dma_addr;	/* bus address of SA */
+	unsigned int		sg_count;	/* count of sg buffers */
+	struct scat_gath	sg[WQE_MAX_BUF];/* list of buffers */
+
+	/* info needed to sleep or poll on result */
+	wait_queue_head_t	wait_q; /* for waiting on completion queue */
+	atomic_t		work_complete;	/* set wait is over */
+
+	/* completion status read from IPSEC engine. 0 if success */
+	unsigned int		comp_status;
+};
+
+/**************************************************************************
+ * Private functions
+ */
+
+static int msp_crypto_setkey(struct crypto_tfm *tfm,
+				const u8 *key, unsigned int key_len);
+static void msp_crypto_encrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in);
+static int msp_crypto_ecb_encrypt(struct blkcipher_desc *desc,
+		       struct scatterlist *dst, struct scatterlist *src,
+		       unsigned int nbytes);
+static int msp_crypto_cbc_encrypt(struct blkcipher_desc *desc,
+		       struct scatterlist *dst, struct scatterlist *src,
+		       unsigned int nbytes);
+static void msp_crypto_decrypt(struct crypto_tfm *tfm, u8 *out, const u8 *in);
+static int msp_crypto_ecb_decrypt(struct blkcipher_desc *desc,
+		       struct scatterlist *dst, struct scatterlist *src,
+		       unsigned int nbytes);
+static int msp_crypto_cbc_decrypt(struct blkcipher_desc *desc,
+		       struct scatterlist *dst, struct scatterlist *src,
+		       unsigned int nbytes);
+
+static void msp_crypto_md5_init(struct crypto_tfm *tfm);
+static void msp_crypto_md5_update(struct crypto_tfm *tfm,
+				const u8 *data, unsigned int len);
+static void msp_crypto_md5_final(struct crypto_tfm *tfm, u8 *out);
+
+static void msp_crypto_sha1_init(struct crypto_tfm *tfm);
+static void msp_crypto_sha1_update(struct crypto_tfm *tfm,
+				const u8 *data, unsigned int len);
+static void msp_crypto_sha1_final(struct crypto_tfm *tfm, u8 *out);
+
+static irqreturn_t msp_secv2_interrupt(int irq, void *dev_id);
+static int poll_completion(void);
+
+#ifdef DEBUG
+#define DBG_SEC(a1, a2...)	printk(KERN_DEBUG "SEC: " a1, ##a2)
+#else
+#define DBG_SEC(a...)
+#endif
+
+static void dump_sec_regs(void);
+#if defined(DEBUG)
+#define debug_dump_sec_regs dump_sec_regs
+#else
+#define debug_dump_sec_regs()
+#endif
+
+#ifdef DUMP_WQ_ENTRIES
+static void dump_wq_entry(struct workq *wq);
+#else
+#define dump_wq_entry(wq)
+#endif
+
+#ifdef DUMP_CQ_ENTRIES
+static void dump_cq_entry(struct compq *cq);
+#else
+#define dump_cq_entry(cq)
+#endif
+
+#ifdef DUMP_SA
+static void dump_sa(const struct sec2_sa *const sa);
+#else
+#define dump_sa(sa)
+#endif
+
+/**************************************************************************
+ * Private data
+ */
+
+/*
+ * Define structures used to register IPSec engine with
+ * Linux Crypto API - this is the only public interface
+ * to this driver!
+ */
+
+/* Crypto API glue for AES functions */
+#define AES_MIN_KEY_SIZE	16 /* in u8 units */
+#define AES_MAX_KEY_SIZE	32
+#define AES_BLOCK_SIZE		16
+
+static struct crypto_alg msp_aes_alg = {
+	.cra_name		= "aes",
+	.cra_driver_name	= "aes-pmcmsp",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct msp_crypto),
+	.cra_alignmask		= 3,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(msp_aes_alg.cra_list),
+	.cra_u = {
+		.cipher = {
+			.cia_min_keysize = AES_MIN_KEY_SIZE,
+			.cia_max_keysize = AES_MAX_KEY_SIZE,
+			.cia_setkey	 = msp_crypto_setkey,
+			.cia_encrypt	 = msp_crypto_encrypt,
+			.cia_decrypt	 = msp_crypto_decrypt,
+		}
+	}
+};
+
+static struct crypto_alg msp_ecb_aes_alg = {
+	.cra_name		= "ecb(aes)",
+	.cra_driver_name	= "ecb-aes-pmcmsp",
+	.cra_priority		= 400,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct msp_crypto),
+	.cra_alignmask		= 3,
+	.cra_type		= &crypto_blkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(msp_ecb_aes_alg.cra_list),
+	.cra_u = {
+		.blkcipher = {
+			.min_keysize	= AES_MIN_KEY_SIZE,
+			.max_keysize	= AES_MAX_KEY_SIZE,
+			.setkey		= msp_crypto_setkey,
+			.encrypt	= msp_crypto_ecb_encrypt,
+			.decrypt	= msp_crypto_ecb_decrypt,
+		}
+	}
+};
+
+static struct crypto_alg msp_cbc_aes_alg = {
+	.cra_name		= "cbc(aes)",
+	.cra_driver_name	= "cbc-aes-pmcmsp",
+	.cra_priority		= 400,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct msp_crypto),
+	.cra_alignmask		= 3,
+	.cra_type		= &crypto_blkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(msp_cbc_aes_alg.cra_list),
+	.cra_u = {
+		.blkcipher = {
+			.min_keysize	= AES_MIN_KEY_SIZE,
+			.max_keysize	= AES_MAX_KEY_SIZE,
+			.ivsize		= AES_BLOCK_SIZE,
+			.setkey		= msp_crypto_setkey,
+			.encrypt	= msp_crypto_cbc_encrypt,
+			.decrypt	= msp_crypto_cbc_decrypt,
+		}
+	}
+};
+
+/* Crypto API glue for DES functions */
+#define DES_KEY_SIZE		8
+#define DES_BLOCK_SIZE		8
+
+static struct crypto_alg msp_des_alg = {
+	.cra_name		= "des",
+	.cra_driver_name	= "des-pmcmsp",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		= DES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct msp_crypto),
+	.cra_alignmask		= 3,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(msp_des_alg.cra_list),
+	.cra_u = {
+		.cipher = {
+			.cia_min_keysize = DES_KEY_SIZE,
+			.cia_max_keysize = DES_KEY_SIZE,
+			.cia_setkey	 = msp_crypto_setkey,
+			.cia_encrypt	 = msp_crypto_encrypt,
+			.cia_decrypt	 = msp_crypto_decrypt,
+		}
+	}
+};
+
+static struct crypto_alg msp_ecb_des_alg = {
+	.cra_name		= "ecb(des)",
+	.cra_driver_name	= "ecb-des-pmcmsp",
+	.cra_priority		= 400,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
+	.cra_blocksize		= DES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct msp_crypto),
+	.cra_alignmask		= 3,
+	.cra_type		= &crypto_blkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(msp_ecb_des_alg.cra_list),
+	.cra_u = {
+		.blkcipher = {
+			.min_keysize	= DES_KEY_SIZE,
+			.max_keysize	= DES_KEY_SIZE,
+			.setkey		= msp_crypto_setkey,
+			.encrypt	= msp_crypto_ecb_encrypt,
+			.decrypt	= msp_crypto_ecb_decrypt,
+		}
+	}
+};
+
+static struct crypto_alg msp_cbc_des_alg = {
+	.cra_name		= "cbc(des)",
+	.cra_driver_name	= "cbc-des-pmcmsp",
+	.cra_priority		= 400,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
+	.cra_blocksize		= DES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct msp_crypto),
+	.cra_alignmask		= 3,
+	.cra_type		= &crypto_blkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(msp_cbc_des_alg.cra_list),
+	.cra_u = {
+		.blkcipher = {
+			.min_keysize	= DES_KEY_SIZE,
+			.max_keysize	= DES_KEY_SIZE,
+			.ivsize		= DES_BLOCK_SIZE,
+			.setkey		= msp_crypto_setkey,
+			.encrypt	= msp_crypto_cbc_encrypt,
+			.decrypt	= msp_crypto_cbc_decrypt,
+		}
+	}
+};
+
+/* Crypto API glue for DES3 functions */
+#define DES3_KEY_SIZE		(3 * DES_KEY_SIZE)
+#define DES3_BLOCK_SIZE		DES_BLOCK_SIZE
+
+static struct crypto_alg msp_des3_alg = {
+	.cra_name		= "des3_ede",
+	.cra_driver_name	= "des3_ede-pmcmsp",
+	.cra_priority		= 300,
+	.cra_flags		= CRYPTO_ALG_TYPE_CIPHER,
+	.cra_blocksize		= DES3_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct msp_crypto),
+	.cra_alignmask		= 3,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(msp_des3_alg.cra_list),
+	.cra_u = {
+		.cipher = {
+			.cia_min_keysize = DES3_KEY_SIZE,
+			.cia_max_keysize = DES3_KEY_SIZE,
+			.cia_setkey	 = msp_crypto_setkey,
+			.cia_encrypt	 = msp_crypto_encrypt,
+			.cia_decrypt	 = msp_crypto_decrypt,
+		}
+	}
+};
+
+static struct crypto_alg msp_ecb_des3_alg = {
+	.cra_name		= "ecb(des3_ede)",
+	.cra_driver_name	= "ecb-des3_ede-pmcmsp",
+	.cra_priority		= 400,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
+	.cra_blocksize		= DES3_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct msp_crypto),
+	.cra_alignmask		= 3,
+	.cra_type		= &crypto_blkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(msp_ecb_des3_alg.cra_list),
+	.cra_u = {
+		.blkcipher = {
+			.min_keysize	= DES3_KEY_SIZE,
+			.max_keysize	= DES3_KEY_SIZE,
+			.setkey		= msp_crypto_setkey,
+			.encrypt	= msp_crypto_ecb_encrypt,
+			.decrypt	= msp_crypto_ecb_decrypt,
+		}
+	}
+};
+
+static struct crypto_alg msp_cbc_des3_alg = {
+	.cra_name		= "cbc(des3_ede)",
+	.cra_driver_name	= "cbc-des3_ede-pmcmsp",
+	.cra_priority		= 400,
+	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
+	.cra_blocksize		= DES3_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct msp_crypto),
+	.cra_alignmask		= 3,
+	.cra_type		= &crypto_blkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_list		= LIST_HEAD_INIT(msp_cbc_des3_alg.cra_list),
+	.cra_u = {
+		.blkcipher = {
+			.min_keysize	= DES3_KEY_SIZE,
+			.max_keysize	= DES3_KEY_SIZE,
+			.ivsize		= DES3_BLOCK_SIZE,
+			.setkey		= msp_crypto_setkey,
+			.encrypt	= msp_crypto_cbc_encrypt,
+			.decrypt	= msp_crypto_cbc_decrypt,
+		}
+	}
+};
+
+/* Crypto API glue for MD5 functions */
+#define MD5_BLOCKSIZE	64
+#define MD5_DIGESTSIZE	16
+
+static struct crypto_alg msp_md5_alg = {
+	.cra_name	 = "md5",
+	.cra_driver_name = "md5-pmcmsp",
+	.cra_flags	 = CRYPTO_ALG_TYPE_DIGEST,
+	.cra_blocksize	 = MD5_BLOCKSIZE,
+	.cra_ctxsize	 = sizeof(struct msp_crypto),
+	.cra_module	 = THIS_MODULE,
+	.cra_list	 = LIST_HEAD_INIT(msp_md5_alg.cra_list),
+	.cra_u = {
+		.digest = {
+			.dia_digestsize	= MD5_DIGESTSIZE,
+			.dia_init	= msp_crypto_md5_init,
+			.dia_update	= msp_crypto_md5_update,
+			.dia_final	= msp_crypto_md5_final,
+		}
+	}
+};
+
+/* Crypto API glue for SHA1 functions */
+#define SHA1_BLOCKSIZE	64
+#define SHA1_DIGESTSIZE	20
+
+static struct crypto_alg msp_sha1_alg = {
+	.cra_name	 = "sha1",
+	.cra_driver_name = "sha1-pmcmsp",
+	.cra_flags	 = CRYPTO_ALG_TYPE_DIGEST,
+	.cra_blocksize	 = SHA1_BLOCKSIZE,
+	.cra_ctxsize	 = sizeof(struct msp_crypto),
+	.cra_module	 = THIS_MODULE,
+	.cra_list	 = LIST_HEAD_INIT(msp_sha1_alg.cra_list),
+	.cra_u = {
+		.digest = {
+			.dia_digestsize	= SHA1_DIGESTSIZE,
+			.dia_init	= msp_crypto_sha1_init,
+			.dia_update	= msp_crypto_sha1_update,
+			.dia_final	= msp_crypto_sha1_final,
+		}
+	}
+};
+
+/* local structures used to control work and completion queues */
+static struct workq sec_work_queues[HW_NR_WORK_QUEUES];
+static struct compq sec_comp_queues[HW_NR_COMP_QUEUES];
+
+/* IO mapped hardware registers */
+static volatile struct sec2_regs *sec2_regs;
+
+/*
+ * IPSEC engine updates head & tail registers AND copies these updates
+ * directly to SDRAM. On some architectures it is faster to access the
+ * SDRAM copies. On other architectures it is faster to access the
+ * registers directly. The SDRAM copies are not currently used in this
+ * implemention but a dummy SDRAM location must still be provided to
+ * engine.
+ */
+static void *status_ptr;
+static dma_addr_t status_dma_addr;
+
+/**************************************************************************
+ * Functions
+ */
+
+static void
+sec_destroy_queues(void)
+{
+	int i;
+	
+	for (i = 0; i < HW_NR_COMP_QUEUES; i++) {
+		struct compq *cq = &sec_comp_queues[i];
+		dma_free_coherent(NULL, SEC_COMP_Q_SIZE,
+				cq->base, cq->base_dma_addr);
+	}
+
+	for (i = 0; i < HW_NR_WORK_QUEUES; i++) {
+		struct compq *wq = &sec_comp_queues[i];
+		dma_free_coherent(NULL, SEC_WORK_Q_SIZE,
+				wq->base, wq->base_dma_addr);
+	}
+	
+	dma_free_coherent(NULL, sizeof(int), status_ptr, status_dma_addr);
+}
+
+static int
+sec_init_queues(void)
+{
+	int i;
+	struct workq *wq;
+	struct compq *cq;
+
+	/*
+	 * Allocate uncached space for hw_ptr values.
+	 * NOTE: status ptr value is not currently used.
+	 */
+	status_ptr = dma_alloc_coherent(NULL, sizeof(int), &status_dma_addr,
+					GFP_KERNEL);
+	DBG_SEC("Allocated status ptr memory at 0x%p (0x%08x)\n",
+			status_ptr, status_dma_addr);
+	if (!status_ptr)
+		return -ENOMEM;
+
+	for (i = 0; i < HW_NR_COMP_QUEUES; i++) {
+		void *base; /* slowpath virtual address of base */
+		dma_addr_t base_dma_addr; /* DMA bus address of base */
+
+		base = dma_alloc_coherent(NULL, SEC_COMP_Q_SIZE,
+				&base_dma_addr, GFP_KERNEL);
+		DBG_SEC("Allocated CQ%d at 0x%p (0x%08x)\n",
+			i, base, base_dma_addr);
+		if (!base)
+			return -ENOMEM;
+
+		cq = &sec_comp_queues[i];
+
+		cq->compq_lock = SPIN_LOCK_UNLOCKED;
+		cq->cq_regs = &sec2_regs->cq[i];
+		cq->base = base;
+		cq->base_dma_addr = base_dma_addr;
+		cq->out = 0;
+
+		cq->cq_regs->ofst_ptr = (unsigned int *)status_dma_addr;
+		cq->cq_regs->base = (unsigned char *)cq->base_dma_addr;
+		cq->cq_regs->size = SEC_COMP_Q_SIZE;
+		cq->cq_regs->in = 0;
+		cq->cq_regs->out = 0;
+	}
+
+	for (i = 0; i < HW_NR_WORK_QUEUES; i++) {
+		void *base; /* slowpath virtual address of base */
+		dma_addr_t base_dma_addr; /* DMA bus address of base */
+
+		base = dma_alloc_coherent(NULL, SEC_WORK_Q_SIZE,
+					&base_dma_addr, GFP_KERNEL);
+		DBG_SEC("Allocated WQ%d at 0x%p (0x%08x)\n",
+			i, base, base_dma_addr);
+		if (!base)
+			return -ENOMEM;
+
+		wq = &sec_work_queues[i];
+
+		init_waitqueue_head(&wq->space_wait);
+
+		wq->workq_lock = SPIN_LOCK_UNLOCKED;
+		wq->wq_regs = &sec2_regs->wq[i];
+		wq->base = base;
+		wq->base_dma_addr = base_dma_addr;
+		wq->in = 0;
+		wq->low_water = SEC_WORK_Q_SIZE >> 1; /* wake when half full */
+
+		wq->wq_regs->ofst_ptr = (unsigned int *)status_dma_addr;
+		wq->wq_regs->base = (unsigned char *)wq->base_dma_addr;
+		wq->wq_regs->size = SEC_WORK_Q_SIZE;
+		wq->wq_regs->in = 0;
+		wq->wq_regs->out = 0;
+	}
+	
+	debug_dump_sec_regs();
+
+	return 0;
+}
+
+static int __init
+msp_secv2_init(void)
+{
+	void *rstaddr, *rngaddr;
+	int rc = -ENOMEM;
+	char secid = identify_sec();
+
+	switch (secid) {
+	case SEC_POLO:
+		printk(KERN_ERR PREFIX
+			"Security engine found\n");
+		break;
+	case FEATURE_NOEXIST:
+		printk(KERN_ERR PREFIX
+			"Security engine not specified in "
+			"FEATURES env param\n");
+		return 0;
+	default:
+		printk(KERN_ERR PREFIX
+			"Security engine '%c' not supported\n", secid);
+		return -ENODEV;
+	}
+
+	/* Temporarily IO remap SoC and RNG registers */
+	rstaddr = ioremap_nocache(MSP_RST_BASE, MSP_RST_SIZE);
+	if (!rstaddr) {
+		printk(KERN_ERR PREFIX
+			"Unable to ioremap address 0x%08x\n", MSP_RST_BASE);
+		goto err_ioremap;
+	}
+	rngaddr = ioremap_nocache(MSP_SEC_BASE + SEC_RNG_CNF, sizeof(u32));
+	if (!rngaddr) {
+		printk(KERN_ERR PREFIX
+			"Unable to ioremap address 0x%08x\n",
+			MSP_SEC_BASE + SEC_RNG_CNF);
+		goto err_ioremap;
+	}
+
+	/* IO remap the security engine registers */
+	sec2_regs = ioremap_nocache(MSP_SEC_BASE + SEC2_REG,
+					sizeof(*sec2_regs));
+	if (!sec2_regs) {
+		printk(KERN_ERR PREFIX
+			"Unable to ioremap address 0x%08x\n",
+			MSP_SEC_BASE + SEC2_REG);
+		goto err_ioremap;
+	}
+
+	/* SoC Reset */
+	if (__raw_readl(rstaddr + MSPRST_STS) & MSP_SE_RST) {
+		__raw_writel(MSP_SE_RST, rstaddr + MSPRST_CLR);
+		while (__raw_readl(rstaddr + MSPRST_STS) & MSP_SE_RST)
+			udelay(5);
+	}
+
+	/* Software reset */
+	sec2_regs->rst |= SEC2_RST_MASTER;
+	while (sec2_regs->rst)
+		udelay(10);
+
+	/* Start random number generator */
+	__raw_writel(0x00010000, rngaddr);
+	__raw_writel(0x00000101, rngaddr);
+
+	DBG_SEC("================ Installing IPSEC Driver ================\n");
+	rc = sec_init_queues();
+	if (rc) {
+		printk(KERN_ERR PREFIX "Queue initialization failed\n");
+		goto err_queue_init;
+	}
+
+	rc = request_irq(MSP_INT_MBOX, msp_secv2_interrupt,
+			SA_SAMPLE_RANDOM, "pmcmsp_sec_hi",
+			(void *)sec2_regs);
+	if (rc) {
+		printk(KERN_WARNING PREFIX "Unable to get IRQ %d (rc=%d).\n",
+			MSP_INT_MBOX, rc);
+		goto err_high_int;
+	}
+
+	sec2_regs->ier = ~0;
+
+#ifdef CONFIG_CRYPTO_PMCMSP_CIPHER
+	/* Register AES with crypto API */
+	rc = crypto_register_alg(&msp_aes_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register AES cipher "
+			"(software algorithm already loaded)\n");
+		goto err_aes;
+	}
+	rc = crypto_register_alg(&msp_ecb_aes_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register ECB-AES cipher "
+			"(software algorithm already loaded)\n");
+		goto err_ecb_aes;
+	}
+	rc = crypto_register_alg(&msp_cbc_aes_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register CBC-AES cipher "
+			"(software algorithm already loaded)\n");
+		goto err_cbc_aes;
+	}
+
+	/* Register DES with crypto API */
+	rc = crypto_register_alg(&msp_des_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register DES cipher "
+			"(software algorithm already loaded)\n");
+		goto err_des;
+	}
+	rc = crypto_register_alg(&msp_ecb_des_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register ECB-DES cipher "
+			"(software algorithm already loaded)\n");
+		goto err_ecb_des;
+	}
+	rc = crypto_register_alg(&msp_cbc_des_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register CBC-DES cipher "
+			"(software algorithm already loaded)\n");
+		goto err_cbc_des;
+	}
+
+	/* Register DES3 with crypto API */
+	rc = crypto_register_alg(&msp_des3_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register DES3_EDE cipher "
+			"(software algorithm already loaded)\n");
+		goto err_des3;
+	}
+	rc = crypto_register_alg(&msp_ecb_des3_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register ECB-DES3_EDE cipher "
+			"(software algorithm already loaded)\n");
+		goto err_ecb_des3;
+	}
+	rc = crypto_register_alg(&msp_cbc_des3_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register CBC-DES3_EDE cipher "
+			"(software algorithm already loaded)\n");
+		goto err_cbc_des3;
+	}
+#endif /* CONFIG_CRYPTO_PMCMSP_CIPHER */
+
+#ifdef CONFIG_CRYPTO_PMCMSP_HASH
+	/* Register MD5/SHA-1 with crypto API */
+	rc = crypto_register_alg(&msp_md5_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register MD5 hash "
+			"(software algorithm already loaded)\n");
+		goto err_md5;
+	}
+	rc = crypto_register_alg(&msp_sha1_alg);
+	if (rc) {
+		printk(KERN_ERR PREFIX
+			"Could not register SHA-1 hash "
+			"(software algorithm already loaded)\n");
+		goto err_sha1;
+	}
+#endif /* CONFIG_CRYPTO_PMCMSP_HASH */
+
+	iounmap(rngaddr);
+	iounmap(rstaddr);
+		
+	/* Okay! */
+	return 0;
+
+#ifdef CONFIG_CRYPTO_PMCMSP_HASH
+	crypto_unregister_alg(&msp_sha1_alg);
+err_sha1:
+	crypto_unregister_alg(&msp_md5_alg);
+err_md5:
+#endif /* CONFIG_CRYPTO_PMCMSP_HASH */
+
+#ifdef CONFIG_CRYPTO_PMCMSP_CIPHER
+	crypto_unregister_alg(&msp_cbc_des3_alg);
+err_cbc_des3:
+	crypto_unregister_alg(&msp_ecb_des3_alg);
+err_ecb_des3:
+	crypto_unregister_alg(&msp_des3_alg);
+err_des3:
+	crypto_unregister_alg(&msp_cbc_des_alg);
+err_cbc_des:
+	crypto_unregister_alg(&msp_ecb_des_alg);
+err_ecb_des:
+	crypto_unregister_alg(&msp_des_alg);
+err_des:
+	crypto_unregister_alg(&msp_cbc_aes_alg);
+err_cbc_aes:
+	crypto_unregister_alg(&msp_ecb_aes_alg);
+err_ecb_aes:
+	crypto_unregister_alg(&msp_aes_alg);
+err_aes:
+#endif /* CONFIG_CRYPTO_PMCMSP_CIPHER */
+	free_irq(MSP_INT_MBOX, (void *)sec2_regs);
+
+err_high_int:
+	sec_destroy_queues();
+err_queue_init:
+	iounmap(sec2_regs);
+err_ioremap:
+	if (rngaddr)
+		iounmap(rngaddr);
+	if (rstaddr)
+		iounmap(rstaddr);
+	
+	return rc;
+}
+
+static void
+msp_secv2_exit(void)
+{
+	crypto_unregister_alg(&msp_sha1_alg);
+	crypto_unregister_alg(&msp_md5_alg);
+	crypto_unregister_alg(&msp_cbc_des3_alg);
+	crypto_unregister_alg(&msp_ecb_des3_alg);
+	crypto_unregister_alg(&msp_des3_alg);
+	crypto_unregister_alg(&msp_cbc_des_alg);
+	crypto_unregister_alg(&msp_ecb_des_alg);
+	crypto_unregister_alg(&msp_des_alg);
+	crypto_unregister_alg(&msp_cbc_aes_alg);
+	crypto_unregister_alg(&msp_ecb_aes_alg);
+	crypto_unregister_alg(&msp_aes_alg);
+	
+	free_irq(MSP_INT_MBOX, (void *)sec2_regs);
+	free_irq(MSP_INT_CIC_SEC, (void *)sec2_regs);
+	
+	sec_destroy_queues();
+	
+	iounmap(sec2_regs);
+}
+
+static irqreturn_t
+msp_secv2_interrupt(int irq, void *dev_id)
+{
+	/*
+	 * TODO: This clears all interrupts, and assumes
+	 * that the cause was a completion queue update.
+	 */
+	unsigned int status;
+
+	status = sec2_regs->sis;
+	sec2_regs->sis = /* ~status */ 0;
+
+	DBG_SEC("interrupt irq %d status was %x\n", irq, status);
+
+	poll_completion();
+
+	return IRQ_HANDLED;
+}
+
+
+/*
+ * sync_for_fastpath_read - sync point before reading shared structure
+ *				via fastpath
+ *
+ * input:
+ *
+ * returns:
+ *
+ * NOTE:
+ * This call is necessary if a shared control structure is accessed via
+ * uncached, fastpath. This call is not needed if uncached, slowpath is
+ * used instead.
+ *
+ * Typical call sequence:
+ * 1. read peripheral register to see if new info
+ * 2. call sync_for_fastpath_read
+ * 3. read structure via uncached, fastpath access
+ */
+static inline void
+sync_for_fastpath_read(void)
+{
+	/*
+	 * compiler memory barrier to ensure read below not moved by compiler
+	 */
+	barrier();
+
+	/*
+	 * Do a dummy read of slowpath SDRAM to ensure that share
+	 * control structure has made it all the way to SDRAM.
+	 */
+	blocking_read_reg32((u32 *)0xb0000000);
+
+	/*
+	 * memory barrier to ensure reads above complete
+	 */
+	rmb();
+}
+
+/*
+ * sync_for_fastpath_write - sync point before writing shared structure
+ *				via fastpath
+ *
+ * input:
+ *
+ * returns:
+ *
+ * NOTE:
+ * This call is necessary if a shared control structure is accessed via
+ * uncached, fastpath. This call is not needed if uncached, slowpath is
+ * used instead.
+ *
+ * Typical call sequence:
+ * 1. write shared structure via uncached, fastpath
+ * 2. call sync_for_fastpath_write
+ * 3. update peripheral register to let device know there is new info
+ *
+ */
+static inline void
+sync_for_fastpath_write(void)
+{
+	/*
+	 * compiler memory barrier to ensure read below not moved by compiler
+	 */
+	barrier();
+
+	/*
+	 * Do a dummy read of fastpath to ensure that share
+	 * control structure has made it all the way to SDRAM.
+	 */
+	blocking_read_reg32((u32 *)0xa0000000);
+
+	/*
+	 * barrier to ensure above reads/writes complete before below
+	 */
+	mb();
+}
+
+
+/*
+ * desc_start - starts creating work element
+ *
+ * input:
+ *	e_ptr - ptr to work element being built
+ *	sa_ptr - ptr to security association
+ *	is_new_sa - true if SA has changed since last call. false otherwise.
+ *
+ * returns:
+ *
+ * note:
+ */
+static inline void
+desc_start(
+	struct desc_tent *const e_ptr,
+	const struct sec2_sa *const sa_ptr,
+	bool is_new_sa)
+{
+	/* check if EOD must be in 2nd to last gather */
+	e_ptr->do_eod_correction =
+		(sa_ptr->flags & SAFLG_MODE_MASK) == SAFLG_MODE_ESP_IN &&
+		(sa_ptr->flags & SAFLG_HASH_MASK) != SAFLG_HASHNULL;
+
+	/* flush SA and save dma bus address */
+	if (is_new_sa)
+		e_ptr->sa_dma_addr = dma_map_single(NULL, (void *)sa_ptr,
+						sizeof(struct sec2_sa),
+						DMA_BIDIRECTIONAL);
+	else
+		e_ptr->sa_dma_addr = virt_to_phys((void *)sa_ptr);
+
+	e_ptr->sg_count = 0;
+	e_ptr->is_first = 1;	/* expect first gather buf next */
+
+	dump_sa(sa_ptr);
+}
+
+/*
+ * desc_add_gather - adds gather buffer to work element
+ *
+ * input:
+ *	e_ptr - work element being built
+ *	buf_ptr - pointer to gather buffer
+ *	length - length of buffer in bytes
+ *	is_last - set if last gather buffer
+ *
+ * returns:
+ *
+ * NOTE:
+ *	The gather buffer is READ by the IPSEC engine
+ *	All gather buffers must be added before any scatter buffers.
+ */
+
+static inline void
+desc_add_gather(
+	struct desc_tent *const e_ptr,
+	const void *const buf_ptr,
+	unsigned int length,
+	unsigned int is_last)
+{
+	struct scat_gath *g_ptr;	/* ptr to gather buffer */
+	unsigned int ctrl;		/* gather buffer control flags */
+
+	g_ptr = &e_ptr->sg[e_ptr->sg_count];
+
+	/* flush buffer and save dma bus address */
+	ctrl = length & SEC2_WE_SG_SIZE;
+	g_ptr->buf_dma_addr = dma_map_single(NULL, (void *)buf_ptr,
+						ctrl, DMA_TO_DEVICE);
+
+	/* set flag bits needed by IPSEC engine */
+	if (e_ptr->is_first) {
+		e_ptr->is_first = 0;
+		ctrl |= SEC2_WE_SG_SOP;
+	}
+	if (is_last) {
+		e_ptr->is_first = 1; /* expect first scatter buf next */
+		ctrl |= SEC2_WE_SG_EOP;
+
+		if (e_ptr->do_eod_correction && e_ptr->sg_count != 0) {
+			/* set EOD in 2nd to last gather */
+			g_ptr[-1].ctrl |= SEC2_WE_SG_EOD;
+		} else
+			ctrl |= SEC2_WE_SG_EOD;
+	}
+
+	g_ptr->ctrl = ctrl;
+	e_ptr->sg_count++;
+}
+
+/*
+ * desc_add_scatter - adds scatter buffer to work element
+ *
+ * input:
+ *	e_ptr - work element being built
+ *	buf_ptr - pointer to scatter buffer
+ *	length - length of buffer in bytes
+ *	is_last - set if last scatter buffer
+ *
+ * returns:
+ *
+ * note:
+ *	The scatter buffer is WRITTEN by the IPSEC engine
+ *	All scatter buffers must be added after any gather buffers.
+ */
+static inline void
+desc_add_scatter(
+	struct desc_tent *const e_ptr,
+	const void *const buf_ptr,
+	unsigned int length,
+	unsigned int is_last)
+{
+	struct scat_gath *s_ptr;	/* ptr to scatter buffer */
+	unsigned int ctrl;		/* scatter buffer control flags */
+
+	s_ptr = &e_ptr->sg[e_ptr->sg_count];
+
+	/* invalidate buffer and save dma bus address */
+	ctrl = length & SEC2_WE_SG_SIZE;
+	s_ptr->buf_dma_addr = dma_map_single(NULL, (void *)buf_ptr,
+						ctrl, DMA_FROM_DEVICE);
+
+	/* set flag bits needed by IPSEC engine */
+	if (e_ptr->is_first) {
+		e_ptr->is_first = 0;
+		ctrl |= SEC2_WE_SG_SOP;
+	}
+	if (is_last)
+		ctrl |= SEC2_WE_SG_EOP | SEC2_WE_SG_EOD;
+
+	s_ptr->ctrl = ctrl | SEC2_WE_SG_SCATTER;
+	e_ptr->sg_count++;
+}
+
+/*
+ * desc_finish - finished creating work element
+ *
+ * input:
+ *	e_ptr - work element being built
+ *	ctrl - work element control flags
+ *
+ * returns:
+ *
+ * note
+ */
+static inline void
+desc_finish(struct desc_tent *const e_ptr, unsigned int ctrl)
+{
+	/* set descriptor size */
+	e_ptr->ctrl = ctrl | (WQE_DESC_SIZE(e_ptr->sg_count) - 1);
+	e_ptr->magic = WQE_MAGIC;
+}
+
+/*
+ * desc_write - write work element to IPSEC engine's work queue
+ *
+ * input:
+ *	wq - ptr to work queue
+ *	e_ptr - ptr to work element to add to work queue
+ *
+ * returns:
+ *
+ * note
+ */
+
+#define WQ_PUT_INT(base, in, val) \
+	do { \
+		*(unsigned int *)&base[in] = (unsigned int)(val); \
+		in = (in + sizeof(int)) & SEC_WORK_Q_MASK; \
+	} while (0)
+
+static inline void
+desc_write(struct workq *const wq, const struct desc_tent *const e_ptr)
+{
+	unsigned char *base_ptr;
+	unsigned int in;
+	int i;
+
+	/*
+	 * It is assumed that the avail register was just read to check
+	 * there is room in the queue for this descriptor.
+	 */
+	base_ptr = wq->base;
+	in = wq->in;
+	WQ_PUT_INT(base_ptr, in, e_ptr->sa_dma_addr);
+	WQ_PUT_INT(base_ptr, in, e_ptr->ctrl);
+	WQ_PUT_INT(base_ptr, in, e_ptr); /* write ptr to work element */
+	WQ_PUT_INT(base_ptr, in, 0);	/* unused */
+	for (i = 0; i < e_ptr->sg_count; i++) {
+		WQ_PUT_INT(base_ptr, in, e_ptr->sg[i].buf_dma_addr);
+		WQ_PUT_INT(base_ptr, in, e_ptr->sg[i].ctrl);
+	}
+
+	dump_wq_entry(wq);
+
+	/*
+	 * Ensure that descriptor data gets all the way to SDRAM BEFORE
+	 * incrementing the hardware register offset.
+	 */
+	sync_for_fastpath_write();
+
+	/*
+	 * Update hardware in offset so IPSEC engine sees new
+	 * work descriptor.
+	 */
+	wq->wq_regs->in = wq->in = in;
+}
+
+/*
+ * desc_read - read work descriptor from IPSEC engine's completion queue
+ *
+ * input:
+ *	cq - pointer to completion queue to read
+ *
+ * returns:
+ *	ptr to work descriptor or NULL if not valid
+ *
+ * NOTE:
+ *	This function handles the syncronization between engine and CPU.
+ *
+ *	Contents:
+ *		word 0: virtual kernel address of work element
+ *		word 1: unused
+ *		word 2: completion status
+ *		word 3: reserved
+ */
+static inline struct desc_tent *
+desc_read(struct compq *const cq)
+{
+	unsigned int out;
+	const unsigned int *int_ptr;
+	struct desc_tent *we_ptr;
+
+	/*
+	 * It is assumed that the avail register was just read to check
+	 * if a descriptor was really in the completion queue. Register
+	 * accesses are always slowpath so they are not syncronized to
+	 * fastpath reads (at all)!
+	 */
+
+	/*
+	 * Ensure that descriptor is really in SDRAM before reading from
+	 * fastpath.
+	 */
+	sync_for_fastpath_read();
+
+	/* read pointer to work element out of completion queue */
+	out = cq->out;
+	int_ptr = (unsigned int *)&cq->base[out];
+	we_ptr = (struct desc_tent *)int_ptr[0];
+	if (we_ptr != NULL && we_ptr->magic == WQE_MAGIC) {
+		/* read completion status out of completion queue */
+		we_ptr->comp_status = int_ptr[2];
+	} else {
+		/* ERROR: Not a valid pointer to work element! */
+		we_ptr = NULL;
+	}
+
+	/*
+	 * barrier to ensure above reads complete before below
+	 */
+	rmb();
+
+	out = (out + CQE_SIZE) & SEC_COMP_Q_MASK;
+	cq->cq_regs->out = cq->out = out;
+
+	return we_ptr;
+}
+
+/*
+ * desc_cleanup - cleanup work entry after work completed
+ *
+ * input:
+ *
+ * returns:
+ *
+ * note
+ */
+static inline void
+desc_cleanup(struct desc_tent *const e_ptr)
+{
+	int i;
+
+	dma_unmap_single(NULL, e_ptr->sa_dma_addr,
+			sizeof(struct sec2_sa), DMA_BIDIRECTIONAL);
+
+	for (i = 0; i < e_ptr->sg_count; i++) {
+		struct scat_gath *sg_ptr = &e_ptr->sg[0];
+		unsigned int buf_size;
+		unsigned int buf_ctrl;
+
+		buf_ctrl = sg_ptr->ctrl;
+		buf_size = buf_ctrl & SEC2_WE_SG_SIZE;
+		if (buf_ctrl & SEC2_WE_SG_SCATTER)
+			dma_unmap_single(NULL, sg_ptr->buf_dma_addr,
+					buf_size, DMA_FROM_DEVICE);
+		else
+			dma_unmap_single(NULL, sg_ptr->buf_dma_addr,
+					buf_size, DMA_TO_DEVICE);
+
+		sg_ptr++;
+	}
+
+	/* filter out warnings that we are not interested in */
+	e_ptr->comp_status &=
+		(SEC2_INT_BAD_ADDR |
+		SEC2_INT_HASH_NON_64 |
+		SEC2_INT_DES_NON_8 |
+		SEC2_INT_AES_NON_16 |
+		SEC2_INT_BAD_GATHER |
+#if 0
+		/* TODO: ICV_COMP_ERR showing up erroneously */
+		SEC2_INT_ICV_COMP_ERR |
+#endif
+		SEC2_INT_OFFSET_ERR |
+		SEC2_INT_GS_BALANCE_ERR |
+		SEC2_INT_EOD_MARK_ERR);
+}
+
+/*
+ * desc_do_work - queues work element to engine and waits for completion
+ *
+ * input:
+ *	e_ptr - ptr to work element to add to queue
+ *
+ * returns:
+ *
+ * note
+ */
+static unsigned int
+desc_do_work(struct desc_tent *e_ptr)
+{
+	unsigned int work_q;	/* index to work queue */
+	struct workq *wq;	/* ptr to work queue control structure */
+	unsigned int comp_q_mask; /* completion queue poll mask */
+	unsigned long flags;	/* interrupt flags */
+	int i = 0;
+
+	work_q = 0; /* only use first work queue and comp queue for now */
+	wq = &sec_work_queues[work_q];
+	comp_q_mask = 1; /* compq zero */
+
+	atomic_set(&e_ptr->work_complete, 0);
+	e_ptr->comp_status = ~0;
+
+	spin_lock_irqsave(&wq->workq_lock, flags);
+
+	/*
+	 * Read engine register to check if there is room in work queue
+	 * for new descriptor.
+	 */
+	while (wq->wq_regs->avail <= WQE_DESC_SIZE_BYTES(e_ptr->sg_count)) {
+		DBG_SEC("waiting for room:\n" );
+		debug_dump_sec_regs();
+	}
+
+	/* write work entry to IPSEC engine and advance hw pointer */
+	desc_write(wq, e_ptr);
+
+	spin_unlock_irqrestore(&wq->workq_lock, flags);
+
+#ifdef DEBUG_VERBOSE
+	DBG_SEC("Registers after submission:\n");
+	dump_sec_regs();
+#endif
+
+	/* poll for work descriptor to be marked as complete */
+	DBG_SEC( "polling for work completion\n" );
+	while (atomic_read(&e_ptr->work_complete) == 0) {
+		int rc = poll_completion();
+		if (rc == -1 && i++ > 10000000) {
+			printk(KERN_ERR
+				"******** SEC: OPERATION TIMED OUT ******\n");
+			dump_sec_regs();
+			break;
+		}
+	}
+
+	desc_cleanup(e_ptr);
+
+	DBG_SEC("Returning with status %x\n", e_ptr->comp_status);
+
+	return e_ptr->comp_status;
+}
+
+static int
+msp_sec2_set_aes_decrypt_key(
+	struct sec2_sa *sa,
+	int workq,
+	int compq)
+{
+	struct sec2_sa tmp_sa;
+	static char junk_buf[16];
+	unsigned int status;
+	struct desc_tent w; /* work queue element */
+
+	if ((unsigned int)workq > 1)
+		return -1;
+	if ((unsigned int)compq > 1)
+		return -1;
+
+	memset(&tmp_sa, 0, sizeof(tmp_sa));
+
+	tmp_sa.flags = sa->flags & SAFLG_CRYPT_TYPE_MASK;
+
+	/* MUST be AES type */
+	if (tmp_sa.flags < SAFLG_AES_128)
+		return -1;
+
+	tmp_sa.flags |= SAFLG_MODE_CRYPT | SAFLG_ECB;
+
+	memcpy(tmp_sa.crypt_keys, sa->crypt_keys, sizeof(sa->crypt_keys));
+
+	desc_start(&w, &tmp_sa, true);
+	desc_add_gather(&w, junk_buf, 16, WQE_LAST);
+	/* size -- ALWAYS 32 for SEC2_WE_CTRL_AKO */
+	desc_add_scatter(&w, sa->crypt_keys, 32, WQE_LAST);
+	desc_finish(&w, SEC2_WE_CTRL_AKO);
+	status = desc_do_work(&w);
+	if (status) {
+		DBG_SEC("status 0x%x from hash in hmac preprocess(2)\n",
+			status);
+		return -1;
+	}
+
+	return 0;
+}
+
+static int
+poll_completion(void)
+{
+	struct compq *cq;
+	int flags;
+	int work_ct = 0;
+	
+	/*
+	 * Check IPSEC engine register to see if at least one
+	 * completion element is in completion queue.
+	 */
+	cq = sec_comp_queues;
+	spin_lock_irqsave(&cq->compq_lock, flags);
+	while ((SEC_COMP_Q_SIZE - cq->cq_regs->avail) >= CQE_SIZE) {
+		struct desc_tent *e_ptr;
+
+		DBG_SEC("Getting compq entry from engine at 0x%08x\n",
+			cq->out);
+		dump_cq_entry(cq);
+
+		/* read work element from comp queue and advance HW ptr */
+		e_ptr = desc_read(cq);
+#ifdef DEBUG_VERBOSE
+		dump_sec_regs();
+#endif
+		if (e_ptr != NULL) {
+			/* mark work descriptor as complete to wakeup poller */
+			atomic_set(&e_ptr->work_complete, 1);
+		}
+		work_ct++;
+	}
+	spin_unlock_irqrestore(&cq->compq_lock, flags);
+
+	if (work_ct)
+		return 0; /* work done, now empty */
+
+	return -1; /* there was nothing to do */
+}
+
+/* Crypto API calls */
+static int
+msp_crypto_setkey(
+	struct crypto_tfm *tfm,
+	const u8 *key,
+	unsigned int key_len)
+{
+	struct msp_crypto *ctx = crypto_tfm_ctx(tfm);
+	u32 *flags = &tfm->crt_flags;
+
+	DBG_SEC("Setting %u-byte key...\n", key_len);
+
+	if (key_len % 8) {
+		printk(KERN_ERR PREFIX "Key length must be 16, 24, or 32\n");
+		
+		*flags |= CRYPTO_TFM_RES_BAD_KEY_LEN;
+		return -EINVAL;
+	}
+
+	ctx->keysize = key_len;
+
+	memcpy((u8 *)(ctx->sa.crypt_keys), key, key_len);
+
+	/* Set AES decrypt key as well */
+	if (key_len >= 16) {
+		memcpy((u8 *)ctx->aes_decrypt_sa.crypt_keys, key, key_len);
+		switch (key_len) {
+		case 16:
+			ctx->aes_decrypt_sa.flags = SAFLG_AES_128;
+			break;
+		case 24:
+			ctx->aes_decrypt_sa.flags = SAFLG_AES_192;
+			break;
+		case 32:
+			ctx->aes_decrypt_sa.flags = SAFLG_AES_256;
+			break;
+		}
+		DBG_SEC("Pre-calculating %u-byte key...\n", key_len);
+		msp_sec2_set_aes_decrypt_key(&(ctx->aes_decrypt_sa), 0, 0);
+	}
+
+	/* Store reversed DES3 key */
+	if (key_len == 24) {
+		ctx->des_decrypt_sa.crypt_keys[0] = ctx->sa.crypt_keys[4];
+		ctx->des_decrypt_sa.crypt_keys[1] = ctx->sa.crypt_keys[5];
+		ctx->des_decrypt_sa.crypt_keys[2] = ctx->sa.crypt_keys[2];
+		ctx->des_decrypt_sa.crypt_keys[3] = ctx->sa.crypt_keys[3];
+		ctx->des_decrypt_sa.crypt_keys[4] = ctx->sa.crypt_keys[0];
+		ctx->des_decrypt_sa.crypt_keys[5] = ctx->sa.crypt_keys[1];
+	};
+
+	return 0;
+}
+
+static void
+msp_crypto_setalg(struct msp_crypto *ctx, const char *algname)
+{
+	struct sec2_sa *sa = &ctx->sa;
+	sa->flags &= ~SAFLG_CRYPT_TYPE_MASK;
+
+	if (strstr(algname, "aes")) {
+		switch (ctx->keysize) {
+		case 16:
+			sa->flags |= SAFLG_AES_128;
+			break;
+		case 24:
+			sa->flags |= SAFLG_AES_192;
+			break;
+		case 32:
+			sa->flags |= SAFLG_AES_256;
+			break;
+		}
+	} else if (strstr(algname, "des3_ede")) {
+		sa->flags |= SAFLG_DES3;
+	} else if (strstr(algname, "des")) {
+		sa->flags |= SAFLG_DES;
+	} else {
+		printk(KERN_WARNING PREFIX
+			"Unknown algorithm '%s', defaulting to CRYPTNULL\n",
+			algname);
+		sa->flags |= SAFLG_CRYPTNULL;
+	}
+}
+
+static u8 *
+msp_crypto_cipher(
+	struct crypto_tfm *tfm,
+	u8 *out,
+	const u8 *in,
+	unsigned int nbytes,
+	const u8 *iv,
+	unsigned int direction,
+	unsigned int mode)
+{
+	struct msp_crypto *ctx = crypto_tfm_ctx(tfm);
+	struct sec2_sa *sa = &ctx->sa;
+	u32 alg;
+
+	const u8 *sptr = in;
+	u8 *dptr = out;
+
+	int crypt_modsize = crypto_tfm_alg_blocksize(tfm);
+	int maxbsize = 0xfff;
+
+	unsigned int bytesleft = nbytes;
+
+	DBG_SEC("Doing crypt of %d bytes from 0x%p to 0x%p\n",
+		nbytes, in, out);
+	msp_crypto_setalg(ctx, crypto_tfm_alg_name(tfm));
+	alg = sa->flags & SAFLG_CRYPT_TYPE_MASK;
+	if (direction == CRYPT_DIRECTION_DECRYPT) {
+		if (alg == SAFLG_AES_128 ||
+		    alg == SAFLG_AES_192 ||
+		    alg == SAFLG_AES_256)
+			/* Use Pre-calculated AES decrypt key */
+			sa = &ctx->aes_decrypt_sa;
+		else if (alg == SAFLG_DES3)
+			/* Use reversed DES decrypt key */
+			sa = &ctx->des_decrypt_sa;
+		sa->flags = alg;
+	}
+
+	sa->flags |= SAFLG_MODE_CRYPT;
+
+	if (direction == CRYPT_DIRECTION_ENCRYPT) {
+		switch (alg) {
+		case SAFLG_DES:
+			sa->flags |= SAFLG_DES_ENCRYPT;
+			break;
+		case SAFLG_AES_128:
+		case SAFLG_AES_192:
+		case SAFLG_AES_256:
+			sa->flags |= SAFLG_AES_ENCRYPT;
+			break;
+		case SAFLG_DES3:
+			sa->flags |= SAFLG_EDE_ENCRYPT;
+			break;
+		}
+	} else {
+		switch (alg) {
+		case SAFLG_DES:
+			sa->flags |= SAFLG_DES_DECRYPT;
+			break;
+		case SAFLG_AES_128:
+		case SAFLG_AES_192:
+		case SAFLG_AES_256:
+			sa->flags |= SAFLG_AES_DECRYPT;
+			break;
+		case SAFLG_DES3:
+			sa->flags |= SAFLG_EDE_DECRYPT;
+			break;
+		}
+	}
+
+	switch (mode) {
+	case CRYPTO_TFM_MODE_ECB:
+		sa->flags |= SAFLG_ECB;
+		break;
+	case CRYPTO_TFM_MODE_CBC:
+		if (direction == CRYPT_DIRECTION_ENCRYPT)
+			sa->flags |= SAFLG_CBC_ENCRYPT;
+		else
+			sa->flags |= SAFLG_CBC_DECRYPT;
+		/* Copy in IV */
+		memcpy((u8 *)sa->crypt_iv, iv, crypt_modsize);
+		break;
+	default:
+		break;
+	}
+
+	/* Do the acual operation now */
+	while (bytesleft > 0) {
+		/*
+		 * TODO: Maybe use s/g to actually pipeline these if there
+		 * are more than one?
+		 */
+		struct desc_tent w;
+		unsigned int status;
+		int bsize;
+
+		bsize = (bytesleft > maxbsize) ? maxbsize : bytesleft;
+		bsize -= bsize % crypt_modsize;
+
+		DBG_SEC("Doing crypt on %d/%d bytes\n", bsize, nbytes);
+		desc_start(&w, sa, true);
+		desc_add_gather(&w, sptr, bsize, WQE_LAST);
+		desc_add_scatter(&w, dptr, bsize, WQE_LAST);
+		desc_finish(&w, 0);
+		status = desc_do_work(&w);
+		if (status != 0)
+			printk(KERN_ERR "Encrypt/decrypt failed, "
+				"status 0x%08x\n", status);
+		sptr += bsize;
+		dptr += bsize;
+		bytesleft -= bsize;
+
+		if (bytesleft < crypt_modsize)
+			break;
+	}
+	DBG_SEC("Crypt operation complete (%d left)\n", (bytesleft));
+
+	return (u8 *)sa->crypt_iv;
+}
+
+static void
+msp_crypto_encrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	msp_crypto_cipher(tfm, dst, src, 1, NULL,
+			CRYPT_DIRECTION_ENCRYPT, CRYPTO_TFM_MODE_OTHER);
+}
+
+static int
+msp_crypto_ecb_encrypt(
+	struct blkcipher_desc *desc,
+	struct scatterlist *dst,
+	struct scatterlist *src,
+	unsigned int nbytes)
+{
+	struct blkcipher_walk walk;
+	int err;
+	unsigned int blksize = crypto_blkcipher_blocksize(desc->tfm);
+
+	blkcipher_walk_init(&walk, dst, src, nbytes);
+	err = blkcipher_walk_virt(desc, &walk);
+
+	while ((nbytes = walk.nbytes)) {
+		msp_crypto_cipher(&desc->tfm->base,
+			walk.dst.virt.addr, walk.src.virt.addr, nbytes,
+			NULL, CRYPT_DIRECTION_ENCRYPT, CRYPTO_TFM_MODE_ECB);
+
+		nbytes &= blksize - 1;
+		err = blkcipher_walk_done(desc, &walk, nbytes);
+	}
+
+	return err;
+}
+
+static int
+msp_crypto_cbc_encrypt(
+	struct blkcipher_desc *desc,
+	struct scatterlist *dst,
+	struct scatterlist *src,
+	unsigned int nbytes)
+{
+	struct blkcipher_walk walk;
+	int err;
+	unsigned int blksize = crypto_blkcipher_blocksize(desc->tfm);
+
+	blkcipher_walk_init(&walk, dst, src, nbytes);
+	err = blkcipher_walk_virt(desc, &walk);
+
+	while ((nbytes = walk.nbytes)) {
+		u8 *iv = msp_crypto_cipher(&desc->tfm->base,
+			walk.dst.virt.addr, walk.src.virt.addr, nbytes,
+			walk.iv, CRYPT_DIRECTION_ENCRYPT, CRYPTO_TFM_MODE_CBC);
+
+		memcpy(walk.iv, iv, blksize);
+		nbytes &= blksize - 1;
+		err = blkcipher_walk_done(desc, &walk, nbytes);
+	}
+
+	return err;
+}
+
+static void
+msp_crypto_decrypt(struct crypto_tfm *tfm, u8 *dst, const u8 *src)
+{
+	msp_crypto_cipher(tfm, dst, src, 1, NULL,
+			CRYPT_DIRECTION_DECRYPT, CRYPTO_TFM_MODE_OTHER);
+}
+
+static int
+msp_crypto_ecb_decrypt(
+	struct blkcipher_desc *desc,
+	struct scatterlist *dst,
+	struct scatterlist *src,
+	unsigned int nbytes)
+{
+	struct blkcipher_walk walk;
+	int err;
+	unsigned int blksize = crypto_blkcipher_blocksize(desc->tfm);
+
+	blkcipher_walk_init(&walk, dst, src, nbytes);
+	err = blkcipher_walk_virt(desc, &walk);
+
+	while ((nbytes = walk.nbytes)) {
+		msp_crypto_cipher(&desc->tfm->base,
+			walk.dst.virt.addr, walk.src.virt.addr, nbytes,
+			NULL, CRYPT_DIRECTION_DECRYPT, CRYPTO_TFM_MODE_ECB);
+
+		nbytes &= blksize - 1;
+		err = blkcipher_walk_done(desc, &walk, nbytes);
+	}
+
+	return err;
+}
+
+static int
+msp_crypto_cbc_decrypt(
+	struct blkcipher_desc *desc,
+	struct scatterlist *dst,
+	struct scatterlist *src,
+	unsigned int nbytes)
+{
+	struct blkcipher_walk walk;
+	int err;
+	unsigned int blksize = crypto_blkcipher_blocksize(desc->tfm);
+
+	blkcipher_walk_init(&walk, dst, src, nbytes);
+	err = blkcipher_walk_virt(desc, &walk);
+
+	while ((nbytes = walk.nbytes)) {
+		msp_crypto_cipher(&desc->tfm->base,
+			walk.dst.virt.addr, walk.src.virt.addr, nbytes,
+			walk.iv, CRYPT_DIRECTION_DECRYPT, CRYPTO_TFM_MODE_CBC);
+
+		nbytes &= blksize - 1;
+		err = blkcipher_walk_done(desc, &walk, nbytes);
+	}
+
+	return err;
+}
+
+static void
+msp_crypto_hash_init(void *ctx_arg)
+{
+	struct msp_hash *ctx = ctx_arg;
+	struct sec2_sa *sa = &ctx->sa;
+
+	DBG_SEC("Starting hash op\n");
+	sa->flags |= SAFLG_MODE_HASH_PAD;
+
+	memset(sa->hash_chain_a, 0, 20);
+
+	if (ctx->data != NULL) {
+		kfree(ctx->data);
+		ctx->data = NULL;
+	}
+	ctx->data_size = 0;
+}
+
+static void
+msp_crypto_hash_update(void *ctx_arg, const u8 *data, unsigned int len)
+{
+	struct msp_hash *ctx = ctx_arg;
+
+	if (len == 0)
+		return;
+
+	DBG_SEC("Adding %d bytes of data from 0x%p\n", len, data);
+
+	if (ctx->data == NULL) {
+		/*
+		 * First time you call hash_update, allocate and
+		 * copy the data.
+		 */
+		ctx->data = kmalloc(len, GFP_KERNEL);
+		memcpy(ctx->data, data, len);
+		ctx->data_size = len;
+	} else {
+		/* Second time, re-alloc and copy */
+		u8 *tmp = ctx->data;
+		ctx->data = kmalloc(ctx->data_size + len, GFP_KERNEL);
+		memcpy(ctx->data, tmp, ctx->data_size);
+		memcpy(ctx->data + ctx->data_size, data, len);
+		ctx->data_size += len;
+		kfree(tmp);
+	}
+}
+
+static void
+msp_crypto_hash_final(void *ctx_arg, u8 *out)
+{
+	struct msp_hash *ctx = ctx_arg;
+	struct sec2_sa *sa = &ctx->sa;
+	struct desc_tent w;
+	unsigned int status;
+
+	if (ctx->data_size == 0)
+		return;
+
+	desc_start(&w, sa, true);
+	desc_add_gather(&w, ctx->data, ctx->data_size, WQE_LAST);
+	desc_add_scatter(&w, out, ctx->resultsize, WQE_LAST);
+	desc_finish(&w, 0);
+	status = desc_do_work(&w);
+	if (status != 0)
+		printk(KERN_ERR "Hash update failed, status 0x%08x\n", status);
+	DBG_SEC("Hash operation complete\n");
+
+	if (ctx->data != NULL) {
+		kfree(ctx->data);
+		ctx->data = NULL;
+	}
+	ctx->data_size = 0;
+}
+
+static void
+msp_crypto_md5_init(struct crypto_tfm *tfm)
+{
+	struct msp_hash *ctx = crypto_tfm_ctx(tfm);
+	struct sec2_sa *sa = &ctx->sa;
+	
+	sa->flags = SAFLG_MD5;
+	ctx->resultsize = 16;
+	msp_crypto_hash_init(ctx);
+}
+
+static void
+msp_crypto_md5_update(
+	struct crypto_tfm *tfm, const u8 *data, unsigned int len)
+{
+	struct msp_hash *ctx = crypto_tfm_ctx(tfm);
+	
+	msp_crypto_hash_update(ctx, data, len);
+}
+
+static void
+msp_crypto_md5_final(struct crypto_tfm *tfm, u8 *out)
+{
+	struct msp_hash *ctx = crypto_tfm_ctx(tfm);
+	
+	msp_crypto_hash_final(ctx, out);
+}
+
+static void
+msp_crypto_sha1_init(struct crypto_tfm *tfm)
+{
+	struct msp_hash *ctx = crypto_tfm_ctx(tfm);
+	struct sec2_sa *sa = &ctx->sa;
+	
+	sa->flags = SAFLG_SHA1;
+	ctx->resultsize = 20;
+	msp_crypto_hash_init(ctx);
+}
+
+static void
+msp_crypto_sha1_update(
+	struct crypto_tfm *tfm, const u8 *data, unsigned int len)
+{
+	struct msp_hash *ctx = crypto_tfm_ctx(tfm);
+	
+	msp_crypto_hash_update(ctx, data, len);
+}
+
+static void
+msp_crypto_sha1_final(struct crypto_tfm *tfm, u8 *out)
+{
+	struct msp_hash *ctx = crypto_tfm_ctx(tfm);
+	
+	msp_crypto_hash_final(ctx, out);
+}
+
+/***********************************************************************
+ *
+ * IPSEC Debug Utilities - Not normally compiled in
+ *
+ ***********************************************************************/
+static void
+dump_sec_regs(void)
+{
+	int i;
+
+	printk(KERN_INFO "SEC: " "IPSEC register start\n");
+	printk(KERN_INFO "SEC:  " "%08x  sis (interrupt status)\n",
+				sec2_regs->sis);
+	printk(KERN_INFO "SEC:  " "%08x  ier (interrupt enable)\n",
+				sec2_regs->ier);
+	printk(KERN_INFO "SEC:  " "%08x  esr (engine status)\n",
+				sec2_regs->esr);
+	for (i = 0; i < HW_NR_WORK_QUEUES; i++) {
+		printk(KERN_INFO "SEC: " "----------\n");
+		printk(KERN_INFO "SEC:  " "%08x  wq%d ofst_ptr\n",
+			(int)sec2_regs->wq[i].ofst_ptr, i);
+		printk(KERN_INFO "SEC:  " "%08x  wq%d avail\n",
+			sec2_regs->wq[i].avail, i);
+		printk(KERN_INFO "SEC:  " "%08x  wq%d base\n",
+			(int)sec2_regs->wq[i].base, i);
+		printk(KERN_INFO "SEC:  " "%08x  wq%d size\n",
+			sec2_regs->wq[i].size, i);
+		printk(KERN_INFO "SEC:  " "%08x  wq%d in\n",
+			sec2_regs->wq[i].in, i);
+		printk(KERN_INFO "SEC:  " "%08x  wq%d out\n",
+			sec2_regs->wq[i].out, i);
+	}
+
+	for (i = 0; i < HW_NR_COMP_QUEUES; i++) {
+		printk(KERN_INFO "SEC: " "----------\n");
+		printk(KERN_INFO "SEC:  " "%08x  cq%d ofst_ptr\n",
+			(int)sec2_regs->cq[i].ofst_ptr, i);
+		printk(KERN_INFO "SEC:  " "%08x  cq%d avail\n",
+			sec2_regs->cq[i].avail, i);
+		printk(KERN_INFO "SEC:  " "%08x  cq%d base\n",
+			(int)sec2_regs->cq[i].base, i);
+		printk(KERN_INFO "SEC:  " "%08x  cq%d size\n",
+			sec2_regs->cq[i].size, i);
+		printk(KERN_INFO "SEC:  " "%08x  cq%d in\n",
+			sec2_regs->cq[i].in, i);
+		printk(KERN_INFO "SEC:  " "%08x  cq%d out\n",
+			sec2_regs->cq[i].out, i);
+	}
+	printk(KERN_INFO "SEC: " "IPSEC register end\n");
+}
+
+#ifdef DUMP_WQ_ENTRIES
+#define GET_INT(base, idx, val) \
+	do { \
+		val = *(unsigned int *)((base) + idx); \
+		idx = (idx + 4) & SEC_WORK_Q_MASK; \
+	} while (0)
+
+static void
+dump_wq_entry(struct workq *wq)
+{
+	int idx, i;
+	unsigned int val;
+	unsigned int desc_size;
+	unsigned int sg_size;
+
+	idx = wq->in;
+
+	printk(KERN_INFO "Work_desc_start, "
+		"sw_in=%d, hw_in=%d, hw_out=%d, avail=%d\n",
+		idx, wq->wq_regs->in, wq->wq_regs->out, wq->wq_regs->avail);
+
+	GET_INT(wq->base, idx, val);
+	printk(KERN_INFO "  %08x SA ptr\n", val);
+
+	GET_INT(wq->base, idx, val);
+	printk(KERN_INFO "  %08x ctrl, pad=%d, next_hdr=%d,",
+		val, (val >> 24) & 0xff, (val >> 16) & 0xff);
+	if (val & SEC2_WE_CTRL_AKO)
+		printk(KERN_INFO " AKO,");
+	if (val & SEC2_WE_CTRL_GI)
+		printk(KERN_INFO " GI,");
+	if (val & SEC2_WE_CTRL_CQ)
+		printk(KERN_INFO " CQ,");
+
+	desc_size = val & 0xff;
+	sg_size = desc_size - 3;
+	printk(KERN_INFO " desc_size=%d, sg_size=%d\n", desc_size, sg_size);
+
+	GET_INT(wq->base, idx, val);
+	printk(KERN_INFO "  %08x desc_tent_ptr\n", val);
+
+	GET_INT(wq->base, idx, val);
+	printk(KERN_INFO "  %08x unused\n", val);
+
+	for (i = 0; i < sg_size; i += 2) {
+		GET_INT(wq->base, idx, val);
+		printk(KERN_INFO "  %08x", val);
+
+		GET_INT(wq->base, idx, val);
+		printk(KERN_INFO " %08x ", val);
+		if (val & SEC2_WE_SG_SCATTER)
+			printk(KERN_INFO " Scat,");
+		else
+			printk(KERN_INFO " Gath,");
+
+		if (val & SEC2_WE_SG_SOP)
+			printk(KERN_INFO " SOP,");
+		if (val & SEC2_WE_SG_EOD)
+			printk(KERN_INFO " EOD,");
+		if (val & SEC2_WE_SG_EOP)
+			printk(KERN_INFO " EOP,");
+
+		printk(KERN_INFO " len=%d\n", val & 0x7ff);
+	}
+	printk(KERN_INFO "Work_desc_end, sw_in=%d, hw_in=%d\n",
+			idx, wq->wq_regs->in);
+}
+#endif
+
+#ifdef DUMP_SA
+static void
+dump_sa(const struct sec2_sa *const sa)
+{
+	unsigned int eng_mode;
+	int i;
+
+	printk(KERN_INFO "SA start\n");
+	printk(KERN_INFO " flags     esp_spi   esp_seq\n");
+	printk(KERN_INFO " %08x  %08x  %08x\n",
+		sa->flags, sa->esp_spi, sa->esp_sequence);
+	switch (eng_mode = sa->flags & SAFLG_MODE_MASK) {
+	case SAFLG_MODE_ESP_IN:
+		printk(KERN_INFO " ESP_IN ");
+		break;
+	case SAFLG_MODE_ESP_OUT:
+		printk(KERN_INFO " ESP_OUT ");
+		break;
+	case SAFLG_MODE_HMAC:
+		printk(KERN_INFO " HMAC ");
+		break;
+	case SAFLG_MODE_HASH_PAD:
+		printk(KERN_INFO " HASH+PAD ");
+		break;
+	case SAFLG_MODE_HASH:
+		printk(KERN_INFO " HASH ");
+		break;
+	case SAFLG_MODE_CRYPT:
+		printk(KERN_INFO " CRYPT ");
+		break;
+	default:
+		printk(KERN_INFO "*BAD*ENG*MODE*");
+		break;
+	}
+
+	if (eng_mode == SAFLG_MODE_ESP_OUT) {
+		printk((sa->flags & SAFLG_SI) ? " SI " : " NO_SI ");
+		printk((sa->flags & SAFLG_CRI) ? " CRI " : " NO_CRI ");
+		printk((sa->flags & SAFLG_EM) ? " EM " : "");
+	}
+
+	if (eng_mode == SAFLG_MODE_ESP_IN)
+		printk((sa->flags & SAFLG_CPI) ? " CPI " : " NO_CPI ");
+
+	if (eng_mode != SAFLG_MODE_CRYPT) {
+		printk((sa->flags & SAFLG_CV) ? " CV " : " NO_CV ");
+
+		switch (sa->flags & SAFLG_HASH_MASK) {
+		case SAFLG_MD5_96:
+			printk(KERN_INFO " MD5-96  ");
+			break;
+		case SAFLG_MD5:
+			printk(KERN_INFO " MD5 ");
+			break;
+		case SAFLG_SHA1_96:
+			printk(KERN_INFO " SHA1-96 ");
+			break;
+		case SAFLG_SHA1:
+			printk(KERN_INFO " SHA1 ");
+			break;
+		case SAFLG_HASHNULL:
+			printk(KERN_INFO " HSH_NULL ");
+			break;
+		default:
+			printk(KERN_INFO " *BAD*HASH* ");
+			break;
+		}
+	}
+
+	if (eng_mode <= SAFLG_MODE_ESP_OUT || eng_mode == SAFLG_MODE_CRYPT) {
+		switch (sa->flags & SAFLG_CRYPT_TYPE_MASK) {
+		case SAFLG_DES:
+			printk(KERN_INFO " DES-");
+			break;
+		case SAFLG_DES3:
+			printk(KERN_INFO " DES3-");
+			break;
+		case SAFLG_AES_128:
+			printk(KERN_INFO " AES128-");
+			break;
+		case SAFLG_AES_192:
+			printk(KERN_INFO " AES192-");
+			break;
+		case SAFLG_AES_256:
+			printk(KERN_INFO " AES256-");
+			break;
+		case SAFLG_CRYPTNULL:
+			printk(KERN_INFO " CRYPTNULL-");
+			break;
+		default:
+			printk(KERN_INFO " BADCRYPT-");
+			break;
+		}
+		printk((sa->flags & SAFLG_DES_K1_DECRYPT) ? "D" : "E");
+		if ((sa->flags & SAFLG_CRYPT_TYPE_MASK) == SAFLG_DES3) {
+			printk((sa->flags & SAFLG_DES_K2_DECRYPT)? "D" : "E");
+			printk((sa->flags & SAFLG_DES_K3_DECRYPT)? "D" : "E");
+		}
+
+		switch (sa->flags & SAFLG_BLK_MASK) {
+		case SAFLG_ECB:
+			printk(KERN_INFO " ECB\n");
+			break;
+		case SAFLG_CTR:
+			printk(KERN_INFO " CTR\n");
+			break;
+		case SAFLG_CBC_ENCRYPT:
+			printk(KERN_INFO " CBC-ENCRYPT\n");
+			break;
+		case SAFLG_CBC_DECRYPT:
+			printk(KERN_INFO " CBC-DECRYPT\n");
+			break;
+		case SAFLG_CFB_ENCRYPT:
+			printk(KERN_INFO " CFB-ENCRYPT\n");
+			break;
+		case SAFLG_CFB_DECRYPT:
+			printk(KERN_INFO " CFB-DECRYPT\n");
+			break;
+		case SAFLG_OFB:
+			printk(KERN_INFO " OFB\n");
+			break;
+		default:
+			printk(KERN_INFO " BAD*BLOCK*MODE\n");
+			break;
+		}
+	} else
+		printk(KERN_INFO "\n");
+
+	printk(KERN_INFO " hash_chain_a:");
+	for (i = 0; i < 0x5; i++) {
+		if (i % 6 == 0)
+			printk(KERN_INFO "\n   %04x  ", i * 4);
+		printk(KERN_INFO "%08x  ", sa->hash_chain_a[i]);
+	}
+	printk(KERN_INFO "\n");
+
+	printk(KERN_INFO " hash_chain_b:");
+	for (i = 0; i < 0x5; i++) {
+		if (i % 6 == 0)
+			printk(KERN_INFO "\n   %04x  ", i * 4);
+		printk(KERN_INFO "%08x  ", sa->hash_chain_b[i]);
+	}
+	printk(KERN_INFO "\n");
+
+	printk(KERN_INFO " encryption keys:");
+	for (i = 0; i < 0x8; i++) {
+		if (i % 4 == 0)
+			printk(KERN_INFO "\n   %04x  ", i * 4);
+		printk(KERN_INFO "%08x ", sa->crypt_keys[i]);
+	}
+	printk(KERN_INFO "\n");
+
+	printk(KERN_INFO " Hash Initial Length:  %08x %08x\n",
+		sa->hash_init_len[0], sa->hash_init_len[1]);
+
+	printk(KERN_INFO " IV:");
+	for (i = 0; i < 0x4; i++) {
+		if (i % 4 == 0)
+			printk(KERN_INFO "\n   %04x  ", i * 4);
+		printk(KERN_INFO "%08x ", sa->crypt_iv[i]);
+	}
+	printk(KERN_INFO "\n");
+
+	printk(KERN_INFO "SA end\n");
+}
+#endif
+
+#ifdef DUMP_CQ_ENTRIES
+static void
+dump_cq_entry(struct compq *cq)
+{
+	unsigned int *ip;
+
+	ip = (unsigned int *)(cq->base + cq->out);
+
+	printk(KERN_INFO " -----> Completion entry  %08x %08x %08x %08x\n",
+		ip[0], ip[1], ip[2], ip[3]);
+}
+#endif
+
+
+module_init(msp_secv2_init);
+module_exit(msp_secv2_exit);
+
+MODULE_DESCRIPTION("PMC MSP Security Accelerator");
+MODULE_LICENSE("GPL")

^ permalink raw reply related

* Re: [PATCH 12/12] drivers: PMC MSP71xx security engine driver
From: Evgeniy Polyakov @ 2007-06-29  9:50 UTC (permalink / raw)
  To: Marc St-Jean; +Cc: davem, herbert, brian_oostenbrink, linux-crypto, rod_sillett
In-Reply-To: <200706281949.l5SJnDdH029612@pasqua.pmc-sierra.bc.ca>

Hi Marc.

On Thu, Jun 28, 2007 at 01:49:13PM -0600, Marc St-Jean (stjeanma@pmc-sierra.com) wrote:
> +static int
> +sec_init_queues(void)
> +{
> +	int i;
> +	struct workq *wq;
> +	struct compq *cq;
> +
> +	/*
> +	 * Allocate uncached space for hw_ptr values.
> +	 * NOTE: status ptr value is not currently used.
> +	 */
> +	status_ptr = dma_alloc_coherent(NULL, sizeof(int), &status_dma_addr,
> +					GFP_KERNEL);
> +	DBG_SEC("Allocated status ptr memory at 0x%p (0x%08x)\n",
> +			status_ptr, status_dma_addr);
> +	if (!status_ptr)
> +		return -ENOMEM;
> +
> +	for (i = 0; i < HW_NR_COMP_QUEUES; i++) {
> +		void *base; /* slowpath virtual address of base */
> +		dma_addr_t base_dma_addr; /* DMA bus address of base */
> +
> +		base = dma_alloc_coherent(NULL, SEC_COMP_Q_SIZE,
> +				&base_dma_addr, GFP_KERNEL);
> +		DBG_SEC("Allocated CQ%d at 0x%p (0x%08x)\n",
> +			i, base, base_dma_addr);
> +		if (!base)
> +			return -ENOMEM;

This leaks allocations.

> +		cq = &sec_comp_queues[i];
> +
> +		cq->compq_lock = SPIN_LOCK_UNLOCKED;
> +		cq->cq_regs = &sec2_regs->cq[i];
> +		cq->base = base;
> +		cq->base_dma_addr = base_dma_addr;
> +		cq->out = 0;
> +
> +		cq->cq_regs->ofst_ptr = (unsigned int *)status_dma_addr;
> +		cq->cq_regs->base = (unsigned char *)cq->base_dma_addr;
> +		cq->cq_regs->size = SEC_COMP_Q_SIZE;
> +		cq->cq_regs->in = 0;
> +		cq->cq_regs->out = 0;
> +	}
> +
> +	for (i = 0; i < HW_NR_WORK_QUEUES; i++) {
> +		void *base; /* slowpath virtual address of base */
> +		dma_addr_t base_dma_addr; /* DMA bus address of base */
> +
> +		base = dma_alloc_coherent(NULL, SEC_WORK_Q_SIZE,
> +					&base_dma_addr, GFP_KERNEL);
> +		DBG_SEC("Allocated WQ%d at 0x%p (0x%08x)\n",
> +			i, base, base_dma_addr);
> +		if (!base)
> +			return -ENOMEM;

This too.

> +		wq = &sec_work_queues[i];
> +
> +		init_waitqueue_head(&wq->space_wait);
> +
> +		wq->workq_lock = SPIN_LOCK_UNLOCKED;
> +		wq->wq_regs = &sec2_regs->wq[i];
> +		wq->base = base;
> +		wq->base_dma_addr = base_dma_addr;
> +		wq->in = 0;
> +		wq->low_water = SEC_WORK_Q_SIZE >> 1; /* wake when half full */
> +
> +		wq->wq_regs->ofst_ptr = (unsigned int *)status_dma_addr;
> +		wq->wq_regs->base = (unsigned char *)wq->base_dma_addr;
> +		wq->wq_regs->size = SEC_WORK_Q_SIZE;
> +		wq->wq_regs->in = 0;
> +		wq->wq_regs->out = 0;
> +	}
> +	
> +	debug_dump_sec_regs();
> +
> +	return 0;
> +}
> +
> +static int __init
> +msp_secv2_init(void)

Shouldn't this and other places be marked as __devinit?

...

> +static irqreturn_t
> +msp_secv2_interrupt(int irq, void *dev_id)
> +{
> +	/*
> +	 * TODO: This clears all interrupts, and assumes
> +	 * that the cause was a completion queue update.
> +	 */
> +	unsigned int status;
> +
> +	status = sec2_regs->sis;
> +	sec2_regs->sis = /* ~status */ 0;
> +
> +	DBG_SEC("interrupt irq %d status was %x\n", irq, status);
> +
> +	poll_completion();
> +
> +	return IRQ_HANDLED;
> +}

Irqs can not be shared?

...

> +static int
> +poll_completion(void)
> +{
> +	struct compq *cq;
> +	int flags;
> +	int work_ct = 0;
> +	
> +	/*
> +	 * Check IPSEC engine register to see if at least one
> +	 * completion element is in completion queue.
> +	 */
> +	cq = sec_comp_queues;
> +	spin_lock_irqsave(&cq->compq_lock, flags);

This lock seems not to protect against desc_do_work() for example, but
there are register/mmio access under both - what is a locking rules
there?

-- 
	Evgeniy Polyakov

^ permalink raw reply

* RSA support into kernel?
From: Gautam Singaraju @ 2007-07-05 22:48 UTC (permalink / raw)
  To: linux-crypto

Is there any attempts being made to provide software based RSA
cryptographic support in kernel level? I see that 2.6.21 supports
Hardware devices such as VIA Padlock ACE. Has anybody had a change to
use such a system?

-GS

^ permalink raw reply

* Re: RSA support into kernel?
From: Evgeniy Polyakov @ 2007-07-06 10:37 UTC (permalink / raw)
  To: Gautam Singaraju; +Cc: linux-crypto
In-Reply-To: <f8b2ccd40707051548w275a1575oaaddffd40d9a9163@mail.gmail.com>

On Thu, Jul 05, 2007 at 03:48:51PM -0700, Gautam Singaraju (gautam.singaraju@gmail.com) wrote:
> Is there any attempts being made to provide software based RSA
> cryptographic support in kernel level? I see that 2.6.21 supports
> Hardware devices such as VIA Padlock ACE. Has anybody had a change to
> use such a system?

VIA padlock engine or RSA? The former is heavily used in the wild, but
why would anyone want to use RSA in the kernel?

> -GS

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: RSA support into kernel?
From: David Miller @ 2007-07-06 11:05 UTC (permalink / raw)
  To: johnpol; +Cc: gautam.singaraju, linux-crypto
In-Reply-To: <20070706103731.GA10033@2ka.mipt.ru>

From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
Date: Fri, 6 Jul 2007 14:37:31 +0400

> On Thu, Jul 05, 2007 at 03:48:51PM -0700, Gautam Singaraju (gautam.singaraju@gmail.com) wrote:
> > Is there any attempts being made to provide software based RSA
> > cryptographic support in kernel level? I see that 2.6.21 supports
> > Hardware devices such as VIA Padlock ACE. Has anybody had a change to
> > use such a system?
> 
> VIA padlock engine or RSA? The former is heavily used in the wild, but
> why would anyone want to use RSA in the kernel?

Automatic SSL done in-kernel on user data for socket I/O, with
hardware offload from the crypto layer when available.

Solaris has done this for quite some time and it helps a lot for
things like the VIA and Niagara.

^ permalink raw reply

* Re: RSA support into kernel?
From: Evgeniy Polyakov @ 2007-07-06 12:10 UTC (permalink / raw)
  To: David Miller; +Cc: gautam.singaraju, linux-crypto
In-Reply-To: <20070706.040533.30182871.davem@davemloft.net>

On Fri, Jul 06, 2007 at 04:05:33AM -0700, David Miller (davem@davemloft.net) wrote:
> From: Evgeniy Polyakov <johnpol@2ka.mipt.ru>
> Date: Fri, 6 Jul 2007 14:37:31 +0400
> 
> > On Thu, Jul 05, 2007 at 03:48:51PM -0700, Gautam Singaraju (gautam.singaraju@gmail.com) wrote:
> > > Is there any attempts being made to provide software based RSA
> > > cryptographic support in kernel level? I see that 2.6.21 supports
> > > Hardware devices such as VIA Padlock ACE. Has anybody had a change to
> > > use such a system?
> > 
> > VIA padlock engine or RSA? The former is heavily used in the wild, but
> > why would anyone want to use RSA in the kernel?
> 
> Automatic SSL done in-kernel on user data for socket I/O, with
> hardware offload from the crypto layer when available.
> 
> Solaris has done this for quite some time and it helps a lot for
> things like the VIA and Niagara.

I.e. for userspace stuff? That is obviously the right usage, but Linux
cryptoapi does not have userspace interface, so was my question.
Actually I was several times already asked after acrypto was closed, how
userspace can use new hardware drivers, and frankly I do not know what
the best userspace API would look like (in one of the projects I already 
used all three methods one-by-one and failed to determine the best). 
Simple char device read/write or ioctl, or blocking/nonblocking syscall 
over file descriptor, or anything else?

-- 
	Evgeniy Polyakov

^ permalink raw reply

* Re: RSA support into kernel?
From: Herbert Xu @ 2007-07-06 13:12 UTC (permalink / raw)
  To: David Miller; +Cc: johnpol, gautam.singaraju, linux-crypto
In-Reply-To: <20070706.040533.30182871.davem@davemloft.net>

David Miller <davem@davemloft.net> wrote:
>> 
>> VIA padlock engine or RSA? The former is heavily used in the wild, but
>> why would anyone want to use RSA in the kernel?
> 
> Automatic SSL done in-kernel on user data for socket I/O, with
> hardware offload from the crypto layer when available.

AFAIK asymmetric crypto is only used for SSL key exchange and not
on the data transfers so I'm not sure whether this would be that
useful.  This is pretty much the same situation with IPsec where
we delegate the key exchange to the userspace KMs.

Now having in-kernel SSL data exchange support using the crypto
API would be pretty cool and would provide the same level of
crypto support to SSL users as we do for IPsec.

So far the only proposed user for RSA in-kernel seems to be
module signing and I'm staying well away from that debate :)

Cheers,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: RSA support into kernel?
From: Michael Halcrow @ 2007-07-06 13:36 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Miller, johnpol, gautam.singaraju, linux-crypto
In-Reply-To: <E1I6ncC-0007e3-00@gondolin.me.apana.org.au>

[-- Attachment #1: Type: text/plain, Size: 669 bytes --]

On Fri, Jul 06, 2007 at 09:12:52PM +0800, Herbert Xu wrote:
> So far the only proposed user for RSA in-kernel seems to be module
> signing and I'm staying well away from that debate :)

eCryptfs uses RSA.

Right now it has to defer to a userspace daemon to perform the
operation.

Mike
.___________________________________________________________________.
                         Michael A. Halcrow                          
       Security Software Engineer, IBM Linux Technology Center       
GnuPG Fingerprint: 419C 5B1E 948A FA73 A54C  20F5 DB40 8531 6DCA 8769

"This is about humans being human."                                  
 - Carl Sagan 

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 481 bytes --]

^ permalink raw reply

* Re: RSA support into kernel?
From: Gautam Singaraju @ 2007-07-06 14:41 UTC (permalink / raw)
  To: Michael Halcrow; +Cc: Herbert Xu, David Miller, johnpol, linux-crypto
In-Reply-To: <20070706133636.GH23191@halcrow.us>

I am considering RSA as an option for research purposes; though I need
it only for decryption purposes. Any specific reason for running the
daemon in user space?

Gautam
On 7/6/07, Michael Halcrow <mike@halcrow.us> wrote:
> On Fri, Jul 06, 2007 at 09:12:52PM +0800, Herbert Xu wrote:
> > So far the only proposed user for RSA in-kernel seems to be module
> > signing and I'm staying well away from that debate :)
>
> eCryptfs uses RSA.
>
> Right now it has to defer to a userspace daemon to perform the
> operation.
>
> Mike
> .___________________________________________________________________.
>                          Michael A. Halcrow
>        Security Software Engineer, IBM Linux Technology Center
> GnuPG Fingerprint: 419C 5B1E 948A FA73 A54C  20F5 DB40 8531 6DCA 8769
>
> "This is about humans being human."
>  - Carl Sagan
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iQEVAwUBRo5FZNtAhTFtyodpAQO9jAgAmCPiGap1u/Qd5Zogb/hxtpoNF8/7Vx+z
> FWnIbzI9jK8g1PBxXGkYVASQ/rPbT/yhX3Zg53jrJm+8RqDAQDY/Ca1qAUvDtD57
> R5Mo/eSSlwuvAMVsLFDYYINeER3fpIX7wdrwB5VTN6YKz9eJFhsNqMUSQ8mCSbbV
> qEzFUq8EdcYsaxSZ56uIXSSphneKXIDAzWCu5hjbLtr71WSkvXKe4kVZKElb1LrB
> SGxxPajTCnjuw1z9VL5Tp2pOfyX3pWRSnoiCxUjwl2Aco0hu+Nl+0X2qsVkkmBSx
> 0MoeTrZ+FAY0QBDlPbtR6N5kD4NvV94WXnfQPi5DC66730nwLufk4Q==
> =4vHb
> -----END PGP SIGNATURE-----
>
>


-- 
---
Gautam

^ permalink raw reply

* Re: RSA support into kernel?
From: Herbert Xu @ 2007-07-06 16:01 UTC (permalink / raw)
  To: Michael Halcrow; +Cc: David Miller, johnpol, gautam.singaraju, linux-crypto
In-Reply-To: <20070706133636.GH23191@halcrow.us>

On Fri, Jul 06, 2007 at 08:36:37AM -0500, Michael Halcrow wrote:
>
> eCryptfs uses RSA.
> 
> Right now it has to defer to a userspace daemon to perform the
> operation.

OK that'd be the most convincing case for me then.

Thanks,
-- 
Visit Openswan at http://www.openswan.org/
Email: Herbert Xu ~{PmV>HI~} <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox