Linux cryptographic layer development

Linux cryptographic layer development
 help / color / mirror / Atom feed

* [PATCH] crypto: testmgr: Use linear alias for test input
From: Laura Abbott @ 2016-12-19 23:37 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Ard Biesheuvel, Giovanni Cabiddu
  Cc: Laura Abbott, linux-crypto, linux-kernel, linux-arm-kernel,
	Mark Rutland, Christopher Covington

Christopher Covington reported a crash on aarch64 on recent Fedora
kernels:

kernel BUG at ./include/linux/scatterlist.h:140!
Internal error: Oops - BUG: 0 [#1] PREEMPT SMP
Modules linked in:
CPU: 2 PID: 752 Comm: cryptomgr_test Not tainted 4.9.0-11815-ge93b1cc #162
Hardware name: linux,dummy-virt (DT)
task: ffff80007c650080 task.stack: ffff800008910000
PC is at sg_init_one+0xa0/0xb8
LR is at sg_init_one+0x24/0xb8
...
[<ffff000008398db8>] sg_init_one+0xa0/0xb8
[<ffff000008350a44>] test_acomp+0x10c/0x438
[<ffff000008350e20>] alg_test_comp+0xb0/0x118
[<ffff00000834f28c>] alg_test+0x17c/0x2f0
[<ffff00000834c6a4>] cryptomgr_test+0x44/0x50
[<ffff0000080dac70>] kthread+0xf8/0x128
[<ffff000008082ec0>] ret_from_fork+0x10/0x50

The test vectors used for input are part of the kernel image. These
inputs are passed as a buffer to sg_init_one which eventually blows up
with BUG_ON(!virt_addr_valid(buf)). On arm64, virt_addr_valid returns
false for the kernel image since virt_to_page will not return the
correct page. The kernel image is also aliased to the linear map so get
the linear alias and pass that to the scatterlist instead.

Reported-by: Christopher Covington <cov@codeaurora.org>
Fixes: d7db7a882deb ("crypto: acomp - update testmgr with support for acomp")
Signed-off-by: Laura Abbott <labbott@redhat.com>
---
x86 supports virt_addr_valid working on kernel image addresses but arm64 is
more strict. This is the direction things have been moving with my
CONFIG_DEBUG_VIRTUAL series for arm64 which is tightening the definition of
__pa/__pa_symbol.
---
 crypto/testmgr.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index f616ad7..f5bac10 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -1464,7 +1464,7 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 
 		memset(output, 0, dlen);
 		init_completion(&result.completion);
-		sg_init_one(&src, ctemplate[i].input, ilen);
+		sg_init_one(&src, __va(__pa_symbol(ctemplate[i].input)), ilen);
 		sg_init_one(&dst, output, dlen);
 
 		req = acomp_request_alloc(tfm);
@@ -1513,7 +1513,7 @@ static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
 
 		memset(output, 0, dlen);
 		init_completion(&result.completion);
-		sg_init_one(&src, dtemplate[i].input, ilen);
+		sg_init_one(&src, __va(__pa_symbol(dtemplate[i].input)), ilen);
 		sg_init_one(&dst, output, dlen);
 
 		req = acomp_request_alloc(tfm);
-- 
2.7.4

^ permalink raw reply related

* Re: HalfSipHash Acceptable Usage
From: Jason A. Donenfeld @ 2016-12-19 21:00 UTC (permalink / raw)
  To: Jean-Philippe Aumasson
  Cc: Theodore Ts'o, Hannes Frederic Sowa, LKML, Eric Biggers,
	Daniel J . Bernstein, David Laight, David Miller, Andi Kleen,
	George Spelvin, kernel-hardening, Andy Lutomirski,
	Linux Crypto Mailing List, Tom Herbert, Vegard Nossum, Netdev,
	Linus Torvalds
In-Reply-To: <CAGiyFdduUNSGq24zfsk0ZU=hnOCmewAw8vw6XvDoS-3f+3UPKQ@mail.gmail.com>

Hi JP,

On Mon, Dec 19, 2016 at 9:49 PM, Jean-Philippe Aumasson
<jeanphilippe.aumasson@gmail.com> wrote:
>
> On Mon, Dec 19, 2016 at 6:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>
>> Hi JP,
>>
>> With the threads getting confusing, I've been urged to try and keep
>> the topics and threads more closely constrained. Here's where we're
>> at, and here's the current pressing security concern. It'd be helpful
>> to have a definitive statement on what you think is best, so we can
>> just build on top of that, instead of getting lost in the chorus of
>> opinions.
>>
>> 1) Anything that requires actual long-term security will use
>> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
>> things like TCP sequence numbers. This seems pretty uncontroversial to
>> me. Seem okay to you?
>
>
>
> Right, since 2012 when we published SipHash many cryptanalysts attempted to
> break SipHash-2-4 with a 128-bit key, for various notions of "break", and
> nothing worth worrying was ever found. I'm totally confident that
> SipHash-2-4 will live up to its security promises.
>
> Don't use something weaker for things like TCP sequence numbers or RNGs. Use
> SipHash2-4 for those. That is the correct choice.
>
>>
>>
>> 2) People seem to want something competitive, performance-wise, with
>> jhash if it's going to replace jhash. The kernel community
>> instinctively pushes back on anything that could harm performance,
>> especially in networking and in critical data structures, so there
>> have been some calls for something faster than SipHash. So, questions
>> regarding this:
>>
>
> No free lunch I guess: either go with a cryptographically secure,
> time-proved keyed hash such as SipHash, or go with some simpler hash deemed
> secure cos its designer can't break it :) #DontRollYourOwnCrypto
>
>> 2a) George thinks that HalfSipHash on 32-bit systems will have roughly
>> comparable speed as SipHash on 64-bit systems, so the idea would be to
>> use HalfSipHash on 32-bit systems' hash tables and SipHash on 64-bit
>> systems' hash tables. The big obvious question is: does HalfSipHash
>> have a sufficient security margin for hashtable usage and hashtable
>> attacks? I'm not wondering about the security margin for other usages,
>> but just of the hashtable usage. In your opinion, does HalfSipHash cut
>> it?
>
>
> HalfSipHash takes its core function from Chaskey and uses the same
> construction as SipHash, so it *should* be secure. Nonetheless it hasn't
> received the same amount of attention as 64-bit SipHash did. So I'm less
> confident about its security than about SipHash's, but it obviously inspires
> a lot more confidence than non-crypto hashes.
>
> Too, HalfSipHash only has a 64-bit key, not a 128-bit key like SipHash, so
> only use this as a mitigation for hash-flooding attacks, where the output of
> the hash function is never directly shown to the caller. Do not use
> HalfSipHash for TCP sequence numbers or RNGs.
>
>
>>
>>
>> 2b) While I certainly wouldn't consider making the use case in
>> question (1) employ a weaker function, for this question (2), there
>> has been some discussion about using HalfSipHash1-3 (or SipHash1-3 on
>> 64-bit) instead of 2-4. So, the same question is therefore posed:
>> would using HalfSipHash1-3 give a sufficient security margin for
>> hashtable usage and hashtable attacks?
>
>
> My educated guess is that yes, it will, but that it may not withhold
> cryptanalysis as a pseudorandom function (PRF). For example I wouldn't be
> surprised if there were a "distinguishing attack" that detects non-random
> patterns in HalfSipHash-1-3's output. But most of the non-crypto hashes I've
> seen have obvious distinguishing attacks. So the upshot is that HSH will get
> you better security that AnyWeakHash even with 1 & 3 rounds.
>
> So, if you're willing to compromise on security, but still want something
> not completely unreasonable, you might be able to get away with using
> HalfSipHash1-3 as a replacement for jhash—in circumstances where the output
> of the hash function is kept secret—in order to mitigate hash-flooding
> attacks.
>

Thanks for the detailed response. I will continue exactly how you've specified.

1. SipHash2-4 for TCP sequence numbers, syncookies, and RNG. IOW, the
things that MD5 is used for now.

2. HalfSipHash1-3 for hash tables where the output is not revealed,
for jhash replacements. On 64-bit this will alias to SipHash1-3.

3. I will write Documentation/siphash.txt detailing this.

4. I'll continue to discourage other kernel developers from rolling
their own crypto or departing from the tried&true in substantial ways.

Thanks again,
Jason

^ permalink raw reply

* Re: HalfSipHash Acceptable Usage
From: Jean-Philippe Aumasson @ 2016-12-19 20:49 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Theodore Ts'o, Hannes Frederic Sowa, LKML, Eric Biggers,
	Daniel J . Bernstein, David Laight, David Miller, Andi Kleen,
	George Spelvin, kernel-hardening, Andy Lutomirski,
	Linux Crypto Mailing List, Tom Herbert, Vegard Nossum, Netdev,
	Linus Torvalds
In-Reply-To: <CAHmME9rPmH=wP_eHYopt8ZPG9TSN7bos3fGOuqKL2HjQW-2SWA@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4442 bytes --]

On Mon, Dec 19, 2016 at 6:32 PM Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> Hi JP,
>
> With the threads getting confusing, I've been urged to try and keep
> the topics and threads more closely constrained. Here's where we're
> at, and here's the current pressing security concern. It'd be helpful
> to have a definitive statement on what you think is best, so we can
> just build on top of that, instead of getting lost in the chorus of
> opinions.
>
> 1) Anything that requires actual long-term security will use
> SipHash2-4, with the 64-bit output and the 128-bit key. This includes
> things like TCP sequence numbers. This seems pretty uncontroversial to
> me. Seem okay to you?
>

Right, since 2012 when we published SipHash many cryptanalysts attempted to
break SipHash-2-4 with a 128-bit key, for various notions of "break", and
nothing worth worrying was ever found. I'm totally confident that
SipHash-2-4 will live up to its security promises.

Don't use something weaker for things like TCP sequence numbers or RNGs.
Use SipHash2-4 for those. That is the correct choice.

>
> 2) People seem to want something competitive, performance-wise, with
> jhash if it's going to replace jhash. The kernel community
> instinctively pushes back on anything that could harm performance,
> especially in networking and in critical data structures, so there
> have been some calls for something faster than SipHash. So, questions
> regarding this:
>
>
No free lunch I guess: either go with a cryptographically secure,
time-proved keyed hash such as SipHash, or go with some simpler hash deemed
secure cos its designer can't break it :) #DontRollYourOwnCrypto

2a) George thinks that HalfSipHash on 32-bit systems will have roughly
> comparable speed as SipHash on 64-bit systems, so the idea would be to
> use HalfSipHash on 32-bit systems' hash tables and SipHash on 64-bit
> systems' hash tables. The big obvious question is: does HalfSipHash
> have a sufficient security margin for hashtable usage and hashtable
> attacks? I'm not wondering about the security margin for other usages,
> but just of the hashtable usage. In your opinion, does HalfSipHash cut
> it?
>

HalfSipHash takes its core function from Chaskey and uses the same
construction as SipHash, so it *should* be secure. Nonetheless it hasn't
received the same amount of attention as 64-bit SipHash did. So I'm less
confident about its security than about SipHash's, but it obviously
inspires a lot more confidence than non-crypto hashes.

Too, HalfSipHash only has a 64-bit key, not a 128-bit key like SipHash, so
only use this as a mitigation for hash-flooding attacks, where the output
of the hash function is never directly shown to the caller. Do not use
HalfSipHash for TCP sequence numbers or RNGs.

>
> 2b) While I certainly wouldn't consider making the use case in
> question (1) employ a weaker function, for this question (2), there
> has been some discussion about using HalfSipHash1-3 (or SipHash1-3 on
> 64-bit) instead of 2-4. So, the same question is therefore posed:
> would using HalfSipHash1-3 give a sufficient security margin for
> hashtable usage and hashtable attacks?
>

My educated guess is that yes, it will, but that it may not withhold
cryptanalysis as a pseudorandom function (PRF). For example I wouldn't be
surprised if there were a "distinguishing attack" that detects non-random
patterns in HalfSipHash-1-3's output. But most of the non-crypto hashes
I've seen have obvious distinguishing attacks. So the upshot is that HSH
will get you better security that AnyWeakHash even with 1 & 3 rounds.

So, if you're willing to compromise on security, but still want something
not completely unreasonable, you might be able to get away with using
HalfSipHash1-3 as a replacement for jhash—in circumstances where the output
of the hash function is kept secret—in order to mitigate hash-flooding
attacks.

>
> My plan is essentially to implement things according to your security
> recommendation. The thread started with me pushing a heavy duty
> security solution -- SipHash2-4 -- for _everything_. I've received
> understandable push back on that notion for certain use cases. So now
> I'm trying to discover what the most acceptable compromise is. Your
> answers on (2a) and (2b) will direct that compromise.
>
> Thanks again,
> Jason

>

[-- Attachment #2: Type: text/html, Size: 6561 bytes --]

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jean-Philippe Aumasson @ 2016-12-19 20:18 UTC (permalink / raw)
  To: George Spelvin, David.Laight, tom
  Cc: ak, davem, djb, ebiggers3, hannes, Jason, kernel-hardening,
	linux-crypto, linux-kernel, luto, netdev, torvalds, tytso,
	vegard.nossum
In-Reply-To: <20161219181040.25441.qmail@ns.sciencehorizons.net>

[-- Attachment #1: Type: text/plain, Size: 3984 bytes --]

FTR, I agree that SipHash may benefit from security/performance
optimizations (there's always optimizations), but I'm note sure that it's
the right time/place to discuss it. Cryptanalysis is hard, and any major
change should be backed by a rigorous security analysis and/or security
proof. For example, the redundancy in the initial state prevents trivial
weaknesses that would occur if four key words were used (something I've
seen proposed in this thread).

In the paper at http://aumasson.jp/siphash/siphash.pdf we explain rationale
behind most of our design choices. The cryptanalysis paper at
https://eprint.iacr.org/2014/722.pdf analyzes differences propagations and
presents attacks on reduced versions of SipHash.


On Mon, Dec 19, 2016 at 7:10 PM George Spelvin <linux@sciencehorizons.net>
wrote:

> David Laight wrote:
> > From: George Spelvin
> ...
> >> uint32_t
> >> hsiphash24(char const *in, size_t len, uint32_t const key[2])
> >> {
> >>      uint32_t c = key[0];
> >>      uint32_t d = key[1];
> >>      uint32_t a =     0x6c796765 ^ 0x736f6d65;
> >>      uint32_t b = d ^ 0x74656462 ^ 0x646f7261;
>
> > I've not looked closely, but is that (in some sense) duplicating
> > the key length?
> > So you could set a = key[2] and b = key[3] and still have an
> > working hash - albeit not exactly the one specified.
>
> That's tempting, but not necessarily effective.  (A similar unsuccesful
> idea can be found in discussions of "DES with independent round keys".
> Or see the design discussion of Salsa20 and the constants in its input.)
>
> You can increase the key size, but that might not increase the *security*
> any.
>
> The big issue is that there are a *lot* of square root attack in
> cryptanalysis.  Because SipHash's state is twice the size of the key,
> such an attack will have the same complexity as key exhaustion and need
> not be considered.  To make a stronger security claim, you need to start
> working through them all and show that they don't apply.
>
> For SipHash in particular, an important property is asymmetry of the
> internal state.  That's what duplicating the key with XORs guarantees.
> If the two halves of the state end up identical, the mixing is much
> weaker.
>
> Now the probability of ending up in a "mirror state" is the square
> root of the state size (1 in 2^64 for HalfSipHash's 128-bit state),
> which is the same probability as guessing a key, so it's not a
> problem that has to be considered when making a 64-bit security claim.
>
> But if you want a higher security level, you have to think about
> what can happen.
>
> That said, I have been thinking very hard about
>
>         a = c ^ 0x48536970;     /* 'HSip' */
>         d = key[2];
>
> By guaranteeing that a and c are different, we get the desired
> asymmetry, and the XOR of b and d is determined by the first word of
> the message anyway, so this isn't weakening anything.
>
> 96 bits is far beyond the reach of any brute-force attack, and if a
> more sophisticated 64-bit attack exists, it's at least out of the reach
> of the script kiddies, and will almost certainly have a non-negligible
> constant factor and more limits in when it can be applied.
>
> > Is it worth using the 32bit hash for IP addresses on 64bit systems that
> > can't do misaligned accessed?
>
> Not a good idea.  To hash 64 bits of input:
>
> * Full SipHash has to do two loads, a shift, an or, and two rounds of
> mixing.
> * HalfSipHash has to do a load, two rounds, another load, and two more
> rounds.
>
> In other words, in addition to being less secure, it's half the speed.
>
> Also, what systems are you thinking about?  x86, ARMv8, PowerPC, and
> S390 (and ia64, if anyone cares) all handle unaligned loads.  MIPS has
> efficient support.  Alpha and HPPA are for retrocomputing fans, not
> people who care about performance.
>
> So you're down to SPARC.  Which conveniently has the same maintainer as
> the networking code, so I figure DaveM can take care of that himself. :-)
>

[-- Attachment #2: Type: text/html, Size: 6131 bytes --]

^ permalink raw reply

* RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: George Spelvin @ 2016-12-19 18:10 UTC (permalink / raw)
  To: David.Laight, linux, tom
  Cc: ak, davem, djb, ebiggers3, hannes, Jason, jeanphilippe.aumasson,
	kernel-hardening, linux-crypto, linux-kernel, luto, netdev,
	torvalds, tytso, vegard.nossum
In-Reply-To: <063D6719AE5E284EB5DD2968C1650D6DB0242669@AcuExch.aculab.com>

David Laight wrote:
> From: George Spelvin
...
>> uint32_t
>> hsiphash24(char const *in, size_t len, uint32_t const key[2])
>> {
>> 	uint32_t c = key[0];
>> 	uint32_t d = key[1];
>> 	uint32_t a =     0x6c796765 ^ 0x736f6d65;
>> 	uint32_t b = d ^ 0x74656462 ^ 0x646f7261;

> I've not looked closely, but is that (in some sense) duplicating
> the key length?
> So you could set a = key[2] and b = key[3] and still have an
> working hash - albeit not exactly the one specified.

That's tempting, but not necessarily effective.  (A similar unsuccesful
idea can be found in discussions of "DES with independent round keys".
Or see the design discussion of Salsa20 and the constants in its input.)

You can increase the key size, but that might not increase the *security*
any.

The big issue is that there are a *lot* of square root attack in
cryptanalysis.  Because SipHash's state is twice the size of the key,
such an attack will have the same complexity as key exhaustion and need
not be considered.  To make a stronger security claim, you need to start
working through them all and show that they don't apply.

For SipHash in particular, an important property is asymmetry of the
internal state.  That's what duplicating the key with XORs guarantees.
If the two halves of the state end up identical, the mixing is much
weaker.

Now the probability of ending up in a "mirror state" is the square
root of the state size (1 in 2^64 for HalfSipHash's 128-bit state),
which is the same probability as guessing a key, so it's not a
problem that has to be considered when making a 64-bit security claim.

But if you want a higher security level, you have to think about
what can happen.

That said, I have been thinking very hard about

	a = c ^ 0x48536970;	/* 'HSip' */
	d = key[2];

By guaranteeing that a and c are different, we get the desired
asymmetry, and the XOR of b and d is determined by the first word of
the message anyway, so this isn't weakening anything.

96 bits is far beyond the reach of any brute-force attack, and if a
more sophisticated 64-bit attack exists, it's at least out of the reach
of the script kiddies, and will almost certainly have a non-negligible
constant factor and more limits in when it can be applied.

> Is it worth using the 32bit hash for IP addresses on 64bit systems that
> can't do misaligned accessed?

Not a good idea.  To hash 64 bits of input:

* Full SipHash has to do two loads, a shift, an or, and two rounds of mixing.
* HalfSipHash has to do a load, two rounds, another load, and two more rounds.

In other words, in addition to being less secure, it's half the speed.  

Also, what systems are you thinking about?  x86, ARMv8, PowerPC, and
S390 (and ia64, if anyone cares) all handle unaligned loads.  MIPS has
efficient support.  Alpha and HPPA are for retrocomputing fans, not
people who care about performance.

So you're down to SPARC.  Which conveniently has the same maintainer as
the networking code, so I figure DaveM can take care of that himself. :-)

^ permalink raw reply

* HalfSipHash Acceptable Usage
From: Jason A. Donenfeld @ 2016-12-19 17:32 UTC (permalink / raw)
  To: Jean-Philippe Aumasson
  Cc: Theodore Ts'o, Hannes Frederic Sowa, LKML, Eric Biggers,
	Daniel J . Bernstein, David Laight, David Miller, Andi Kleen,
	George Spelvin, kernel-hardening, Andy Lutomirski,
	Linux Crypto Mailing List, Tom Herbert, Vegard Nossum, Netdev,
	Linus Torvalds

Hi JP,

With the threads getting confusing, I've been urged to try and keep
the topics and threads more closely constrained. Here's where we're
at, and here's the current pressing security concern. It'd be helpful
to have a definitive statement on what you think is best, so we can
just build on top of that, instead of getting lost in the chorus of
opinions.

1) Anything that requires actual long-term security will use
SipHash2-4, with the 64-bit output and the 128-bit key. This includes
things like TCP sequence numbers. This seems pretty uncontroversial to
me. Seem okay to you?

2) People seem to want something competitive, performance-wise, with
jhash if it's going to replace jhash. The kernel community
instinctively pushes back on anything that could harm performance,
especially in networking and in critical data structures, so there
have been some calls for something faster than SipHash. So, questions
regarding this:

2a) George thinks that HalfSipHash on 32-bit systems will have roughly
comparable speed as SipHash on 64-bit systems, so the idea would be to
use HalfSipHash on 32-bit systems' hash tables and SipHash on 64-bit
systems' hash tables. The big obvious question is: does HalfSipHash
have a sufficient security margin for hashtable usage and hashtable
attacks? I'm not wondering about the security margin for other usages,
but just of the hashtable usage. In your opinion, does HalfSipHash cut
it?

2b) While I certainly wouldn't consider making the use case in
question (1) employ a weaker function, for this question (2), there
has been some discussion about using HalfSipHash1-3 (or SipHash1-3 on
64-bit) instead of 2-4. So, the same question is therefore posed:
would using HalfSipHash1-3 give a sufficient security margin for
hashtable usage and hashtable attacks?

My plan is essentially to implement things according to your security
recommendation. The thread started with me pushing a heavy duty
security solution -- SipHash2-4 -- for _everything_. I've received
understandable push back on that notion for certain use cases. So now
I'm trying to discover what the most acceptable compromise is. Your
answers on (2a) and (2b) will direct that compromise.

Thanks again,
Jason

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jason A. Donenfeld @ 2016-12-19 17:21 UTC (permalink / raw)
  To: Theodore Ts'o, kernel-hardening, Jason, George Spelvin,
	Andi Kleen, David Miller, David Laight, Daniel J . Bernstein,
	Eric Biggers, Hannes Frederic Sowa, Jean-Philippe Aumasson,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Vegard Nossum
In-Reply-To: <20161217154152.5oug7mzb4tmfknwv@thunk.org>

Hi Ted,

On Sat, Dec 17, 2016 at 4:41 PM, Theodore Ts'o <tytso@mit.edu> wrote:
> On Fri, Dec 16, 2016 at 09:15:03PM -0500, George Spelvin wrote:
>> >> - Ted, Andy Lutorminski and I will try to figure out a construction of
>> >>   get_random_long() that we all like.
>
> We don't have to find the most optimal solution right away; we can
> approach this incrementally, after all.

Thanks to your call for moderation. This is the impression I have too.
And with all the back and forth of these threads, I fear nothing will
get done. I'm going to collect the best ideas amongst all of these,
and try to get it merged. Then after that we can incrementally improve
on it.

David Miller -- would you merge something into your 4.11 tree? Seems
like you might be the guy for this, since the changes primarily affect
net/*.

Latest patches are here:
https://git.zx2c4.com/linux-dev/log/?h=siphash


> So long as we replace get_random_{long,int}() with something which is
> (a) strictly better in terms of security given today's use of MD5, and
> (b) which is strictly *faster* than the current construction on 32-bit
> and 64-bit systems, we can do that, and can try to make it be faster
> while maintaining some minimum level of security which is sufficient
> for all current users of get_random_{long,int}() and which can be
> clearly artificulated for future users of get_random_{long,int}().
>
> The main worry at this point I have is benchmarking siphash on a
> 32-bit system.  It may be that simply batching the chacha20 output so
> that we're using the urandom construction more efficiently is the
> better way to go, since that *does* meet the criteron of strictly more
> secure and strictly faster than the current MD5 solution.  I'm open to
> using siphash, but I want to see the the 32-bit numbers first.

Sure, I'll run some benchmarks and report back.

> As far as half-siphash is concerned, it occurs to me that the main
> problem will be those users who need to guarantee that output can't be
> guessed over a long period of time.  For example, if you have a
> long-running process, then the output needs to remain unguessable over
> potentially months or years, or else you might be weakening the ASLR
> protections.  If on the other hand, the hash table or the process will
> be going away in a matter of seconds or minutes, the requirements with
> respect to cryptographic strength go down significantly.
>
> Now, maybe this doesn't matter that much if we can guarantee (or make
> assumptions) that the attacker doesn't have unlimited access the
> output stream of get_random_{long,int}(), or if it's being used in an
> anti-DOS use case where it ultimately only needs to be harder than
> alternate ways of attacking the system.

The only acceptable usage of HalfSipHash is for hash table lookups
where top security isn't actually a goal. I'm still a bit queasy about
going with it, but George S has very aggressively been pursuing a
"save every last cycle" agenda, which makes sense given how
performance sensitive certain hash tables are, and so I suspect this
could be an okay compromise between performance and security. But,
only for hash tables. Certainly not for the RNG.

>
> Rekeying every five minutes doesn't necessarily help the with respect
> to ASLR, but it might reduce the amount of the output stream that
> would be available to the attacker in order to be able to attack the
> get_random_{long,int}() generator, and it also reduces the value of
> doing that attack to only compromising the ASLR for those processes
> started within that five minute window.

The current implemention of get_random_int/long in my branch uses
128-bit key siphash, so the chances of compromising the key are pretty
much nil. The periodic rekeying is to protect against direct-ram
attacks or other kernel exploits -- a concern brought up by Andy.

With siphash, which takes a 128-bit key, if you got an RNG output once
every picosecond, I believe it would take approximately 10^19 years...

Jason

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jean-Philippe Aumasson @ 2016-12-19 17:19 UTC (permalink / raw)
  To: Jason A. Donenfeld, noloader
  Cc: Netdev, kernel-hardening, LKML, Linux Crypto Mailing List,
	David Laight, Ted Tso, Hannes Frederic Sowa, Linus Torvalds,
	Eric Biggers, Tom Herbert, George Spelvin, Vegard Nossum,
	Andi Kleen, David Miller, Andy Lutomirski, Daniel J . Bernstein
In-Reply-To: <CAHmME9o6CFuQJEp3QJgkh78r5Z2F8NJnpTCCFUiSnAiJxPGCJQ@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1388 bytes --]

Yeah you can use the PRF properties to build a DRBG, but that may not be
optimal in terms of performance.
On Mon, 19 Dec 2016 at 18:08, Jason A. Donenfeld <Jason@zx2c4.com> wrote:

> On Sat, Dec 17, 2016 at 3:55 PM, Jeffrey Walton <noloader@gmail.com>
> wrote:
> > It may be prudent to include the endian reversal in the test to ensure
> > big endian machines produce expected results. Some closely related
> > testing on an old Apple PowerMac G5 revealed that result needed to be
> > reversed before returning it to a caller.
>
> The function [1] returns a u64. Originally I had it returning a
> __le64, but that was considered unnecessary by many prior reviewers on
> the list. It returns an integer. If you want uniform bytes out of it,
> then use the endian conversion function, the same as you would do with
> any other type of integer.
>
> Additionally, this function is *not* meant for af_alg or any of the
> crypto/* code. It's very unlikely to find a use there.
>
>
> > Forgive my ignorance... I did not find reading on using the primitive
> > in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
> > Aumasson's site does not seem to discuss the use case:
>
> He's on this thread so I suppose he can speak up for himself. But in
> my conversations with him, the primary take-away was, "seems okay to
> me!". But please -- JP - correct me if I've misinterpreted.
>

[-- Attachment #2: Type: text/html, Size: 2216 bytes --]

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jason A. Donenfeld @ 2016-12-19 17:08 UTC (permalink / raw)
  To: noloader
  Cc: Netdev, kernel-hardening, LKML, Linux Crypto Mailing List,
	David Laight, Ted Tso, Hannes Frederic Sowa, Linus Torvalds,
	Eric Biggers, Tom Herbert, George Spelvin, Vegard Nossum,
	Andi Kleen, David Miller, Andy Lutomirski, Jean-Philippe Aumasson,
	Daniel J . Bernstein
In-Reply-To: <CAH8yC8nE5CLfbv8gCRCeiNW3JLCAB3062-6pNO8mEXJYcsFUsw@mail.gmail.com>

On Sat, Dec 17, 2016 at 3:55 PM, Jeffrey Walton <noloader@gmail.com> wrote:
> It may be prudent to include the endian reversal in the test to ensure
> big endian machines produce expected results. Some closely related
> testing on an old Apple PowerMac G5 revealed that result needed to be
> reversed before returning it to a caller.

The function [1] returns a u64. Originally I had it returning a
__le64, but that was considered unnecessary by many prior reviewers on
the list. It returns an integer. If you want uniform bytes out of it,
then use the endian conversion function, the same as you would do with
any other type of integer.

Additionally, this function is *not* meant for af_alg or any of the
crypto/* code. It's very unlikely to find a use there.

> Forgive my ignorance... I did not find reading on using the primitive
> in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
> Aumasson's site does not seem to discuss the use case:

He's on this thread so I suppose he can speak up for himself. But in
my conversations with him, the primary take-away was, "seems okay to
me!". But please -- JP - correct me if I've misinterpreted.

^ permalink raw reply

* Re: [RFC PATCH 1/3] crypto: zip - Add ThunderX ZIP driver core
From: Sasha Levin @ 2016-12-19 16:12 UTC (permalink / raw)
  To: Jan Glauber, alexander.levin
  Cc: Herbert Xu, linux-crypto, linux-kernel@vger.kernel.org List,
	David S . Miller, Mahipal Challa, Vishnu Nair
In-Reply-To: <20161212150439.18627-2-jglauber@cavium.com>

On Mon, Dec 12, 2016 at 10:04 AM, Jan Glauber <jglauber@cavium.com> wrote:
> +/* error messages */
> +#define zip_err(fmt, args...) pr_err("ZIP ERR:%s():%d: " \
> +                             fmt "\n", __func__, __LINE__, ## args)
> +
> +#ifdef MSG_ENABLE
> +/* Enable all messages */
> +#define zip_msg(fmt, args...) pr_info("ZIP_MSG:" fmt "\n", ## args)
> +#else
> +#define zip_msg(fmt, args...)
> +#endif
> +
> +#if defined(ZIP_DEBUG_ENABLE) && defined(MSG_ENABLE)
> +
> +#ifdef DEBUG_LEVEL
> +
> +#define FILE_NAME (strrchr(__FILE__, '/') ? strrchr(__FILE__, '/') + 1 : \
> +       strrchr(__FILE__, '\\') ? strrchr(__FILE__, '\\') + 1 : __FILE__)
> +
> +#if DEBUG_LEVEL >= 4
> +
> +#define zip_dbg(fmt, args...) pr_info("ZIP DBG: %s: %s() : %d: " \
> +                             fmt "\n", FILE_NAME, __func__, __LINE__, ## args)
> +
> +#define zip_dbg_enter(fmt, args...) pr_info("ZIP_DBG: %s() in %s" \
> +                                   fmt "\n", __func__, FILE_NAME, ## args)
> +
> +#define zip_dbg_exit(fmt, args...) pr_info("ZIP_DBG:Exit %s() in %s" \
> +                                  fmt "\n", __func__, FILE_NAME, ## args)
> +
> +#elif DEBUG_LEVEL >= 3
> +
> +#define zip_dbg(fmt, args...) pr_info("ZIP DBG: %s: %s() : %d: " \
> +                             fmt "\n", FILE_NAME, __func__, __LINE__, ## args)
> +
> +#elif DEBUG_LEVEL >= 2
> +
> +#define zip_dbg(fmt, args...) pr_info("ZIP DBG: %s() : %d: " \
> +                             fmt "\n", __func__, __LINE__, ## args)
> +
> +#else
> +
> +#define zip_dbg(fmt, args...) pr_info("ZIP DBG:" fmt "\n", ## args)
> +
> +#endif /* DEBUG LEVEL >= */
> +
> +#if DEBUG_LEVEL <= 3
> +
> +#define zip_dbg_enter(fmt, args...)
> +#define zip_dbg_exit(fmt, args...)
> +
> +#endif /* DEBUG_LEVEL <= 3 */
> +#else
> +
> +#define zip_dbg(fmt, args...) pr_info("ZIP DBG:" fmt "\n", ## args)
> +
> +#define zip_dbg_enter(fmt, args...)
> +#define zip_dbg_exit(fmt, args...)
> +
> +#endif /* DEBUG_LEVEL */
> +#else
> +
> +#define zip_dbg(fmt, args...)
> +#define zip_dbg_enter(fmt, args...)
> +#define zip_dbg_exit(fmt, args...)
> +
> +#endif /* ZIP_DEBUG_ENABLE */

The whole zip_dbg_[enter,exit] thing can be just done with tracepoints
instead of reinventing the wheel. No reason to carry that code

> +u32 zip_load_instr(union zip_inst_s *instr,
> +                  struct zip_device *zip_dev)
> +{
> +       union zip_quex_doorbell dbell;
> +       u32 queue = 0;
> +       u32 consumed = 0;
> +       u64 *ncb_ptr = NULL;
> +       union zip_nptr_s ncp;
> +
> +       /*
> +        * Distribute the instructions between the enabled queues based on
> +        * the CPU id.
> +        */
> +       if (raw_smp_processor_id() % 2 == 0)
> +               queue = 0;
> +       else
> +               queue = 1;

Is this just simplistic load balancing scheme? There's no guarantee
that the cpu won't change later on.

> +
> +       /* Poll for the IQ cmd completion code */
> +       zip_dbg_exit();

The comment doesn't match with what actually happens?

> +static inline u64 zip_depth(void)
> +{
> +       struct zip_device *zip_dev = zip_get_device(zip_get_node_id());
> +
> +       if (!zip_dev)
> +               return -ENODEV;
> +
> +       return zip_dev->depth;
> +}
> +
> +static inline u64 zip_onfsize(void)
> +{
> +       struct zip_device *zip_dev = zip_get_device(zip_get_node_id());
> +
> +       if (!zip_dev)
> +               return -ENODEV;
> +
> +       return zip_dev->onfsize;
> +}
> +
> +static inline u64 zip_ctxsize(void)
> +{
> +       struct zip_device *zip_dev = zip_get_device(zip_get_node_id());
> +
> +       if (!zip_dev)
> +               return -ENODEV;
> +
> +       return zip_dev->ctxsize;
> +}

Where is it all used?

> +/*
> + * Allocates new ZIP device structure
> + * Returns zip_device pointer or NULL if cannot allocate memory for zip_device
> + */
> +static struct zip_device *zip_alloc_device(struct pci_dev *pdev)
> +{
> +       struct zip_device *zip = NULL;
> +       int idx = 0;
> +
> +       for (idx = 0; idx < MAX_ZIP_DEVICES; idx++) {
> +               if (!zip_dev[idx])
> +                       break;
> +       }
> +
> +       zip = kzalloc(sizeof(*zip), GFP_KERNEL);
> +
> +       if (!zip)
> +               return NULL;
> +
> +       zip_dev[idx] = zip;

If for some odd reason we try to allocate more than MAX_ZIP_DEVICES
this will deref an invalid pointer.

> +/**
> + * zip_get_device - Get ZIP device based on node id of cpu
> + *
> + * @node: Node id of the current cpu
> + * Return: Pointer to Zip device structure
> + */
> +struct zip_device *zip_get_device(int node)
> +{
> +       if ((node < MAX_ZIP_DEVICES) && (node >= 0))
> +               return zip_dev[node];
> +
> +       zip_err("ZIP device not found for node id %d\n", node);
> +       return NULL;
> +}
> +
> +/**
> + * zip_get_node_id - Get the node id of the current cpu
> + *
> + * Return: Node id of the current cpu
> + */
> +int zip_get_node_id(void)
> +{
> +       return cpu_to_node(raw_smp_processor_id());
> +}

This looks off: why are you calling raw_smp_processor_id() instead of
just smp_processor_id()?

I guess you saw warnings from smp_processor_id(), but they happen to
warn you that the cpu might change under you, making this node number
incorrect.


> +#ifndef __ZIP_MAIN_H__
> +#define __ZIP_MAIN_H__
> +
> +#include "zip_device.h"
> +#include "zip_regs.h"
> +
> +/* PCI device IDs */
> +#define PCI_DEVICE_ID_THUNDERX_ZIP   0xA01A
> +
> +/* ZIP device BARs */
> +#define PCI_CFG_ZIP_PF_BAR0   0  /* Base addr for normal regs */
> +#define PCI_CFG_ZIP_PF_BAR4   4  /* Base addr for MSI-X regs  */
> +
> +/* Maximum available zip queues */
> +#define ZIP_MAX_NUM_QUEUES    8
> +#define ZIP_MAXQ_PER_ZIPENG   4
> +#define ZIP_NUMENG_CN88XX     2
> +
> +#define ZIP_128B_ALIGN        7
> +
> +/* Buffer size and alignment */
> +#define ZIP_CMD_QBUF_SIZE     (8064 + 8)
> +#define ZIP_CMD_QBUF_ALIGN    128
> +#define ZIP_DATA_BUF_ALIGN    8
>
> +/*
> + * There will be max of 2^20 zip cmds in the zip instruction queue.
> + * So no of zip Chunk buffers = ((2^20) / ((2*1024)/64))
> + */
> +#define ZIP_MAX_CMD           (1024 * 1024)
> +#define ZIP_CMD_PER_BUF       (ZIP_CMD_QBUF_SIZE / 64)
> +#define ZIP_CMD_QBUF_MAX_CNT  (1 * 1024)
> +
> +/* Data buffer size 64K for time being */
> +#define ZIP_DATA_BUF_SIZE     (64 * 1024)
> +
> +/* Number of data buffers */
> +#define ZIP_DATA_BUF_CNT      (32 * 1024)

Bunch of these defines aren't used.

> +/**
> + * zip_cmd_qbuf_alloc - Allocates a cmd buffer for ZIP Instruction Queue
> + * @zip: Pointer to zip device structure
> + * @q:   Queue number to allocate bufffer to
> + * Return: 0 if successful, -ENOMEM otherwise
> + */
> +int zip_cmd_qbuf_alloc(struct zip_device *zip, int q)
> +{
> +       zip_dbg_enter();
> +
> +       zip->iq[q].sw_head = (u64 *)__get_free_pages((GFP_KERNEL | GFP_DMA),
> +                                               get_order(ZIP_CMD_QBUF_SIZE));
> +
> +       if (!zip->iq[q].sw_head)
> +               return -ENOMEM;
> +
> +       memset(zip->iq[q].sw_head, 0, ZIP_CMD_QBUF_SIZE);
> +
> +       zip_dbg("cmd_qbuf_alloc[%d] Success : %p\n", q, zip->iq[q].sw_head);
> +       zip_dbg_exit();
> +       return 0;
> +}
> +
> +/**
> + * zip_cmd_qbuf_free - Frees the cmd Queue buffer
> + * @zip: Pointer to zip device structure
> + * @q:   Queue number to free buffer of
> + */
> +void zip_cmd_qbuf_free(struct zip_device *zip, int q)
> +{
> +       zip_dbg("Freeing cmd_qbuf 0x%lx\n", zip->iq[q].sw_tail);
> +
> +       free_pages((u64)zip->iq[q].sw_tail, get_order(ZIP_CMD_QBUF_SIZE));
> +}

Why are you going through __get_free_pages() for constant size
allocations? kmemcache would be better here.

> +/**
> + * zip_data_buf_alloc - Allocates memory for a data bufffer
> + * @size:   Size of the buffer to allocate
> + * Returns: Pointer to the buffer allocated
> + */
> +u8 *zip_data_buf_alloc(u64 size)
> +{
> +       u8 *ptr;
> +
> +       zip_dbg_enter();
> +
> +       ptr = (u8 *)__get_free_pages((GFP_ATOMIC | GFP_DMA),
> +                                       get_order(size));
> +
> +       if (!ptr)
> +               return NULL;
> +
> +       memset(ptr, 0, size);
> +
> +       zip_dbg("Data buffer allocation success\n");
> +       zip_dbg_exit();
> +       return ptr;
> +}
> +
> +/**
> + * zip_data_buf_free - Frees the memory of a data buffer
> + * @ptr:  Pointer to the buffer
> + * @size: Buffer size
> + */
> +void zip_data_buf_free(u8 *ptr, u64 size)
> +{
> +       zip_dbg("Freeing data buffer 0x%lx\n", ptr);
> +
> +       free_pages((u64)ptr, get_order(size));
> +}

Same question here; you end up allocating constant sizes - why not kmemcache?

Also, this is quite a lot to allocate out of the atomic pool, is there
a reason for that?


Thanks,
Sasha

^ permalink raw reply

* RE: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: David Laight @ 2016-12-19 14:14 UTC (permalink / raw)
  To: 'George Spelvin', tom@herbertland.com
  Cc: ak@linux.intel.com, davem@davemloft.net, djb@cr.yp.to,
	ebiggers3@gmail.com, hannes@stressinduktion.org, Jason@zx2c4.com,
	jeanphilippe.aumasson@gmail.com,
	kernel-hardening@lists.openwall.com, linux-crypto@vger.kernel.org,
	linux-kernel@vger.kernel.org, luto@amacapital.net,
	netdev@vger.kernel.org, torvalds@linux-foundation.org,
	tytso@mit.edu, vegard.nossum@gmail.com
In-Reply-To: <20161217152122.19677.qmail@ns.sciencehorizons.net>

From: George Spelvin
> Sent: 17 December 2016 15:21
...
> uint32_t
> hsiphash24(char const *in, size_t len, uint32_t const key[2])
> {
> 	uint32_t c = key[0];
> 	uint32_t d = key[1];
> 	uint32_t a =     0x6c796765 ^ 0x736f6d65;
> 	uint32_t b = d ^ 0x74656462 ^ 0x646f7261;

I've not looked closely, but is that (in some sense) duplicating
the key length?
So you could set a = key[2] and b = key[3] and still have an
working hash - albeit not exactly the one specified.

I'll add another comment here...
Is it worth using the 32bit hash for IP addresses on 64bit systems that
can't do misaligned accessed?

	David

^ permalink raw reply

* Re: [RFC PATCH 1/3] crypto: zip - Add ThunderX ZIP driver core
From: Jan Glauber @ 2016-12-19 12:29 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: Herbert Xu, linux-crypto, linux-kernel, David S . Miller,
	Mahipal Challa, Vishnu Nair
In-Reply-To: <20161213133900.GA10647@Red>

Hi Corentin,

thanks for your review! Your comments all look reasonable to me,
Mahipal will address them.

Since I posted this series at the beginning of the merge window
I'd like to wait for some more time before we post an updated version.

--Jan

On Tue, Dec 13, 2016 at 02:39:00PM +0100, Corentin Labbe wrote:
> Hello
> 
> I have some comment below
> 
> On Mon, Dec 12, 2016 at 04:04:37PM +0100, Jan Glauber wrote:
> > From: Mahipal Challa <Mahipal.Challa@cavium.com>
> > 
> [...]
> > --- a/drivers/crypto/Makefile
> > +++ b/drivers/crypto/Makefile
> > @@ -27,6 +27,7 @@ obj-$(CONFIG_CRYPTO_DEV_MXC_SCC) += mxc-scc.o
> >  obj-$(CONFIG_CRYPTO_DEV_TALITOS) += talitos.o
> >  obj-$(CONFIG_CRYPTO_DEV_UX500) += ux500/
> >  obj-$(CONFIG_CRYPTO_DEV_QAT) += qat/
> > +obj-$(CONFIG_CRYPTO_DEV_CAVIUM_ZIP) += cavium/
> >  obj-$(CONFIG_CRYPTO_DEV_QCE) += qce/
> >  obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
> >  obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
> 
> Try to keep some alphabetical order
> 
> [...]
> > +/* ZIP invocation result completion status codes */
> > +#define ZIP_NOTDONE		0x0
> > +
> > +/* Successful completion. */
> > +#define ZIP_SUCCESS		0x1
> > +
> > +/* Output truncated */
> > +#define ZIP_DTRUNC		0x2
> > +
> > +/* Dynamic Stop */
> > +#define ZIP_DYNAMIC_STOP	0x3
> > +
> > +/* Uncompress ran out of input data when IWORD0[EF] was set */
> > +#define ZIP_ITRUNC		0x4
> > +
> > +/* Uncompress found the reserved block type 3 */
> > +#define ZIP_RBLOCK		0x5
> > +
> > +/* Uncompress found LEN != ZIP_NLEN in an uncompressed block in the input */
> > +#define ZIP_NLEN		0x6
> > +
> > +/* Uncompress found a bad code in the main Huffman codes. */
> > +#define ZIP_BADCODE		0x7
> > +
> > +/* Uncompress found a bad code in the 19 Huffman codes encoding lengths. */
> > +#define ZIP_BADCODE2	        0x8
> > +
> > +/* Compress found a zero-length input. */
> > +#define ZIP_ZERO_LEN	        0x9
> > +
> > +/* The compress or decompress encountered an internal parity error. */
> > +#define ZIP_PARITY		0xA
> 
> Perhaps all errors could begin with ZIP_ERR_xxx ?
> 
> [...]
> > +static inline u64 zip_depth(void)
> > +{
> > +	struct zip_device *zip_dev = zip_get_device(zip_get_node_id());
> > +
> > +	if (!zip_dev)
> > +		return -ENODEV;
> > +
> > +	return zip_dev->depth;
> > +}
> 
> This function is not used.
> 
> > +
> > +static inline u64 zip_onfsize(void)
> > +{
> > +	struct zip_device *zip_dev = zip_get_device(zip_get_node_id());
> > +
> > +	if (!zip_dev)
> > +		return -ENODEV;
> > +
> > +	return zip_dev->onfsize;
> > +}
> 
> Same for this
> 
> > +
> > +static inline u64 zip_ctxsize(void)
> > +{
> > +	struct zip_device *zip_dev = zip_get_device(zip_get_node_id());
> > +
> > +	if (!zip_dev)
> > +		return -ENODEV;
> > +
> > +	return zip_dev->ctxsize;
> > +}
> 
> Again
> 
> [...]
> > +
> > +/*
> > + * Allocates new ZIP device structure
> > + * Returns zip_device pointer or NULL if cannot allocate memory for zip_device
> > + */
> > +static struct zip_device *zip_alloc_device(struct pci_dev *pdev)
> 
> pdev is not used, so you can remove it from arglist.
> Or keep it and use devm_kzalloc for allocating zip
> 
> > +{
> > +	struct zip_device *zip = NULL;
> > +	int idx = 0;
> > +
> > +	for (idx = 0; idx < MAX_ZIP_DEVICES; idx++) {
> > +		if (!zip_dev[idx])
> > +			break;
> > +	}
> > +
> > +	zip = kzalloc(sizeof(*zip), GFP_KERNEL);
> > +
> > +	if (!zip)
> > +		return NULL;
> > +
> > +	zip_dev[idx] = zip;
> > +	zip->index = idx;
> > +	return zip;
> > +}
> 
> [...]
> > +static int zip_probe(struct pci_dev *pdev, const struct pci_device_id *ent)
> > +{
> > +	struct device *dev = &pdev->dev;
> > +	struct zip_device *zip = NULL;
> > +	int    err;
> > +
> > +	zip_dbg_enter();
> > +
> > +	zip = zip_alloc_device(pdev);
> > +
> > +	if (!zip)
> > +		return -ENOMEM;
> > +
> > +	pr_info("Found ZIP device %d %x:%x on Node %d\n", zip->index,
> > +		pdev->vendor, pdev->device, dev_to_node(dev));
> 
> You use lots of pr_info, why not using more dev_info/dev_err ?
> 
> > +
> > +	zip->pdev = pdev;
> > +
> > +	pci_set_drvdata(pdev, zip);
> > +
> > +	err = pci_enable_device(pdev);
> > +	if (err) {
> > +		zip_err("Failed to enable PCI device");
> > +		goto err_free_device;
> > +	}
> > +
> > +	err = pci_request_regions(pdev, DRV_NAME);
> > +	if (err) {
> > +		zip_err("PCI request regions failed 0x%x", err);
> > +		goto err_disable_device;
> > +	}
> > +
> > +	err = pci_set_dma_mask(pdev, DMA_BIT_MASK(48));
> > +	if (err) {
> > +		dev_err(dev, "Unable to get usable DMA configuration\n");
> > +		goto err_release_regions;
> > +	}
> > +
> > +	err = pci_set_consistent_dma_mask(pdev, DMA_BIT_MASK(48));
> > +	if (err) {
> > +		dev_err(dev, "Unable to get 48-bit DMA for allocations\n");
> > +		goto err_release_regions;
> > +	}
> > +
> > +	/* MAP configuration registers */
> > +	zip->reg_base = pci_ioremap_bar(pdev, PCI_CFG_ZIP_PF_BAR0);
> > +	if (!zip->reg_base) {
> > +		zip_err("ZIP: Cannot map BAR0 CSR memory space, aborting");
> > +		err = -ENOMEM;
> > +		goto err_release_regions;
> > +	}
> > +
> > +	/* Initialize ZIP Hardware */
> > +	err = zip_init_hw(zip);
> > +	if (err)
> > +		goto err_release_regions;
> > +
> > +	return 0;
> > +
> > +err_release_regions:
> > +	if (zip->reg_base)
> > +		iounmap(zip->reg_base);
> > +	pci_release_regions(pdev);
> > +
> > +err_disable_device:
> > +	pci_disable_device(pdev);
> > +
> > +err_free_device:
> > +	pci_set_drvdata(pdev, NULL);
> > +
> > +	/* remove zip_dev from zip_device list, free the zip_device memory */
> > +	zip_dev[zip->index] = NULL;
> > +	kfree(zip);
> > +
> > +	zip_dbg_exit();
> > +	return err;
> > +}
> 
> [...]
> > +static int __init zip_init_module(void)
> > +{
> > +	int ret;
> > +
> > +	memset(&zip_dev, 0, sizeof(zip_dev));
> 
> A static variable is already zeroed
> 
> > +
> > +	zip_msg("%s\n", DRV_NAME);
> > +
> > +	ret = pci_register_driver(&zip_driver);
> > +	if (ret < 0) {
> > +		zip_err("ZIP: pci_register_driver() returned %d\n", ret);
> > +		return ret;
> > +	}
> > +
> > +	/* Register with the Kernel Crypto Interface */
> > +	ret = zip_register_compression_device();
> > +	if (ret < 0) {
> > +		zip_err("ZIP: Kernel Crypto Registration failed\n");
> > +		return 1;
> 
> 1 is not a good error code. And you quit without unregistering.
> 
> > +	}
> > +
> > +	return ret;
> > +}
> > +
> > +static void __exit zip_cleanup_module(void)
> > +{
> > +	/* Unregister this driver for pci zip devices */
> > +	pci_unregister_driver(&zip_driver);
> > +
> > +	/* Unregister from the kernel crypto interface */
> > +	zip_unregister_compression_device();
> 
> I think you must do the opposite. (unregister crypto first)
> 
> > +
> > +	pr_info("ThunderX-ZIP driver is removed successfully\n");
> > +}
> > +
> > +module_init(zip_init_module);
> > +module_exit(zip_cleanup_module);
> 
> Why not using module_pci_driver ?
> 
> [...]
> > +/**
> > + * enum zip_comp_e - ZIP Completion Enumeration, enumerates the values of
> > + * ZIP_ZRES_S[COMPCODE].
> > + */
> > +enum zip_comp_e {
> > +	ZIP_COMP_E_BADCODE = 0x7,
> > +	ZIP_COMP_E_BADCODE2 = 0x8,
> > +	ZIP_COMP_E_DTRUNC = 0x2,
> > +	ZIP_COMP_E_FATAL = 0xb,
> > +	ZIP_COMP_E_ITRUNC = 0x4,
> > +	ZIP_COMP_E_NLEN = 0x6,
> > +	ZIP_COMP_E_NOTDONE = 0x0,
> > +	ZIP_COMP_E_PARITY = 0xa,
> > +	ZIP_COMP_E_RBLOCK = 0x5,
> > +	ZIP_COMP_E_STOP = 0x3,
> > +	ZIP_COMP_E_SUCCESS = 0x1,
> > +	ZIP_COMP_E_ZERO_LEN = 0x9,
> > +	ZIP_COMP_E_ENUM_LAST = 0xc,
> 
> Why not using already declared define ? ZIP_COMP_E_BADCODE = ZIP_BADCODE 
> 
> Regards
> Corentin Labbe

^ permalink raw reply

* Re: Test AEAD/authenc algorithms from userspace
From: Harsh Jain @ 2016-12-19 10:38 UTC (permalink / raw)
  To: Stephan Mueller, Herbert Xu; +Cc: linux-crypto
In-Reply-To: <2943969.IiWKeGvEyD@tauon.atsec.com>

Hi Herbert,

TLS default mode of operation is MAC-then-Encrypt for Authenc algos.
Currently framework only supports EtM used in IPSec. User space
programs like openssl cannot use af-alg interface to encrypt/decrypt
in TLS mode.
Are we going to support Mac-then-Encrypt mode in future kernel releases?


Regards
Harsh Jain

On Tue, May 31, 2016 at 12:35 PM, Stephan Mueller <smueller@chronox.de> wrote:
> Am Dienstag, 31. Mai 2016, 12:31:16 schrieb Harsh Jain:
>
> Hi Harsh,
>
>> Hi All,
>>
>> How can we open socket of type "authenc(hmac(sha256),cbc(aes))" from
>> userspace program.I check libkcapi library. It has test programs for
>> GCM/CCM. There are 3 types of approaches to Authenticated Encryption,
>> Which of them is supported in crypto framework.
>>
>> 1) Encrypt-then-MAC (EtM)
>>      The plaintext is first encrypted, then a MAC is produced based on
>> the resulting ciphertext. The ciphertext and its MAC are sent
>> together.
>> 2) Encrypt-and-MAC (E&M)
>>      A MAC is produced based on the plaintext, and the plaintext is
>> encrypted without the MAC. The plaintext's MAC and the ciphertext are
>> sent together.
>>
>> 3) MAC-then-Encrypt (MtE)
>>      A MAC is produced based on the plaintext, then the plaintext and
>> MAC are together encrypted to produce a ciphertext based on both. The
>> ciphertext (containing an encrypted MAC) is sent.
>
> The cipher types you mention refer to the implementation of authenc(). IIRC,
> authenc implements EtM as this is mandated by IPSEC.
>
> When you use libkcapi, you should simply be able to use your cipher name with
> the AEAD API. I.e. use the examples you see for CCM or GCM and use those with
> the chosen authenc() cipher. Do you experience any issues?
>
> Ciao
> Stephan

^ permalink raw reply

* [PATCH v3 2/2] crypto: mediatek - add DT bindings documentation
From: Ryder Lee @ 2016-12-19  2:20 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Matthias Brugger
  Cc: devicetree-u79uwXL29TY76Z2rM5mHXA, Ryder Lee, Sean Wang,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Roy Luo,
	linux-mediatek-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r
In-Reply-To: <1482114045-18716-1-git-send-email-ryder.lee-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>

Add DT bindings documentation for the crypto driver

Signed-off-by: Ryder Lee <ryder.lee-NuS5LvNUpcJWk0Htik3J/w@public.gmane.org>
---
 .../devicetree/bindings/crypto/mediatek-crypto.txt | 27 ++++++++++++++++++++++
 1 file changed, 27 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/mediatek-crypto.txt

diff --git a/Documentation/devicetree/bindings/crypto/mediatek-crypto.txt b/Documentation/devicetree/bindings/crypto/mediatek-crypto.txt
new file mode 100644
index 0000000..c204725
--- /dev/null
+++ b/Documentation/devicetree/bindings/crypto/mediatek-crypto.txt
@@ -0,0 +1,27 @@
+MediaTek cryptographic accelerators
+
+Required properties:
+- compatible: Should be "mediatek,eip97-crypto"
+- reg: Address and length of the register set for the device
+- interrupts: Should contain the five crypto engines interrupts in numeric
+	order. These are global system and four descriptor rings.
+- clocks: the clock used by the core
+- clock-names: the names of the clock listed in the clocks property. These are
+	"ethif", "cryp"
+- power-domains: Must contain a reference to the PM domain.
+
+
+Example:
+	crypto: crypto@1b240000 {
+		compatible = "mediatek,eip97-crypto";
+		reg = <0 0x1b240000 0 0x20000>;
+		interrupts = <GIC_SPI 82 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 83 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 84 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 91 IRQ_TYPE_LEVEL_LOW>,
+			     <GIC_SPI 97 IRQ_TYPE_LEVEL_LOW>;
+		clocks = <&topckgen CLK_TOP_ETHIF_SEL>,
+			 <&ethsys CLK_ETHSYS_CRYPTO>;
+		clock-names = "ethif","cryp";
+		power-domains = <&scpsys MT2701_POWER_DOMAIN_ETH>;
+	};
-- 
1.9.1

^ permalink raw reply related

* [PATCH v3 1/2] Add crypto driver support for some MediaTek chips
From: Ryder Lee @ 2016-12-19  2:20 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Matthias Brugger
  Cc: devicetree, linux-mediatek, linux-kernel, linux-crypto,
	linux-arm-kernel, Sean Wang, Roy Luo, Ryder Lee
In-Reply-To: <1482114045-18716-1-git-send-email-ryder.lee@mediatek.com>

This adds support for the MediaTek hardware accelerator on
mt7623/mt2701/mt8521p SoC.

This driver currently implement:
- SHA1 and SHA2 family(HMAC) hash algorithms.
- AES block cipher in CBC/ECB mode with 128/196/256 bits keys.

Signed-off-by: Ryder Lee <ryder.lee@mediatek.com>
---
 drivers/crypto/Kconfig                 |   17 +
 drivers/crypto/Makefile                |    1 +
 drivers/crypto/mediatek/Makefile       |    2 +
 drivers/crypto/mediatek/mtk-aes.c      |  765 +++++++++++++++++
 drivers/crypto/mediatek/mtk-platform.c |  604 ++++++++++++++
 drivers/crypto/mediatek/mtk-platform.h |  238 ++++++
 drivers/crypto/mediatek/mtk-regs.h     |  194 +++++
 drivers/crypto/mediatek/mtk-sha.c      | 1437 ++++++++++++++++++++++++++++++++
 8 files changed, 3258 insertions(+)
 create mode 100644 drivers/crypto/mediatek/Makefile
 create mode 100644 drivers/crypto/mediatek/mtk-aes.c
 create mode 100644 drivers/crypto/mediatek/mtk-platform.c
 create mode 100644 drivers/crypto/mediatek/mtk-platform.h
 create mode 100644 drivers/crypto/mediatek/mtk-regs.h
 create mode 100644 drivers/crypto/mediatek/mtk-sha.c

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 4d2b81f..937039d 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -553,6 +553,23 @@ config CRYPTO_DEV_ROCKCHIP
 	  This driver interfaces with the hardware crypto accelerator.
 	  Supporting cbc/ecb chainmode, and aes/des/des3_ede cipher mode.
 
+config CRYPTO_DEV_MEDIATEK
+	tristate "MediaTek's EIP97 Cryptographic Engine driver"
+	depends on ARM && (ARCH_MEDIATEK || COMPILE_TEST)
+	select NEON
+	select KERNEL_MODE_NEON
+	select ARM_CRYPTO
+	select CRYPTO_AES
+	select CRYPTO_BLKCIPHER
+	select CRYPTO_SHA1_ARM_NEON
+	select CRYPTO_SHA256_ARM
+	select CRYPTO_SHA512_ARM
+	select CRYPTO_HMAC
+	help
+	  This driver allows you to utilize the hardware crypto accelerator
+	  EIP97 which can be found on the MT7623 MT2701, MT8521p, etc ....
+	  Select this if you want to use it for AES/SHA1/SHA2 algorithms.
+
 source "drivers/crypto/chelsio/Kconfig"
 
 endif # CRYPTO_HW
diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
index ad7250f..272b51a 100644
--- a/drivers/crypto/Makefile
+++ b/drivers/crypto/Makefile
@@ -10,6 +10,7 @@ obj-$(CONFIG_CRYPTO_DEV_IMGTEC_HASH) += img-hash.o
 obj-$(CONFIG_CRYPTO_DEV_IXP4XX) += ixp4xx_crypto.o
 obj-$(CONFIG_CRYPTO_DEV_MV_CESA) += mv_cesa.o
 obj-$(CONFIG_CRYPTO_DEV_MARVELL_CESA) += marvell/
+obj-$(CONFIG_CRYPTO_DEV_MEDIATEK) += mediatek/
 obj-$(CONFIG_CRYPTO_DEV_MXS_DCP) += mxs-dcp.o
 obj-$(CONFIG_CRYPTO_DEV_NIAGARA2) += n2_crypto.o
 n2_crypto-y := n2_core.o n2_asm.o
diff --git a/drivers/crypto/mediatek/Makefile b/drivers/crypto/mediatek/Makefile
new file mode 100644
index 0000000..187be79
--- /dev/null
+++ b/drivers/crypto/mediatek/Makefile
@@ -0,0 +1,2 @@
+obj-$(CONFIG_CRYPTO_DEV_MEDIATEK) += mtk-crypto.o
+mtk-crypto-objs:= mtk-platform.o mtk-aes.o mtk-sha.o
diff --git a/drivers/crypto/mediatek/mtk-aes.c b/drivers/crypto/mediatek/mtk-aes.c
new file mode 100644
index 0000000..3271471
--- /dev/null
+++ b/drivers/crypto/mediatek/mtk-aes.c
@@ -0,0 +1,765 @@
+/*
+ * Cryptographic API.
+ *
+ * Driver for EIP97 AES acceleration.
+ *
+ * Copyright (c) 2016 Ryder Lee <ryder.lee@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Some ideas are from atmel-aes.c drivers.
+ */
+
+#include <crypto/aes.h>
+#include "mtk-platform.h"
+
+#define AES_QUEUE_SIZE		512
+#define AES_BUF_ORDER		2
+#define AES_BUF_SIZE		((PAGE_SIZE << AES_BUF_ORDER) \
+				& ~(AES_BLOCK_SIZE - 1))
+
+/* AES command token */
+#define AES_CT_SIZE_ECB		2
+#define AES_CT_SIZE_CBC		3
+#define AES_CT_CTRL_HDR		cpu_to_le32(0x00220000)
+#define AES_COMMAND0		cpu_to_le32(0x05000000)
+#define AES_COMMAND1		cpu_to_le32(0x2d060000)
+#define AES_COMMAND2		cpu_to_le32(0xe4a63806)
+
+/* AES transform information */
+#define AES_TFM_ECB		cpu_to_le32(0x0 << 0)
+#define AES_TFM_CBC		cpu_to_le32(0x1 << 0)
+#define AES_TFM_DECRYPT		cpu_to_le32(0x5 << 0)
+#define AES_TFM_ENCRYPT		cpu_to_le32(0x4 << 0)
+#define AES_TFM_SIZE(x)		cpu_to_le32((x) << 8)
+#define AES_TFM_128BITS		cpu_to_le32(0xb << 16)
+#define AES_TFM_192BITS		cpu_to_le32(0xd << 16)
+#define AES_TFM_256BITS		cpu_to_le32(0xf << 16)
+#define AES_TFM_FULL_IV		cpu_to_le32(0xf << 5)
+
+/* AES flags */
+#define AES_FLAGS_MODE_MSK	0x7
+#define AES_FLAGS_ECB		BIT(0)
+#define AES_FLAGS_CBC		BIT(1)
+#define AES_FLAGS_ENCRYPT	BIT(2)
+#define AES_FLAGS_BUSY		BIT(3)
+
+/**
+ * mtk_aes_ct is a set of hardware instructions(command token)
+ * that are used to control engine's processing flow of AES.
+ */
+struct mtk_aes_ct {
+	__le32 ct_ctrl0;
+	__le32 ct_ctrl1;
+	__le32 ct_ctrl2;
+};
+
+/**
+ * mtk_aes_tfm is used to define AES transform state
+ * and contains all keys and initial vectors.
+ */
+struct mtk_aes_tfm {
+	__le32 tfm_ctrl0;
+	__le32 tfm_ctrl1;
+	__le32 state[SIZE_IN_WORDS(AES_KEYSIZE_256 + AES_BLOCK_SIZE)];
+};
+
+/**
+ * mtk_aes_info consists of command token and transform state of AES,
+ * which should be encapsulated in command and result descriptors.
+ *
+ * The engine requires this information to do:
+ * - Commands decoding and control of the engine's data path.
+ * - Coordinating hardware data fetch and store operations.
+ * - Result token construction and output.
+ */
+struct mtk_aes_info {
+	struct mtk_aes_ct ct;
+	struct mtk_aes_tfm tfm;
+};
+
+struct mtk_aes_reqctx {
+	u64 mode;
+};
+
+struct mtk_aes_ctx {
+	struct mtk_cryp *cryp;
+	struct mtk_aes_info info;
+	u32 keylen;
+};
+
+struct mtk_aes_drv {
+	struct list_head dev_list;
+	/* Device list lock */
+	spinlock_t lock;
+};
+
+static struct mtk_aes_drv mtk_aes = {
+	.dev_list = LIST_HEAD_INIT(mtk_aes.dev_list),
+	.lock = __SPIN_LOCK_UNLOCKED(mtk_aes.lock),
+};
+
+static inline u32 mtk_aes_read(struct mtk_cryp *cryp, u32 offset)
+{
+	return readl_relaxed(cryp->base + offset);
+}
+
+static inline void mtk_aes_write(struct mtk_cryp *cryp,
+				 u32 offset, u32 value)
+{
+	writel_relaxed(value, cryp->base + offset);
+}
+
+static struct mtk_cryp *mtk_aes_find_dev(struct mtk_aes_ctx *ctx)
+{
+	struct mtk_cryp *cryp = NULL;
+	struct mtk_cryp *tmp;
+
+	spin_lock_bh(&mtk_aes.lock);
+	if (!ctx->cryp) {
+		list_for_each_entry(tmp, &mtk_aes.dev_list, aes_list) {
+			cryp = tmp;
+			break;
+		}
+		ctx->cryp = cryp;
+	} else {
+		cryp = ctx->cryp;
+	}
+	spin_unlock_bh(&mtk_aes.lock);
+
+	return cryp;
+}
+
+static inline size_t mtk_aes_padlen(size_t len)
+{
+	len &= AES_BLOCK_SIZE - 1;
+	return len ? AES_BLOCK_SIZE - len : 0;
+}
+
+static bool mtk_aes_check_aligned(struct scatterlist *sg, size_t len,
+				  struct mtk_aes_dma *dma)
+{
+	int nents;
+
+	if (!IS_ALIGNED(len, AES_BLOCK_SIZE))
+		return false;
+
+	for (nents = 0; sg; sg = sg_next(sg), ++nents) {
+		if (!IS_ALIGNED(sg->offset, sizeof(u32)))
+			return false;
+
+		if (len <= sg->length) {
+			if (!IS_ALIGNED(len, AES_BLOCK_SIZE))
+				return false;
+
+			dma->nents = nents + 1;
+			dma->remainder = sg->length - len;
+			sg->length = len;
+			return true;
+		}
+
+		if (!IS_ALIGNED(sg->length, AES_BLOCK_SIZE))
+			return false;
+
+		len -= sg->length;
+	}
+
+	return false;
+}
+
+/* Initialize and map transform information of AES */
+static int mtk_aes_info_map(struct mtk_cryp *cryp,
+			    struct mtk_aes_rec *aes,
+			    size_t len)
+{
+	struct mtk_aes_ctx *ctx = crypto_ablkcipher_ctx(
+			crypto_ablkcipher_reqtfm(aes->req));
+	struct mtk_aes_info *info = aes->info;
+	struct mtk_aes_ct *ct = &info->ct;
+	struct mtk_aes_tfm *tfm = &info->tfm;
+
+	aes->ct_hdr = AES_CT_CTRL_HDR | cpu_to_le32(len);
+
+	if (aes->flags & AES_FLAGS_ENCRYPT)
+		tfm->tfm_ctrl0 = AES_TFM_ENCRYPT;
+	else
+		tfm->tfm_ctrl0 = AES_TFM_DECRYPT;
+
+	if (ctx->keylen == SIZE_IN_WORDS(AES_KEYSIZE_128))
+		tfm->tfm_ctrl0 |= AES_TFM_128BITS;
+	else if (ctx->keylen == SIZE_IN_WORDS(AES_KEYSIZE_256))
+		tfm->tfm_ctrl0 |= AES_TFM_256BITS;
+	else if (ctx->keylen == SIZE_IN_WORDS(AES_KEYSIZE_192))
+		tfm->tfm_ctrl0 |= AES_TFM_192BITS;
+
+	ct->ct_ctrl0 = AES_COMMAND0 | cpu_to_le32(len);
+	ct->ct_ctrl1 = AES_COMMAND1;
+
+	if (aes->flags & AES_FLAGS_CBC) {
+		const u32 *iv = (const u32 *)aes->req->info;
+		u32 *iv_state = tfm->state + ctx->keylen;
+		int i;
+
+		aes->ct_size = AES_CT_SIZE_CBC;
+		ct->ct_ctrl2 = AES_COMMAND2;
+
+		tfm->tfm_ctrl0 |= AES_TFM_SIZE(ctx->keylen +
+				  SIZE_IN_WORDS(AES_BLOCK_SIZE));
+		tfm->tfm_ctrl1 = AES_TFM_CBC | AES_TFM_FULL_IV;
+
+		for (i = 0; i < SIZE_IN_WORDS(AES_BLOCK_SIZE); i++)
+			iv_state[i] = cpu_to_le32(iv[i]);
+
+	} else if (aes->flags & AES_FLAGS_ECB) {
+		aes->ct_size = AES_CT_SIZE_ECB;
+		tfm->tfm_ctrl0 |= AES_TFM_SIZE(ctx->keylen);
+		tfm->tfm_ctrl1 = AES_TFM_ECB;
+	}
+
+	aes->ct_dma = dma_map_single(cryp->dev, info, sizeof(*info),
+					DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(cryp->dev, aes->ct_dma))) {
+		dev_err(cryp->dev, "dma %d bytes error\n", sizeof(*info));
+		return -EINVAL;
+	}
+	aes->tfm_dma = aes->ct_dma + sizeof(*ct);
+
+	return 0;
+}
+
+static int mtk_aes_xmit(struct mtk_cryp *cryp, struct mtk_aes_rec *aes)
+{
+	struct mtk_ring *ring = cryp->ring[aes->id];
+	struct mtk_desc *cmd = NULL, *res = NULL;
+	struct scatterlist *ssg, *dsg;
+	u32 len = aes->src.sg_len;
+	int nents;
+
+	/* Fill in the command/result descriptors */
+	for (nents = 0; nents < len; ++nents) {
+		ssg = &aes->src.sg[nents];
+		dsg = &aes->dst.sg[nents];
+
+		cmd = ring->cmd_base + ring->pos;
+		cmd->hdr = MTK_DESC_BUF_LEN(ssg->length);
+		cmd->buf = cpu_to_le32(sg_dma_address(ssg));
+
+		res = ring->res_base + ring->pos;
+		res->hdr = MTK_DESC_BUF_LEN(dsg->length);
+		res->buf = cpu_to_le32(sg_dma_address(dsg));
+
+		if (nents == 0) {
+			res->hdr |= MTK_DESC_FIRST;
+			cmd->hdr |= MTK_DESC_FIRST |
+				    MTK_DESC_CT_LEN(aes->ct_size);
+			cmd->ct = cpu_to_le32(aes->ct_dma);
+			cmd->ct_hdr = aes->ct_hdr;
+			cmd->tfm = cpu_to_le32(aes->tfm_dma);
+		}
+
+		if (++ring->pos == MTK_DESC_NUM)
+			ring->pos = 0;
+	}
+
+	cmd->hdr |= MTK_DESC_LAST;
+	res->hdr |= MTK_DESC_LAST;
+
+	/*
+	 * Make sure that all changes to the DMA ring are done before we
+	 * start engine.
+	 */
+	wmb();
+	/* Start DMA transfer */
+	mtk_aes_write(cryp, RDR_PREP_COUNT(aes->id), MTK_DESC_CNT(len));
+	mtk_aes_write(cryp, CDR_PREP_COUNT(aes->id), MTK_DESC_CNT(len));
+
+	return -EINPROGRESS;
+}
+
+static inline void mtk_aes_restore_sg(const struct mtk_aes_dma *dma)
+{
+	struct scatterlist *sg = dma->sg;
+	int nents = dma->nents;
+
+	if (!dma->remainder)
+		return;
+
+	while (--nents > 0 && sg)
+		sg = sg_next(sg);
+
+	if (!sg)
+		return;
+
+	sg->length += dma->remainder;
+}
+
+static int mtk_aes_map(struct mtk_cryp *cryp, struct mtk_aes_rec *aes)
+{
+	struct scatterlist *src = aes->req->src;
+	struct scatterlist *dst = aes->req->dst;
+	size_t len = aes->req->nbytes;
+	size_t padlen = 0;
+	bool src_aligned, dst_aligned;
+
+	aes->total = len;
+	aes->src.sg = src;
+	aes->dst.sg = dst;
+	aes->real_dst = dst;
+
+	src_aligned = mtk_aes_check_aligned(src, len, &aes->src);
+	if (src == dst)
+		dst_aligned = src_aligned;
+	else
+		dst_aligned = mtk_aes_check_aligned(dst, len, &aes->dst);
+
+	if (!src_aligned || !dst_aligned) {
+		padlen = mtk_aes_padlen(len);
+
+		if (len + padlen > AES_BUF_SIZE)
+			return -ENOMEM;
+
+		if (!src_aligned) {
+			sg_copy_to_buffer(src, sg_nents(src), aes->buf, len);
+			aes->src.sg = &aes->aligned_sg;
+			aes->src.nents = 1;
+			aes->src.remainder = 0;
+		}
+
+		if (!dst_aligned) {
+			aes->dst.sg = &aes->aligned_sg;
+			aes->dst.nents = 1;
+			aes->dst.remainder = 0;
+		}
+
+		sg_init_table(&aes->aligned_sg, 1);
+		sg_set_buf(&aes->aligned_sg, aes->buf, len + padlen);
+	}
+
+	if (aes->src.sg == aes->dst.sg) {
+		aes->src.sg_len = dma_map_sg(cryp->dev, aes->src.sg,
+				aes->src.nents, DMA_BIDIRECTIONAL);
+		aes->dst.sg_len = aes->src.sg_len;
+		if (unlikely(!aes->src.sg_len))
+			return -EFAULT;
+	} else {
+		aes->src.sg_len = dma_map_sg(cryp->dev, aes->src.sg,
+				aes->src.nents, DMA_TO_DEVICE);
+		if (unlikely(!aes->src.sg_len))
+			return -EFAULT;
+
+		aes->dst.sg_len = dma_map_sg(cryp->dev, aes->dst.sg,
+				aes->dst.nents, DMA_FROM_DEVICE);
+		if (unlikely(!aes->dst.sg_len)) {
+			dma_unmap_sg(cryp->dev, aes->src.sg,
+				     aes->src.nents, DMA_TO_DEVICE);
+			return -EFAULT;
+		}
+	}
+
+	return mtk_aes_info_map(cryp, aes, len + padlen);
+}
+
+static int mtk_aes_handle_queue(struct mtk_cryp *cryp, u8 id,
+				struct ablkcipher_request *req)
+{
+	struct mtk_aes_rec *aes = cryp->aes[id];
+	struct crypto_async_request *areq, *backlog;
+	struct mtk_aes_reqctx *rctx;
+	struct mtk_aes_ctx *ctx;
+	unsigned long flags;
+	int err, ret = 0;
+
+	spin_lock_irqsave(&aes->lock, flags);
+	if (req)
+		ret = ablkcipher_enqueue_request(&aes->queue, req);
+	if (aes->flags & AES_FLAGS_BUSY) {
+		spin_unlock_irqrestore(&aes->lock, flags);
+		return ret;
+	}
+	backlog = crypto_get_backlog(&aes->queue);
+	areq = crypto_dequeue_request(&aes->queue);
+	if (areq)
+		aes->flags |= AES_FLAGS_BUSY;
+	spin_unlock_irqrestore(&aes->lock, flags);
+
+	if (!areq)
+		return ret;
+
+	if (backlog)
+		backlog->complete(backlog, -EINPROGRESS);
+
+	req = ablkcipher_request_cast(areq);
+	ctx = crypto_ablkcipher_ctx(crypto_ablkcipher_reqtfm(req));
+	rctx = ablkcipher_request_ctx(req);
+	rctx->mode &= AES_FLAGS_MODE_MSK;
+	/* Assign new request to device */
+	aes->req = req;
+	aes->info = &ctx->info;
+	aes->flags = (aes->flags & ~AES_FLAGS_MODE_MSK) | rctx->mode;
+
+	err = mtk_aes_map(cryp, aes);
+	if (err)
+		return err;
+
+	return mtk_aes_xmit(cryp, aes);
+}
+
+static void mtk_aes_unmap(struct mtk_cryp *cryp, struct mtk_aes_rec *aes)
+{
+	dma_unmap_single(cryp->dev, aes->ct_dma,
+			 sizeof(struct mtk_aes_info), DMA_TO_DEVICE);
+
+	if (aes->src.sg == aes->dst.sg) {
+		dma_unmap_sg(cryp->dev, aes->src.sg,
+			     aes->src.nents, DMA_BIDIRECTIONAL);
+
+		if (aes->src.sg != &aes->aligned_sg)
+			mtk_aes_restore_sg(&aes->src);
+	} else {
+		dma_unmap_sg(cryp->dev, aes->dst.sg,
+			     aes->dst.nents, DMA_FROM_DEVICE);
+
+		if (aes->dst.sg != &aes->aligned_sg)
+			mtk_aes_restore_sg(&aes->dst);
+
+		dma_unmap_sg(cryp->dev, aes->src.sg,
+			     aes->src.nents, DMA_TO_DEVICE);
+
+		if (aes->src.sg != &aes->aligned_sg)
+			mtk_aes_restore_sg(&aes->src);
+	}
+
+	if (aes->dst.sg == &aes->aligned_sg)
+		sg_copy_from_buffer(aes->real_dst,
+				    sg_nents(aes->real_dst),
+				    aes->buf, aes->total);
+}
+
+static inline void mtk_aes_complete(struct mtk_cryp *cryp,
+				    struct mtk_aes_rec *aes)
+{
+	aes->flags &= ~AES_FLAGS_BUSY;
+
+	aes->req->base.complete(&aes->req->base, 0);
+
+	/* Handle new request */
+	mtk_aes_handle_queue(cryp, aes->id, NULL);
+}
+
+/* Check and set the AES key to transform state buffer */
+static int mtk_aes_setkey(struct crypto_ablkcipher *tfm,
+			  const u8 *key, u32 keylen)
+{
+	struct mtk_aes_ctx *ctx = crypto_ablkcipher_ctx(tfm);
+	const u32 *key_tmp = (const u32 *)key;
+	u32 *key_state = ctx->info.tfm.state;
+	int i;
+
+	if (keylen != AES_KEYSIZE_128 &&
+	    keylen != AES_KEYSIZE_192 &&
+	    keylen != AES_KEYSIZE_256) {
+		crypto_ablkcipher_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+
+	ctx->keylen = SIZE_IN_WORDS(keylen);
+
+	for (i = 0; i < ctx->keylen; i++)
+		key_state[i] = cpu_to_le32(key_tmp[i]);
+
+	return 0;
+}
+
+static int mtk_aes_crypt(struct ablkcipher_request *req, u64 mode)
+{
+	struct mtk_aes_ctx *ctx = crypto_ablkcipher_ctx(
+			crypto_ablkcipher_reqtfm(req));
+	struct mtk_aes_reqctx *rctx = ablkcipher_request_ctx(req);
+
+	rctx->mode = mode;
+
+	return mtk_aes_handle_queue(ctx->cryp,
+			!(mode & AES_FLAGS_ENCRYPT), req);
+}
+
+static int mtk_ecb_encrypt(struct ablkcipher_request *req)
+{
+	return mtk_aes_crypt(req, AES_FLAGS_ENCRYPT | AES_FLAGS_ECB);
+}
+
+static int mtk_ecb_decrypt(struct ablkcipher_request *req)
+{
+	return mtk_aes_crypt(req, AES_FLAGS_ECB);
+}
+
+static int mtk_cbc_encrypt(struct ablkcipher_request *req)
+{
+	return mtk_aes_crypt(req, AES_FLAGS_ENCRYPT | AES_FLAGS_CBC);
+}
+
+static int mtk_cbc_decrypt(struct ablkcipher_request *req)
+{
+	return mtk_aes_crypt(req, AES_FLAGS_CBC);
+}
+
+static int mtk_aes_cra_init(struct crypto_tfm *tfm)
+{
+	struct mtk_aes_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct mtk_cryp *cryp = NULL;
+
+	tfm->crt_ablkcipher.reqsize = sizeof(struct mtk_aes_reqctx);
+
+	cryp = mtk_aes_find_dev(ctx);
+	if (!cryp) {
+		pr_err("can't find crypto device\n");
+		return -ENODEV;
+	}
+
+	return 0;
+}
+
+static struct crypto_alg aes_algs[] = {
+{
+	.cra_name		=	"cbc(aes)",
+	.cra_driver_name	=	"cbc-aes-mtk",
+	.cra_priority		=	400,
+	.cra_flags		=	CRYPTO_ALG_TYPE_ABLKCIPHER |
+						CRYPTO_ALG_ASYNC,
+	.cra_init		=	mtk_aes_cra_init,
+	.cra_blocksize		=	AES_BLOCK_SIZE,
+	.cra_ctxsize		=	sizeof(struct mtk_aes_ctx),
+	.cra_alignmask		=	15,
+	.cra_type		=	&crypto_ablkcipher_type,
+	.cra_module		=	THIS_MODULE,
+	.cra_u.ablkcipher	=	{
+		.min_keysize	=	AES_MIN_KEY_SIZE,
+		.max_keysize	=	AES_MAX_KEY_SIZE,
+		.setkey		=	mtk_aes_setkey,
+		.encrypt	=	mtk_cbc_encrypt,
+		.decrypt	=	mtk_cbc_decrypt,
+		.ivsize		=	AES_BLOCK_SIZE,
+	}
+},
+{
+	.cra_name		=	"ecb(aes)",
+	.cra_driver_name	=	"ecb-aes-mtk",
+	.cra_priority		=	400,
+	.cra_flags		=	CRYPTO_ALG_TYPE_ABLKCIPHER |
+						CRYPTO_ALG_ASYNC,
+	.cra_init		=	mtk_aes_cra_init,
+	.cra_blocksize		=	AES_BLOCK_SIZE,
+	.cra_ctxsize		=	sizeof(struct mtk_aes_ctx),
+	.cra_alignmask		=	15,
+	.cra_type		=	&crypto_ablkcipher_type,
+	.cra_module		=	THIS_MODULE,
+	.cra_u.ablkcipher	=	{
+		.min_keysize	=	AES_MIN_KEY_SIZE,
+		.max_keysize	=	AES_MAX_KEY_SIZE,
+		.setkey		=	mtk_aes_setkey,
+		.encrypt	=	mtk_ecb_encrypt,
+		.decrypt	=	mtk_ecb_decrypt,
+	}
+},
+};
+
+static void mtk_aes_enc_task(unsigned long data)
+{
+	struct mtk_cryp *cryp = (struct mtk_cryp *)data;
+	struct mtk_aes_rec *aes = cryp->aes[0];
+
+	mtk_aes_unmap(cryp, aes);
+	mtk_aes_complete(cryp, aes);
+}
+
+static void mtk_aes_dec_task(unsigned long data)
+{
+	struct mtk_cryp *cryp = (struct mtk_cryp *)data;
+	struct mtk_aes_rec *aes = cryp->aes[1];
+
+	mtk_aes_unmap(cryp, aes);
+	mtk_aes_complete(cryp, aes);
+}
+
+static irqreturn_t mtk_aes_enc_irq(int irq, void *dev_id)
+{
+	struct mtk_cryp *cryp = (struct mtk_cryp *)dev_id;
+	struct mtk_aes_rec *aes = cryp->aes[0];
+	u32 val = mtk_aes_read(cryp, RDR_STAT(RING0));
+
+	mtk_aes_write(cryp, RDR_STAT(RING0), val);
+
+	if (likely(AES_FLAGS_BUSY & aes->flags)) {
+		mtk_aes_write(cryp, RDR_PROC_COUNT(RING0), MTK_CNT_RST);
+		mtk_aes_write(cryp, RDR_THRESH(RING0),
+			      MTK_RDR_PROC_THRESH | MTK_RDR_PROC_MODE);
+
+		tasklet_schedule(&aes->task);
+	} else {
+		dev_warn(cryp->dev, "AES interrupt when no active requests.\n");
+	}
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t mtk_aes_dec_irq(int irq, void *dev_id)
+{
+	struct mtk_cryp *cryp = (struct mtk_cryp *)dev_id;
+	struct mtk_aes_rec *aes = cryp->aes[1];
+	u32 val = mtk_aes_read(cryp, RDR_STAT(RING1));
+
+	mtk_aes_write(cryp, RDR_STAT(RING1), val);
+
+	if (likely(AES_FLAGS_BUSY & aes->flags)) {
+		mtk_aes_write(cryp, RDR_PROC_COUNT(RING1), MTK_CNT_RST);
+		mtk_aes_write(cryp, RDR_THRESH(RING1),
+			      MTK_RDR_PROC_THRESH | MTK_RDR_PROC_MODE);
+
+		tasklet_schedule(&aes->task);
+	} else {
+		dev_warn(cryp->dev, "AES interrupt when no active requests.\n");
+	}
+	return IRQ_HANDLED;
+}
+
+/*
+ * The purpose of creating encryption and decryption records is
+ * to process outbound/inbound data in parallel, it can improve
+ * performance in most use cases, such as IPSec VPN, especially
+ * under heavy network traffic.
+ */
+static int mtk_aes_record_init(struct mtk_cryp *cryp)
+{
+	struct mtk_aes_rec **aes = cryp->aes;
+	int i, err = -ENOMEM;
+
+	for (i = 0; i < MTK_REC_NUM; i++) {
+		aes[i] = kzalloc(sizeof(**aes), GFP_KERNEL);
+		if (!aes[i])
+			goto err_cleanup;
+
+		aes[i]->buf = (void *)__get_free_pages(GFP_KERNEL,
+						AES_BUF_ORDER);
+		if (!aes[i]->buf)
+			goto err_cleanup;
+
+		aes[i]->id = i;
+
+		spin_lock_init(&aes[i]->lock);
+		crypto_init_queue(&aes[i]->queue, AES_QUEUE_SIZE);
+	}
+
+	tasklet_init(&aes[0]->task, mtk_aes_enc_task, (unsigned long)cryp);
+	tasklet_init(&aes[1]->task, mtk_aes_dec_task, (unsigned long)cryp);
+
+	return 0;
+
+err_cleanup:
+	for (; i--; ) {
+		free_page((unsigned long)aes[i]->buf);
+		kfree(aes[i]);
+	}
+
+	return err;
+}
+
+static void mtk_aes_record_free(struct mtk_cryp *cryp)
+{
+	int i;
+
+	for (i = 0; i < MTK_REC_NUM; i++) {
+		tasklet_kill(&cryp->aes[i]->task);
+		free_page((unsigned long)cryp->aes[i]->buf);
+		kfree(cryp->aes[i]);
+	}
+}
+
+static void mtk_aes_unregister_algs(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(aes_algs); i++)
+		crypto_unregister_alg(&aes_algs[i]);
+}
+
+static int mtk_aes_register_algs(void)
+{
+	int err, i;
+
+	for (i = 0; i < ARRAY_SIZE(aes_algs); i++) {
+		err = crypto_register_alg(&aes_algs[i]);
+		if (err)
+			goto err_aes_algs;
+	}
+
+	return 0;
+
+err_aes_algs:
+	for (; i--; )
+		crypto_unregister_alg(&aes_algs[i]);
+
+	return err;
+}
+
+int mtk_cipher_alg_register(struct mtk_cryp *cryp)
+{
+	int ret;
+
+	INIT_LIST_HEAD(&cryp->aes_list);
+
+	/* Initialize two cipher records */
+	ret = mtk_aes_record_init(cryp);
+	if (ret)
+		goto err_record;
+
+	/* Ring0 is use by encryption record */
+	ret = devm_request_irq(cryp->dev, cryp->irq[RING0], mtk_aes_enc_irq,
+			       IRQF_TRIGGER_LOW, "mtk-aes", cryp);
+	if (ret) {
+		dev_err(cryp->dev, "unable to request AES encryption irq.\n");
+		goto err_res;
+	}
+
+	/* Ring1 is use by decryption record */
+	ret = devm_request_irq(cryp->dev, cryp->irq[RING1], mtk_aes_dec_irq,
+			       IRQF_TRIGGER_LOW, "mtk-aes", cryp);
+	if (ret) {
+		dev_err(cryp->dev, "unable to request AES decryption irq.\n");
+		goto err_res;
+	}
+
+	/* Enable ring0 and ring1 interrupt */
+	mtk_aes_write(cryp, AIC_ENABLE_SET(RING0), MTK_IRQ_RDR0);
+	mtk_aes_write(cryp, AIC_ENABLE_SET(RING1), MTK_IRQ_RDR1);
+
+	spin_lock(&mtk_aes.lock);
+	list_add_tail(&cryp->aes_list, &mtk_aes.dev_list);
+	spin_unlock(&mtk_aes.lock);
+
+	ret = mtk_aes_register_algs();
+	if (ret)
+		goto err_algs;
+
+	return 0;
+
+err_algs:
+	spin_lock(&mtk_aes.lock);
+	list_del(&cryp->aes_list);
+	spin_unlock(&mtk_aes.lock);
+err_res:
+	mtk_aes_record_free(cryp);
+err_record:
+
+	dev_err(cryp->dev, "mtk-aes initialization failed.\n");
+	return ret;
+}
+
+void mtk_cipher_alg_release(struct mtk_cryp *cryp)
+{
+	spin_lock(&mtk_aes.lock);
+	list_del(&cryp->aes_list);
+	spin_unlock(&mtk_aes.lock);
+
+	mtk_aes_unregister_algs();
+	mtk_aes_record_free(cryp);
+}
diff --git a/drivers/crypto/mediatek/mtk-platform.c b/drivers/crypto/mediatek/mtk-platform.c
new file mode 100644
index 0000000..286296f
--- /dev/null
+++ b/drivers/crypto/mediatek/mtk-platform.c
@@ -0,0 +1,604 @@
+/*
+ * Driver for EIP97 cryptographic accelerator.
+ *
+ * Copyright (c) 2016 Ryder Lee <ryder.lee@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#include <linux/clk.h>
+#include <linux/init.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/pm_runtime.h>
+#include "mtk-platform.h"
+
+#define MTK_BURST_SIZE_MSK		GENMASK(7, 4)
+#define MTK_BURST_SIZE(x)		((x) << 4)
+#define MTK_DESC_SIZE(x)		((x) << 0)
+#define MTK_DESC_OFFSET(x)		((x) << 16)
+#define MTK_DESC_FETCH_SIZE(x)		((x) << 0)
+#define MTK_DESC_FETCH_THRESH(x)	((x) << 16)
+#define MTK_DESC_OVL_IRQ_EN		BIT(25)
+#define MTK_DESC_ATP_PRESENT		BIT(30)
+
+#define MTK_DFSE_IDLE			GENMASK(3, 0)
+#define MTK_DFSE_THR_CTRL_EN		BIT(30)
+#define MTK_DFSE_THR_CTRL_RESET		BIT(31)
+#define MTK_DFSE_RING_ID(x)		(((x) >> 12) & GENMASK(3, 0))
+#define MTK_DFSE_MIN_DATA(x)		((x) << 0)
+#define MTK_DFSE_MAX_DATA(x)		((x) << 8)
+#define MTK_DFE_MIN_CTRL(x)		((x) << 16)
+#define MTK_DFE_MAX_CTRL(x)		((x) << 24)
+
+#define MTK_IN_BUF_MIN_THRESH(x)	((x) << 8)
+#define MTK_IN_BUF_MAX_THRESH(x)	((x) << 12)
+#define MTK_OUT_BUF_MIN_THRESH(x)	((x) << 0)
+#define MTK_OUT_BUF_MAX_THRESH(x)	((x) << 4)
+#define MTK_IN_TBUF_SIZE(x)		(((x) >> 4) & GENMASK(3, 0))
+#define MTK_IN_DBUF_SIZE(x)		(((x) >> 8) & GENMASK(3, 0))
+#define MTK_OUT_DBUF_SIZE(x)		(((x) >> 16) & GENMASK(3, 0))
+#define MTK_CMD_FIFO_SIZE(x)		(((x) >> 8) & GENMASK(3, 0))
+#define MTK_RES_FIFO_SIZE(x)		(((x) >> 12) & GENMASK(3, 0))
+
+#define MTK_PE_TK_LOC_AVL		BIT(2)
+#define MTK_PE_PROC_HELD		BIT(14)
+#define MTK_PE_TK_TIMEOUT_EN		BIT(22)
+#define MTK_PE_INPUT_DMA_ERR		BIT(0)
+#define MTK_PE_OUTPUT_DMA_ERR		BIT(1)
+#define MTK_PE_PKT_PORC_ERR		BIT(2)
+#define MTK_PE_PKT_TIMEOUT		BIT(3)
+#define MTK_PE_FATAL_ERR		BIT(14)
+#define MTK_PE_INPUT_DMA_ERR_EN		BIT(16)
+#define MTK_PE_OUTPUT_DMA_ERR_EN	BIT(17)
+#define MTK_PE_PKT_PORC_ERR_EN		BIT(18)
+#define MTK_PE_PKT_TIMEOUT_EN		BIT(19)
+#define MTK_PE_FATAL_ERR_EN		BIT(30)
+#define MTK_PE_INT_OUT_EN		BIT(31)
+
+#define MTK_HIA_SIGNATURE		((u16)0x35ca)
+#define MTK_HIA_DATA_WIDTH(x)		(((x) >> 25) & GENMASK(1, 0))
+#define MTK_HIA_DMA_LENGTH(x)		(((x) >> 20) & GENMASK(4, 0))
+#define MTK_CDR_STAT_CLR		GENMASK(4, 0)
+#define MTK_RDR_STAT_CLR		GENMASK(7, 0)
+
+#define MTK_AIC_INT_MSK			GENMASK(5, 0)
+#define MTK_AIC_VER_MSK			(GENMASK(15, 0) | GENMASK(27, 20))
+#define MTK_AIC_VER11			0x011036c9
+#define MTK_AIC_VER12			0x012036c9
+#define MTK_AIC_G_CLR			GENMASK(30, 20)
+
+/**
+ * EIP97 is an integrated security subsystem to accelerate cryptographic
+ * functions and protocols to offload the host processor.
+ * Some important hardware modules are briefly introduced below:
+ *
+ * Host Interface Adapter(HIA) - the main interface between the host
+ * system and the hardware subsystem. It is responsible for attaching
+ * processing engine to the specific host bus interface and provides a
+ * standardized software view for off loading tasks to the engine.
+ *
+ * Command Descriptor Ring Manager(CDR Manager) - keeps track of how many
+ * CD the host has prepared in the CDR. It monitors the fill level of its
+ * CD-FIFO and if there's sufficient space for the next block of descriptors,
+ * then it fires off a DMA request to fetch a block of CDs.
+ *
+ * Data fetch engine(DFE) - It is responsible for parsing the CD and
+ * setting up the required control and packet data DMA transfers from
+ * system memory to the processing engine.
+ *
+ * Result Descriptor Ring Manager(RDR Manager) - same as CDR Manager,
+ * but target is result descriptors, Moreover, it also handles the RD
+ * updates under control of the DSE. For each packet data segment
+ * processed, the DSE triggers the RDR Manager to write the updated RD.
+ * If triggered to update, the RDR Manager sets up a DMA operation to
+ * copy the RD from the DSE to the correct location in the RDR.
+ *
+ * Data Store Engine(DSE) - It is responsible for parsing the prepared RD
+ * and setting up the required control and packet data DMA transfers from
+ * the processing engine to system memory.
+ *
+ * Advanced Interrupt Controllers(AICs) - receive interrupt request signals
+ * from various sources and combine them into one interrupt output.
+ * The AICs are used by:
+ * - One for the HIA global and processing engine interrupts.
+ * - The others for the descriptor ring interrupts.
+ */
+
+/* Cryptographic engine capabilities */
+struct mtk_sys_cap {
+	/* host interface adapter */
+	u32 hia_ver;
+	u32 hia_opt;
+	/* packet engine */
+	u32 pkt_eng_opt;
+	/* global hardware */
+	u32 hw_opt;
+};
+
+static void mtk_desc_ring_link(struct mtk_cryp *cryp, u32 mask)
+{
+	/* Assign rings to DFE/DSE thread and enable it */
+	writel(MTK_DFSE_THR_CTRL_EN | mask, cryp->base + DFE_THR_CTRL);
+	writel(MTK_DFSE_THR_CTRL_EN | mask, cryp->base + DSE_THR_CTRL);
+}
+
+static void mtk_dfe_dse_buf_setup(struct mtk_cryp *cryp,
+				  struct mtk_sys_cap *cap)
+{
+	u32 width = MTK_HIA_DATA_WIDTH(cap->hia_opt) + 2;
+	u32 len = MTK_HIA_DMA_LENGTH(cap->hia_opt) - 1;
+	u32 ipbuf = min((u32)MTK_IN_DBUF_SIZE(cap->hw_opt) + width, len);
+	u32 opbuf = min((u32)MTK_OUT_DBUF_SIZE(cap->hw_opt) + width, len);
+	u32 itbuf = min((u32)MTK_IN_TBUF_SIZE(cap->hw_opt) + width, len);
+
+	writel(MTK_DFSE_MIN_DATA(ipbuf - 1) |
+	       MTK_DFSE_MAX_DATA(ipbuf) |
+	       MTK_DFE_MIN_CTRL(itbuf - 1) |
+	       MTK_DFE_MAX_CTRL(itbuf),
+	       cryp->base + DFE_CFG);
+
+	writel(MTK_DFSE_MIN_DATA(opbuf - 1) |
+	       MTK_DFSE_MAX_DATA(opbuf),
+	       cryp->base + DSE_CFG);
+
+	writel(MTK_IN_BUF_MIN_THRESH(ipbuf - 1) |
+	       MTK_IN_BUF_MAX_THRESH(ipbuf),
+	       cryp->base + PE_IN_DBUF_THRESH);
+
+	writel(MTK_IN_BUF_MIN_THRESH(itbuf - 1) |
+	       MTK_IN_BUF_MAX_THRESH(itbuf),
+	       cryp->base + PE_IN_TBUF_THRESH);
+
+	writel(MTK_OUT_BUF_MIN_THRESH(opbuf - 1) |
+	       MTK_OUT_BUF_MAX_THRESH(opbuf),
+	       cryp->base + PE_OUT_DBUF_THRESH);
+
+	writel(0, cryp->base + PE_OUT_TBUF_THRESH);
+	writel(0, cryp->base + PE_OUT_BUF_CTRL);
+}
+
+static int mtk_dfe_dse_state_check(struct mtk_cryp *cryp)
+{
+	int ret = -EINVAL;
+	u32 val;
+
+	/* Check for completion of all DMA transfers */
+	val = readl(cryp->base + DFE_THR_STAT);
+	if (MTK_DFSE_RING_ID(val) == MTK_DFSE_IDLE) {
+		val = readl(cryp->base + DSE_THR_STAT);
+		if (MTK_DFSE_RING_ID(val) == MTK_DFSE_IDLE)
+			ret = 0;
+	}
+
+	if (!ret) {
+		/* Take DFE/DSE thread out of reset */
+		writel(0, cryp->base + DFE_THR_CTRL);
+		writel(0, cryp->base + DSE_THR_CTRL);
+	} else {
+		return -EBUSY;
+	}
+
+	return 0;
+}
+
+static int mtk_dfe_dse_reset(struct mtk_cryp *cryp)
+{
+	int err;
+
+	/* Reset DSE/DFE and correct system priorities for all rings. */
+	writel(MTK_DFSE_THR_CTRL_RESET, cryp->base + DFE_THR_CTRL);
+	writel(0, cryp->base + DFE_PRIO_0);
+	writel(0, cryp->base + DFE_PRIO_1);
+	writel(0, cryp->base + DFE_PRIO_2);
+	writel(0, cryp->base + DFE_PRIO_3);
+
+	writel(MTK_DFSE_THR_CTRL_RESET, cryp->base + DSE_THR_CTRL);
+	writel(0, cryp->base + DSE_PRIO_0);
+	writel(0, cryp->base + DSE_PRIO_1);
+	writel(0, cryp->base + DSE_PRIO_2);
+	writel(0, cryp->base + DSE_PRIO_3);
+
+	err = mtk_dfe_dse_state_check(cryp);
+	if (err)
+		return err;
+
+	return 0;
+}
+
+static void mtk_cmd_desc_ring_setup(struct mtk_cryp *cryp,
+				    int i, struct mtk_sys_cap *cap)
+{
+	/* Full descriptor that fits FIFO minus one */
+	u32 count =
+		((1 << MTK_CMD_FIFO_SIZE(cap->hia_opt)) / MTK_DESC_SZ) - 1;
+
+	/* Temporarily disable external triggering */
+	writel(0, cryp->base + CDR_CFG(i));
+
+	/* Clear CDR count */
+	writel(MTK_CNT_RST, cryp->base + CDR_PREP_COUNT(i));
+	writel(MTK_CNT_RST, cryp->base + CDR_PROC_COUNT(i));
+
+	writel(0, cryp->base + CDR_PREP_PNTR(i));
+	writel(0, cryp->base + CDR_PROC_PNTR(i));
+	writel(0, cryp->base + CDR_DMA_CFG(i));
+
+	/* Configure CDR host address space */
+	writel(0, cryp->base + CDR_BASE_ADDR_HI(i));
+	writel(cryp->ring[i]->cmd_dma, cryp->base + CDR_BASE_ADDR_LO(i));
+
+	writel(MTK_DESC_RING_SZ, cryp->base + CDR_RING_SIZE(i));
+
+	/* Clear and disable all CDR interrupts */
+	writel(MTK_CDR_STAT_CLR, cryp->base + CDR_STAT(i));
+
+	/*
+	 * Set command descriptor offset and enable additional
+	 * token present in descriptor.
+	 */
+	writel(MTK_DESC_SIZE(MTK_DESC_SZ) |
+		   MTK_DESC_OFFSET(MTK_DESC_OFF) |
+	       MTK_DESC_ATP_PRESENT,
+	       cryp->base + CDR_DESC_SIZE(i));
+
+	writel(MTK_DESC_FETCH_SIZE(count * MTK_DESC_OFF) |
+		   MTK_DESC_FETCH_THRESH(count * MTK_DESC_SZ),
+		   cryp->base + CDR_CFG(i));
+}
+
+static void mtk_res_desc_ring_setup(struct mtk_cryp *cryp,
+				    int i, struct mtk_sys_cap *cap)
+{
+	u32 rndup = 2;
+	u32 count = ((1 << MTK_RES_FIFO_SIZE(cap->hia_opt)) / rndup) - 1;
+
+	/* Temporarily disable external triggering */
+	writel(0, cryp->base + RDR_CFG(i));
+
+	/* Clear RDR count */
+	writel(MTK_CNT_RST, cryp->base + RDR_PREP_COUNT(i));
+	writel(MTK_CNT_RST, cryp->base + RDR_PROC_COUNT(i));
+
+	writel(0, cryp->base + RDR_PREP_PNTR(i));
+	writel(0, cryp->base + RDR_PROC_PNTR(i));
+	writel(0, cryp->base + RDR_DMA_CFG(i));
+
+	/* Configure RDR host address space */
+	writel(0, cryp->base + RDR_BASE_ADDR_HI(i));
+	writel(cryp->ring[i]->res_dma, cryp->base + RDR_BASE_ADDR_LO(i));
+
+	writel(MTK_DESC_RING_SZ, cryp->base + RDR_RING_SIZE(i));
+	writel(MTK_RDR_STAT_CLR, cryp->base + RDR_STAT(i));
+
+	/*
+	 * RDR manager generates update interrupts on a per-completed-packet,
+	 * and the rd_proc_thresh_irq interrupt is fired when proc_pkt_count
+	 * for the RDR exceeds the number of packets.
+	 */
+	writel(MTK_RDR_PROC_THRESH | MTK_RDR_PROC_MODE,
+	       cryp->base + RDR_THRESH(i));
+
+	/*
+	 * Configure a threshold and time-out value for the processed
+	 * result descriptors (or complete packets) that are written to
+	 * the RDR.
+	 */
+	writel(MTK_DESC_SIZE(MTK_DESC_SZ) | MTK_DESC_OFFSET(MTK_DESC_OFF),
+	       cryp->base + RDR_DESC_SIZE(i));
+
+	/*
+	 * Configure HIA fetch size and fetch threshold that are used to
+	 * fetch blocks of multiple descriptors.
+	 */
+	writel(MTK_DESC_FETCH_SIZE(count * MTK_DESC_OFF) |
+	       MTK_DESC_FETCH_THRESH(count * rndup) |
+	       MTK_DESC_OVL_IRQ_EN,
+		   cryp->base + RDR_CFG(i));
+}
+
+static int mtk_packet_engine_setup(struct mtk_cryp *cryp)
+{
+	struct mtk_sys_cap cap;
+	int i, err;
+	u32 val;
+
+	cap.hia_ver = readl(cryp->base + HIA_VERSION);
+	cap.hia_opt = readl(cryp->base + HIA_OPTIONS);
+	cap.hw_opt = readl(cryp->base + EIP97_OPTIONS);
+
+	if (!(((u16)cap.hia_ver) == MTK_HIA_SIGNATURE))
+		return -EINVAL;
+
+	/* Configure endianness conversion method for master (DMA) interface */
+	writel(0, cryp->base + EIP97_MST_CTRL);
+
+	/* Set HIA burst size */
+	val = readl(cryp->base + HIA_MST_CTRL);
+	val &= ~MTK_BURST_SIZE_MSK;
+	val |= MTK_BURST_SIZE(5);
+	writel(val, cryp->base + HIA_MST_CTRL);
+
+	err = mtk_dfe_dse_reset(cryp);
+	if (err) {
+		dev_err(cryp->dev, "Failed to reset DFE and DSE.\n");
+		return err;
+	}
+
+	mtk_dfe_dse_buf_setup(cryp, &cap);
+
+	/* Enable the 4 rings for the packet engines. */
+	mtk_desc_ring_link(cryp, 0xf);
+
+	for (i = 0; i < RING_MAX; i++) {
+		mtk_cmd_desc_ring_setup(cryp, i, &cap);
+		mtk_res_desc_ring_setup(cryp, i, &cap);
+	}
+
+	writel(MTK_PE_TK_LOC_AVL | MTK_PE_PROC_HELD | MTK_PE_TK_TIMEOUT_EN,
+	       cryp->base + PE_TOKEN_CTRL_STAT);
+
+	/* Clear all pending interrupts */
+	writel(MTK_AIC_G_CLR, cryp->base + AIC_G_ACK);
+	writel(MTK_PE_INPUT_DMA_ERR | MTK_PE_OUTPUT_DMA_ERR |
+	       MTK_PE_PKT_PORC_ERR | MTK_PE_PKT_TIMEOUT |
+	       MTK_PE_FATAL_ERR | MTK_PE_INPUT_DMA_ERR_EN |
+	       MTK_PE_OUTPUT_DMA_ERR_EN | MTK_PE_PKT_PORC_ERR_EN |
+	       MTK_PE_PKT_TIMEOUT_EN | MTK_PE_FATAL_ERR_EN |
+	       MTK_PE_INT_OUT_EN,
+	       cryp->base + PE_INTERRUPT_CTRL_STAT);
+
+	return 0;
+}
+
+static int mtk_aic_cap_check(struct mtk_cryp *cryp, int hw)
+{
+	u32 val;
+
+	if (hw == RING_MAX)
+		val = readl(cryp->base + AIC_G_VERSION);
+	else
+		val = readl(cryp->base + AIC_VERSION(hw));
+
+	val &= MTK_AIC_VER_MSK;
+	if (val != MTK_AIC_VER11 && val != MTK_AIC_VER12)
+		return -ENXIO;
+
+	if (hw == RING_MAX)
+		val = readl(cryp->base + AIC_G_OPTIONS);
+	else
+		val = readl(cryp->base + AIC_OPTIONS(hw));
+
+	val &= MTK_AIC_INT_MSK;
+	if (!val || val > 32)
+		return -ENXIO;
+
+	return 0;
+}
+
+static int mtk_aic_init(struct mtk_cryp *cryp, int hw)
+{
+	int err;
+
+	err = mtk_aic_cap_check(cryp, hw);
+	if (err)
+		return err;
+
+	/* Disable all interrupts and set initial configuration */
+	if (hw == RING_MAX) {
+		writel(0, cryp->base + AIC_G_ENABLE_CTRL);
+		writel(0, cryp->base + AIC_G_POL_CTRL);
+		writel(0, cryp->base + AIC_G_TYPE_CTRL);
+		writel(0, cryp->base + AIC_G_ENABLE_SET);
+	} else {
+		writel(0, cryp->base + AIC_ENABLE_CTRL(hw));
+		writel(0, cryp->base + AIC_POL_CTRL(hw));
+		writel(0, cryp->base + AIC_TYPE_CTRL(hw));
+		writel(0, cryp->base + AIC_ENABLE_SET(hw));
+	}
+
+	return 0;
+}
+
+static int mtk_accelerator_init(struct mtk_cryp *cryp)
+{
+	int i, err;
+
+	/* Initialize advanced interrupt controller(AIC) */
+	for (i = 0; i < MTK_IRQ_NUM; i++) {
+		err = mtk_aic_init(cryp, i);
+		if (err) {
+			dev_err(cryp->dev, "Failed to initialize AIC.\n");
+			return err;
+		}
+	}
+
+	/* Initialize packet engine */
+	err = mtk_packet_engine_setup(cryp);
+	if (err) {
+		dev_err(cryp->dev, "Failed to configure packet engine.\n");
+		return err;
+	}
+
+	return 0;
+}
+
+static void mtk_desc_dma_free(struct mtk_cryp *cryp)
+{
+	int i;
+
+	for (i = 0; i < RING_MAX; i++) {
+		dma_free_coherent(cryp->dev, MTK_DESC_RING_SZ,
+				  cryp->ring[i]->res_base,
+				  cryp->ring[i]->res_dma);
+		dma_free_coherent(cryp->dev, MTK_DESC_RING_SZ,
+				  cryp->ring[i]->cmd_base,
+				  cryp->ring[i]->cmd_dma);
+		kfree(cryp->ring[i]);
+	}
+}
+
+static int mtk_desc_ring_alloc(struct mtk_cryp *cryp)
+{
+	struct mtk_ring **ring = cryp->ring;
+	int i, err = ENOMEM;
+
+	for (i = 0; i < RING_MAX; i++) {
+		ring[i] = kzalloc(sizeof(**ring), GFP_KERNEL);
+		if (!ring[i])
+			goto err_cleanup;
+
+		ring[i]->cmd_base = dma_zalloc_coherent(cryp->dev,
+					   MTK_DESC_RING_SZ,
+					   &ring[i]->cmd_dma,
+					   GFP_KERNEL);
+		if (!ring[i]->cmd_base)
+			goto err_cleanup;
+
+		ring[i]->res_base = dma_zalloc_coherent(cryp->dev,
+					   MTK_DESC_RING_SZ,
+					   &ring[i]->res_dma,
+					   GFP_KERNEL);
+		if (!ring[i]->res_base)
+			goto err_cleanup;
+	}
+	return 0;
+
+err_cleanup:
+	for (; i--; ) {
+		dma_free_coherent(cryp->dev, MTK_DESC_RING_SZ,
+				  ring[i]->res_base, ring[i]->res_dma);
+		dma_free_coherent(cryp->dev, MTK_DESC_RING_SZ,
+				  ring[i]->cmd_base, ring[i]->cmd_dma);
+		kfree(ring[i]);
+	}
+	return err;
+}
+
+static int mtk_crypto_probe(struct platform_device *pdev)
+{
+	struct resource *res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	struct mtk_cryp *cryp;
+	int i, err;
+
+	cryp = devm_kzalloc(&pdev->dev, sizeof(*cryp), GFP_KERNEL);
+	if (!cryp)
+		return -ENOMEM;
+
+	cryp->base = devm_ioremap_resource(&pdev->dev, res);
+	if (IS_ERR(cryp->base))
+		return PTR_ERR(cryp->base);
+
+	for (i = 0; i < MTK_IRQ_NUM; i++) {
+		cryp->irq[i] = platform_get_irq(pdev, i);
+		if (cryp->irq[i] < 0) {
+			dev_err(cryp->dev, "no IRQ:%d resource info\n", i);
+			return -ENXIO;
+		}
+	}
+
+	cryp->clk_ethif = devm_clk_get(&pdev->dev, "ethif");
+	cryp->clk_cryp = devm_clk_get(&pdev->dev, "cryp");
+	if (IS_ERR(cryp->clk_ethif) || IS_ERR(cryp->clk_cryp))
+		return -EPROBE_DEFER;
+
+	cryp->dev = &pdev->dev;
+	pm_runtime_enable(cryp->dev);
+	pm_runtime_get_sync(cryp->dev);
+
+	err = clk_prepare_enable(cryp->clk_ethif);
+	if (err)
+		goto err_clk_ethif;
+
+	err = clk_prepare_enable(cryp->clk_cryp);
+	if (err)
+		goto err_clk_cryp;
+
+	/* Allocate four command/result descriptor rings */
+	err = mtk_desc_ring_alloc(cryp);
+	if (err) {
+		dev_err(cryp->dev, "Unable to allocate descriptor rings.\n");
+		goto err_resource;
+	}
+
+	/* Initialize hardware modules */
+	err = mtk_accelerator_init(cryp);
+	if (err) {
+		dev_err(cryp->dev, "Failed to initialize cryptographic engine.\n");
+		goto err_engine;
+	}
+
+	err = mtk_cipher_alg_register(cryp);
+	if (err) {
+		dev_err(cryp->dev, "Unable to register cipher algorithm.\n");
+		goto err_cipher;
+	}
+
+	err = mtk_hash_alg_register(cryp);
+	if (err) {
+		dev_err(cryp->dev, "Unable to register hash algorithm.\n");
+		goto err_hash;
+	}
+
+	platform_set_drvdata(pdev, cryp);
+	return 0;
+
+err_hash:
+	mtk_cipher_alg_release(cryp);
+err_cipher:
+	mtk_dfe_dse_reset(cryp);
+err_engine:
+	mtk_desc_dma_free(cryp);
+err_resource:
+	clk_disable_unprepare(cryp->clk_cryp);
+err_clk_cryp:
+	clk_disable_unprepare(cryp->clk_ethif);
+err_clk_ethif:
+	pm_runtime_put_sync(cryp->dev);
+	pm_runtime_disable(cryp->dev);
+
+	return err;
+}
+
+static int mtk_crypto_remove(struct platform_device *pdev)
+{
+	struct mtk_cryp *cryp = platform_get_drvdata(pdev);
+
+	mtk_hash_alg_release(cryp);
+	mtk_cipher_alg_release(cryp);
+	mtk_desc_dma_free(cryp);
+
+	clk_disable_unprepare(cryp->clk_cryp);
+	clk_disable_unprepare(cryp->clk_ethif);
+
+	pm_runtime_put_sync(cryp->dev);
+	pm_runtime_disable(cryp->dev);
+	platform_set_drvdata(pdev, NULL);
+
+	return 0;
+}
+
+const struct of_device_id of_crypto_id[] = {
+	{ .compatible = "mediatek,eip97-crypto" },
+	{},
+};
+MODULE_DEVICE_TABLE(of, of_crypto_id);
+
+static struct platform_driver mtk_crypto_driver = {
+	.probe = mtk_crypto_probe,
+	.remove = mtk_crypto_remove,
+	.driver = {
+		   .name = "mtk-crypto",
+		   .owner = THIS_MODULE,
+		   .of_match_table = of_crypto_id,
+	},
+};
+module_platform_driver(mtk_crypto_driver);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Ryder Lee <ryder.lee@mediatek.com>");
+MODULE_DESCRIPTION("Cryptographic accelerator driver for EIP97");
diff --git a/drivers/crypto/mediatek/mtk-platform.h b/drivers/crypto/mediatek/mtk-platform.h
new file mode 100644
index 0000000..4d4309a
--- /dev/null
+++ b/drivers/crypto/mediatek/mtk-platform.h
@@ -0,0 +1,238 @@
+/*
+ * Driver for EIP97 cryptographic accelerator.
+ *
+ * Copyright (c) 2016 Ryder Lee <ryder.lee@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ */
+
+#ifndef __MTK_PLATFORM_H_
+#define __MTK_PLATFORM_H_
+
+#include <crypto/algapi.h>
+#include <crypto/internal/hash.h>
+#include <crypto/scatterwalk.h>
+#include <linux/crypto.h>
+#include <linux/dma-mapping.h>
+#include <linux/interrupt.h>
+#include <linux/scatterlist.h>
+#include "mtk-regs.h"
+
+#define MTK_RDR_PROC_THRESH	BIT(0)
+#define MTK_RDR_PROC_MODE	BIT(23)
+#define MTK_CNT_RST		BIT(31)
+#define MTK_IRQ_RDR0		BIT(1)
+#define MTK_IRQ_RDR1		BIT(3)
+#define MTK_IRQ_RDR2		BIT(5)
+#define MTK_IRQ_RDR3		BIT(7)
+
+#define SIZE_IN_WORDS(x)	((x) >> 2)
+
+/**
+ * Ring 0/1 are used by AES encrypt and decrypt.
+ * Ring 2/3 are used by SHA.
+ */
+enum {
+	RING0 = 0,
+	RING1,
+	RING2,
+	RING3,
+	RING_MAX,
+};
+
+#define MTK_REC_NUM		(RING_MAX / 2)
+#define MTK_IRQ_NUM		5
+
+/**
+ * struct mtk_desc - DMA descriptor
+ * @hdr:	the descriptor control header
+ * @buf:	DMA address of input buffer segment
+ * @ct:		DMA address of command token that control operation flow
+ * @ct_hdr:	the command token control header
+ * @tag:	the user-defined field
+ * @tfm:	DMA address of transform state
+ * @bound:	align descriptors offset boundary
+ *
+ * Structure passed to the crypto engine to describe where source
+ * data needs to be fetched and how it needs to be processed.
+ */
+struct mtk_desc {
+	__le32 hdr;
+	__le32 buf;
+	__le32 ct;
+	__le32 ct_hdr;
+	__le32 tag;
+	__le32 tfm;
+	__le32 bound[2];
+};
+
+#define MTK_DESC_NUM		512
+#define MTK_DESC_OFF		SIZE_IN_WORDS(sizeof(struct mtk_desc))
+#define MTK_DESC_SZ		(MTK_DESC_OFF - 2)
+#define MTK_DESC_RING_SZ	((sizeof(struct mtk_desc) * MTK_DESC_NUM))
+#define MTK_DESC_CNT(x)		((MTK_DESC_OFF * (x)) << 2)
+#define MTK_DESC_LAST		cpu_to_le32(BIT(22))
+#define MTK_DESC_FIRST		cpu_to_le32(BIT(23))
+#define MTK_DESC_BUF_LEN(x)	cpu_to_le32(x)
+#define MTK_DESC_CT_LEN(x)	cpu_to_le32((x) << 24)
+
+/**
+ * struct mtk_ring - Descriptor ring
+ * @cmd_base:	pointer to command descriptor ring base
+ * @cmd_dma:	DMA address of command descriptor ring
+ * @res_base:	pointer to result descriptor ring base
+ * @res_dma:	DMA address of result descriptor ring
+ * @pos:	current position in the ring
+ *
+ * A descriptor ring is a circular buffer that is used to manage
+ * one or more descriptors. There are two type of descriptor rings;
+ * the command descriptor ring and result descriptor ring.
+ */
+struct mtk_ring {
+	struct mtk_desc *cmd_base;
+	dma_addr_t cmd_dma;
+	struct mtk_desc *res_base;
+	dma_addr_t res_dma;
+	u32 pos;
+};
+
+/**
+ * struct mtk_aes_dma - Structure that holds sg list info
+ * @sg:		pointer to scatter-gather list
+ * @nents:	number of entries in the sg list
+ * @remainder:	remainder of sg list
+ * @sg_len:	number of entries in the sg mapped list
+ */
+struct mtk_aes_dma {
+	struct scatterlist *sg;
+	int nents;
+	u32 remainder;
+	u32 sg_len;
+};
+
+/**
+ * struct mtk_aes_rec - AES operation record
+ * @queue:	crypto request queue
+ * @req:	pointer to ablkcipher request
+ * @task:	the tasklet is use in AES interrupt
+ * @src:	the structure that holds source sg list info
+ * @dst:	the structure that holds destination sg list info
+ * @aligned_sg:	the scatter list is use to alignment
+ * @real_dst:	pointer to the destination sg list
+ * @total:	request buffer length
+ * @buf:	pointer to page buffer
+ * @info:	pointer to AES transform state and command token
+ * @ct_hdr:	AES command token control field
+ * @ct_size:	size of AES command token
+ * @ct_dma:	DMA address of AES command token
+ * @tfm_dma:	DMA address of AES transform state
+ * @id:		record identification
+ * @flags:	it's describing AES operation state
+ * @lock:	the ablkcipher queue lock
+ *
+ * Structure used to record AES execution state.
+ */
+struct mtk_aes_rec {
+	struct crypto_queue queue;
+	struct ablkcipher_request *req;
+	struct tasklet_struct task;
+	struct mtk_aes_dma src;
+	struct mtk_aes_dma dst;
+
+	struct scatterlist aligned_sg;
+	struct scatterlist *real_dst;
+
+	size_t total;
+	void *buf;
+
+	void *info;
+	__le32 ct_hdr;
+	u32 ct_size;
+	dma_addr_t ct_dma;
+	dma_addr_t tfm_dma;
+
+	u8 id;
+	unsigned long flags;
+	/* queue lock */
+	spinlock_t lock;
+};
+
+/**
+ * struct mtk_sha_rec - SHA operation record
+ * @queue:	crypto request queue
+ * @req:	pointer to ahash request
+ * @task:	the tasklet is use in SHA interrupt
+ * @info:	pointer to SHA transform state and command token
+ * @ct_hdr:	SHA command token control field
+ * @ct_size:	size of SHA command token
+ * @ct_dma:	DMA address of SHA command token
+ * @tfm_dma:	DMA address of SHA transform state
+ * @id:		record identification
+ * @flags:	it's describing SHA operation state
+ * @lock:	the ablkcipher queue lock
+ *
+ * Structure used to record SHA execution state.
+ */
+struct mtk_sha_rec {
+	struct crypto_queue queue;
+	struct ahash_request *req;
+	struct tasklet_struct task;
+
+	void *info;
+	__le32 ct_hdr;
+	u32 ct_size;
+	dma_addr_t ct_dma;
+	dma_addr_t tfm_dma;
+
+	u8 id;
+	unsigned long flags;
+	/* queue lock */
+	spinlock_t lock;
+};
+
+/**
+ * struct mtk_cryp - Cryptographic device
+ * @base:	pointer to mapped register I/O base
+ * @dev:	pointer to device
+ * @clk_ethif:	pointer to ethif clock
+ * @clk_cryp:	pointer to crypto clock
+ * @irq:	global system and rings IRQ
+ * @ring:	pointer to execution state of AES
+ * @aes:	pointer to execution state of SHA
+ * @sha:	each execution record map to a ring
+ * @aes_list:	device list of AES
+ * @sha_list:	device list of SHA
+ * @tmp:	pointer to temporary buffer for internal use
+ * @tmp_dma:	DMA address of temporary buffer
+ * @rec:	it's used to select SHA record for tfm
+ *
+ * Structure storing cryptographic device information.
+ */
+struct mtk_cryp {
+	void __iomem *base;
+	struct device *dev;
+	struct clk *clk_ethif;
+	struct clk *clk_cryp;
+	int irq[MTK_IRQ_NUM];
+
+	struct mtk_ring *ring[RING_MAX];
+	struct mtk_aes_rec *aes[MTK_REC_NUM];
+	struct mtk_sha_rec *sha[MTK_REC_NUM];
+
+	struct list_head aes_list;
+	struct list_head sha_list;
+
+	void *tmp;
+	dma_addr_t tmp_dma;
+	bool rec;
+};
+
+int mtk_cipher_alg_register(struct mtk_cryp *cryp);
+void mtk_cipher_alg_release(struct mtk_cryp *cryp);
+int mtk_hash_alg_register(struct mtk_cryp *cryp);
+void mtk_hash_alg_release(struct mtk_cryp *cryp);
+
+#endif /* __MTK_PLATFORM_H_ */
diff --git a/drivers/crypto/mediatek/mtk-regs.h b/drivers/crypto/mediatek/mtk-regs.h
new file mode 100644
index 0000000..94f4eb8
--- /dev/null
+++ b/drivers/crypto/mediatek/mtk-regs.h
@@ -0,0 +1,194 @@
+/*
+ * Support for MediaTek cryptographic accelerator.
+ *
+ * Copyright (c) 2016 MediaTek Inc.
+ * Author: Ryder Lee <ryder.lee@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License.
+ *
+ */
+
+#ifndef __MTK_REGS_H__
+#define __MTK_REGS_H__
+
+/* HIA, Command Descriptor Ring Manager */
+#define CDR_BASE_ADDR_LO(x)		(0x0 + ((x) << 12))
+#define CDR_BASE_ADDR_HI(x)		(0x4 + ((x) << 12))
+#define CDR_DATA_BASE_ADDR_LO(x)	(0x8 + ((x) << 12))
+#define CDR_DATA_BASE_ADDR_HI(x)	(0xC + ((x) << 12))
+#define CDR_ACD_BASE_ADDR_LO(x)		(0x10 + ((x) << 12))
+#define CDR_ACD_BASE_ADDR_HI(x)		(0x14 + ((x) << 12))
+#define CDR_RING_SIZE(x)		(0x18 + ((x) << 12))
+#define CDR_DESC_SIZE(x)		(0x1C + ((x) << 12))
+#define CDR_CFG(x)			(0x20 + ((x) << 12))
+#define CDR_DMA_CFG(x)			(0x24 + ((x) << 12))
+#define CDR_THRESH(x)			(0x28 + ((x) << 12))
+#define CDR_PREP_COUNT(x)		(0x2C + ((x) << 12))
+#define CDR_PROC_COUNT(x)		(0x30 + ((x) << 12))
+#define CDR_PREP_PNTR(x)		(0x34 + ((x) << 12))
+#define CDR_PROC_PNTR(x)		(0x38 + ((x) << 12))
+#define CDR_STAT(x)			(0x3C + ((x) << 12))
+
+/* HIA, Result Descriptor Ring Manager */
+#define RDR_BASE_ADDR_LO(x)		(0x800 + ((x) << 12))
+#define RDR_BASE_ADDR_HI(x)		(0x804 + ((x) << 12))
+#define RDR_DATA_BASE_ADDR_LO(x)	(0x808 + ((x) << 12))
+#define RDR_DATA_BASE_ADDR_HI(x)	(0x80C + ((x) << 12))
+#define RDR_ACD_BASE_ADDR_LO(x)		(0x810 + ((x) << 12))
+#define RDR_ACD_BASE_ADDR_HI(x)		(0x814 + ((x) << 12))
+#define RDR_RING_SIZE(x)		(0x818 + ((x) << 12))
+#define RDR_DESC_SIZE(x)		(0x81C + ((x) << 12))
+#define RDR_CFG(x)			(0x820 + ((x) << 12))
+#define RDR_DMA_CFG(x)			(0x824 + ((x) << 12))
+#define RDR_THRESH(x)			(0x828 + ((x) << 12))
+#define RDR_PREP_COUNT(x)		(0x82C + ((x) << 12))
+#define RDR_PROC_COUNT(x)		(0x830 + ((x) << 12))
+#define RDR_PREP_PNTR(x)		(0x834 + ((x) << 12))
+#define RDR_PROC_PNTR(x)		(0x838 + ((x) << 12))
+#define RDR_STAT(x)			(0x83C + ((x) << 12))
+
+/* HIA, Ring AIC */
+#define AIC_POL_CTRL(x)			(0xE000 - ((x) << 12))
+#define	AIC_TYPE_CTRL(x)		(0xE004 - ((x) << 12))
+#define	AIC_ENABLE_CTRL(x)		(0xE008 - ((x) << 12))
+#define	AIC_RAW_STAL(x)			(0xE00C - ((x) << 12))
+#define	AIC_ENABLE_SET(x)		(0xE00C - ((x) << 12))
+#define	AIC_ENABLED_STAT(x)		(0xE010 - ((x) << 12))
+#define	AIC_ACK(x)			(0xE010 - ((x) << 12))
+#define	AIC_ENABLE_CLR(x)		(0xE014 - ((x) << 12))
+#define	AIC_OPTIONS(x)			(0xE018 - ((x) << 12))
+#define	AIC_VERSION(x)			(0xE01C - ((x) << 12))
+
+/* HIA, Global AIC */
+#define AIC_G_POL_CTRL			0xF800
+#define AIC_G_TYPE_CTRL			0xF804
+#define AIC_G_ENABLE_CTRL		0xF808
+#define AIC_G_RAW_STAT			0xF80C
+#define AIC_G_ENABLE_SET		0xF80C
+#define AIC_G_ENABLED_STAT		0xF810
+#define AIC_G_ACK			0xF810
+#define AIC_G_ENABLE_CLR		0xF814
+#define AIC_G_OPTIONS			0xF818
+#define AIC_G_VERSION			0xF81C
+
+/* HIA, Data Fetch Engine */
+#define DFE_CFG				0xF000
+#define DFE_PRIO_0			0xF010
+#define DFE_PRIO_1			0xF014
+#define DFE_PRIO_2			0xF018
+#define DFE_PRIO_3			0xF01C
+
+/* HIA, Data Fetch Engine access monitoring for CDR */
+#define DFE_RING_REGION_LO(x)		(0xF080 + ((x) << 3))
+#define DFE_RING_REGION_HI(x)		(0xF084 + ((x) << 3))
+
+/* HIA, Data Fetch Engine thread control and status for thread */
+#define DFE_THR_CTRL			0xF200
+#define DFE_THR_STAT			0xF204
+#define DFE_THR_DESC_CTRL		0xF208
+#define DFE_THR_DESC_DPTR_LO		0xF210
+#define DFE_THR_DESC_DPTR_HI		0xF214
+#define DFE_THR_DESC_ACDPTR_LO		0xF218
+#define DFE_THR_DESC_ACDPTR_HI		0xF21C
+
+/* HIA, Data Store Engine */
+#define DSE_CFG				0xF400
+#define DSE_PRIO_0			0xF410
+#define DSE_PRIO_1			0xF414
+#define DSE_PRIO_2			0xF418
+#define DSE_PRIO_3			0xF41C
+
+/* HIA, Data Store Engine access monitoring for RDR */
+#define DSE_RING_REGION_LO(x)		(0xF480 + ((x) << 3))
+#define DSE_RING_REGION_HI(x)		(0xF484 + ((x) << 3))
+
+/* HIA, Data Store Engine thread control and status for thread */
+#define DSE_THR_CTRL			0xF600
+#define DSE_THR_STAT			0xF604
+#define DSE_THR_DESC_CTRL		0xF608
+#define DSE_THR_DESC_DPTR_LO		0xF610
+#define DSE_THR_DESC_DPTR_HI		0xF614
+#define DSE_THR_DESC_S_DPTR_LO		0xF618
+#define DSE_THR_DESC_S_DPTR_HI		0xF61C
+#define DSE_THR_ERROR_STAT		0xF620
+
+/* HIA Global */
+#define HIA_MST_CTRL			0xFFF4
+#define HIA_OPTIONS			0xFFF8
+#define HIA_VERSION			0xFFFC
+
+/* Processing Engine Input Side, Processing Engine */
+#define PE_IN_DBUF_THRESH		0x10000
+#define PE_IN_TBUF_THRESH		0x10100
+
+/* Packet Engine Configuration / Status Registers */
+#define PE_TOKEN_CTRL_STAT		0x11000
+#define PE_FUNCTION_EN			0x11004
+#define PE_CONTEXT_CTRL			0x11008
+#define PE_INTERRUPT_CTRL_STAT		0x11010
+#define PE_CONTEXT_STAT			0x1100C
+#define PE_OUT_TRANS_CTRL_STAT		0x11018
+#define PE_OUT_BUF_CTRL			0x1101C
+
+/* Packet Engine PRNG Registers */
+#define PE_PRNG_STAT			0x11040
+#define PE_PRNG_CTRL			0x11044
+#define PE_PRNG_SEED_L			0x11048
+#define PE_PRNG_SEED_H			0x1104C
+#define PE_PRNG_KEY_0_L			0x11050
+#define PE_PRNG_KEY_0_H			0x11054
+#define PE_PRNG_KEY_1_L			0x11058
+#define PE_PRNG_KEY_1_H			0x1105C
+#define PE_PRNG_RES_0			0x11060
+#define PE_PRNG_RES_1			0x11064
+#define PE_PRNG_RES_2			0x11068
+#define PE_PRNG_RES_3			0x1106C
+#define PE_PRNG_LFSR_L			0x11070
+#define PE_PRNG_LFSR_H			0x11074
+
+/* Packet Engine AIC */
+#define PE_EIP96_AIC_POL_CTRL		0x113C0
+#define PE_EIP96_AIC_TYPE_CTRL		0x113C4
+#define PE_EIP96_AIC_ENABLE_CTRL	0x113C8
+#define PE_EIP96_AIC_RAW_STAT		0x113CC
+#define PE_EIP96_AIC_ENABLE_SET		0x113CC
+#define PE_EIP96_AIC_ENABLED_STAT	0x113D0
+#define PE_EIP96_AIC_ACK		0x113D0
+#define PE_EIP96_AIC_ENABLE_CLR		0x113D4
+#define PE_EIP96_AIC_OPTIONS		0x113D8
+#define PE_EIP96_AIC_VERSION		0x113DC
+
+/* Packet Engine Options & Version Registers */
+#define PE_EIP96_OPTIONS		0x113F8
+#define PE_EIP96_VERSION		0x113FC
+
+/* Processing Engine Output Side */
+#define PE_OUT_DBUF_THRESH		0x11C00
+#define PE_OUT_TBUF_THRESH		0x11D00
+
+/* Processing Engine Local AIC */
+#define PE_AIC_POL_CTRL			0x11F00
+#define PE_AIC_TYPE_CTRL		0x11F04
+#define PE_AIC_ENABLE_CTRL		0x11F08
+#define PE_AIC_RAW_STAT			0x11F0C
+#define PE_AIC_ENABLE_SET		0x11F0C
+#define PE_AIC_ENABLED_STAT		0x11F10
+#define PE_AIC_ENABLE_CLR		0x11F14
+#define PE_AIC_OPTIONS			0x11F18
+#define PE_AIC_VERSION			0x11F1C
+
+/* Processing Engine General Configuration and Version */
+#define PE_IN_FLIGHT			0x11FF0
+#define PE_OPTIONS			0x11FF8
+#define PE_VERSION			0x11FFC
+
+/* EIP-97 - Global */
+#define EIP97_CLOCK_STATE		0x1FFE4
+#define EIP97_FORCE_CLOCK_ON		0x1FFE8
+#define EIP97_FORCE_CLOCK_OFF		0x1FFEC
+#define EIP97_MST_CTRL			0x1FFF4
+#define EIP97_OPTIONS			0x1FFF8
+#define EIP97_VERSION			0x1FFFC
+#endif /* __MTK_REGS_H__ */
diff --git a/drivers/crypto/mediatek/mtk-sha.c b/drivers/crypto/mediatek/mtk-sha.c
new file mode 100644
index 0000000..8951363
--- /dev/null
+++ b/drivers/crypto/mediatek/mtk-sha.c
@@ -0,0 +1,1437 @@
+/*
+ * Cryptographic API.
+ *
+ * Driver for EIP97 SHA1/SHA2(HMAC) acceleration.
+ *
+ * Copyright (c) 2016 Ryder Lee <ryder.lee@mediatek.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Some ideas are from atmel-sha.c and omap-sham.c drivers.
+ */
+
+#include <crypto/sha.h>
+#include "mtk-platform.h"
+
+#define SHA_ALIGN_MSK		(sizeof(u32) - 1)
+#define SHA_QUEUE_SIZE		512
+#define SHA_TMP_BUF_SIZE	512
+#define SHA_BUF_SIZE		((u32)PAGE_SIZE)
+
+#define SHA_OP_UPDATE		1
+#define SHA_OP_FINAL		2
+
+#define SHA_DATA_LEN_MSK	cpu_to_le32(GENMASK(16, 0))
+
+/* SHA command token */
+#define SHA_CT_SIZE		5
+#define SHA_CT_CTRL_HDR		cpu_to_le32(0x02220000)
+#define SHA_COMMAND0		cpu_to_le32(0x03020000)
+#define SHA_COMMAND1		cpu_to_le32(0x21060000)
+#define SHA_COMMAND2		cpu_to_le32(0xe0e63802)
+
+/* SHA transform information */
+#define SHA_TFM_HASH		cpu_to_le32(0x2 << 0)
+#define SHA_TFM_INNER_DIG	cpu_to_le32(0x1 << 21)
+#define SHA_TFM_SIZE(x)		cpu_to_le32((x) << 8)
+#define SHA_TFM_START		cpu_to_le32(0x1 << 4)
+#define SHA_TFM_CONTINUE	cpu_to_le32(0x1 << 5)
+#define SHA_TFM_HASH_STORE	cpu_to_le32(0x1 << 19)
+#define SHA_TFM_SHA1		cpu_to_le32(0x2 << 23)
+#define SHA_TFM_SHA256		cpu_to_le32(0x3 << 23)
+#define SHA_TFM_SHA224		cpu_to_le32(0x4 << 23)
+#define SHA_TFM_SHA512		cpu_to_le32(0x5 << 23)
+#define SHA_TFM_SHA384		cpu_to_le32(0x6 << 23)
+#define SHA_TFM_DIGEST(x)	cpu_to_le32(((x) & GENMASK(3, 0)) << 24)
+
+/* SHA flags */
+#define SHA_FLAGS_BUSY		BIT(0)
+#define	SHA_FLAGS_FINAL		BIT(1)
+#define SHA_FLAGS_FINUP		BIT(2)
+#define SHA_FLAGS_SG		BIT(3)
+#define SHA_FLAGS_ALGO_MSK	GENMASK(8, 4)
+#define SHA_FLAGS_SHA1		BIT(4)
+#define SHA_FLAGS_SHA224	BIT(5)
+#define SHA_FLAGS_SHA256	BIT(6)
+#define SHA_FLAGS_SHA384	BIT(7)
+#define SHA_FLAGS_SHA512	BIT(8)
+#define SHA_FLAGS_HMAC		BIT(9)
+#define SHA_FLAGS_PAD		BIT(10)
+
+/**
+ * mtk_sha_ct is a set of hardware instructions(command token)
+ * that are used to control engine's processing flow of SHA,
+ * and it contains the first two words of transform state.
+ */
+struct mtk_sha_ct {
+	__le32 tfm_ctrl0;
+	__le32 tfm_ctrl1;
+	__le32 ct_ctrl0;
+	__le32 ct_ctrl1;
+	__le32 ct_ctrl2;
+};
+
+/**
+ * mtk_sha_tfm is used to define SHA transform state
+ * and store result digest that produced by engine.
+ */
+struct mtk_sha_tfm {
+	__le32 tfm_ctrl0;
+	__le32 tfm_ctrl1;
+	__le32 digest[SIZE_IN_WORDS(SHA512_DIGEST_SIZE)];
+};
+
+/**
+ * mtk_sha_info consists of command token and transform state
+ * of SHA, its role is similar to mtk_aes_info.
+ */
+struct mtk_sha_info {
+	struct mtk_sha_ct ct;
+	struct mtk_sha_tfm tfm;
+};
+
+struct mtk_sha_reqctx {
+	struct mtk_sha_info info;
+	unsigned long flags;
+	unsigned long op;
+
+	u64 digcnt;
+	bool start;
+	size_t bufcnt;
+	dma_addr_t dma_addr;
+
+	/* Walk state */
+	struct scatterlist *sg;
+	u32 offset;	/* Offset in current sg */
+	u32 total;	/* Total request */
+	size_t ds;
+	size_t bs;
+
+	u8 *buffer;
+};
+
+struct mtk_sha_hmac_ctx {
+	struct crypto_shash	*shash;
+	u8 ipad[SHA512_BLOCK_SIZE] __aligned(sizeof(u32));
+	u8 opad[SHA512_BLOCK_SIZE] __aligned(sizeof(u32));
+};
+
+struct mtk_sha_ctx {
+	struct mtk_cryp *cryp;
+	unsigned long flags;
+	u8 id;
+	u8 buf[SHA_BUF_SIZE] __aligned(sizeof(u32));
+
+	struct mtk_sha_hmac_ctx	base[0];
+};
+
+struct mtk_sha_drv {
+	struct list_head dev_list;
+	/* Device list lock */
+	spinlock_t lock;
+};
+
+static struct mtk_sha_drv mtk_sha = {
+	.dev_list = LIST_HEAD_INIT(mtk_sha.dev_list),
+	.lock = __SPIN_LOCK_UNLOCKED(mtk_sha.lock),
+};
+
+static int mtk_sha_handle_queue(struct mtk_cryp *cryp, u8 id,
+				struct ahash_request *req);
+
+static inline u32 mtk_sha_read(struct mtk_cryp *cryp, u32 offset)
+{
+	return readl_relaxed(cryp->base + offset);
+}
+
+static inline void mtk_sha_write(struct mtk_cryp *cryp,
+				 u32 offset, u32 value)
+{
+	writel_relaxed(value, cryp->base + offset);
+}
+
+static struct mtk_cryp *mtk_sha_find_dev(struct mtk_sha_ctx *tctx)
+{
+	struct mtk_cryp *cryp = NULL;
+	struct mtk_cryp *tmp;
+
+	spin_lock_bh(&mtk_sha.lock);
+	if (!tctx->cryp) {
+		list_for_each_entry(tmp, &mtk_sha.dev_list, sha_list) {
+			cryp = tmp;
+			break;
+		}
+		tctx->cryp = cryp;
+	} else {
+		cryp = tctx->cryp;
+	}
+
+	/*
+	 * Assign record id to tfm in round-robin fashion, and this
+	 * will help tfm to bind  to corresponding descriptor rings.
+	 */
+	tctx->id = cryp->rec;
+	cryp->rec = !cryp->rec;
+
+	spin_unlock_bh(&mtk_sha.lock);
+
+	return cryp;
+}
+
+static int mtk_sha_append_sg(struct mtk_sha_reqctx *ctx)
+{
+	size_t count;
+
+	while ((ctx->bufcnt < SHA_BUF_SIZE) && ctx->total) {
+		count = min(ctx->sg->length - ctx->offset, ctx->total);
+		count = min(count, SHA_BUF_SIZE - ctx->bufcnt);
+
+		if (count <= 0) {
+			/*
+			 * Check if count <= 0 because the buffer is full or
+			 * because the sg length is 0. In the latest case,
+			 * check if there is another sg in the list, a 0 length
+			 * sg doesn't necessarily mean the end of the sg list.
+			 */
+			if ((ctx->sg->length == 0) && !sg_is_last(ctx->sg)) {
+				ctx->sg = sg_next(ctx->sg);
+				continue;
+			} else {
+				break;
+			}
+		}
+
+		scatterwalk_map_and_copy(ctx->buffer + ctx->bufcnt, ctx->sg,
+					 ctx->offset, count, 0);
+
+		ctx->bufcnt += count;
+		ctx->offset += count;
+		ctx->total -= count;
+
+		if (ctx->offset == ctx->sg->length) {
+			ctx->sg = sg_next(ctx->sg);
+			if (ctx->sg)
+				ctx->offset = 0;
+			else
+				ctx->total = 0;
+		}
+	}
+
+	return 0;
+}
+
+/*
+ * The purpose of this padding is to ensure that the padded message is a
+ * multiple of 512 bits (SHA1/SHA224/SHA256) or 1024 bits (SHA384/SHA512).
+ * The bit "1" is appended at the end of the message followed by
+ * "padlen-1" zero bits. Then a 64 bits block (SHA1/SHA224/SHA256) or
+ * 128 bits block (SHA384/SHA512) equals to the message length in bits
+ * is appended.
+ *
+ * For SHA1/SHA224/SHA256, padlen is calculated as followed:
+ *  - if message length < 56 bytes then padlen = 56 - message length
+ *  - else padlen = 64 + 56 - message length
+ *
+ * For SHA384/SHA512, padlen is calculated as followed:
+ *  - if message length < 112 bytes then padlen = 112 - message length
+ *  - else padlen = 128 + 112 - message length
+ */
+static void mtk_sha_fill_padding(struct mtk_sha_reqctx *ctx, u32 len)
+{
+	u32 index, padlen;
+	u64 bits[2];
+	u64 size = ctx->digcnt;
+
+	size += ctx->bufcnt;
+	size += len;
+
+	bits[1] = cpu_to_be64(size << 3);
+	bits[0] = cpu_to_be64(size >> 61);
+
+	if (ctx->flags & (SHA_FLAGS_SHA384 | SHA_FLAGS_SHA512)) {
+		index = ctx->bufcnt & 0x7f;
+		padlen = (index < 112) ? (112 - index) : ((128 + 112) - index);
+		*(ctx->buffer + ctx->bufcnt) = 0x80;
+		memset(ctx->buffer + ctx->bufcnt + 1, 0, padlen - 1);
+		memcpy(ctx->buffer + ctx->bufcnt + padlen, bits, 16);
+		ctx->bufcnt += padlen + 16;
+		ctx->flags |= SHA_FLAGS_PAD;
+	} else {
+		index = ctx->bufcnt & 0x3f;
+		padlen = (index < 56) ? (56 - index) : ((64 + 56) - index);
+		*(ctx->buffer + ctx->bufcnt) = 0x80;
+		memset(ctx->buffer + ctx->bufcnt + 1, 0, padlen - 1);
+		memcpy(ctx->buffer + ctx->bufcnt + padlen, &bits[1], 8);
+		ctx->bufcnt += padlen + 8;
+		ctx->flags |= SHA_FLAGS_PAD;
+	}
+}
+
+/* Initialize basic transform information of SHA */
+static void mtk_sha_info_init(struct mtk_sha_rec *sha,
+			      struct mtk_sha_reqctx *ctx)
+{
+	struct mtk_sha_info *info = sha->info;
+	struct mtk_sha_ct *ct = &info->ct;
+	struct mtk_sha_tfm *tfm = &info->tfm;
+
+	sha->ct_hdr = SHA_CT_CTRL_HDR;
+	sha->ct_size = SHA_CT_SIZE;
+
+	tfm->tfm_ctrl0 = SHA_TFM_HASH | SHA_TFM_INNER_DIG |
+			 SHA_TFM_SIZE(SIZE_IN_WORDS(ctx->ds));
+
+	switch (ctx->flags & SHA_FLAGS_ALGO_MSK) {
+	case SHA_FLAGS_SHA1:
+		tfm->tfm_ctrl0 |= SHA_TFM_SHA1;
+		break;
+	case SHA_FLAGS_SHA224:
+		tfm->tfm_ctrl0 |= SHA_TFM_SHA224;
+		break;
+	case SHA_FLAGS_SHA256:
+		tfm->tfm_ctrl0 |= SHA_TFM_SHA256;
+		break;
+	case SHA_FLAGS_SHA384:
+		tfm->tfm_ctrl0 |= SHA_TFM_SHA384;
+		break;
+	case SHA_FLAGS_SHA512:
+		tfm->tfm_ctrl0 |= SHA_TFM_SHA512;
+		break;
+
+	default:
+		/* Should not happen... */
+		return;
+	}
+
+	tfm->tfm_ctrl1 = SHA_TFM_HASH_STORE;
+	ct->tfm_ctrl0 = tfm->tfm_ctrl0 | SHA_TFM_CONTINUE | SHA_TFM_START;
+	ct->tfm_ctrl1 = tfm->tfm_ctrl1;
+
+	ct->ct_ctrl0 = SHA_COMMAND0;
+	ct->ct_ctrl1 = SHA_COMMAND1;
+	ct->ct_ctrl2 = SHA_COMMAND2 | SHA_TFM_DIGEST(SIZE_IN_WORDS(ctx->ds));
+}
+
+/*
+ * Update input data length field of transform information and
+ * map it to DMA region.
+ */
+static int mtk_sha_info_map(struct mtk_cryp *cryp,
+			    struct mtk_sha_rec *sha,
+			    size_t len)
+{
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(sha->req);
+	struct mtk_sha_info *info = sha->info;
+	struct mtk_sha_ct *ct = &info->ct;
+
+	if (ctx->start)
+		ctx->start = false;
+	else
+		ct->tfm_ctrl0 &= ~SHA_TFM_START;
+
+	sha->ct_hdr &= ~SHA_DATA_LEN_MSK;
+	sha->ct_hdr |= cpu_to_le32(len);
+	ct->ct_ctrl0 &= ~SHA_DATA_LEN_MSK;
+	ct->ct_ctrl0 |= cpu_to_le32(len);
+
+	ctx->digcnt += len;
+
+	sha->ct_dma = dma_map_single(cryp->dev, info, sizeof(*info),
+				      DMA_BIDIRECTIONAL);
+	if (unlikely(dma_mapping_error(cryp->dev, sha->ct_dma))) {
+		dev_err(cryp->dev, "dma %d bytes error\n", sizeof(*info));
+		return -EINVAL;
+	}
+	sha->tfm_dma = sha->ct_dma + sizeof(*ct);
+
+	return 0;
+}
+
+/*
+ * Because of hardware limitation, we must pre-calculate the inner
+ * and outer digest that need to be processed firstly by engine, then
+ * apply the result digest to the input message. These complex hashing
+ * procedures limits HMAC performance, so we use fallback SW encoding.
+ */
+static int mtk_sha_finish_hmac(struct ahash_request *req)
+{
+	struct mtk_sha_ctx *tctx = crypto_tfm_ctx(req->base.tfm);
+	struct mtk_sha_hmac_ctx *bctx = tctx->base;
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(req);
+
+	SHASH_DESC_ON_STACK(shash, bctx->shash);
+
+	shash->tfm = bctx->shash;
+	shash->flags = 0; /* not CRYPTO_TFM_REQ_MAY_SLEEP */
+
+	return crypto_shash_init(shash) ?:
+	       crypto_shash_update(shash, bctx->opad, ctx->bs) ?:
+	       crypto_shash_finup(shash, req->result, ctx->ds, req->result);
+}
+
+/* Initialize request context */
+static int mtk_sha_init(struct ahash_request *req)
+{
+	struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+	struct mtk_sha_ctx *tctx = crypto_ahash_ctx(tfm);
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(req);
+
+	ctx->flags = 0;
+	ctx->ds = crypto_ahash_digestsize(tfm);
+
+	switch (ctx->ds) {
+	case SHA1_DIGEST_SIZE:
+		ctx->flags |= SHA_FLAGS_SHA1;
+		ctx->bs = SHA1_BLOCK_SIZE;
+		break;
+	case SHA224_DIGEST_SIZE:
+		ctx->flags |= SHA_FLAGS_SHA224;
+		ctx->bs = SHA224_BLOCK_SIZE;
+		break;
+	case SHA256_DIGEST_SIZE:
+		ctx->flags |= SHA_FLAGS_SHA256;
+		ctx->bs = SHA256_BLOCK_SIZE;
+		break;
+	case SHA384_DIGEST_SIZE:
+		ctx->flags |= SHA_FLAGS_SHA384;
+		ctx->bs = SHA384_BLOCK_SIZE;
+		break;
+	case SHA512_DIGEST_SIZE:
+		ctx->flags |= SHA_FLAGS_SHA512;
+		ctx->bs = SHA512_BLOCK_SIZE;
+		break;
+	default:
+		return -EINVAL;
+	}
+
+	ctx->bufcnt = 0;
+	ctx->digcnt = 0;
+	ctx->buffer = tctx->buf;
+	ctx->start = true;
+
+	if (tctx->flags & SHA_FLAGS_HMAC) {
+		struct mtk_sha_hmac_ctx *bctx = tctx->base;
+
+		memcpy(ctx->buffer, bctx->ipad, ctx->bs);
+		ctx->bufcnt = ctx->bs;
+		ctx->flags |= SHA_FLAGS_HMAC;
+	}
+
+	return 0;
+}
+
+static int mtk_sha_xmit(struct mtk_cryp *cryp, struct mtk_sha_rec *sha,
+			dma_addr_t addr, size_t len)
+{
+	struct mtk_ring *ring = cryp->ring[sha->id];
+	struct mtk_desc *cmd = ring->cmd_base + ring->pos;
+	struct mtk_desc *res = ring->res_base + ring->pos;
+	int err;
+
+	err = mtk_sha_info_map(cryp, sha, len);
+	if (err)
+		return err;
+
+	/* Fill in the command/result descriptors */
+	res->hdr = MTK_DESC_FIRST |
+		   MTK_DESC_LAST |
+		   MTK_DESC_BUF_LEN(len);
+
+	res->buf = cpu_to_le32(cryp->tmp_dma);
+
+	cmd->hdr = MTK_DESC_FIRST |
+		   MTK_DESC_LAST |
+		   MTK_DESC_BUF_LEN(len) |
+		   MTK_DESC_CT_LEN(sha->ct_size);
+
+	cmd->buf = cpu_to_le32(addr);
+	cmd->ct = cpu_to_le32(sha->ct_dma);
+	cmd->ct_hdr = sha->ct_hdr;
+	cmd->tfm = cpu_to_le32(sha->tfm_dma);
+
+	if (++ring->pos == MTK_DESC_NUM)
+		ring->pos = 0;
+
+	/*
+	 * Make sure that all changes to the DMA ring are done before we
+	 * start engine.
+	 */
+	wmb();
+	/* Start DMA transfer */
+	mtk_sha_write(cryp, RDR_PREP_COUNT(sha->id), MTK_DESC_CNT(1));
+	mtk_sha_write(cryp, CDR_PREP_COUNT(sha->id), MTK_DESC_CNT(1));
+
+	return -EINPROGRESS;
+}
+
+static int mtk_sha_xmit2(struct mtk_cryp *cryp,
+			 struct mtk_sha_rec *sha,
+			 struct mtk_sha_reqctx *ctx,
+			 size_t len1, size_t len2)
+{
+	struct mtk_ring *ring = cryp->ring[sha->id];
+	struct mtk_desc *cmd = ring->cmd_base + ring->pos;
+	struct mtk_desc *res = ring->res_base + ring->pos;
+	int err;
+
+	err = mtk_sha_info_map(cryp, sha, len1 + len2);
+	if (err)
+		return err;
+
+	/* Fill in the command/result descriptors */
+	res->hdr = MTK_DESC_BUF_LEN(len1) | MTK_DESC_FIRST;
+	res->buf = cpu_to_le32(cryp->tmp_dma);
+
+	cmd->hdr = MTK_DESC_BUF_LEN(len1) |
+		   MTK_DESC_FIRST |
+		   MTK_DESC_CT_LEN(sha->ct_size);
+	cmd->buf = cpu_to_le32(sg_dma_address(ctx->sg));
+	cmd->ct = cpu_to_le32(sha->ct_dma);
+	cmd->ct_hdr = sha->ct_hdr;
+	cmd->tfm = cpu_to_le32(sha->tfm_dma);
+
+	if (++ring->pos == MTK_DESC_NUM)
+		ring->pos = 0;
+
+	cmd = ring->cmd_base + ring->pos;
+	res = ring->res_base + ring->pos;
+
+	res->hdr = MTK_DESC_BUF_LEN(len2) | MTK_DESC_LAST;
+	res->buf = cpu_to_le32(cryp->tmp_dma);
+
+	cmd->hdr = MTK_DESC_BUF_LEN(len2) | MTK_DESC_LAST;
+	cmd->buf = cpu_to_le32(ctx->dma_addr);
+
+	if (++ring->pos == MTK_DESC_NUM)
+		ring->pos = 0;
+
+	/*
+	 * Make sure that all changes to the DMA ring are done before we
+	 * start engine.
+	 */
+	wmb();
+	/* Start DMA transfer */
+	mtk_sha_write(cryp, RDR_PREP_COUNT(sha->id), MTK_DESC_CNT(2));
+	mtk_sha_write(cryp, CDR_PREP_COUNT(sha->id), MTK_DESC_CNT(2));
+
+	return -EINPROGRESS;
+}
+
+static int mtk_sha_dma_map(struct mtk_cryp *cryp,
+			   struct mtk_sha_rec *sha,
+			   struct mtk_sha_reqctx *ctx,
+			   size_t count)
+{
+	ctx->dma_addr = dma_map_single(cryp->dev, ctx->buffer,
+				SHA_BUF_SIZE, DMA_TO_DEVICE);
+	if (unlikely(dma_mapping_error(cryp->dev, ctx->dma_addr))) {
+		dev_err(cryp->dev, "dma map error\n");
+		return -EINVAL;
+	}
+
+	ctx->flags &= ~SHA_FLAGS_SG;
+
+	return mtk_sha_xmit(cryp, sha, ctx->dma_addr, count);
+}
+
+static int mtk_sha_update_slow(struct mtk_cryp *cryp,
+			       struct mtk_sha_rec *sha)
+{
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(sha->req);
+	size_t count;
+	u32 final;
+
+	mtk_sha_append_sg(ctx);
+
+	final = (ctx->flags & SHA_FLAGS_FINUP) && !ctx->total;
+
+	dev_dbg(cryp->dev, "slow: bufcnt: %u\n", ctx->bufcnt);
+
+	if (final) {
+		sha->flags |= SHA_FLAGS_FINAL;
+		mtk_sha_fill_padding(ctx, 0);
+	}
+
+	if (final || (ctx->bufcnt == SHA_BUF_SIZE && ctx->total)) {
+		count = ctx->bufcnt;
+		ctx->bufcnt = 0;
+
+		return mtk_sha_dma_map(cryp, sha, ctx, count);
+	}
+	return 0;
+}
+
+static int mtk_sha_update_start(struct mtk_cryp *cryp,
+				struct mtk_sha_rec *sha)
+{
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(sha->req);
+	u32 len, final, tail;
+	struct scatterlist *sg;
+
+	if (!ctx->total)
+		return 0;
+
+	if (ctx->bufcnt || ctx->offset)
+		return mtk_sha_update_slow(cryp, sha);
+
+	sg = ctx->sg;
+
+	if (!IS_ALIGNED(sg->offset, sizeof(u32)))
+		return mtk_sha_update_slow(cryp, sha);
+
+	if (!sg_is_last(sg) && !IS_ALIGNED(sg->length, ctx->bs))
+		/* size is not ctx->bs aligned */
+		return mtk_sha_update_slow(cryp, sha);
+
+	len = min(ctx->total, sg->length);
+
+	if (sg_is_last(sg)) {
+		if (!(ctx->flags & SHA_FLAGS_FINUP)) {
+			/* not last sg must be ctx->bs aligned */
+			tail = len & (ctx->bs - 1);
+			len -= tail;
+		}
+	}
+
+	ctx->total -= len;
+	ctx->offset = len; /* offset where to start slow */
+
+	final = (ctx->flags & SHA_FLAGS_FINUP) && !ctx->total;
+
+	/* Add padding */
+	if (final) {
+		size_t count;
+
+		tail = len & (ctx->bs - 1);
+		len -= tail;
+		ctx->total += tail;
+		ctx->offset = len; /* offset where to start slow */
+
+		sg = ctx->sg;
+		mtk_sha_append_sg(ctx);
+		mtk_sha_fill_padding(ctx, len);
+
+		ctx->dma_addr = dma_map_single(cryp->dev, ctx->buffer,
+			SHA_BUF_SIZE, DMA_TO_DEVICE);
+		if (unlikely(dma_mapping_error(cryp->dev, ctx->dma_addr))) {
+			dev_err(cryp->dev, "dma map bytes error\n");
+			return -EINVAL;
+		}
+
+		sha->flags |= SHA_FLAGS_FINAL;
+		count = ctx->bufcnt;
+		ctx->bufcnt = 0;
+
+		if (len == 0) {
+			ctx->flags &= ~SHA_FLAGS_SG;
+			return mtk_sha_xmit(cryp, sha, ctx->dma_addr, count);
+
+		} else {
+			ctx->sg = sg;
+			if (!dma_map_sg(cryp->dev, ctx->sg, 1, DMA_TO_DEVICE)) {
+				dev_err(cryp->dev, "dma_map_sg error\n");
+				return -EINVAL;
+			}
+
+			ctx->flags |= SHA_FLAGS_SG;
+			return mtk_sha_xmit2(cryp, sha, ctx, len, count);
+		}
+	}
+
+	if (!dma_map_sg(cryp->dev, ctx->sg, 1, DMA_TO_DEVICE)) {
+		dev_err(cryp->dev, "dma_map_sg  error\n");
+		return -EINVAL;
+	}
+
+	ctx->flags |= SHA_FLAGS_SG;
+
+	return mtk_sha_xmit(cryp, sha, sg_dma_address(ctx->sg), len);
+}
+
+static int mtk_sha_final_req(struct mtk_cryp *cryp,
+			     struct mtk_sha_rec *sha)
+{
+	struct ahash_request *req = sha->req;
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(req);
+	size_t count;
+
+	mtk_sha_fill_padding(ctx, 0);
+
+	sha->flags |= SHA_FLAGS_FINAL;
+	count = ctx->bufcnt;
+	ctx->bufcnt = 0;
+
+	return mtk_sha_dma_map(cryp, sha, ctx, count);
+}
+
+/* Copy ready hash (+ finalize hmac) */
+static int mtk_sha_finish(struct ahash_request *req)
+{
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(req);
+	u32 *digest = ctx->info.tfm.digest;
+	u32 *result = (u32 *)req->result;
+	int i;
+
+	/* Get the hash from the digest buffer */
+	for (i = 0; i < SIZE_IN_WORDS(ctx->ds); i++)
+		result[i] = le32_to_cpu(digest[i]);
+
+	if (ctx->flags & SHA_FLAGS_HMAC)
+		return mtk_sha_finish_hmac(req);
+
+	return 0;
+}
+
+static void mtk_sha_finish_req(struct mtk_cryp *cryp,
+			       struct mtk_sha_rec *sha, int err)
+{
+	if (likely(!err && (SHA_FLAGS_FINAL & sha->flags)))
+		err = mtk_sha_finish(sha->req);
+
+	sha->flags &= ~(SHA_FLAGS_BUSY | SHA_FLAGS_FINAL);
+
+	sha->req->base.complete(&sha->req->base, err);
+
+	/* Handle new request */
+	mtk_sha_handle_queue(cryp, sha->id - RING2, NULL);
+}
+
+static int mtk_sha_handle_queue(struct mtk_cryp *cryp, u8 id,
+				struct ahash_request *req)
+{
+	struct mtk_sha_rec *sha = cryp->sha[id];
+	struct crypto_async_request *async_req, *backlog;
+	struct mtk_sha_reqctx *ctx;
+	unsigned long flags;
+	int err = 0, ret = 0;
+
+	spin_lock_irqsave(&sha->lock, flags);
+	if (req)
+		ret = ahash_enqueue_request(&sha->queue, req);
+
+	if (SHA_FLAGS_BUSY & sha->flags) {
+		spin_unlock_irqrestore(&sha->lock, flags);
+		return ret;
+	}
+
+	backlog = crypto_get_backlog(&sha->queue);
+	async_req = crypto_dequeue_request(&sha->queue);
+	if (async_req)
+		sha->flags |= SHA_FLAGS_BUSY;
+	spin_unlock_irqrestore(&sha->lock, flags);
+
+	if (!async_req)
+		return ret;
+
+	if (backlog)
+		backlog->complete(backlog, -EINPROGRESS);
+
+	req = ahash_request_cast(async_req);
+	ctx = ahash_request_ctx(req);
+
+	sha->req = req;
+	sha->info = &ctx->info;
+
+	mtk_sha_info_init(sha, ctx);
+
+	if (ctx->op == SHA_OP_UPDATE) {
+		err = mtk_sha_update_start(cryp, sha);
+		if (err != -EINPROGRESS && (ctx->flags & SHA_FLAGS_FINUP))
+			/* No final() after finup() */
+			err = mtk_sha_final_req(cryp, sha);
+	} else if (ctx->op == SHA_OP_FINAL) {
+		err = mtk_sha_final_req(cryp, sha);
+	}
+
+	if (unlikely(err != -EINPROGRESS))
+		/* Task will not finish it, so do it here */
+		mtk_sha_finish_req(cryp, sha, err);
+
+	return ret;
+}
+
+static int mtk_sha_enqueue(struct ahash_request *req, u32 op)
+{
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(req);
+	struct mtk_sha_ctx *tctx = crypto_tfm_ctx(req->base.tfm);
+
+	ctx->op = op;
+
+	return mtk_sha_handle_queue(tctx->cryp, tctx->id, req);
+}
+
+static void mtk_sha_unmap(struct mtk_cryp *cryp, struct mtk_sha_rec *sha)
+{
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(sha->req);
+
+	dma_unmap_single(cryp->dev, sha->ct_dma,
+			 sizeof(struct mtk_sha_info), DMA_BIDIRECTIONAL);
+
+	if (ctx->flags & SHA_FLAGS_SG) {
+		dma_unmap_sg(cryp->dev, ctx->sg, 1, DMA_TO_DEVICE);
+		if (ctx->sg->length == ctx->offset) {
+			ctx->sg = sg_next(ctx->sg);
+			if (ctx->sg)
+				ctx->offset = 0;
+		}
+		if (ctx->flags & SHA_FLAGS_PAD) {
+			dma_unmap_single(cryp->dev, ctx->dma_addr,
+					 SHA_BUF_SIZE, DMA_TO_DEVICE);
+		}
+	} else
+		dma_unmap_single(cryp->dev, ctx->dma_addr,
+				 SHA_BUF_SIZE, DMA_TO_DEVICE);
+}
+
+static void mtk_sha_complete(struct mtk_cryp *cryp,
+			     struct mtk_sha_rec *sha)
+{
+	int err = 0;
+
+	err = mtk_sha_update_start(cryp, sha);
+	if (err != -EINPROGRESS)
+		mtk_sha_finish_req(cryp, sha, err);
+}
+
+static int mtk_sha_update(struct ahash_request *req)
+{
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(req);
+
+	ctx->total = req->nbytes;
+	ctx->sg = req->src;
+	ctx->offset = 0;
+
+	if ((ctx->bufcnt + ctx->total < SHA_BUF_SIZE) &&
+	    !(ctx->flags & SHA_FLAGS_FINUP))
+		return mtk_sha_append_sg(ctx);
+
+	return mtk_sha_enqueue(req, SHA_OP_UPDATE);
+}
+
+static int mtk_sha_final(struct ahash_request *req)
+{
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(req);
+
+	ctx->flags |= SHA_FLAGS_FINUP;
+
+	if (ctx->flags & SHA_FLAGS_PAD)
+		return mtk_sha_finish(req);
+
+	return mtk_sha_enqueue(req, SHA_OP_FINAL);
+}
+
+static int mtk_sha_finup(struct ahash_request *req)
+{
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(req);
+	int err1, err2;
+
+	ctx->flags |= SHA_FLAGS_FINUP;
+
+	err1 = mtk_sha_update(req);
+	if (err1 == -EINPROGRESS || err1 == -EBUSY)
+		return err1;
+	/*
+	 * final() has to be always called to cleanup resources
+	 * even if update() failed
+	 */
+	err2 = mtk_sha_final(req);
+
+	return err1 ?: err2;
+}
+
+static int mtk_sha_digest(struct ahash_request *req)
+{
+	return mtk_sha_init(req) ?: mtk_sha_finup(req);
+}
+
+static int mtk_sha_setkey(struct crypto_ahash *tfm,
+			  const unsigned char *key, u32 keylen)
+{
+	struct mtk_sha_ctx *tctx = crypto_ahash_ctx(tfm);
+	struct mtk_sha_hmac_ctx *bctx = tctx->base;
+	size_t bs = crypto_shash_blocksize(bctx->shash);
+	size_t ds = crypto_shash_digestsize(bctx->shash);
+	int err, i;
+
+	SHASH_DESC_ON_STACK(shash, bctx->shash);
+
+	shash->tfm = bctx->shash;
+	shash->flags = crypto_shash_get_flags(bctx->shash) &
+			CRYPTO_TFM_REQ_MAY_SLEEP;
+
+	if (keylen > bs) {
+		err = crypto_shash_digest(shash, key, keylen, bctx->ipad);
+		if (err)
+			return err;
+		keylen = ds;
+	} else {
+		memcpy(bctx->ipad, key, keylen);
+	}
+
+	memset(bctx->ipad + keylen, 0, bs - keylen);
+	memcpy(bctx->opad, bctx->ipad, bs);
+
+	for (i = 0; i < bs; i++) {
+		bctx->ipad[i] ^= 0x36;
+		bctx->opad[i] ^= 0x5c;
+	}
+
+	return err;
+}
+
+static int mtk_sha_export(struct ahash_request *req, void *out)
+{
+	const struct mtk_sha_reqctx *ctx = ahash_request_ctx(req);
+
+	memcpy(out, ctx, sizeof(*ctx));
+	return 0;
+}
+
+static int mtk_sha_import(struct ahash_request *req, const void *in)
+{
+	struct mtk_sha_reqctx *ctx = ahash_request_ctx(req);
+
+	memcpy(ctx, in, sizeof(*ctx));
+	return 0;
+}
+
+static int mtk_sha_cra_init_alg(struct crypto_tfm *tfm,
+				const char *alg_base)
+{
+	struct mtk_sha_ctx *tctx = crypto_tfm_ctx(tfm);
+	struct mtk_cryp *cryp = NULL;
+
+	cryp = mtk_sha_find_dev(tctx);
+	if (!cryp)
+		return -ENODEV;
+
+	crypto_ahash_set_reqsize(__crypto_ahash_cast(tfm),
+				 sizeof(struct mtk_sha_reqctx));
+
+	if (alg_base) {
+		struct mtk_sha_hmac_ctx *bctx = tctx->base;
+
+		tctx->flags |= SHA_FLAGS_HMAC;
+		bctx->shash = crypto_alloc_shash(alg_base, 0,
+					CRYPTO_ALG_NEED_FALLBACK);
+		if (IS_ERR(bctx->shash)) {
+			pr_err("base driver %s could not be loaded.\n",
+			       alg_base);
+
+			return PTR_ERR(bctx->shash);
+		}
+	}
+	return 0;
+}
+
+static int mtk_sha_cra_init(struct crypto_tfm *tfm)
+{
+	return mtk_sha_cra_init_alg(tfm, NULL);
+}
+
+static int mtk_sha_cra_sha1_init(struct crypto_tfm *tfm)
+{
+	return mtk_sha_cra_init_alg(tfm, "sha1");
+}
+
+static int mtk_sha_cra_sha224_init(struct crypto_tfm *tfm)
+{
+	return mtk_sha_cra_init_alg(tfm, "sha224");
+}
+
+static int mtk_sha_cra_sha256_init(struct crypto_tfm *tfm)
+{
+	return mtk_sha_cra_init_alg(tfm, "sha256");
+}
+
+static int mtk_sha_cra_sha384_init(struct crypto_tfm *tfm)
+{
+	return mtk_sha_cra_init_alg(tfm, "sha384");
+}
+
+static int mtk_sha_cra_sha512_init(struct crypto_tfm *tfm)
+{
+	return mtk_sha_cra_init_alg(tfm, "sha512");
+}
+
+static void mtk_sha_cra_exit(struct crypto_tfm *tfm)
+{
+	struct mtk_sha_ctx *tctx = crypto_tfm_ctx(tfm);
+
+	if (tctx->flags & SHA_FLAGS_HMAC) {
+		struct mtk_sha_hmac_ctx *bctx = tctx->base;
+
+		crypto_free_shash(bctx->shash);
+	}
+}
+
+static struct ahash_alg algs_sha1_sha224_sha256[] = {
+{
+	.init		= mtk_sha_init,
+	.update		= mtk_sha_update,
+	.final		= mtk_sha_final,
+	.finup		= mtk_sha_finup,
+	.digest		= mtk_sha_digest,
+	.export		= mtk_sha_export,
+	.import		= mtk_sha_import,
+	.halg.digestsize	= SHA1_DIGEST_SIZE,
+	.halg.statesize = sizeof(struct mtk_sha_reqctx),
+	.halg.base	= {
+		.cra_name		= "sha1",
+		.cra_driver_name	= "mtk-sha1",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC,
+		.cra_blocksize		= SHA1_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct mtk_sha_ctx),
+		.cra_alignmask		= SHA_ALIGN_MSK,
+		.cra_module		= THIS_MODULE,
+		.cra_init		= mtk_sha_cra_init,
+		.cra_exit		= mtk_sha_cra_exit,
+	}
+},
+{
+	.init		= mtk_sha_init,
+	.update		= mtk_sha_update,
+	.final		= mtk_sha_final,
+	.finup		= mtk_sha_finup,
+	.digest		= mtk_sha_digest,
+	.export		= mtk_sha_export,
+	.import		= mtk_sha_import,
+	.halg.digestsize	= SHA224_DIGEST_SIZE,
+	.halg.statesize = sizeof(struct mtk_sha_reqctx),
+	.halg.base	= {
+		.cra_name		= "sha224",
+		.cra_driver_name	= "mtk-sha224",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC,
+		.cra_blocksize		= SHA224_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct mtk_sha_ctx),
+		.cra_alignmask		= SHA_ALIGN_MSK,
+		.cra_module		= THIS_MODULE,
+		.cra_init		= mtk_sha_cra_init,
+		.cra_exit		= mtk_sha_cra_exit,
+	}
+},
+{
+	.init		= mtk_sha_init,
+	.update		= mtk_sha_update,
+	.final		= mtk_sha_final,
+	.finup		= mtk_sha_finup,
+	.digest		= mtk_sha_digest,
+	.export		= mtk_sha_export,
+	.import		= mtk_sha_import,
+	.halg.digestsize	= SHA256_DIGEST_SIZE,
+	.halg.statesize = sizeof(struct mtk_sha_reqctx),
+	.halg.base	= {
+		.cra_name		= "sha256",
+		.cra_driver_name	= "mtk-sha256",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC,
+		.cra_blocksize		= SHA256_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct mtk_sha_ctx),
+		.cra_alignmask		= SHA_ALIGN_MSK,
+		.cra_module		= THIS_MODULE,
+		.cra_init		= mtk_sha_cra_init,
+		.cra_exit		= mtk_sha_cra_exit,
+	}
+},
+{
+	.init		= mtk_sha_init,
+	.update		= mtk_sha_update,
+	.final		= mtk_sha_final,
+	.finup		= mtk_sha_finup,
+	.digest		= mtk_sha_digest,
+	.export		= mtk_sha_export,
+	.import		= mtk_sha_import,
+	.setkey		= mtk_sha_setkey,
+	.halg.digestsize	= SHA1_DIGEST_SIZE,
+	.halg.statesize = sizeof(struct mtk_sha_reqctx),
+	.halg.base	= {
+		.cra_name		= "hmac(sha1)",
+		.cra_driver_name	= "mtk-hmac-sha1",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC |
+					  CRYPTO_ALG_NEED_FALLBACK,
+		.cra_blocksize		= SHA1_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct mtk_sha_ctx) +
+					sizeof(struct mtk_sha_hmac_ctx),
+		.cra_alignmask		= SHA_ALIGN_MSK,
+		.cra_module		= THIS_MODULE,
+		.cra_init		= mtk_sha_cra_sha1_init,
+		.cra_exit		= mtk_sha_cra_exit,
+	}
+},
+{
+	.init		= mtk_sha_init,
+	.update		= mtk_sha_update,
+	.final		= mtk_sha_final,
+	.finup		= mtk_sha_finup,
+	.digest		= mtk_sha_digest,
+	.export		= mtk_sha_export,
+	.import		= mtk_sha_import,
+	.setkey		= mtk_sha_setkey,
+	.halg.digestsize	= SHA224_DIGEST_SIZE,
+	.halg.statesize = sizeof(struct mtk_sha_reqctx),
+	.halg.base	= {
+		.cra_name		= "hmac(sha224)",
+		.cra_driver_name	= "mtk-hmac-sha224",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC |
+					  CRYPTO_ALG_NEED_FALLBACK,
+		.cra_blocksize		= SHA224_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct mtk_sha_ctx) +
+					sizeof(struct mtk_sha_hmac_ctx),
+		.cra_alignmask		= SHA_ALIGN_MSK,
+		.cra_module		= THIS_MODULE,
+		.cra_init		= mtk_sha_cra_sha224_init,
+		.cra_exit		= mtk_sha_cra_exit,
+	}
+},
+{
+	.init		= mtk_sha_init,
+	.update		= mtk_sha_update,
+	.final		= mtk_sha_final,
+	.finup		= mtk_sha_finup,
+	.digest		= mtk_sha_digest,
+	.export		= mtk_sha_export,
+	.import		= mtk_sha_import,
+	.setkey		= mtk_sha_setkey,
+	.halg.digestsize	= SHA256_DIGEST_SIZE,
+	.halg.statesize = sizeof(struct mtk_sha_reqctx),
+	.halg.base	= {
+		.cra_name		= "hmac(sha256)",
+		.cra_driver_name	= "mtk-hmac-sha256",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC |
+					  CRYPTO_ALG_NEED_FALLBACK,
+		.cra_blocksize		= SHA256_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct mtk_sha_ctx) +
+					sizeof(struct mtk_sha_hmac_ctx),
+		.cra_alignmask		= SHA_ALIGN_MSK,
+		.cra_module		= THIS_MODULE,
+		.cra_init		= mtk_sha_cra_sha256_init,
+		.cra_exit		= mtk_sha_cra_exit,
+	}
+},
+};
+
+static struct ahash_alg algs_sha384_sha512[] = {
+{
+	.init		= mtk_sha_init,
+	.update		= mtk_sha_update,
+	.final		= mtk_sha_final,
+	.finup		= mtk_sha_finup,
+	.digest		= mtk_sha_digest,
+	.export		= mtk_sha_export,
+	.import		= mtk_sha_import,
+	.halg.digestsize	= SHA384_DIGEST_SIZE,
+	.halg.statesize = sizeof(struct mtk_sha_reqctx),
+	.halg.base	= {
+		.cra_name		= "sha384",
+		.cra_driver_name	= "mtk-sha384",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC,
+		.cra_blocksize		= SHA384_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct mtk_sha_ctx),
+		.cra_alignmask		= SHA_ALIGN_MSK,
+		.cra_module		= THIS_MODULE,
+		.cra_init		= mtk_sha_cra_init,
+		.cra_exit		= mtk_sha_cra_exit,
+	}
+},
+{
+	.init		= mtk_sha_init,
+	.update		= mtk_sha_update,
+	.final		= mtk_sha_final,
+	.finup		= mtk_sha_finup,
+	.digest		= mtk_sha_digest,
+	.export		= mtk_sha_export,
+	.import		= mtk_sha_import,
+	.halg.digestsize	= SHA512_DIGEST_SIZE,
+	.halg.statesize = sizeof(struct mtk_sha_reqctx),
+	.halg.base	= {
+		.cra_name		= "sha512",
+		.cra_driver_name	= "mtk-sha512",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC,
+		.cra_blocksize		= SHA512_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct mtk_sha_ctx),
+		.cra_alignmask		= SHA_ALIGN_MSK,
+		.cra_module		= THIS_MODULE,
+		.cra_init		= mtk_sha_cra_init,
+		.cra_exit		= mtk_sha_cra_exit,
+	}
+},
+{
+	.init		= mtk_sha_init,
+	.update		= mtk_sha_update,
+	.final		= mtk_sha_final,
+	.finup		= mtk_sha_finup,
+	.digest		= mtk_sha_digest,
+	.export		= mtk_sha_export,
+	.import		= mtk_sha_import,
+	.setkey		= mtk_sha_setkey,
+	.halg.digestsize	= SHA384_DIGEST_SIZE,
+	.halg.statesize = sizeof(struct mtk_sha_reqctx),
+	.halg.base	= {
+		.cra_name		= "hmac(sha384)",
+		.cra_driver_name	= "mtk-hmac-sha384",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC |
+					  CRYPTO_ALG_NEED_FALLBACK,
+		.cra_blocksize		= SHA384_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct mtk_sha_ctx) +
+					sizeof(struct mtk_sha_hmac_ctx),
+		.cra_alignmask		= SHA_ALIGN_MSK,
+		.cra_module		= THIS_MODULE,
+		.cra_init		= mtk_sha_cra_sha384_init,
+		.cra_exit		= mtk_sha_cra_exit,
+	}
+},
+{
+	.init		= mtk_sha_init,
+	.update		= mtk_sha_update,
+	.final		= mtk_sha_final,
+	.finup		= mtk_sha_finup,
+	.digest		= mtk_sha_digest,
+	.export		= mtk_sha_export,
+	.import		= mtk_sha_import,
+	.setkey		= mtk_sha_setkey,
+	.halg.digestsize	= SHA512_DIGEST_SIZE,
+	.halg.statesize = sizeof(struct mtk_sha_reqctx),
+	.halg.base	= {
+		.cra_name		= "hmac(sha512)",
+		.cra_driver_name	= "mtk-hmac-sha512",
+		.cra_priority		= 400,
+		.cra_flags		= CRYPTO_ALG_ASYNC |
+					  CRYPTO_ALG_NEED_FALLBACK,
+		.cra_blocksize		= SHA512_BLOCK_SIZE,
+		.cra_ctxsize		= sizeof(struct mtk_sha_ctx) +
+					sizeof(struct mtk_sha_hmac_ctx),
+		.cra_alignmask		= SHA_ALIGN_MSK,
+		.cra_module		= THIS_MODULE,
+		.cra_init		= mtk_sha_cra_sha512_init,
+		.cra_exit		= mtk_sha_cra_exit,
+	}
+},
+};
+
+static void mtk_sha_task0(unsigned long data)
+{
+	struct mtk_cryp *cryp = (struct mtk_cryp *)data;
+	struct mtk_sha_rec *sha = cryp->sha[0];
+
+	mtk_sha_unmap(cryp, sha);
+	mtk_sha_complete(cryp, sha);
+}
+
+static void mtk_sha_task1(unsigned long data)
+{
+	struct mtk_cryp *cryp = (struct mtk_cryp *)data;
+	struct mtk_sha_rec *sha = cryp->sha[1];
+
+	mtk_sha_unmap(cryp, sha);
+	mtk_sha_complete(cryp, sha);
+}
+
+static irqreturn_t mtk_sha_ring2_irq(int irq, void *dev_id)
+{
+	struct mtk_cryp *cryp = (struct mtk_cryp *)dev_id;
+	struct mtk_sha_rec *sha = cryp->sha[0];
+	u32 val = mtk_sha_read(cryp, RDR_STAT(RING2));
+
+	mtk_sha_write(cryp, RDR_STAT(RING2), val);
+
+	if (likely((SHA_FLAGS_BUSY & sha->flags))) {
+		mtk_sha_write(cryp, RDR_PROC_COUNT(RING2), MTK_CNT_RST);
+		mtk_sha_write(cryp, RDR_THRESH(RING2),
+			      MTK_RDR_PROC_THRESH | MTK_RDR_PROC_MODE);
+
+		tasklet_schedule(&sha->task);
+	} else {
+		dev_warn(cryp->dev, "AES interrupt when no active requests.\n");
+	}
+	return IRQ_HANDLED;
+}
+
+static irqreturn_t mtk_sha_ring3_irq(int irq, void *dev_id)
+{
+	struct mtk_cryp *cryp = (struct mtk_cryp *)dev_id;
+	struct mtk_sha_rec *sha = cryp->sha[1];
+	u32 val = mtk_sha_read(cryp, RDR_STAT(RING3));
+
+	mtk_sha_write(cryp, RDR_STAT(RING3), val);
+
+	if (likely((SHA_FLAGS_BUSY & sha->flags))) {
+		mtk_sha_write(cryp, RDR_PROC_COUNT(RING3), MTK_CNT_RST);
+		mtk_sha_write(cryp, RDR_THRESH(RING3),
+			      MTK_RDR_PROC_THRESH | MTK_RDR_PROC_MODE);
+
+		tasklet_schedule(&sha->task);
+	} else {
+		dev_warn(cryp->dev, "AES interrupt when no active requests.\n");
+	}
+	return IRQ_HANDLED;
+}
+
+/*
+ * The purpose of two SHA records is used to get extra performance.
+ * It is similar to mtk_aes_record_init().
+ */
+static int mtk_sha_record_init(struct mtk_cryp *cryp)
+{
+	struct mtk_sha_rec **sha = cryp->sha;
+	int i, err = -ENOMEM;
+
+	for (i = 0; i < MTK_REC_NUM; i++) {
+		sha[i] = kzalloc(sizeof(**sha), GFP_KERNEL);
+		if (!sha[i])
+			goto err_cleanup;
+
+		sha[i]->id = i + RING2;
+
+		spin_lock_init(&sha[i]->lock);
+		crypto_init_queue(&sha[i]->queue, SHA_QUEUE_SIZE);
+	}
+
+	tasklet_init(&sha[0]->task, mtk_sha_task0, (unsigned long)cryp);
+	tasklet_init(&sha[1]->task, mtk_sha_task1, (unsigned long)cryp);
+
+	cryp->rec = 1;
+
+	return 0;
+
+err_cleanup:
+	for (; i--; )
+		kfree(sha[i]);
+	return err;
+}
+
+static void mtk_sha_record_free(struct mtk_cryp *cryp)
+{
+	int i;
+
+	for (i = 0; i < MTK_REC_NUM; i++) {
+		tasklet_kill(&cryp->sha[i]->task);
+		kfree(cryp->sha[i]);
+	}
+}
+
+static void mtk_sha_unregister_algs(void)
+{
+	int i;
+
+	for (i = 0; i < ARRAY_SIZE(algs_sha1_sha224_sha256); i++)
+		crypto_unregister_ahash(&algs_sha1_sha224_sha256[i]);
+
+	for (i = 0; i < ARRAY_SIZE(algs_sha384_sha512); i++)
+		crypto_unregister_ahash(&algs_sha384_sha512[i]);
+}
+
+static int mtk_sha_register_algs(void)
+{
+	int err, i;
+
+	for (i = 0; i < ARRAY_SIZE(algs_sha1_sha224_sha256); i++) {
+		err = crypto_register_ahash(&algs_sha1_sha224_sha256[i]);
+		if (err)
+			goto err_sha_224_256_algs;
+	}
+
+	for (i = 0; i < ARRAY_SIZE(algs_sha384_sha512); i++) {
+		err = crypto_register_ahash(&algs_sha384_sha512[i]);
+		if (err)
+			goto err_sha_384_512_algs;
+	}
+
+	return 0;
+
+err_sha_384_512_algs:
+	for (; i--; )
+		crypto_unregister_ahash(&algs_sha384_sha512[i]);
+	i = ARRAY_SIZE(algs_sha1_sha224_sha256);
+err_sha_224_256_algs:
+	for (; i--; )
+		crypto_unregister_ahash(&algs_sha1_sha224_sha256[i]);
+
+	return err;
+}
+
+int mtk_hash_alg_register(struct mtk_cryp *cryp)
+{
+	int err;
+
+	INIT_LIST_HEAD(&cryp->sha_list);
+
+	/* Initialize two hash records */
+	err = mtk_sha_record_init(cryp);
+	if (err)
+		goto err_record;
+
+	/* Ring2 is use by SHA record0 */
+	err = devm_request_irq(cryp->dev, cryp->irq[RING2],
+			       mtk_sha_ring2_irq, IRQF_TRIGGER_LOW,
+			       "mtk-sha", cryp);
+	if (err) {
+		dev_err(cryp->dev, "unable to request sha irq0.\n");
+		goto err_res;
+	}
+
+	/* Ring3 is use by SHA record1 */
+	err = devm_request_irq(cryp->dev, cryp->irq[RING3],
+			       mtk_sha_ring3_irq, IRQF_TRIGGER_LOW,
+			       "mtk-sha", cryp);
+	if (err) {
+		dev_err(cryp->dev, "unable to request sha irq1.\n");
+		goto err_res;
+	}
+
+	/* Enable ring2 and ring3 interrupt for hash */
+	mtk_sha_write(cryp, AIC_ENABLE_SET(RING2), MTK_IRQ_RDR2);
+	mtk_sha_write(cryp, AIC_ENABLE_SET(RING3), MTK_IRQ_RDR3);
+
+	cryp->tmp = dma_alloc_coherent(cryp->dev, SHA_TMP_BUF_SIZE,
+					&cryp->tmp_dma, GFP_KERNEL);
+	if (!cryp->tmp) {
+		dev_err(cryp->dev, "unable to allocate tmp buffer.\n");
+		err = -EINVAL;
+		goto err_res;
+	}
+
+	spin_lock(&mtk_sha.lock);
+	list_add_tail(&cryp->sha_list, &mtk_sha.dev_list);
+	spin_unlock(&mtk_sha.lock);
+
+	err = mtk_sha_register_algs();
+	if (err)
+		goto err_algs;
+
+	return 0;
+
+err_algs:
+	spin_lock(&mtk_sha.lock);
+	list_del(&cryp->sha_list);
+	spin_unlock(&mtk_sha.lock);
+	dma_free_coherent(cryp->dev, SHA_TMP_BUF_SIZE,
+			  cryp->tmp, cryp->tmp_dma);
+err_res:
+	mtk_sha_record_free(cryp);
+err_record:
+
+	dev_err(cryp->dev, "mtk-sha initialization failed.\n");
+	return err;
+}
+
+void mtk_hash_alg_release(struct mtk_cryp *cryp)
+{
+	spin_lock(&mtk_sha.lock);
+	list_del(&cryp->sha_list);
+	spin_unlock(&mtk_sha.lock);
+
+	mtk_sha_unregister_algs();
+	dma_free_coherent(cryp->dev, SHA_TMP_BUF_SIZE,
+			  cryp->tmp, cryp->tmp_dma);
+	mtk_sha_record_free(cryp);
+}
-- 
1.9.1

^ permalink raw reply related

* [PATCH v3 0/2] Add MediaTek crypto accelerator driver
From: Ryder Lee @ 2016-12-19  2:20 UTC (permalink / raw)
  To: Herbert Xu, David S. Miller, Matthias Brugger
  Cc: devicetree, linux-mediatek, linux-kernel, linux-crypto,
	linux-arm-kernel, Sean Wang, Roy Luo, Ryder Lee

Hello,

This adds support for the MediaTek hardware accelerator on
some SoCs.

This driver currently implement: 
- SHA1 and SHA2 family(HMAC) hash algorithms.
- AES block cipher in CBC/ECB mode with 128/196/256 bits keys.

Chances since v3:
-remove unused structure member
-drop interrupt-parent from DT bindings documentation

Changes since v2:
- use byteorder conversion macros and type identifiers for descriptors
- revise register definition macros to make it more clear
- revise DT compatiable string

Changes since v1:
- remove EXPORT_SYMBOL
- remove unused PRNG setting
- sort headers in alphabetical order
- add a definition for IRQ unmber
- replace ambiguous definition
- add more annotation and function comment
- add COMPILE_TEST in Kconfig

Ryder Lee (2):
  Add crypto driver support for some MediaTek chips
  crypto: mediatek - add DT bindings documentation

 .../devicetree/bindings/crypto/mediatek-crypto.txt |   27 +
 drivers/crypto/Kconfig                             |   17 +
 drivers/crypto/Makefile                            |    1 +
 drivers/crypto/mediatek/Makefile                   |    2 +
 drivers/crypto/mediatek/mtk-aes.c                  |  765 +++++++++++
 drivers/crypto/mediatek/mtk-platform.c             |  604 ++++++++
 drivers/crypto/mediatek/mtk-platform.h             |  238 ++++
 drivers/crypto/mediatek/mtk-regs.h                 |  194 +++
 drivers/crypto/mediatek/mtk-sha.c                  | 1437 ++++++++++++++++++++
 9 files changed, 3285 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/crypto/mediatek-crypto.txt
 create mode 100644 drivers/crypto/mediatek/Makefile
 create mode 100644 drivers/crypto/mediatek/mtk-aes.c
 create mode 100644 drivers/crypto/mediatek/mtk-platform.c
 create mode 100644 drivers/crypto/mediatek/mtk-platform.h
 create mode 100644 drivers/crypto/mediatek/mtk-regs.h
 create mode 100644 drivers/crypto/mediatek/mtk-sha.c

-- 
1.9.1

^ permalink raw reply

* (unknown), 
From: linux-crypto @ 2016-12-18 10:35 UTC (permalink / raw)
  To: linux-crypto
  Cc: pndy, 24405943805281, ewtge, 37894869, vcnx, izdb, 0992929564

[-- Attachment #1: BALLANCE-4318393604847.zip --]
[-- Type: application/zip, Size: 14527 bytes --]

^ permalink raw reply

* Re: [PATCH v3 1/3] siphash: add cryptographically secure hashtable function
From: Christian Kujau @ 2016-12-18  0:06 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Tom Herbert, Netdev, kernel-hardening, LKML,
	Linux Crypto Mailing List, Jean-Philippe Aumasson,
	Daniel J . Bernstein, Linus Torvalds, Eric Biggers, David Laight
In-Reply-To: <CAHmME9qNcsXtdWO_rmngSXXeBsTbA9B_33oLJ_pWOWcO7P2JZg@mail.gmail.com>

On Thu, 15 Dec 2016, Jason A. Donenfeld wrote:
> > I'd still drop the "24" unless you really think we're going to have
> > multiple variants coming into the kernel.
> 
> Okay. I don't have a problem with this, unless anybody has some reason
> to the contrary.

What if the 2/4-round version falls and we need more rounds to withstand 
future cryptoanalysis? We'd then have siphash_ and siphash48_ functions, 
no? My amateurish bike-shedding argument would be "let's keep the 24 then" :-)

C.
-- 
BOFH excuse #354:

Chewing gum on /dev/sd3c

^ permalink raw reply

* Re: Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jeffrey Walton @ 2016-12-17 16:14 UTC (permalink / raw)
  To: Theodore Ts'o, kernel-hardening, Jason A. Donenfeld,
	George Spelvin, ak, davem, David Laight, D. J. Bernstein,
	Eric Biggers, Hannes Frederic Sowa, Jean-Philippe Aumasson,
	linux-crypto, LKML, luto, Netdev, Tom Herbert, Linus Torvalds,
	Vegard Nossum
In-Reply-To: <20161217154152.5oug7mzb4tmfknwv@thunk.org>

> As far as half-siphash is concerned, it occurs to me that the main
> problem will be those users who need to guarantee that output can't be
> guessed over a long period of time.  For example, if you have a
> long-running process, then the output needs to remain unguessable over
> potentially months or years, or else you might be weakening the ASLR
> protections.  If on the other hand, the hash table or the process will
> be going away in a matter of seconds or minutes, the requirements with
> respect to cryptographic strength go down significantly.

Perhaps SipHash-4-8 should be used instead of SipHash-2-4. I believe
SipHash-4-8 is recommended for the security conscious who want to be
more conservative in their security estimates.

SipHash-4-8 does not add much more processing. If you are clocking
SipHash-2-4 at 2.0 or 2.5 cpb, then SipHash-4-8 will run at 3.0 to
4.0. Both are well below MD5 times. (At least with the data sets I've
tested).

> Now, maybe this doesn't matter that much if we can guarantee (or make
> assumptions) that the attacker doesn't have unlimited access the
> output stream of get_random_{long,int}(), or if it's being used in an
> anti-DOS use case where it ultimately only needs to be harder than
> alternate ways of attacking the system.
>
> Rekeying every five minutes doesn't necessarily help the with respect
> to ASLR, but it might reduce the amount of the output stream that
> would be available to the attacker in order to be able to attack the
> get_random_{long,int}() generator, and it also reduces the value of
> doing that attack to only compromising the ASLR for those processes
> started within that five minute window.

Forgive my ignorance... I did not find reading on using the primitive
in a PRNG. Does anyone know what Aumasson or Bernstein have to say?
Aumasson's site does not seem to discuss the use case:
https://www.google.com/search?q=siphash+rng+site%3A131002.net. (And
their paper only mentions random-number once in a different context).

Making the leap from internal hash tables and short-lived network
packets to the rng case may leave something to be desired, especially
if the bits get used in unanticipated ways, like creating long term
private keys.

Jeff

^ permalink raw reply

* Re: Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Theodore Ts'o @ 2016-12-17 15:41 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Jason, linux, ak, davem, David.Laight, djb, ebiggers3, hannes,
	jeanphilippe.aumasson, linux-crypto, linux-kernel, luto, netdev,
	tom, torvalds, vegard.nossum
In-Reply-To: <20161217021503.32767.qmail@ns.sciencehorizons.net>

On Fri, Dec 16, 2016 at 09:15:03PM -0500, George Spelvin wrote:
> >> - Ted, Andy Lutorminski and I will try to figure out a construction of
> >>   get_random_long() that we all like.

We don't have to find the most optimal solution right away; we can
approach this incrementally, after all.

So long as we replace get_random_{long,int}() with something which is
(a) strictly better in terms of security given today's use of MD5, and
(b) which is strictly *faster* than the current construction on 32-bit
and 64-bit systems, we can do that, and can try to make it be faster
while maintaining some minimum level of security which is sufficient
for all current users of get_random_{long,int}() and which can be
clearly artificulated for future users of get_random_{long,int}().

The main worry at this point I have is benchmarking siphash on a
32-bit system.  It may be that simply batching the chacha20 output so
that we're using the urandom construction more efficiently is the
better way to go, since that *does* meet the criteron of strictly more
secure and strictly faster than the current MD5 solution.  I'm open to
using siphash, but I want to see the the 32-bit numbers first.

As far as half-siphash is concerned, it occurs to me that the main
problem will be those users who need to guarantee that output can't be
guessed over a long period of time.  For example, if you have a
long-running process, then the output needs to remain unguessable over
potentially months or years, or else you might be weakening the ASLR
protections.  If on the other hand, the hash table or the process will
be going away in a matter of seconds or minutes, the requirements with
respect to cryptographic strength go down significantly.

Now, maybe this doesn't matter that much if we can guarantee (or make
assumptions) that the attacker doesn't have unlimited access the
output stream of get_random_{long,int}(), or if it's being used in an
anti-DOS use case where it ultimately only needs to be harder than
alternate ways of attacking the system.

Rekeying every five minutes doesn't necessarily help the with respect
to ASLR, but it might reduce the amount of the output stream that
would be available to the attacker in order to be able to attack the
get_random_{long,int}() generator, and it also reduces the value of
doing that attack to only compromising the ASLR for those processes
started within that five minute window.

Cheers,

						- Ted

P.S.  I'm using ASLR as an example use case, above; of course we will
need to make similar eximainations of the other uses of
get_random_{long,int}().

P.P.S.  We might also want to think about potentially defining
get_random_{long,int}() to be unambiguously strong, and then creating
a get_weak_random_{long,int}() which on platforms where performance
might be a consideration, it uses a weaker algorithm perhaps with some
kind of rekeying interval.

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: George Spelvin @ 2016-12-17 15:21 UTC (permalink / raw)
  To: tom
  Cc: ak, davem, David.Laight, djb, ebiggers3, hannes, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, luto, netdev, torvalds, tytso, vegard.nossum
In-Reply-To: <CALx6S351VFRZmEQphRQy6YtmZYPnOtTN7=XiNrJmhWJGv4HUBg@mail.gmail.com>

To follow up on my comments that your benchmark results were peculiar,
here's my benchmark code.

It just computes the hash of all n*(n+1)/2 possible non-empty substrings
of a buffer of n (called "max" below) bytes.  "cpb" is "cycles per byte".

(The average length is (n+2)/3, c.f. https://oeis.org/A000292)

On x86-32, HSipHash is asymptotically twice the speed of SipHash,
rising to 2.5x for short strings:

SipHash/HSipHash benchmark, sizeof(long) = 4
 SipHash: max=   4 cycles=     10495 cpb=524.7500 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=      3400 cpb=170.0000 (sum=146a863e)
 SipHash: max=   8 cycles=     24468 cpb=203.9000 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=      9237 cpb= 76.9750 (sum=d3b5e0cd)
 SipHash: max=  16 cycles=     94622 cpb=115.9583 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles=     34499 cpb= 42.2782 (sum=16bb7475)
 SipHash: max=  32 cycles=    418767 cpb= 69.9811 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=    156695 cpb= 26.1857 (sum=eed00fcb)
 SipHash: max=  64 cycles=   2119152 cpb= 46.3101 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=   1008678 cpb= 22.0428 (sum=99b9f4f)
 SipHash: max= 128 cycles=  12728659 cpb= 35.5788 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   5452931 cpb= 15.2419 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  38931946 cpb= 13.7615 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  13807312 cpb=  4.8805 (sum=ceeafcc1)
 SipHash: max= 512 cycles= 205537380 cpb=  9.1346 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles= 103420960 cpb=  4.5963 (sum=7f15a313)
 SipHash: max=1024 cycles=1540259472 cpb=  8.5817 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 796090824 cpb=  4.4355 (sum=d8f3374f)

On x86-64, SipHash is consistently faster, asymptotically approaching 2x
for long strings:

SipHash/HSipHash benchmark, sizeof(long) = 8
 SipHash: max=   4 cycles=      2642 cpb=132.1000 (sum=47a4f5554869fa97)
HSipHash: max=   4 cycles=      2498 cpb=124.9000 (sum=146a863e)
 SipHash: max=   8 cycles=      5270 cpb= 43.9167 (sum=21c41a86355affcc)
HSipHash: max=   8 cycles=      7140 cpb= 59.5000 (sum=d3b5e0cd)
 SipHash: max=  16 cycles=     19950 cpb= 24.4485 (sum=26d816b72721e48f)
HSipHash: max=  16 cycles=     23546 cpb= 28.8554 (sum=16bb7475)
 SipHash: max=  32 cycles=     80188 cpb= 13.4004 (sum=dd5a97694b8a832d)
HSipHash: max=  32 cycles=    101218 cpb= 16.9148 (sum=eed00fcb)
 SipHash: max=  64 cycles=    373286 cpb=  8.1575 (sum=a2a725aecc09ed00)
HSipHash: max=  64 cycles=    535568 cpb= 11.7038 (sum=99b9f4f)
 SipHash: max= 128 cycles=   2075224 cpb=  5.8006 (sum=420878cd20272817)
HSipHash: max= 128 cycles=   3336820 cpb=  9.3270 (sum=f1f4ad18)
 SipHash: max= 256 cycles=  14276278 cpb=  5.0463 (sum=e05dfb28b90dfd98)
HSipHash: max= 256 cycles=  28847880 cpb= 10.1970 (sum=ceeafcc1)
 SipHash: max= 512 cycles=  50135180 cpb=  2.2281 (sum=7d129d4de145fbea)
HSipHash: max= 512 cycles=  86145916 cpb=  3.8286 (sum=7f15a313)
 SipHash: max=1024 cycles= 334111900 cpb=  1.8615 (sum=cca7cbdc778ca8af)
HSipHash: max=1024 cycles= 640432452 cpb=  3.5682 (sum=d8f3374f)


Here's the code; compile with -DSELFTEST.  (The main purpose of
printing the sum is to prevent dead code elimination.)


#if SELFTEST
#include <stdint.h>
#include <stdlib.h>

static inline uint64_t rol64(uint64_t word, unsigned int shift)
{
	return word << shift | word >> (64 - shift);
}

static inline uint32_t rol32(uint32_t word, unsigned int shift)
{
	return word << shift | word >> (32 - shift);
}

static inline uint64_t get_unaligned_le64(void const *p)
{
	return *(uint64_t const *)p;
}

static inline uint32_t get_unaligned_le32(void const *p)
{
	return *(uint32_t const *)p;
}

static inline uint64_t le64_to_cpup(uint64_t const *p)
{
	return *p;
}

static inline uint32_t le32_to_cpup(uint32_t const *p)
{
	return *p;
}


#else
#include <linux/bitops.h>	/* For rol64 */
#include <linux/cryptohash.h>
#include <asm/byteorder.h>
#include <asm/unaligned.h>
#endif

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
	(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
	 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *		i386	i386	i386	x86_64	x86_64	x86_64	x86_64
 * Length	small	unroll  halfmd4 small	unroll	halfmd4 teahash
 * 1..4		   1069	   1029	   1608	    195	    160	    399	    690
 * 1..8		   2483	   2381	   3851	    410	    360	    988	   1659
 * 1..12           4303	   4152	   6207	    690	    618	   1642	   2690
 * 1..16           6122	   5931	   8668	    968	    876	   2363	   3786
 * 1..20           8348	   8137	  11245	   1323	   1185	   3162	   5567
 * 1..24          10580	  10327	  13935	   1657	   1504	   4066	   7635
 * 1..28          13211	  12956	  16803	   2069	   1871	   5028	   9759
 * 1..32          15843	  15572	  19725	   2470	   2260	   6084	  11932
 * 1..36          18864	  18609	  24259	   2934	   2678	   7566	  14794
 * 1..1024      5890194	6130242	10264816 881933	 881244	3617392	7589036
 *
 * The performance penalty is quite minor, decreasing for long strings,
 * and it's significantly faster than half_md4, so I'm going for the
 * I-cache win.
 */
uint64_t
siphash24(char const *in, size_t len, uint32_t const seed[4])
{
	uint64_t a = 0x736f6d6570736575;	/* somepseu */
	uint64_t b = 0x646f72616e646f6d;	/* dorandom */
	uint64_t c = 0x6c7967656e657261;	/* lygenera */
	uint64_t d = 0x7465646279746573;	/* tedbytes */
	uint64_t m = 0;
	uint8_t padbyte = len;

	m = seed[2] | (uint64_t)seed[3] << 32;
	b ^= m;
	d ^= m;
	m = seed[0] | (uint64_t)seed[1] << 32;
	/* a ^= m; is done in loop below */
	c ^= m;

	/*
	 * By using the same SipRound code for all iterations, we
	 * save space, at the expense of some branch prediction.  But
	 * branch prediction is hard because of variable length anyway.
	 */
	len = len/8 + 3;	/* Now number of rounds to perform */
	do {
		a ^= m;

		switch (--len) {
			unsigned bytes;

		default:	/* Full words */
			d ^= m = get_unaligned_le64(in);
			in += 8;
			break;
		case 2:		/* Final partial word */
			/*
			 * We'd like to do one 64-bit fetch rather than
			 * mess around with bytes, but reading past the end
			 * might hit a protection boundary.  Fortunately,
			 * we know that protection boundaries are aligned,
			 * so we can consider only three cases:
			 * - The remainder occupies zero words
			 * - The remainder fits into one word
			 * - The remainder straddles two words
			 */
			bytes = padbyte & 7;

			if (bytes == 0) {
				m = 0;
			} else {
				unsigned offset = (unsigned)(uintptr_t)in & 7;

				if (offset + bytes <= 8) {
					m = le64_to_cpup((uint64_t const *)
								(in - offset));
					m >>= 8*offset;
				} else {
					m = get_unaligned_le64(in);
				}
				m &= ((uint64_t)1 << 8*bytes) - 1;
			}
			/* Could use | or +, but ^ allows associativity */
			d ^= m ^= (uint64_t)padbyte << 56;
			break;
		case 1:		/* Beginning of finalization */
			m = 0;
			c ^= 0xff;
			/*FALLTHROUGH*/
		case 0:		/* Second half of finalization */
			break;
		}

		SIP_ROUND(a, b, c, d);
		SIP_ROUND(a, b, c, d);
	} while (len);

	return a ^ b ^ c ^ d;
}

#undef SIP_ROUND
#undef SIP_MIX


#define HSIP_MIX(a, b, s) ((a) += (b), (b) = rol32(b, s), (b) ^= (a))

/*
 * These are the PRELIMINARY rotate constants suggested by
 * Jean-Philippe Aumasson.  Update to final when available.
 */
#define HSIP_ROUND(a, b, c, d) \
	(HSIP_MIX(a, b,  5), HSIP_MIX(c, d,  8), (a) = rol32(a, 16), \
	 HSIP_MIX(c, b,  7), HSIP_MIX(a, d, 13), (c) = rol32(c, 16))

uint32_t
hsiphash24(char const *in, size_t len, uint32_t const key[2])
{
	uint32_t c = key[0];
	uint32_t d = key[1];
	uint32_t a =     0x6c796765 ^ 0x736f6d65;
	uint32_t b = d ^ 0x74656462 ^ 0x646f7261;
	uint32_t m = c;
	uint8_t padbyte = len;

	/*
	 * By using the same SipRound code for all iterations, we
	 * save space, at the expense of some branch prediction.  But
	 * branch prediction is hard because of variable length anyway.
	 */
	len = len/sizeof(m) + 3;	/* Now number of rounds to perform */
	do {
		a ^= m;

		switch (--len) {
			unsigned bytes;

		default:	/* Full words */
			d ^= m = get_unaligned_le32(in);
			in += sizeof(m);
			break;
		case 2:		/* Final partial word */
			/*
			 * We'd like to do one 32-bit fetch rather than
			 * mess around with bytes, but reading past the end
			 * might hit a protection boundary.  Fortunately,
			 * we know that protection boundaries are aligned,
			 * so we can consider only three cases:
			 * - The remainder occupies zero words
			 * - The remainder fits into one word
			 * - The remainder straddles two words
			 */
			bytes = padbyte & 3;

			if (bytes == 0) {
				m = 0;
			} else {
				unsigned offset = (unsigned)(uintptr_t)in & 3;

				if (offset + bytes <= 4) {
					m = le32_to_cpup((uint32_t const *)
								(in - offset));
					m >>= 8*offset;
				} else {
					m = get_unaligned_le32(in);
				}
				m &= ((uint32_t)1 << 8*bytes) - 1;
			}
			/* Could use | or +, but ^ allows associativity */
			d ^= m ^= (uint32_t)padbyte << 24;
			break;
		case 1:		/* Beginning of finalization */
			m = 0;
			c ^= 0xff;
			/*FALLTHROUGH*/
		case 0:		/* Second half of finalization */
			break;
		}

		HSIP_ROUND(a, b, c, d);
		HSIP_ROUND(a, b, c, d);
	} while (len);

	return a ^ b ^ c ^ d;
	// return c + d;
}

#undef HSIP_ROUND
#undef HSIP_MIX

/*
 * No objection to EXPORT_SYMBOL, but we should probably figure out
 * how the seed[] array should work first.  Homework for the first
 * person to want to call it from a module!
 */

#if SELFTEST

#include <stdio.h>

static uint64_t rdtsc()
{
	uint32_t eax, edx;

	asm volatile ("rdtsc" : "=a" (eax), "=d" (edx));
	return (uint64_t)edx << 32 | eax;
}

int
main(void)
{
	static char const buf[1024] = { 0 };
	unsigned max;
	static const uint32_t key[4] = { 1, 2, 3, 4 };

	printf("SipHash/HSipHash benchmark, sizeof(long) = %u\n",
		(unsigned)sizeof(long));
	for (unsigned max = 4; max <= 1024; max *= 2) {
		uint64_t sum1 = 0;
		uint32_t sum2 = 0;
		uint64_t cycles;
		uint32_t bytes = 0;

		/* A less lazy person could figure out the closed form */
		for (int i = 1; i <= max; i++)
			bytes += i * (max + 1 - i);

		cycles = rdtsc();
		for (int i = 1; i <= max; i++)
			for (int j = 0; j <= max-i; j++)
				sum1 += siphash24(buf+j, i, key);
		cycles = rdtsc() - cycles;

		printf(" SipHash: max=%4u cycles=%10llu cpb=%8.4f (sum=%llx)\n",
			max, cycles, (double)cycles/bytes, sum1);

		cycles = rdtsc();
		for (int i = 1; i <= max; i++)
			for (int j = 0; j <= max-i; j++)
				sum2 += hsiphash24(buf+j, i, key);
		cycles = rdtsc() - cycles;
		printf("HSipHash: max=%4u cycles=%10llu cpb=%8.4f (sum=%lx)\n",
			max, cycles, (double)cycles/bytes, sum2);
	}
	return 0;
}


#endif

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jeffrey Walton @ 2016-12-17 14:55 UTC (permalink / raw)
  To: Jason A. Donenfeld
  Cc: Netdev, kernel-hardening, LKML, linux-crypto, David Laight,
	Ted Tso, Hannes Frederic Sowa, Linus Torvalds, Eric Biggers,
	Tom Herbert, George Spelvin, Vegard Nossum, ak, davem, luto,
	Jean-Philippe Aumasson, Daniel J . Bernstein
In-Reply-To: <20161215203003.31989-2-Jason@zx2c4.com>

> diff --git a/lib/test_siphash.c b/lib/test_siphash.c
> new file mode 100644
> index 000000000000..93549e4e22c5
> --- /dev/null
> +++ b/lib/test_siphash.c
> @@ -0,0 +1,83 @@
> +/* Test cases for siphash.c
> + *
> + * Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>. All Rights Reserved.
> + *
> + * This file is provided under a dual BSD/GPLv2 license.
> + *
> + * SipHash: a fast short-input PRF
> + * https://131002.net/siphash/
> + *
> + * This implementation is specifically for SipHash2-4.
> + */
> +
> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
> +
> +#include <linux/siphash.h>
> +#include <linux/kernel.h>
> +#include <linux/string.h>
> +#include <linux/errno.h>
> +#include <linux/module.h>
> +
> +/* Test vectors taken from official reference source available at:
> + *     https://131002.net/siphash/siphash24.c
> + */
> +static const u64 test_vectors[64] = {
> +       0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
> +       0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
> +       0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
> +       0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
> +       0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
> +       0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
> +       0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
> +       0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
> +       0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
> +       0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
> +       0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
> +       0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
> +       0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
> +       0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
> +       0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
> +       0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
> +       0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
> +       0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
> +       0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
> +       0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
> +       0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
> +       0x958a324ceb064572ULL
> +};
> +static const siphash_key_t test_key =
> +       { 0x0706050403020100ULL , 0x0f0e0d0c0b0a0908ULL };
> +
> +static int __init siphash_test_init(void)
> +{
> +       u8 in[64] __aligned(SIPHASH_ALIGNMENT);
> +       u8 in_unaligned[65];
> +       u8 i;
> +       int ret = 0;
> +
> +       for (i = 0; i < 64; ++i) {
> +               in[i] = i;
> +               in_unaligned[i + 1] = i;
> +               if (siphash(in, i, test_key) != test_vectors[i]) {
> +                       pr_info("self-test aligned %u: FAIL\n", i + 1);
> +                       ret = -EINVAL;
> +               }
> +               if (siphash_unaligned(in_unaligned + 1, i, test_key) != test_vectors[i]) {
> +                       pr_info("self-test unaligned %u: FAIL\n", i + 1);
> +                       ret = -EINVAL;
> +               }
> +       }
> +       if (!ret)
> +               pr_info("self-tests: pass\n");
> +       return ret;
> +}
> +
> +static void __exit siphash_test_exit(void)
> +{
> +}
> +
> +module_init(siphash_test_init);
> +module_exit(siphash_test_exit);
> +
> +MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
> +MODULE_LICENSE("Dual BSD/GPL");
> --
> 2.11.0
>

I believe the output of SipHash depends upon endianness. Folks who
request a digest through the af_alg interface will likely expect a
byte array.

I think that means on little endian machines, values like element 0
must be reversed byte reversed:

    0x726fdb47dd0e0e31ULL => 31,0e,0e,dd,47,db,6f,72

If I am not mistaken, that value (and other tv's) are returned here:

    return (v0 ^ v1) ^ (v2 ^ v3);

It may be prudent to include the endian reversal in the test to ensure
big endian machines produce expected results. Some closely related
testing on an old Apple PowerMac G5 revealed that result needed to be
reversed before returning it to a caller.

Jeff

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: George Spelvin @ 2016-12-17 12:42 UTC (permalink / raw)
  To: Jason
  Cc: ak, davem, David.Laight, djb, ebiggers3, hannes,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, luto, netdev, tom, torvalds, tytso,
	vegard.nossum
In-Reply-To: <CAHmME9rxCYfwyF6EADWqpAEt+yqCPgCLUVH0FPdAy7r-oPnrRg@mail.gmail.com>

BTW, here's some SipHash code I wrote for Linux a while ago.

My target application was ext4 directory hashing, resulting in different
implementation choices, although I still think that a rolled-up
implementation like this is reasonable.  Reducing I-cache impact speeds
up the calling code.

One thing I'd like to suggest you steal is the way it handles the
fetch of the final partial word.  It's a lot smaller and faster than
an 8-way case statement.


#include <linux/bitops.h>	/* For rol64 */
#include <linux/cryptohash.h>
#include <asm/byteorder.h>
#include <asm/unaligned.h>

/* The basic ARX mixing function, taken from Skein */
#define SIP_MIX(a, b, s) ((a) += (b), (b) = rol64(b, s), (b) ^= (a))

/*
 * The complete SipRound.  Note that, when unrolled twice like below,
 * the 32-bit rotates drop out on 32-bit machines.
 */
#define SIP_ROUND(a, b, c, d) \
	(SIP_MIX(a, b, 13), SIP_MIX(c, d, 16), (a) = rol64(a, 32), \
	 SIP_MIX(c, b, 17), SIP_MIX(a, d, 21), (c) = rol64(c, 32))

/*
 * This is rolled up more than most implementations, resulting in about
 * 55% the code size.  Speed is a few precent slower.  A crude benchmark
 * (for (i=1; i <= max; i++) for (j = 0; j < 4096-i; j++) hash(buf+j, i);)
 * produces the following timings (in usec):
 *
 *		i386	i386	i386	x86_64	x86_64	x86_64	x86_64
 * Length	small	unroll  halfmd4 small	unroll	halfmd4 teahash
 * 1..4		   1069	   1029	   1608	    195	    160	    399	    690
 * 1..8		   2483	   2381	   3851	    410	    360	    988	   1659
 * 1..12           4303	   4152	   6207	    690	    618	   1642	   2690
 * 1..16           6122	   5931	   8668	    968	    876	   2363	   3786
 * 1..20           8348	   8137	  11245	   1323	   1185	   3162	   5567
 * 1..24          10580	  10327	  13935	   1657	   1504	   4066	   7635
 * 1..28          13211	  12956	  16803	   2069	   1871	   5028	   9759
 * 1..32          15843	  15572	  19725	   2470	   2260	   6084	  11932
 * 1..36          18864	  18609	  24259	   2934	   2678	   7566	  14794
 * 1..1024      5890194	6130242	10264816 881933	 881244	3617392	7589036
 *
 * The performance penalty is quite minor, decreasing for long strings,
 * and it's significantly faster than half_md4, so I'm going for the
 * I-cache win.
 */
uint64_t
siphash24(char const *in, size_t len, uint32_t const seed[4])
{
	uint64_t a = 0x736f6d6570736575;	/* somepseu */
	uint64_t b = 0x646f72616e646f6d;	/* dorandom */
	uint64_t c = 0x6c7967656e657261;	/* lygenera */
	uint64_t d = 0x7465646279746573;	/* tedbytes */
	uint64_t m = 0;
	uint8_t padbyte = len;

	/*
	 * Mix in the 128-bit hash seed.  This is in a format convenient
	 * to the ext3/ext4 code.  Please feel free to adapt the
	 * */
	if (seed) {
		m = seed[2] | (uint64_t)seed[3] << 32;
		b ^= m;
		d ^= m;
		m = seed[0] | (uint64_t)seed[1] << 32;
		/* a ^= m; is done in loop below */
		c ^= m;
	}

	/*
	 * By using the same SipRound code for all iterations, we
	 * save space, at the expense of some branch prediction.  But
	 * branch prediction is hard because of variable length anyway.
	 */
	len = len/8 + 3;	/* Now number of rounds to perform */
	do {
		a ^= m;

		switch (--len) {
			unsigned bytes;

		default:	/* Full words */
			d ^= m = get_unaligned_le64(in);
			in += 8;
			break;
		case 2:		/* Final partial word */
			/*
			 * We'd like to do one 64-bit fetch rather than
			 * mess around with bytes, but reading past the end
			 * might hit a protection boundary.  Fortunately,
			 * we know that protection boundaries are aligned,
			 * so we can consider only three cases:
			 * - The remainder occupies zero words
			 * - The remainder fits into one word
			 * - The remainder straddles two words
			 */
			bytes = padbyte & 7;

			if (bytes == 0) {
				m = 0;
			} else {
				unsigned offset = (unsigned)(uintptr_t)in & 7;

				if (offset + bytes <= 8) {
					m = le64_to_cpup((uint64_t const *)
								(in - offset));
					m >>= 8*offset;
				} else {
					m = get_unaligned_le64(in);
				}
				m &= ((uint64_t)1 << 8*bytes) - 1;
			}
			/* Could use | or +, but ^ allows associativity */
			d ^= m ^= (uint64_t)padbyte << 56;
			break;
		case 1:		/* Beginning of finalization */
			m = 0;
			c ^= 0xff;
			/*FALLTHROUGH*/
		case 0:		/* Second half of finalization */
			break;
		}

		SIP_ROUND(a, b, c, d);
		SIP_ROUND(a, b, c, d);
	} while (len);

	return a ^ b ^ c ^ d;
}

#undef SIP_ROUND
#undef SIP_MIX

/*
 * No objection to EXPORT_SYMBOL, but we should probably figure out
 * how the seed[] array should work first.  Homework for the first
 * person to want to call it from a module!
 */

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: George Spelvin @ 2016-12-17  2:15 UTC (permalink / raw)
  To: Jason, linux
  Cc: ak, davem, David.Laight, djb, ebiggers3, hannes,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <CAHmME9r4YNqNSZ-KXAHtJN_vm+eL1tSoC-6muHaFUN6fWhkO2g@mail.gmail.com>

> I already did this. Check my branch.

Do you think it should return "u32" (as you currently have it) or
"unsigned long"?  I thought the latter, since it doesn't cost any
more and makes more 

> I wonder if this could also lead to a similar aliasing
> with arch_get_random_int, since I'm pretty sure all rdrand-like
> instructions return native word size anyway.

Well, Intel's can return 16, 32 or 64 bits, and it makes a
small difference with reseed scheduling.

>> - Ted, Andy Lutorminski and I will try to figure out a construction of
>>   get_random_long() that we all like.

> And me, I hope... No need to make this exclusive.

Gaah, engage brain before fingers.  That was so obvious I didn't say
it, and the result came out sounding extremely rude.

A better (but longer) way to write it would be "I'm sorry that I, Ted,
and Andy are all arguing with you and each other about how to do this
and we can't finalize this part yet".

^ permalink raw reply

* Re: [PATCH v5 1/4] siphash: add cryptographically secure PRF
From: Jason A. Donenfeld @ 2016-12-17  1:39 UTC (permalink / raw)
  To: George Spelvin
  Cc: Andi Kleen, David Miller, David Laight, Daniel J . Bernstein,
	Eric Biggers, Hannes Frederic Sowa, Jean-Philippe Aumasson,
	kernel-hardening, Linux Crypto Mailing List, LKML,
	Andy Lutomirski, Netdev, Tom Herbert, Linus Torvalds,
	Theodore Ts'o, Vegard Nossum
In-Reply-To: <20161216234408.30174.qmail@ns.sciencehorizons.net>

On Sat, Dec 17, 2016 at 12:44 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
> Ths advice I'd give now is:
> - Implement
> unsigned long hsiphash(const void *data, size_t len, const unsigned long key[2])
>   .. as SipHash on 64-bit (maybe SipHash-1-3, still being discussed) and
>   HalfSipHash on 32-bit.

I already did this. Check my branch.

> - Document when it may or may not be used carefully.

Good idea. I'll write up some extensive documentation about all of
this, detailing use cases and our various conclusions.

> - #define get_random_int (unsigned)get_random_long

That's a good idea, since ultimately the other just casts in the
return value. I wonder if this could also lead to a similar aliasing
with arch_get_random_int, since I'm pretty sure all rdrand-like
instructions return native word size anyway.

> - Ted, Andy Lutorminski and I will try to figure out a construction of
>   get_random_long() that we all like.

And me, I hope... No need to make this exclusive.

Jason

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox