Linux cryptographic layer development
 help / color / mirror / Atom feed
* Crypto Fixes for 4.9
From: Herbert Xu @ 2016-12-10  6:01 UTC (permalink / raw)
  To: Linus Torvalds, David S. Miller, Linux Kernel Mailing List,
	Linux Crypto Mailing List
In-Reply-To: <20161205063754.GA9408@gondor.apana.org.au>

Hi Linus:

This push fixes the following issues:

- Fix pointer size when caam is used with AArch64 boot loader on
  AArch32 kernel.
- Fix ahash state corruption in marvell driver.
- Fix buggy algif_aed tag handling.
- Prevent mcryptd from being used with incompatible algorithms
  which can cause crashes.


Please pull from

git://git.kernel.org/pub/scm/linux/kernel/git/herbert/crypto-2.6.git linus


Horia Geantă (1):
      crypto: caam - fix pointer size for AArch64 boot loader, AArch32 kernel

Romain Perier (2):
      crypto: marvell - Don't copy hash operation twice into the SRAM
      crypto: marvell - Don't corrupt state of an STD req for re-stepped ahash

Stephan Mueller (2):
      crypto: algif_aead - fix AEAD tag memory handling
      crypto: algif_aead - fix uninitialized variable warning

tim (1):
      crypto: mcryptd - Check mcryptd algorithm compatibility

 crypto/algif_aead.c           |   59 ++++++++++++++++++++++++++---------------
 crypto/mcryptd.c              |   19 ++++++++-----
 drivers/crypto/caam/ctrl.c    |    5 ++--
 drivers/crypto/marvell/hash.c |   11 ++++----
 4 files changed, 57 insertions(+), 37 deletions(-)

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Remaining crypto API regressions with CONFIG_VMAP_STACK
From: Eric Biggers @ 2016-12-10  5:55 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: linux-crypto, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kernel-hardening@lists.openwall.com, Herbert Xu,
	Andrew Lutomirski, Stephan Mueller
In-Reply-To: <CALCETrW=+3u3P8Xva+0ck9=fr-mD6azPtTkOQ3uQO+GoOA6FcQ@mail.gmail.com>

On Fri, Dec 09, 2016 at 09:25:38PM -0800, Andy Lutomirski wrote:
> > The following crypto drivers initialize a scatterlist to point into an
> > ahash_request, which may have been allocated on the stack with
> > AHASH_REQUEST_ON_STACK():
> >
> >         drivers/crypto/bfin_crc.c:351
> >         drivers/crypto/qce/sha.c:299
> >         drivers/crypto/sahara.c:973,988
> >         drivers/crypto/talitos.c:1910
> 
> This are impossible or highly unlikely on x86.
> 
> >         drivers/crypto/ccp/ccp-crypto-aes-cmac.c:105,119,142
> >         drivers/crypto/ccp/ccp-crypto-sha.c:95,109,124
> 
> These
> 
> >         drivers/crypto/qce/sha.c:325
> 
> This is impossible on x86.
> 

Thanks for looking into these.  I didn't investigate who/what is likely to be
using each driver.

Of course I would not be surprised to see people want to start supporting
virtually mapped stacks on other architectures too.

> >
> > The "good" news with these bugs is that on x86_64 without CONFIG_DEBUG_SG=y or
> > CONFIG_DEBUG_VIRTUAL=y, you can still do virt_to_page() and then page_address()
> > on a vmalloc address and get back the same address, even though you aren't
> > *supposed* to be able to do this.  This will make things still work for most
> > people.  The bad news is that if you happen to have consumed just about 1 page
> > (or N pages) of your stack at the time you call the crypto API, your stack
> > buffer may actually span physically non-contiguous pages, so the crypto
> > algorithm will scribble over some unrelated page.
> 
> Are you sure?  If it round-trips to the same virtual address, it
> doesn't matter if the buffer is contiguous.

You may be right, I didn't test this.  The hash_walk and blkcipher_walk code do
go page by page, but I suppose on x86_64 it would just step from one bogus
"struct page" to the adjacent one and still map it to the original virtual
address.

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: Remaining crypto API regressions with CONFIG_VMAP_STACK
From: Herbert Xu @ 2016-12-10  5:37 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric Biggers, linux-crypto, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, kernel-hardening@lists.openwall.com,
	Andrew Lutomirski, Stephan Mueller
In-Reply-To: <CALCETrW=+3u3P8Xva+0ck9=fr-mD6azPtTkOQ3uQO+GoOA6FcQ@mail.gmail.com>

On Fri, Dec 09, 2016 at 09:25:38PM -0800, Andy Lutomirski wrote:
>
> Herbert, how hard would it be to teach the crypto code to use a more
> sensible data structure than scatterlist and to use coccinelle fix
> this stuff for real?

First of all we already have a sync non-SG hash interface, it's
called shash.

If we had enough sync-only users of skcipher then I'll consider
adding an interface for it.  However, at this point in time it
appears to more sense to convert such users over to the async
interface rather than the other way around.

As for AEAD we never had a sync interface to begin with and I
don't think I'm going to add one.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: Remaining crypto API regressions with CONFIG_VMAP_STACK
From: Herbert Xu @ 2016-12-10  5:32 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Eric Biggers, linux-crypto, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org, kernel-hardening@lists.openwall.com,
	Andrew Lutomirski, Stephan Mueller
In-Reply-To: <CALCETrW=+3u3P8Xva+0ck9=fr-mD6azPtTkOQ3uQO+GoOA6FcQ@mail.gmail.com>

On Fri, Dec 09, 2016 at 09:25:38PM -0800, Andy Lutomirski wrote:
>
> > The following crypto drivers initialize a scatterlist to point into an
> > ablkcipher_request, which may have been allocated on the stack with
> > SKCIPHER_REQUEST_ON_STACK():
> >
> >         drivers/crypto/ccp/ccp-crypto-aes-xts.c:162
> >         drivers/crypto/ccp/ccp-crypto-aes.c:94
> 
> These are real, and I wish I'd known about them sooner.

Are you sure? Any instance of *_ON_STACK must only be used with
sync algorithms and most drivers under drivers/crypto declare
themselves as async.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH 7/7] hwrng: core: Remove two unused include
From: kbuild test robot @ 2016-12-10  5:31 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: kbuild-all, mpm, herbert, arnd, gregkh, linux-crypto,
	linux-kernel, Corentin Labbe
In-Reply-To: <1481293299-21697-7-git-send-email-clabbe.montjoie@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 5431 bytes --]

Hi Corentin,

[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on v4.9-rc8 next-20161209]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Corentin-Labbe/hwrng-core-do-not-use-multiple-blank-lines/20161210-072632
config: i386-randconfig-i0-201649 (attached as .config)
compiler: gcc-4.8 (Debian 4.8.4-1) 4.8.4
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All errors (new ones prefixed by >>):

   drivers/char/hw_random/core.c: In function 'rng_dev_open':
>> drivers/char/hw_random/core.c:169:11: error: dereferencing pointer to incomplete type
     if ((filp->f_mode & FMODE_READ) == 0)
              ^
   drivers/char/hw_random/core.c:169:22: error: 'FMODE_READ' undeclared (first use in this function)
     if ((filp->f_mode & FMODE_READ) == 0)
                         ^
   drivers/char/hw_random/core.c:169:22: note: each undeclared identifier is reported only once for each function it appears in
   drivers/char/hw_random/core.c:171:10: error: dereferencing pointer to incomplete type
     if (filp->f_mode & FMODE_WRITE)
             ^
   drivers/char/hw_random/core.c:171:21: error: 'FMODE_WRITE' undeclared (first use in this function)
     if (filp->f_mode & FMODE_WRITE)
                        ^
   drivers/char/hw_random/core.c: In function 'rng_dev_read':
   drivers/char/hw_random/core.c:221:11: error: dereferencing pointer to incomplete type
        !(filp->f_flags & O_NONBLOCK));
              ^
   drivers/char/hw_random/core.c:221:23: error: 'O_NONBLOCK' undeclared (first use in this function)
        !(filp->f_flags & O_NONBLOCK));
                          ^
   drivers/char/hw_random/core.c:230:12: error: dereferencing pointer to incomplete type
       if (filp->f_flags & O_NONBLOCK) {
               ^
   drivers/char/hw_random/core.c: At top level:
   drivers/char/hw_random/core.c:272:21: error: variable 'rng_chrdev_ops' has initializer but incomplete type
    static const struct file_operations rng_chrdev_ops = {
                        ^
   drivers/char/hw_random/core.c:273:2: error: unknown field 'owner' specified in initializer
     .owner  = THIS_MODULE,
     ^
   In file included from include/linux/linkage.h:6:0,
                    from include/linux/kernel.h:6,
                    from include/linux/delay.h:10,
                    from drivers/char/hw_random/core.c:13:
   include/linux/export.h:37:30: warning: excess elements in struct initializer [enabled by default]
    #define THIS_MODULE ((struct module *)0)
                                 ^
   drivers/char/hw_random/core.c:273:12: note: in expansion of macro 'THIS_MODULE'
     .owner  = THIS_MODULE,
               ^
   include/linux/export.h:37:30: warning: (near initialization for 'rng_chrdev_ops') [enabled by default]
    #define THIS_MODULE ((struct module *)0)
                                 ^
   drivers/char/hw_random/core.c:273:12: note: in expansion of macro 'THIS_MODULE'
     .owner  = THIS_MODULE,
               ^
   drivers/char/hw_random/core.c:274:2: error: unknown field 'open' specified in initializer
     .open  = rng_dev_open,
     ^
   drivers/char/hw_random/core.c:274:2: warning: excess elements in struct initializer [enabled by default]
   drivers/char/hw_random/core.c:274:2: warning: (near initialization for 'rng_chrdev_ops') [enabled by default]
   drivers/char/hw_random/core.c:275:2: error: unknown field 'read' specified in initializer
     .read  = rng_dev_read,
     ^
   drivers/char/hw_random/core.c:275:2: warning: excess elements in struct initializer [enabled by default]
   drivers/char/hw_random/core.c:275:2: warning: (near initialization for 'rng_chrdev_ops') [enabled by default]
   drivers/char/hw_random/core.c:276:2: error: unknown field 'llseek' specified in initializer
     .llseek  = noop_llseek,
     ^
   drivers/char/hw_random/core.c:276:13: error: 'noop_llseek' undeclared here (not in a function)
     .llseek  = noop_llseek,
                ^
   drivers/char/hw_random/core.c:276:2: warning: excess elements in struct initializer [enabled by default]
     .llseek  = noop_llseek,
     ^
   drivers/char/hw_random/core.c:276:2: warning: (near initialization for 'rng_chrdev_ops') [enabled by default]

vim +169 drivers/char/hw_random/core.c

844dd05f Michael Buesch 2006-06-26  163  	return 0;
844dd05f Michael Buesch 2006-06-26  164  }
844dd05f Michael Buesch 2006-06-26  165  
844dd05f Michael Buesch 2006-06-26  166  static int rng_dev_open(struct inode *inode, struct file *filp)
844dd05f Michael Buesch 2006-06-26  167  {
844dd05f Michael Buesch 2006-06-26  168  	/* enforce read-only access to this chrdev */
844dd05f Michael Buesch 2006-06-26 @169  	if ((filp->f_mode & FMODE_READ) == 0)
844dd05f Michael Buesch 2006-06-26  170  		return -EINVAL;
844dd05f Michael Buesch 2006-06-26  171  	if (filp->f_mode & FMODE_WRITE)
844dd05f Michael Buesch 2006-06-26  172  		return -EINVAL;

:::::: The code at line 169 was first introduced by commit
:::::: 844dd05fec172d98b0dacecd9b9e9f6595204c13 [PATCH] Add new generic HW RNG core

:::::: TO: Michael Buesch <mb@bu3sch.de>
:::::: CC: Linus Torvalds <torvalds@g5.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 26603 bytes --]

^ permalink raw reply

* Re: Remaining crypto API regressions with CONFIG_VMAP_STACK
From: Andy Lutomirski @ 2016-12-10  5:25 UTC (permalink / raw)
  To: Eric Biggers
  Cc: linux-crypto, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
	kernel-hardening@lists.openwall.com, Herbert Xu,
	Andrew Lutomirski, Stephan Mueller
In-Reply-To: <20161209230851.GB64048@google.com>

On Fri, Dec 9, 2016 at 3:08 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> In the 4.9 kernel, virtually-mapped stacks will be supported and enabled by
> default on x86_64.  This has been exposing a number of problems in which
> on-stack buffers are being passed into the crypto API, which to support crypto
> accelerators operates on 'struct page' rather than on virtual memory.
>
> Some of these problems have already been fixed, but I was wondering how many
> problems remain, so I briefly looked through all the callers of sg_set_buf() and
> sg_init_one().  Overall I found quite a few remaining problems, detailed below.
>
> The following crypto drivers initialize a scatterlist to point into an
> ahash_request, which may have been allocated on the stack with
> AHASH_REQUEST_ON_STACK():
>
>         drivers/crypto/bfin_crc.c:351
>         drivers/crypto/qce/sha.c:299
>         drivers/crypto/sahara.c:973,988
>         drivers/crypto/talitos.c:1910

This are impossible or highly unlikely on x86.

>         drivers/crypto/ccp/ccp-crypto-aes-cmac.c:105,119,142
>         drivers/crypto/ccp/ccp-crypto-sha.c:95,109,124

These

>         drivers/crypto/qce/sha.c:325

This is impossible on x86.

>
> The following crypto drivers initialize a scatterlist to point into an
> ablkcipher_request, which may have been allocated on the stack with
> SKCIPHER_REQUEST_ON_STACK():
>
>         drivers/crypto/ccp/ccp-crypto-aes-xts.c:162
>         drivers/crypto/ccp/ccp-crypto-aes.c:94

These are real, and I wish I'd known about them sooner.

>
> And these other places do crypto operations on buffers clearly on the stack:
>
>         drivers/net/wireless/intersil/orinoco/mic.c:72

Ick.

>         drivers/usb/wusbcore/crypto.c:264

Well, crud.  I thought I had fixed this driver but I missed one case.
Will send a fix tomorrow.  But I'm still unconvinced that this
hardware ever shipped.

>         net/ceph/crypto.c:182

Ick.

>         net/rxrpc/rxkad.c:737,1000

Well, crud.  This was supposed to have been fixed in:

commit a263629da519b2064588377416e067727e2cbdf9
Author: Herbert Xu <herbert@gondor.apana.org.au>
Date:   Sun Jun 26 14:55:24 2016 -0700

    rxrpc: Avoid using stack memory in SG lists in rxkad


>         security/keys/encrypted-keys/encrypted.c:500

That's a trivial one-liner.  Patch coming tomorrow.

>         fs/cifs/smbencrypt.c:96

Ick.

>
> Note: I almost certainly missed some, since I excluded places where the use of a
> stack buffer was not obvious to me.  I also excluded AEAD algorithms since there
> isn't an AEAD_REQUEST_ON_STACK() macro (yet).
>
> The "good" news with these bugs is that on x86_64 without CONFIG_DEBUG_SG=y or
> CONFIG_DEBUG_VIRTUAL=y, you can still do virt_to_page() and then page_address()
> on a vmalloc address and get back the same address, even though you aren't
> *supposed* to be able to do this.  This will make things still work for most
> people.  The bad news is that if you happen to have consumed just about 1 page
> (or N pages) of your stack at the time you call the crypto API, your stack
> buffer may actually span physically non-contiguous pages, so the crypto
> algorithm will scribble over some unrelated page.

Are you sure?  If it round-trips to the same virtual address, it
doesn't matter if the buffer is contiguous.

>  Also, hardware crypto drivers
> which actually do operate on physical memory will break too.

Those were already broken.  DMA has been illegal on the stack for
years and DMA debugging would have caught it.

>
> So I am wondering: is the best solution really to make all these crypto API
> algorithms and users use heap buffers, as opposed to something like maintaining
> a lowmem alias for the stack, or introducing a more general function to convert
> buffers (possibly in the vmalloc space) into scatterlists?  And if the current
> solution is desired, who is going to fix all of these bugs and when?

The *right* solution IMO is to fix crypto to stop using scatterlists.
Scatterlists are for DMA using physical addresses, and they're
inappropriate almost every user of them that's using them for crypto.
kiov would be much better -- it would make sense and it would be
faster.

I have a hack to make scatterlists pointing to the stack work (as long
as they're only one element), but that's seriously gross.

Herbert, how hard would it be to teach the crypto code to use a more
sensible data structure than scatterlist and to use coccinelle fix
this stuff for real?

In the mean time, we should patch the handful of drivers that matter.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH 7/7] hwrng: core: Remove two unused include
From: kbuild test robot @ 2016-12-10  1:27 UTC (permalink / raw)
  To: Corentin Labbe
  Cc: kbuild-all, mpm, herbert, arnd, gregkh, linux-crypto,
	linux-kernel, Corentin Labbe
In-Reply-To: <1481293299-21697-7-git-send-email-clabbe.montjoie@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 23999 bytes --]

Hi Corentin,

[auto build test ERROR on char-misc/char-misc-testing]
[also build test ERROR on v4.9-rc8 next-20161209]
[if your patch is applied to the wrong git tree, please drop us a note to help improve the system]

url:    https://github.com/0day-ci/linux/commits/Corentin-Labbe/hwrng-core-do-not-use-multiple-blank-lines/20161210-072632
config: i386-randconfig-x007-201649 (attached as .config)
compiler: gcc-6 (Debian 6.2.0-3) 6.2.0 20160901
reproduce:
        # save the attached .config to linux build tree
        make ARCH=i386 

All error/warnings (new ones prefixed by >>):

   In file included from include/linux/linkage.h:4:0,
                    from include/linux/kernel.h:6,
                    from include/linux/delay.h:10,
                    from drivers/char/hw_random/core.c:13:
   drivers/char/hw_random/core.c: In function 'rng_dev_open':
>> drivers/char/hw_random/core.c:169:11: error: dereferencing pointer to incomplete type 'struct file'
     if ((filp->f_mode & FMODE_READ) == 0)
              ^
   include/linux/compiler.h:149:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
>> drivers/char/hw_random/core.c:169:2: note: in expansion of macro 'if'
     if ((filp->f_mode & FMODE_READ) == 0)
     ^~
>> drivers/char/hw_random/core.c:169:22: error: 'FMODE_READ' undeclared (first use in this function)
     if ((filp->f_mode & FMODE_READ) == 0)
                         ^
   include/linux/compiler.h:149:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
>> drivers/char/hw_random/core.c:169:2: note: in expansion of macro 'if'
     if ((filp->f_mode & FMODE_READ) == 0)
     ^~
   drivers/char/hw_random/core.c:169:22: note: each undeclared identifier is reported only once for each function it appears in
     if ((filp->f_mode & FMODE_READ) == 0)
                         ^
   include/linux/compiler.h:149:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
>> drivers/char/hw_random/core.c:169:2: note: in expansion of macro 'if'
     if ((filp->f_mode & FMODE_READ) == 0)
     ^~
>> drivers/char/hw_random/core.c:171:21: error: 'FMODE_WRITE' undeclared (first use in this function)
     if (filp->f_mode & FMODE_WRITE)
                        ^
   include/linux/compiler.h:149:30: note: in definition of macro '__trace_if'
     if (__builtin_constant_p(!!(cond)) ? !!(cond) :   \
                                 ^~~~
   drivers/char/hw_random/core.c:171:2: note: in expansion of macro 'if'
     if (filp->f_mode & FMODE_WRITE)
     ^~
   drivers/char/hw_random/core.c: In function 'rng_dev_read':
>> drivers/char/hw_random/core.c:221:23: error: 'O_NONBLOCK' undeclared (first use in this function)
        !(filp->f_flags & O_NONBLOCK));
                          ^~~~~~~~~~
   drivers/char/hw_random/core.c: At top level:
>> drivers/char/hw_random/core.c:272:21: error: variable 'rng_chrdev_ops' has initializer but incomplete type
    static const struct file_operations rng_chrdev_ops = {
                        ^~~~~~~~~~~~~~~
>> drivers/char/hw_random/core.c:273:2: error: unknown field 'owner' specified in initializer
     .owner  = THIS_MODULE,
     ^
   In file included from include/linux/linkage.h:6:0,
                    from include/linux/kernel.h:6,
                    from include/linux/delay.h:10,
                    from drivers/char/hw_random/core.c:13:
   include/linux/export.h:37:21: warning: excess elements in struct initializer
    #define THIS_MODULE ((struct module *)0)
                        ^
>> drivers/char/hw_random/core.c:273:12: note: in expansion of macro 'THIS_MODULE'
     .owner  = THIS_MODULE,
               ^~~~~~~~~~~
   include/linux/export.h:37:21: note: (near initialization for 'rng_chrdev_ops')
    #define THIS_MODULE ((struct module *)0)
                        ^
>> drivers/char/hw_random/core.c:273:12: note: in expansion of macro 'THIS_MODULE'
     .owner  = THIS_MODULE,
               ^~~~~~~~~~~
>> drivers/char/hw_random/core.c:274:2: error: unknown field 'open' specified in initializer
     .open  = rng_dev_open,
     ^
>> drivers/char/hw_random/core.c:274:11: warning: excess elements in struct initializer
     .open  = rng_dev_open,
              ^~~~~~~~~~~~
   drivers/char/hw_random/core.c:274:11: note: (near initialization for 'rng_chrdev_ops')
>> drivers/char/hw_random/core.c:275:2: error: unknown field 'read' specified in initializer
     .read  = rng_dev_read,
     ^
   drivers/char/hw_random/core.c:275:11: warning: excess elements in struct initializer
     .read  = rng_dev_read,
              ^~~~~~~~~~~~
   drivers/char/hw_random/core.c:275:11: note: (near initialization for 'rng_chrdev_ops')
>> drivers/char/hw_random/core.c:276:2: error: unknown field 'llseek' specified in initializer
     .llseek  = noop_llseek,
     ^
>> drivers/char/hw_random/core.c:276:13: error: 'noop_llseek' undeclared here (not in a function)
     .llseek  = noop_llseek,
                ^~~~~~~~~~~
   drivers/char/hw_random/core.c:276:13: warning: excess elements in struct initializer
   drivers/char/hw_random/core.c:276:13: note: (near initialization for 'rng_chrdev_ops')
>> drivers/char/hw_random/core.c:272:37: error: storage size of 'rng_chrdev_ops' isn't known
    static const struct file_operations rng_chrdev_ops = {
                                        ^~~~~~~~~~~~~~

vim +169 drivers/char/hw_random/core.c

92c7987e7 Corentin Labbe    2016-12-09    7   * Please read Documentation/hw_random.txt for details on use.
92c7987e7 Corentin Labbe    2016-12-09    8   *
92c7987e7 Corentin Labbe    2016-12-09    9   * This software may be used and distributed according to the terms
92c7987e7 Corentin Labbe    2016-12-09   10   * of the GNU General Public License, incorporated herein by reference.
844dd05fe Michael Buesch    2006-06-26   11   */
844dd05fe Michael Buesch    2006-06-26   12  
b70f09c75 Corentin Labbe    2016-12-09  @13  #include <linux/delay.h>
844dd05fe Michael Buesch    2006-06-26   14  #include <linux/device.h>
b70f09c75 Corentin Labbe    2016-12-09   15  #include <linux/err.h>
844dd05fe Michael Buesch    2006-06-26   16  #include <linux/hw_random.h>
844dd05fe Michael Buesch    2006-06-26   17  #include <linux/kernel.h>
be4000bc4 Torsten Duwe      2014-06-14   18  #include <linux/kthread.h>
b70f09c75 Corentin Labbe    2016-12-09   19  #include <linux/miscdevice.h>
b70f09c75 Corentin Labbe    2016-12-09   20  #include <linux/module.h>
d9e797261 Kees Cook         2014-03-03   21  #include <linux/random.h>
b70f09c75 Corentin Labbe    2016-12-09   22  #include <linux/slab.h>
b70f09c75 Corentin Labbe    2016-12-09   23  #include <linux/uaccess.h>
844dd05fe Michael Buesch    2006-06-26   24  
844dd05fe Michael Buesch    2006-06-26   25  #define RNG_MODULE_NAME		"hw_random"
844dd05fe Michael Buesch    2006-06-26   26  
844dd05fe Michael Buesch    2006-06-26   27  static struct hwrng *current_rng;
be4000bc4 Torsten Duwe      2014-06-14   28  static struct task_struct *hwrng_fill;
844dd05fe Michael Buesch    2006-06-26   29  static LIST_HEAD(rng_list);
9372b35e1 Rusty Russell     2014-12-08   30  /* Protects rng_list and current_rng */
844dd05fe Michael Buesch    2006-06-26   31  static DEFINE_MUTEX(rng_mutex);
9372b35e1 Rusty Russell     2014-12-08   32  /* Protects rng read functions, data_avail, rng_buffer and rng_fillbuf */
9372b35e1 Rusty Russell     2014-12-08   33  static DEFINE_MUTEX(reading_mutex);
9996508b3 Ian Molton        2009-12-01   34  static int data_avail;
be4000bc4 Torsten Duwe      2014-06-14   35  static u8 *rng_buffer, *rng_fillbuf;
0f734e6e7 Torsten Duwe      2014-06-14   36  static unsigned short current_quality;
0f734e6e7 Torsten Duwe      2014-06-14   37  static unsigned short default_quality; /* = 0; default to "off" */
be4000bc4 Torsten Duwe      2014-06-14   38  
be4000bc4 Torsten Duwe      2014-06-14   39  module_param(current_quality, ushort, 0644);
be4000bc4 Torsten Duwe      2014-06-14   40  MODULE_PARM_DESC(current_quality,
be4000bc4 Torsten Duwe      2014-06-14   41  		 "current hwrng entropy estimation per mill");
0f734e6e7 Torsten Duwe      2014-06-14   42  module_param(default_quality, ushort, 0644);
0f734e6e7 Torsten Duwe      2014-06-14   43  MODULE_PARM_DESC(default_quality,
0f734e6e7 Torsten Duwe      2014-06-14   44  		 "default entropy content of hwrng per mill");
be4000bc4 Torsten Duwe      2014-06-14   45  
ff77c150f Herbert Xu        2014-12-23   46  static void drop_current_rng(void);
90ac41bd4 Herbert Xu        2014-12-23   47  static int hwrng_init(struct hwrng *rng);
be4000bc4 Torsten Duwe      2014-06-14   48  static void start_khwrngd(void);
f7f154f12 Rusty Russell     2013-03-05   49  
d3cc79964 Amit Shah         2014-07-10   50  static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size,
d3cc79964 Amit Shah         2014-07-10   51  			       int wait);
d3cc79964 Amit Shah         2014-07-10   52  
f7f154f12 Rusty Russell     2013-03-05   53  static size_t rng_buffer_size(void)
f7f154f12 Rusty Russell     2013-03-05   54  {
f7f154f12 Rusty Russell     2013-03-05   55  	return SMP_CACHE_BYTES < 32 ? 32 : SMP_CACHE_BYTES;
f7f154f12 Rusty Russell     2013-03-05   56  }
844dd05fe Michael Buesch    2006-06-26   57  
d3cc79964 Amit Shah         2014-07-10   58  static void add_early_randomness(struct hwrng *rng)
d3cc79964 Amit Shah         2014-07-10   59  {
d3cc79964 Amit Shah         2014-07-10   60  	int bytes_read;
6d4952d9d Andrew Lutomirski 2016-10-17   61  	size_t size = min_t(size_t, 16, rng_buffer_size());
d3cc79964 Amit Shah         2014-07-10   62  
9372b35e1 Rusty Russell     2014-12-08   63  	mutex_lock(&reading_mutex);
6d4952d9d Andrew Lutomirski 2016-10-17   64  	bytes_read = rng_get_data(rng, rng_buffer, size, 1);
9372b35e1 Rusty Russell     2014-12-08   65  	mutex_unlock(&reading_mutex);
d3cc79964 Amit Shah         2014-07-10   66  	if (bytes_read > 0)
6d4952d9d Andrew Lutomirski 2016-10-17   67  		add_device_randomness(rng_buffer, bytes_read);
d3cc79964 Amit Shah         2014-07-10   68  }
d3cc79964 Amit Shah         2014-07-10   69  
3a2c0ba5a Rusty Russell     2014-12-08   70  static inline void cleanup_rng(struct kref *kref)
3a2c0ba5a Rusty Russell     2014-12-08   71  {
3a2c0ba5a Rusty Russell     2014-12-08   72  	struct hwrng *rng = container_of(kref, struct hwrng, ref);
3a2c0ba5a Rusty Russell     2014-12-08   73  
3a2c0ba5a Rusty Russell     2014-12-08   74  	if (rng->cleanup)
3a2c0ba5a Rusty Russell     2014-12-08   75  		rng->cleanup(rng);
a027f30d7 Rusty Russell     2014-12-08   76  
77584ee57 Herbert Xu        2014-12-23   77  	complete(&rng->cleanup_done);
3a2c0ba5a Rusty Russell     2014-12-08   78  }
3a2c0ba5a Rusty Russell     2014-12-08   79  
90ac41bd4 Herbert Xu        2014-12-23   80  static int set_current_rng(struct hwrng *rng)
3a2c0ba5a Rusty Russell     2014-12-08   81  {
90ac41bd4 Herbert Xu        2014-12-23   82  	int err;
90ac41bd4 Herbert Xu        2014-12-23   83  
3a2c0ba5a Rusty Russell     2014-12-08   84  	BUG_ON(!mutex_is_locked(&rng_mutex));
90ac41bd4 Herbert Xu        2014-12-23   85  
90ac41bd4 Herbert Xu        2014-12-23   86  	err = hwrng_init(rng);
90ac41bd4 Herbert Xu        2014-12-23   87  	if (err)
90ac41bd4 Herbert Xu        2014-12-23   88  		return err;
90ac41bd4 Herbert Xu        2014-12-23   89  
ff77c150f Herbert Xu        2014-12-23   90  	drop_current_rng();
3a2c0ba5a Rusty Russell     2014-12-08   91  	current_rng = rng;
90ac41bd4 Herbert Xu        2014-12-23   92  
90ac41bd4 Herbert Xu        2014-12-23   93  	return 0;
3a2c0ba5a Rusty Russell     2014-12-08   94  }
3a2c0ba5a Rusty Russell     2014-12-08   95  
3a2c0ba5a Rusty Russell     2014-12-08   96  static void drop_current_rng(void)
3a2c0ba5a Rusty Russell     2014-12-08   97  {
3a2c0ba5a Rusty Russell     2014-12-08   98  	BUG_ON(!mutex_is_locked(&rng_mutex));
3a2c0ba5a Rusty Russell     2014-12-08   99  	if (!current_rng)
3a2c0ba5a Rusty Russell     2014-12-08  100  		return;
3a2c0ba5a Rusty Russell     2014-12-08  101  
3a2c0ba5a Rusty Russell     2014-12-08  102  	/* decrease last reference for triggering the cleanup */
3a2c0ba5a Rusty Russell     2014-12-08  103  	kref_put(&current_rng->ref, cleanup_rng);
3a2c0ba5a Rusty Russell     2014-12-08  104  	current_rng = NULL;
3a2c0ba5a Rusty Russell     2014-12-08  105  }
3a2c0ba5a Rusty Russell     2014-12-08  106  
3a2c0ba5a Rusty Russell     2014-12-08  107  /* Returns ERR_PTR(), NULL or refcounted hwrng */
3a2c0ba5a Rusty Russell     2014-12-08  108  static struct hwrng *get_current_rng(void)
3a2c0ba5a Rusty Russell     2014-12-08  109  {
3a2c0ba5a Rusty Russell     2014-12-08  110  	struct hwrng *rng;
3a2c0ba5a Rusty Russell     2014-12-08  111  
3a2c0ba5a Rusty Russell     2014-12-08  112  	if (mutex_lock_interruptible(&rng_mutex))
3a2c0ba5a Rusty Russell     2014-12-08  113  		return ERR_PTR(-ERESTARTSYS);
3a2c0ba5a Rusty Russell     2014-12-08  114  
3a2c0ba5a Rusty Russell     2014-12-08  115  	rng = current_rng;
3a2c0ba5a Rusty Russell     2014-12-08  116  	if (rng)
3a2c0ba5a Rusty Russell     2014-12-08  117  		kref_get(&rng->ref);
3a2c0ba5a Rusty Russell     2014-12-08  118  
3a2c0ba5a Rusty Russell     2014-12-08  119  	mutex_unlock(&rng_mutex);
3a2c0ba5a Rusty Russell     2014-12-08  120  	return rng;
3a2c0ba5a Rusty Russell     2014-12-08  121  }
3a2c0ba5a Rusty Russell     2014-12-08  122  
3a2c0ba5a Rusty Russell     2014-12-08  123  static void put_rng(struct hwrng *rng)
3a2c0ba5a Rusty Russell     2014-12-08  124  {
3a2c0ba5a Rusty Russell     2014-12-08  125  	/*
3a2c0ba5a Rusty Russell     2014-12-08  126  	 * Hold rng_mutex here so we serialize in case they set_current_rng
3a2c0ba5a Rusty Russell     2014-12-08  127  	 * on rng again immediately.
3a2c0ba5a Rusty Russell     2014-12-08  128  	 */
3a2c0ba5a Rusty Russell     2014-12-08  129  	mutex_lock(&rng_mutex);
3a2c0ba5a Rusty Russell     2014-12-08  130  	if (rng)
3a2c0ba5a Rusty Russell     2014-12-08  131  		kref_put(&rng->ref, cleanup_rng);
3a2c0ba5a Rusty Russell     2014-12-08  132  	mutex_unlock(&rng_mutex);
3a2c0ba5a Rusty Russell     2014-12-08  133  }
3a2c0ba5a Rusty Russell     2014-12-08  134  
90ac41bd4 Herbert Xu        2014-12-23  135  static int hwrng_init(struct hwrng *rng)
844dd05fe Michael Buesch    2006-06-26  136  {
15b66cd54 Herbert Xu        2014-12-23  137  	if (kref_get_unless_zero(&rng->ref))
15b66cd54 Herbert Xu        2014-12-23  138  		goto skip_init;
15b66cd54 Herbert Xu        2014-12-23  139  
d3cc79964 Amit Shah         2014-07-10  140  	if (rng->init) {
d3cc79964 Amit Shah         2014-07-10  141  		int ret;
d3cc79964 Amit Shah         2014-07-10  142  
d3cc79964 Amit Shah         2014-07-10  143  		ret =  rng->init(rng);
d3cc79964 Amit Shah         2014-07-10  144  		if (ret)
d3cc79964 Amit Shah         2014-07-10  145  			return ret;
d3cc79964 Amit Shah         2014-07-10  146  	}
15b66cd54 Herbert Xu        2014-12-23  147  
15b66cd54 Herbert Xu        2014-12-23  148  	kref_init(&rng->ref);
15b66cd54 Herbert Xu        2014-12-23  149  	reinit_completion(&rng->cleanup_done);
15b66cd54 Herbert Xu        2014-12-23  150  
15b66cd54 Herbert Xu        2014-12-23  151  skip_init:
d3cc79964 Amit Shah         2014-07-10  152  	add_early_randomness(rng);
be4000bc4 Torsten Duwe      2014-06-14  153  
0f734e6e7 Torsten Duwe      2014-06-14  154  	current_quality = rng->quality ? : default_quality;
506bf0c04 Keith Packard     2015-03-18  155  	if (current_quality > 1024)
506bf0c04 Keith Packard     2015-03-18  156  		current_quality = 1024;
0f734e6e7 Torsten Duwe      2014-06-14  157  
0f734e6e7 Torsten Duwe      2014-06-14  158  	if (current_quality == 0 && hwrng_fill)
0f734e6e7 Torsten Duwe      2014-06-14  159  		kthread_stop(hwrng_fill);
be4000bc4 Torsten Duwe      2014-06-14  160  	if (current_quality > 0 && !hwrng_fill)
be4000bc4 Torsten Duwe      2014-06-14  161  		start_khwrngd();
be4000bc4 Torsten Duwe      2014-06-14  162  
844dd05fe Michael Buesch    2006-06-26  163  	return 0;
844dd05fe Michael Buesch    2006-06-26  164  }
844dd05fe Michael Buesch    2006-06-26  165  
844dd05fe Michael Buesch    2006-06-26  166  static int rng_dev_open(struct inode *inode, struct file *filp)
844dd05fe Michael Buesch    2006-06-26  167  {
844dd05fe Michael Buesch    2006-06-26  168  	/* enforce read-only access to this chrdev */
844dd05fe Michael Buesch    2006-06-26 @169  	if ((filp->f_mode & FMODE_READ) == 0)
844dd05fe Michael Buesch    2006-06-26  170  		return -EINVAL;
844dd05fe Michael Buesch    2006-06-26 @171  	if (filp->f_mode & FMODE_WRITE)
844dd05fe Michael Buesch    2006-06-26  172  		return -EINVAL;
844dd05fe Michael Buesch    2006-06-26  173  	return 0;
844dd05fe Michael Buesch    2006-06-26  174  }
844dd05fe Michael Buesch    2006-06-26  175  
9996508b3 Ian Molton        2009-12-01  176  static inline int rng_get_data(struct hwrng *rng, u8 *buffer, size_t size,
9996508b3 Ian Molton        2009-12-01  177  			int wait) {
9996508b3 Ian Molton        2009-12-01  178  	int present;
9996508b3 Ian Molton        2009-12-01  179  
9372b35e1 Rusty Russell     2014-12-08  180  	BUG_ON(!mutex_is_locked(&reading_mutex));
9996508b3 Ian Molton        2009-12-01  181  	if (rng->read)
9996508b3 Ian Molton        2009-12-01  182  		return rng->read(rng, (void *)buffer, size, wait);
9996508b3 Ian Molton        2009-12-01  183  
9996508b3 Ian Molton        2009-12-01  184  	if (rng->data_present)
9996508b3 Ian Molton        2009-12-01  185  		present = rng->data_present(rng, wait);
9996508b3 Ian Molton        2009-12-01  186  	else
9996508b3 Ian Molton        2009-12-01  187  		present = 1;
9996508b3 Ian Molton        2009-12-01  188  
9996508b3 Ian Molton        2009-12-01  189  	if (present)
9996508b3 Ian Molton        2009-12-01  190  		return rng->data_read(rng, (u32 *)buffer);
9996508b3 Ian Molton        2009-12-01  191  
9996508b3 Ian Molton        2009-12-01  192  	return 0;
9996508b3 Ian Molton        2009-12-01  193  }
9996508b3 Ian Molton        2009-12-01  194  
844dd05fe Michael Buesch    2006-06-26  195  static ssize_t rng_dev_read(struct file *filp, char __user *buf,
844dd05fe Michael Buesch    2006-06-26  196  			    size_t size, loff_t *offp)
844dd05fe Michael Buesch    2006-06-26  197  {
844dd05fe Michael Buesch    2006-06-26  198  	ssize_t ret = 0;
984e976f5 Patrick McHardy   2007-11-21  199  	int err = 0;
9996508b3 Ian Molton        2009-12-01  200  	int bytes_read, len;
3a2c0ba5a Rusty Russell     2014-12-08  201  	struct hwrng *rng;
844dd05fe Michael Buesch    2006-06-26  202  
844dd05fe Michael Buesch    2006-06-26  203  	while (size) {
3a2c0ba5a Rusty Russell     2014-12-08  204  		rng = get_current_rng();
3a2c0ba5a Rusty Russell     2014-12-08  205  		if (IS_ERR(rng)) {
3a2c0ba5a Rusty Russell     2014-12-08  206  			err = PTR_ERR(rng);
844dd05fe Michael Buesch    2006-06-26  207  			goto out;
9996508b3 Ian Molton        2009-12-01  208  		}
3a2c0ba5a Rusty Russell     2014-12-08  209  		if (!rng) {
844dd05fe Michael Buesch    2006-06-26  210  			err = -ENODEV;
3a2c0ba5a Rusty Russell     2014-12-08  211  			goto out;
844dd05fe Michael Buesch    2006-06-26  212  		}
984e976f5 Patrick McHardy   2007-11-21  213  
1ab87298c Jiri Slaby        2015-11-27  214  		if (mutex_lock_interruptible(&reading_mutex)) {
1ab87298c Jiri Slaby        2015-11-27  215  			err = -ERESTARTSYS;
1ab87298c Jiri Slaby        2015-11-27  216  			goto out_put;
1ab87298c Jiri Slaby        2015-11-27  217  		}
9996508b3 Ian Molton        2009-12-01  218  		if (!data_avail) {
3a2c0ba5a Rusty Russell     2014-12-08  219  			bytes_read = rng_get_data(rng, rng_buffer,
f7f154f12 Rusty Russell     2013-03-05  220  				rng_buffer_size(),
9996508b3 Ian Molton        2009-12-01 @221  				!(filp->f_flags & O_NONBLOCK));
893f11286 Ralph Wuerthner   2008-04-17  222  			if (bytes_read < 0) {
893f11286 Ralph Wuerthner   2008-04-17  223  				err = bytes_read;
9372b35e1 Rusty Russell     2014-12-08  224  				goto out_unlock_reading;
9996508b3 Ian Molton        2009-12-01  225  			}
9996508b3 Ian Molton        2009-12-01  226  			data_avail = bytes_read;
893f11286 Ralph Wuerthner   2008-04-17  227  		}
844dd05fe Michael Buesch    2006-06-26  228  
9996508b3 Ian Molton        2009-12-01  229  		if (!data_avail) {
9996508b3 Ian Molton        2009-12-01  230  			if (filp->f_flags & O_NONBLOCK) {
9996508b3 Ian Molton        2009-12-01  231  				err = -EAGAIN;
9372b35e1 Rusty Russell     2014-12-08  232  				goto out_unlock_reading;
9996508b3 Ian Molton        2009-12-01  233  			}
9996508b3 Ian Molton        2009-12-01  234  		} else {
9996508b3 Ian Molton        2009-12-01  235  			len = data_avail;
9996508b3 Ian Molton        2009-12-01  236  			if (len > size)
9996508b3 Ian Molton        2009-12-01  237  				len = size;
9996508b3 Ian Molton        2009-12-01  238  
9996508b3 Ian Molton        2009-12-01  239  			data_avail -= len;
9996508b3 Ian Molton        2009-12-01  240  
9996508b3 Ian Molton        2009-12-01  241  			if (copy_to_user(buf + ret, rng_buffer + data_avail,
9996508b3 Ian Molton        2009-12-01  242  								len)) {
844dd05fe Michael Buesch    2006-06-26  243  				err = -EFAULT;
9372b35e1 Rusty Russell     2014-12-08  244  				goto out_unlock_reading;
9996508b3 Ian Molton        2009-12-01  245  			}
9996508b3 Ian Molton        2009-12-01  246  
9996508b3 Ian Molton        2009-12-01  247  			size -= len;
9996508b3 Ian Molton        2009-12-01  248  			ret += len;
844dd05fe Michael Buesch    2006-06-26  249  		}
844dd05fe Michael Buesch    2006-06-26  250  
9372b35e1 Rusty Russell     2014-12-08  251  		mutex_unlock(&reading_mutex);
3a2c0ba5a Rusty Russell     2014-12-08  252  		put_rng(rng);
9996508b3 Ian Molton        2009-12-01  253  
844dd05fe Michael Buesch    2006-06-26  254  		if (need_resched())
844dd05fe Michael Buesch    2006-06-26  255  			schedule_timeout_interruptible(1);
9996508b3 Ian Molton        2009-12-01  256  
9996508b3 Ian Molton        2009-12-01  257  		if (signal_pending(current)) {
844dd05fe Michael Buesch    2006-06-26  258  			err = -ERESTARTSYS;
844dd05fe Michael Buesch    2006-06-26  259  			goto out;
844dd05fe Michael Buesch    2006-06-26  260  		}
9996508b3 Ian Molton        2009-12-01  261  	}
844dd05fe Michael Buesch    2006-06-26  262  out:
844dd05fe Michael Buesch    2006-06-26  263  	return ret ? : err;
3a2c0ba5a Rusty Russell     2014-12-08  264  
9372b35e1 Rusty Russell     2014-12-08  265  out_unlock_reading:
9372b35e1 Rusty Russell     2014-12-08  266  	mutex_unlock(&reading_mutex);
1ab87298c Jiri Slaby        2015-11-27  267  out_put:
3a2c0ba5a Rusty Russell     2014-12-08  268  	put_rng(rng);
3a2c0ba5a Rusty Russell     2014-12-08  269  	goto out;
844dd05fe Michael Buesch    2006-06-26  270  }
844dd05fe Michael Buesch    2006-06-26  271  
62322d255 Arjan van de Ven  2006-07-03 @272  static const struct file_operations rng_chrdev_ops = {
844dd05fe Michael Buesch    2006-06-26 @273  	.owner		= THIS_MODULE,
844dd05fe Michael Buesch    2006-06-26 @274  	.open		= rng_dev_open,
844dd05fe Michael Buesch    2006-06-26 @275  	.read		= rng_dev_read,
6038f373a Arnd Bergmann     2010-08-15 @276  	.llseek		= noop_llseek,
844dd05fe Michael Buesch    2006-06-26  277  };
844dd05fe Michael Buesch    2006-06-26  278  
0daa7a0af Takashi Iwai      2015-02-02  279  static const struct attribute_group *rng_dev_groups[];

:::::: The code at line 169 was first introduced by commit
:::::: 844dd05fec172d98b0dacecd9b9e9f6595204c13 [PATCH] Add new generic HW RNG core

:::::: TO: Michael Buesch <mb@bu3sch.de>
:::::: CC: Linus Torvalds <torvalds@g5.osdl.org>

---
0-DAY kernel test infrastructure                Open Source Technology Center
https://lists.01.org/pipermail/kbuild-all                   Intel Corporation

[-- Attachment #2: .config.gz --]
[-- Type: application/gzip, Size: 17496 bytes --]

^ permalink raw reply

* Remaining crypto API regressions with CONFIG_VMAP_STACK
From: Eric Biggers @ 2016-12-09 23:08 UTC (permalink / raw)
  To: linux-crypto
  Cc: linux-kernel, linux-mm, kernel-hardening, Herbert Xu,
	Andrew Lutomirski, Stephan Mueller

In the 4.9 kernel, virtually-mapped stacks will be supported and enabled by
default on x86_64.  This has been exposing a number of problems in which
on-stack buffers are being passed into the crypto API, which to support crypto
accelerators operates on 'struct page' rather than on virtual memory.

Some of these problems have already been fixed, but I was wondering how many
problems remain, so I briefly looked through all the callers of sg_set_buf() and
sg_init_one().  Overall I found quite a few remaining problems, detailed below.

The following crypto drivers initialize a scatterlist to point into an
ahash_request, which may have been allocated on the stack with
AHASH_REQUEST_ON_STACK():

	drivers/crypto/bfin_crc.c:351
	drivers/crypto/qce/sha.c:299
	drivers/crypto/sahara.c:973,988
	drivers/crypto/talitos.c:1910
	drivers/crypto/ccp/ccp-crypto-aes-cmac.c:105,119,142
	drivers/crypto/ccp/ccp-crypto-sha.c:95,109,124
	drivers/crypto/qce/sha.c:325

The following crypto drivers initialize a scatterlist to point into an
ablkcipher_request, which may have been allocated on the stack with
SKCIPHER_REQUEST_ON_STACK():

	drivers/crypto/ccp/ccp-crypto-aes-xts.c:162
	drivers/crypto/ccp/ccp-crypto-aes.c:94

And these other places do crypto operations on buffers clearly on the stack:

	drivers/net/wireless/intersil/orinoco/mic.c:72
	drivers/usb/wusbcore/crypto.c:264
	net/ceph/crypto.c:182
	net/rxrpc/rxkad.c:737,1000
	security/keys/encrypted-keys/encrypted.c:500
	fs/cifs/smbencrypt.c:96

Note: I almost certainly missed some, since I excluded places where the use of a
stack buffer was not obvious to me.  I also excluded AEAD algorithms since there
isn't an AEAD_REQUEST_ON_STACK() macro (yet).

The "good" news with these bugs is that on x86_64 without CONFIG_DEBUG_SG=y or
CONFIG_DEBUG_VIRTUAL=y, you can still do virt_to_page() and then page_address()
on a vmalloc address and get back the same address, even though you aren't
*supposed* to be able to do this.  This will make things still work for most
people.  The bad news is that if you happen to have consumed just about 1 page
(or N pages) of your stack at the time you call the crypto API, your stack
buffer may actually span physically non-contiguous pages, so the crypto
algorithm will scribble over some unrelated page.  Also, hardware crypto drivers
which actually do operate on physical memory will break too.

So I am wondering: is the best solution really to make all these crypto API
algorithms and users use heap buffers, as opposed to something like maintaining
a lowmem alias for the stack, or introducing a more general function to convert
buffers (possibly in the vmalloc space) into scatterlists?  And if the current
solution is desired, who is going to fix all of these bugs and when?

Eric

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [PATCH v6 1/2] sparc: fix a building error reported by kbuild
From: Sam Ravnborg @ 2016-12-09 21:58 UTC (permalink / raw)
  To: Gonglei
  Cc: linux-kernel, qemu-devel, virtio-dev, virtualization,
	linux-crypto, luonengjun, mst, stefanha, weidong.huang, wu.wubin,
	xin.zeng, claudio.fontana, herbert, pasic, davem, jianjay.zhou,
	hanweidong, arei.gonglei, cornelia.huck, xuquan8, longpeng2,
	wanzongshun, sparclinux
In-Reply-To: <1481171829-116496-2-git-send-email-arei.gonglei@huawei.com>

Hi Gonglei.

On Thu, Dec 08, 2016 at 12:37:08PM +0800, Gonglei wrote:
> >> arch/sparc/include/asm/topology_64.h:44:44:
> error: implicit declaration of function 'cpu_data'
> [-Werror=implicit-function-declaration]
> 
>  #define topology_physical_package_id(cpu) (cpu_data(cpu).proc_id)
>                                                ^
> Let's include cpudata.h in topology_64.h.
> 
> Cc: Sam Ravnborg <sam@ravnborg.org>
> Cc: David S. Miller <davem@davemloft.net>
> Cc: sparclinux@vger.kernel.org
> Suggested-by: Sam Ravnborg <sam@ravnborg.org>
> Signed-off-by: Gonglei <arei.gonglei@huawei.com>
Acked-by: Sam Ravnborg <sam@ravnborg.org>

> ---
>  arch/sparc/include/asm/topology_64.h | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/arch/sparc/include/asm/topology_64.h b/arch/sparc/include/asm/topology_64.h
> index 7b4898a..2255430 100644
> --- a/arch/sparc/include/asm/topology_64.h
> +++ b/arch/sparc/include/asm/topology_64.h
> @@ -4,6 +4,7 @@
>  #ifdef CONFIG_NUMA
>  
>  #include <asm/mmzone.h>
> +#include <asm/cpudata.h>

Nitpick - if you are going to resend this patch, then please
order the two includes in alphabetic order.

For two includes this looks like bikeshedding, but when we add
more having them in a defined arder prevents merge conflicts.
And makes it readable too.

We also sometimes order the includes with the longest lines topmost,
and lines with the ame length are ordered alphabetically.
But this is not seen so often.

	Sam

^ permalink raw reply

* Re: [PATCH v2 1/3] crypto: brcm: DT documentation for Broadcom SPU driver
From: Rob Herring @ 2016-12-09 21:26 UTC (permalink / raw)
  To: Rob Rice
  Cc: Herbert Xu, David S. Miller, Mark Rutland,
	linux-crypto-u79uwXL29TY76Z2rM5mHXA,
	devicetree-u79uwXL29TY76Z2rM5mHXA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA, Ray Jui, Scott Branden,
	Jon Mason, bcm-kernel-feedback-list-dY08KVG/lbpWk0Htik3J/w,
	Catalin Marinas, Will Deacon,
	linux-arm-kernel-IAPFreCvJWM7uuMidbF8XUB+6BGkLq7r, Steve Lin
In-Reply-To: <1480714499-1476-2-git-send-email-rob.rice-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>

On Fri, Dec 02, 2016 at 04:34:57PM -0500, Rob Rice wrote:
> Device tree documentation for Broadcom Secure Processing Unit
> (SPU) crypto driver.
> 
> Signed-off-by: Steve Lin <steven.lin1-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> Signed-off-by: Rob Rice <rob.rice-dY08KVG/lbpWk0Htik3J/w@public.gmane.org>
> ---
>  .../devicetree/bindings/crypto/brcm,spu-crypto.txt | 25 ++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/crypto/brcm,spu-crypto.txt
> 
> diff --git a/Documentation/devicetree/bindings/crypto/brcm,spu-crypto.txt b/Documentation/devicetree/bindings/crypto/brcm,spu-crypto.txt
> new file mode 100644
> index 0000000..e5fe942
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/crypto/brcm,spu-crypto.txt
> @@ -0,0 +1,25 @@
> +The Broadcom Secure Processing Unit (SPU) driver supports symmetric

Bindings describe h/w, not drivers.

> +cryptographic offload for Broadcom SoCs with SPU hardware. A SoC may have
> +multiple SPU hardware blocks.
> +
> +Required properties:
> +- compatible : Should be "brcm,spum-crypto" for devices with SPU-M hardware

Additionally, you should have SoC specific compatible here.

> +  (e.g., Northstar2) or "brcm,spum-nsp-crypto" for the Northstar Plus variant
> +  of the SPU-M hardware.
> +
> +- reg: Should contain SPU registers location and length.
> +- mboxes: A list of mailbox channels to be used by the kernel driver. Mailbox
> +channels correspond to DMA rings on the device.

What determines the mbox assignment?

Needs to specify how many.

> +
> +Example:
> +	spu-crypto@612d0000 {

Just crypto@...

Rob
--
To unsubscribe from this list: send the line "unsubscribe devicetree" in
the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply

* Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
From: Michael S. Tsirkin @ 2016-12-09 20:45 UTC (permalink / raw)
  To: Bart Van Assche
  Cc: Madhani, Himanshu, kvm@vger.kernel.org, Neil Armstrong,
	David Airlie, linux-remoteproc@vger.kernel.org,
	dri-devel@lists.freedesktop.org,
	virtualization@lists.linux-foundation.org,
	linux-s390@vger.kernel.org, James E.J. Bottomley, Herbert Xu,
	linux-scsi@vger.kernel.org, Christoph Hellwig,
	v9fs-developer@lists.sourceforge.net, Asias He,
	Arnd Bergmann <ar
In-Reply-To: <BLUPR02MB168371E06EFA9AB34AA2C58181870@BLUPR02MB1683.namprd02.prod.outlook.com>

On Fri, Dec 09, 2016 at 03:18:02PM +0000, Bart Van Assche wrote:
> On 12/08/16 22:40, Madhani, Himanshu wrote:
> > We’ll take a look and send patches to resolve these warnings.
> 
> Thanks!
> 
> Bart.
> 

Sounds good. I posted what I have so far so that you can
start from that.

-- 
MST

^ permalink raw reply

* [PATCH] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-09 18:36 UTC (permalink / raw)
  To: linux-kernel, kernel-hardening, linux-crypto, rusty, torvalds
  Cc: Jason A. Donenfeld, Jean-Philippe Aumasson, Daniel J . Bernstein

SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function.

SipHash isn't just some new trendy hash function. It's been around for a
while, and there really isn't anything that comes remotely close to
being useful in the way SipHash is. With that said, why do we need this?

There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector.

Linux developers already seem to be aware that this is an issue, and
various places that use hash tables in, say, a network context, use a
non-cryptographically secure function (usually jhash) and then try to
twiddle with the key on a time basis (or in many cases just do nothing
and hope that nobody notices). While this is an admirable attempt at
solving the problem, it doesn't actually fix it. SipHash fixes it.

(It fixes it in such a sound way that you could even build a stream
cipher out of SipHash that would resist the modern cryptanalysis.)

There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.

Dozens of languages are already using this internally for their hash
tables. Some of the BSDs already use this in their kernels. SipHash is
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.

Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Daniel J. Bernstein <djb@cr.yp.to>
---
 include/linux/siphash.h |  18 ++++++
 lib/Makefile            |   3 +-
 lib/siphash.c           | 163 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 183 insertions(+), 1 deletion(-)
 create mode 100644 include/linux/siphash.h
 create mode 100644 lib/siphash.c

diff --git a/include/linux/siphash.h b/include/linux/siphash.h
new file mode 100644
index 000000000000..485c2101cc7d
--- /dev/null
+++ b/include/linux/siphash.h
@@ -0,0 +1,18 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#ifndef _LINUX_SIPHASH_H
+#define _LINUX_SIPHASH_H
+
+#include <linux/types.h>
+
+enum siphash24_lengths {
+	SIPHASH24_KEY_LEN = 16
+};
+
+uint64_t siphash24(const uint8_t *data, size_t len, const uint8_t key[SIPHASH24_KEY_LEN]);
+
+#endif /* _LINUX_SIPHASH_H */
diff --git a/lib/Makefile b/lib/Makefile
index 50144a3aeebd..d224337b0d01 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
 	 sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
 	 flex_proportions.o ratelimit.o show_mem.o \
 	 is_single_threaded.o plist.o decompress.o kobject_uevent.o \
-	 earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o
+	 earlycpio.o seq_buf.o siphash.o \
+	 nmi_backtrace.o nodemask.o win_minmax.o
 
 lib-$(CONFIG_MMU) += ioremap.o
 lib-$(CONFIG_SMP) += cpumask.o
diff --git a/lib/siphash.c b/lib/siphash.c
new file mode 100644
index 000000000000..022d86f04b9b
--- /dev/null
+++ b/lib/siphash.c
@@ -0,0 +1,163 @@
+/* Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ * Copyright (C) 2012-2014 Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
+ * Copyright (C) 2012-2014 Daniel J. Bernstein <djb@cr.yp.to>
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+
+#define ROTL(x,b) (uint64_t)(((x) << (b)) | ((x) >> (64 - (b))))
+#define U8TO64(p) le64_to_cpu(*(__le64 *)(p))
+
+#define SIPROUND \
+	do { \
+	v0 += v1; v1 = ROTL(v1, 13); v1 ^= v0; v0 = ROTL(v0, 32); \
+	v2 += v3; v3 = ROTL(v3, 16); v3 ^= v2; \
+	v0 += v3; v3 = ROTL(v3, 21); v3 ^= v0; \
+	v2 += v1; v1 = ROTL(v1, 17); v1 ^= v2; v2 = ROTL(v2, 32); \
+	} while(0)
+
+__attribute__((optimize("unroll-loops")))
+uint64_t siphash24(const uint8_t *data, size_t len, const uint8_t key[SIPHASH24_KEY_LEN])
+{
+	uint64_t v0 = 0x736f6d6570736575ULL;
+	uint64_t v1 = 0x646f72616e646f6dULL;
+	uint64_t v2 = 0x6c7967656e657261ULL;
+	uint64_t v3 = 0x7465646279746573ULL;
+	uint64_t b;
+	uint64_t k0 = U8TO64(key);
+	uint64_t k1 = U8TO64(key + sizeof(uint64_t));
+	uint64_t m;
+	const uint8_t *end = data + len - (len % sizeof(uint64_t));
+	const uint8_t left = len & (sizeof(uint64_t) - 1);
+	b = ((uint64_t)len) << 56;
+	v3 ^= k1;
+	v2 ^= k0;
+	v1 ^= k1;
+	v0 ^= k0;
+	for (; data != end; data += sizeof(uint64_t)) {
+		m = U8TO64(data);
+		v3 ^= m;
+		SIPROUND;
+		SIPROUND;
+		v0 ^= m;
+	}
+	switch (left) {
+		case 7: b |= ((uint64_t)data[6]) << 48;
+		case 6: b |= ((uint64_t)data[5]) << 40;
+		case 5: b |= ((uint64_t)data[4]) << 32;
+		case 4: b |= ((uint64_t)data[3]) << 24;
+		case 3: b |= ((uint64_t)data[2]) << 16;
+		case 2: b |= ((uint64_t)data[1]) <<  8;
+		case 1: b |= ((uint64_t)data[0]); break;
+		case 0: break;
+	}
+	v3 ^= b;
+	SIPROUND;
+	SIPROUND;
+	v0 ^= b;
+	v2 ^= 0xff;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	SIPROUND;
+	b = (v0 ^ v1) ^ (v2 ^ v3);
+	return (__force uint64_t)cpu_to_le64(b);
+}
+EXPORT_SYMBOL(siphash24);
+
+#ifdef DEBUG
+static const uint8_t test_vectors[64][8] = {
+	{ 0x31, 0x0e, 0x0e, 0xdd, 0x47, 0xdb, 0x6f, 0x72 },
+	{ 0xfd, 0x67, 0xdc, 0x93, 0xc5, 0x39, 0xf8, 0x74 },
+	{ 0x5a, 0x4f, 0xa9, 0xd9, 0x09, 0x80, 0x6c, 0x0d },
+	{ 0x2d, 0x7e, 0xfb, 0xd7, 0x96, 0x66, 0x67, 0x85 },
+	{ 0xb7, 0x87, 0x71, 0x27, 0xe0, 0x94, 0x27, 0xcf },
+	{ 0x8d, 0xa6, 0x99, 0xcd, 0x64, 0x55, 0x76, 0x18 },
+	{ 0xce, 0xe3, 0xfe, 0x58, 0x6e, 0x46, 0xc9, 0xcb },
+	{ 0x37, 0xd1, 0x01, 0x8b, 0xf5, 0x00, 0x02, 0xab },
+	{ 0x62, 0x24, 0x93, 0x9a, 0x79, 0xf5, 0xf5, 0x93 },
+	{ 0xb0, 0xe4, 0xa9, 0x0b, 0xdf, 0x82, 0x00, 0x9e },
+	{ 0xf3, 0xb9, 0xdd, 0x94, 0xc5, 0xbb, 0x5d, 0x7a },
+	{ 0xa7, 0xad, 0x6b, 0x22, 0x46, 0x2f, 0xb3, 0xf4 },
+	{ 0xfb, 0xe5, 0x0e, 0x86, 0xbc, 0x8f, 0x1e, 0x75 },
+	{ 0x90, 0x3d, 0x84, 0xc0, 0x27, 0x56, 0xea, 0x14 },
+	{ 0xee, 0xf2, 0x7a, 0x8e, 0x90, 0xca, 0x23, 0xf7 },
+	{ 0xe5, 0x45, 0xbe, 0x49, 0x61, 0xca, 0x29, 0xa1 },
+	{ 0xdb, 0x9b, 0xc2, 0x57, 0x7f, 0xcc, 0x2a, 0x3f },
+	{ 0x94, 0x47, 0xbe, 0x2c, 0xf5, 0xe9, 0x9a, 0x69 },
+	{ 0x9c, 0xd3, 0x8d, 0x96, 0xf0, 0xb3, 0xc1, 0x4b },
+	{ 0xbd, 0x61, 0x79, 0xa7, 0x1d, 0xc9, 0x6d, 0xbb },
+	{ 0x98, 0xee, 0xa2, 0x1a, 0xf2, 0x5c, 0xd6, 0xbe },
+	{ 0xc7, 0x67, 0x3b, 0x2e, 0xb0, 0xcb, 0xf2, 0xd0 },
+	{ 0x88, 0x3e, 0xa3, 0xe3, 0x95, 0x67, 0x53, 0x93 },
+	{ 0xc8, 0xce, 0x5c, 0xcd, 0x8c, 0x03, 0x0c, 0xa8 },
+	{ 0x94, 0xaf, 0x49, 0xf6, 0xc6, 0x50, 0xad, 0xb8 },
+	{ 0xea, 0xb8, 0x85, 0x8a, 0xde, 0x92, 0xe1, 0xbc },
+	{ 0xf3, 0x15, 0xbb, 0x5b, 0xb8, 0x35, 0xd8, 0x17 },
+	{ 0xad, 0xcf, 0x6b, 0x07, 0x63, 0x61, 0x2e, 0x2f },
+	{ 0xa5, 0xc9, 0x1d, 0xa7, 0xac, 0xaa, 0x4d, 0xde },
+	{ 0x71, 0x65, 0x95, 0x87, 0x66, 0x50, 0xa2, 0xa6 },
+	{ 0x28, 0xef, 0x49, 0x5c, 0x53, 0xa3, 0x87, 0xad },
+	{ 0x42, 0xc3, 0x41, 0xd8, 0xfa, 0x92, 0xd8, 0x32 },
+	{ 0xce, 0x7c, 0xf2, 0x72, 0x2f, 0x51, 0x27, 0x71 },
+	{ 0xe3, 0x78, 0x59, 0xf9, 0x46, 0x23, 0xf3, 0xa7 },
+	{ 0x38, 0x12, 0x05, 0xbb, 0x1a, 0xb0, 0xe0, 0x12 },
+	{ 0xae, 0x97, 0xa1, 0x0f, 0xd4, 0x34, 0xe0, 0x15 },
+	{ 0xb4, 0xa3, 0x15, 0x08, 0xbe, 0xff, 0x4d, 0x31 },
+	{ 0x81, 0x39, 0x62, 0x29, 0xf0, 0x90, 0x79, 0x02 },
+	{ 0x4d, 0x0c, 0xf4, 0x9e, 0xe5, 0xd4, 0xdc, 0xca },
+	{ 0x5c, 0x73, 0x33, 0x6a, 0x76, 0xd8, 0xbf, 0x9a },
+	{ 0xd0, 0xa7, 0x04, 0x53, 0x6b, 0xa9, 0x3e, 0x0e },
+	{ 0x92, 0x59, 0x58, 0xfc, 0xd6, 0x42, 0x0c, 0xad },
+	{ 0xa9, 0x15, 0xc2, 0x9b, 0xc8, 0x06, 0x73, 0x18 },
+	{ 0x95, 0x2b, 0x79, 0xf3, 0xbc, 0x0a, 0xa6, 0xd4 },
+	{ 0xf2, 0x1d, 0xf2, 0xe4, 0x1d, 0x45, 0x35, 0xf9 },
+	{ 0x87, 0x57, 0x75, 0x19, 0x04, 0x8f, 0x53, 0xa9 },
+	{ 0x10, 0xa5, 0x6c, 0xf5, 0xdf, 0xcd, 0x9a, 0xdb },
+	{ 0xeb, 0x75, 0x09, 0x5c, 0xcd, 0x98, 0x6c, 0xd0 },
+	{ 0x51, 0xa9, 0xcb, 0x9e, 0xcb, 0xa3, 0x12, 0xe6 },
+	{ 0x96, 0xaf, 0xad, 0xfc, 0x2c, 0xe6, 0x66, 0xc7 },
+	{ 0x72, 0xfe, 0x52, 0x97, 0x5a, 0x43, 0x64, 0xee },
+	{ 0x5a, 0x16, 0x45, 0xb2, 0x76, 0xd5, 0x92, 0xa1 },
+	{ 0xb2, 0x74, 0xcb, 0x8e, 0xbf, 0x87, 0x87, 0x0a },
+	{ 0x6f, 0x9b, 0xb4, 0x20, 0x3d, 0xe7, 0xb3, 0x81 },
+	{ 0xea, 0xec, 0xb2, 0xa3, 0x0b, 0x22, 0xa8, 0x7f },
+	{ 0x99, 0x24, 0xa4, 0x3c, 0xc1, 0x31, 0x57, 0x24 },
+	{ 0xbd, 0x83, 0x8d, 0x3a, 0xaf, 0xbf, 0x8d, 0xb7 },
+	{ 0x0b, 0x1a, 0x2a, 0x32, 0x65, 0xd5, 0x1a, 0xea },
+	{ 0x13, 0x50, 0x79, 0xa3, 0x23, 0x1c, 0xe6, 0x60 },
+	{ 0x93, 0x2b, 0x28, 0x46, 0xe4, 0xd7, 0x06, 0x66 },
+	{ 0xe1, 0x91, 0x5f, 0x5c, 0xb1, 0xec, 0xa4, 0x6c },
+	{ 0xf3, 0x25, 0x96, 0x5c, 0xa1, 0x6d, 0x62, 0x9f },
+	{ 0x57, 0x5f, 0xf2, 0x8e, 0x60, 0x38, 0x1b, 0xe5 },
+	{ 0x72, 0x45, 0x06, 0xeb, 0x4c, 0x32, 0x8a, 0x95 }
+};
+
+static int siphash24_selftest(void)
+{
+	uint8_t in[64], k[16], i;
+	uint64_t out;
+	int ret = 0;
+
+	for (i = 0; i < 16; ++i)
+		k[i] = i;
+
+	for (i = 0; i < 64; ++i) {
+		in[i] = i;
+		out = siphash24(in, i, k);
+		if (memcmp(&out, test_vectors[i], 8)) {
+			printk(KERN_INFO "siphash24: self-test %u: FAIL\n", i + 1);
+			ret = -1;
+		}
+	}
+	if (!ret)
+		printk(KERN_INFO "siphash24: self-tests: pass\n");
+	return ret;
+}
+__initcall(siphash24_selftest);
+#endif
-- 
2.11.0

^ permalink raw reply related

* Re: [PATCH 7/7] hwrng: core: Remove two unused include
From: Corentin Labbe @ 2016-12-09 18:24 UTC (permalink / raw)
  To: mpm, herbert, arnd, gregkh; +Cc: linux-crypto, linux-kernel
In-Reply-To: <1481293299-21697-7-git-send-email-clabbe.montjoie@gmail.com>

On Fri, Dec 09, 2016 at 03:21:39PM +0100, Corentin Labbe wrote:
> linux/fs.h and linux/sched.h are useless for hw_random/core.c.
> This patch remove them.
> 
> Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
> ---
>  drivers/char/hw_random/core.c | 2 --
>  1 file changed, 2 deletions(-)
> 
> diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
> index 5c654b5..85c9ab3 100644
> --- a/drivers/char/hw_random/core.c
> +++ b/drivers/char/hw_random/core.c
> @@ -13,14 +13,12 @@
>  #include <linux/delay.h>
>  #include <linux/device.h>
>  #include <linux/err.h>
> -#include <linux/fs.h>
>  #include <linux/hw_random.h>
>  #include <linux/kernel.h>
>  #include <linux/kthread.h>
>  #include <linux/miscdevice.h>
>  #include <linux/module.h>
>  #include <linux/random.h>
> -#include <linux/sched.h>
>  #include <linux/slab.h>
>  #include <linux/uaccess.h>
>  
> -- 
> 2.7.3
> 

Sorry forget this patch, it is buggy.
linux/fs.h is needed.

Regards

^ permalink raw reply

* Fri, 09 Dec 2016 16:11:33 -0000.Dear linux-crypto Your Electricity Bill.7512902343099$$$!!!16141866278
From: barbara.polaszek @ 2016-12-09 16:11 UTC (permalink / raw)
  To: linux-crypto

[-- Attachment #1: URGENT_805395950816022_linux-crypto.zip --]
[-- Type: application/zip, Size: 4735 bytes --]

^ permalink raw reply

* Re: [PATCH] linux/types.h: enable endian checks for all sparse builds
From: Bart Van Assche @ 2016-12-09 15:18 UTC (permalink / raw)
  To: Madhani, Himanshu, Michael S. Tsirkin
  Cc: kvm@vger.kernel.org, Neil Armstrong, David Airlie,
	linux-remoteproc@vger.kernel.org, dri-devel@lists.freedesktop.org,
	virtualization@lists.linux-foundation.org,
	linux-s390@vger.kernel.org, James E.J. Bottomley, Herbert Xu,
	linux-scsi@vger.kernel.org, Christoph Hellwig,
	v9fs-developer@lists.sourceforge.net, Asias He, Arnd Bergmann,
	linux-kbuild@vger.kernel.org, Jens Axboe, Michal Marek,
	Stefan Hajnoczi <stef
In-Reply-To: <6199215E-2AA4-4705-9552-5D61FE03F866@cavium.com>

On 12/08/16 22:40, Madhani, Himanshu wrote:
> We’ll take a look and send patches to resolve these warnings.

Thanks!

Bart.

^ permalink raw reply

* [PATCH v2 3/3] crypto: arm/chacha20 - implement NEON version based on SSE3 code
From: Ard Biesheuvel @ 2016-12-09 14:33 UTC (permalink / raw)
  To: linux-crypto, herbert; +Cc: linux-arm-kernel, Ard Biesheuvel
In-Reply-To: <1481294033-23508-1-git-send-email-ard.biesheuvel@linaro.org>

This is a straight port to ARM/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm/crypto/Kconfig              |   6 +
 arch/arm/crypto/Makefile             |   2 +
 arch/arm/crypto/chacha20-neon-core.S | 524 ++++++++++++++++++++
 arch/arm/crypto/chacha20-neon-glue.c | 127 +++++
 4 files changed, 659 insertions(+)

diff --git a/arch/arm/crypto/Kconfig b/arch/arm/crypto/Kconfig
index 13f1b4c289d4..2f3339f015d3 100644
--- a/arch/arm/crypto/Kconfig
+++ b/arch/arm/crypto/Kconfig
@@ -130,4 +130,10 @@ config CRYPTO_CRC32_ARM_CE
 	depends on KERNEL_MODE_NEON && CRC32
 	select CRYPTO_HASH
 
+config CRYPTO_CHACHA20_NEON
+	tristate "NEON accelerated ChaCha20 symmetric cipher"
+	depends on KERNEL_MODE_NEON
+	select CRYPTO_BLKCIPHER
+	select CRYPTO_CHACHA20
+
 endif
diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index b578a1820ab1..8d74e55eacd4 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -8,6 +8,7 @@ obj-$(CONFIG_CRYPTO_SHA1_ARM) += sha1-arm.o
 obj-$(CONFIG_CRYPTO_SHA1_ARM_NEON) += sha1-arm-neon.o
 obj-$(CONFIG_CRYPTO_SHA256_ARM) += sha256-arm.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM) += sha512-arm.o
+obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
 
 ce-obj-$(CONFIG_CRYPTO_AES_ARM_CE) += aes-arm-ce.o
 ce-obj-$(CONFIG_CRYPTO_SHA1_ARM_CE) += sha1-arm-ce.o
@@ -40,6 +41,7 @@ aes-arm-ce-y	:= aes-ce-core.o aes-ce-glue.o
 ghash-arm-ce-y	:= ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y	:= crct10dif-ce-core.o crct10dif-ce-glue.o
 crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
+chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
 
 quiet_cmd_perl = PERL    $@
       cmd_perl = $(PERL) $(<) > $(@)
diff --git a/arch/arm/crypto/chacha20-neon-core.S b/arch/arm/crypto/chacha20-neon-core.S
new file mode 100644
index 000000000000..ff1d337bdb4a
--- /dev/null
+++ b/arch/arm/crypto/chacha20-neon-core.S
@@ -0,0 +1,524 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSE3 functions
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/linkage.h>
+
+	.text
+	.fpu		neon
+	.align		5
+
+ENTRY(chacha20_block_xor_neon)
+	// r0: Input state matrix, s
+	// r1: 1 data block output, o
+	// r2: 1 data block input, i
+
+	//
+	// This function encrypts one ChaCha20 block by loading the state matrix
+	// in four NEON registers. It performs matrix operation on four words in
+	// parallel, but requireds shuffling to rearrange the words after each
+	// round.
+	//
+
+	// x0..3 = s0..3
+	add		ip, r0, #0x20
+	vld1.32		{q0-q1}, [r0]
+	vld1.32		{q2-q3}, [ip]
+
+	vmov		q8, q0
+	vmov		q9, q1
+	vmov		q10, q2
+	vmov		q11, q3
+
+	mov		r3, #10
+
+.Ldoubleround:
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+	vadd.i32	q0, q0, q1
+	veor		q4, q3, q0
+	vshl.u32	q3, q4, #16
+	vsri.u32	q3, q4, #16
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+	vadd.i32	q2, q2, q3
+	veor		q4, q1, q2
+	vshl.u32	q1, q4, #12
+	vsri.u32	q1, q4, #20
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+	vadd.i32	q0, q0, q1
+	veor		q4, q3, q0
+	vshl.u32	q3, q4, #8
+	vsri.u32	q3, q4, #24
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+	vadd.i32	q2, q2, q3
+	veor		q4, q1, q2
+	vshl.u32	q1, q4, #7
+	vsri.u32	q1, q4, #25
+
+	// x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+	vext.8		q1, q1, q1, #4
+	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+	vext.8		q2, q2, q2, #8
+	// x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+	vext.8		q3, q3, q3, #12
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+	vadd.i32	q0, q0, q1
+	veor		q4, q3, q0
+	vshl.u32	q3, q4, #16
+	vsri.u32	q3, q4, #16
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+	vadd.i32	q2, q2, q3
+	veor		q4, q1, q2
+	vshl.u32	q1, q4, #12
+	vsri.u32	q1, q4, #20
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+	vadd.i32	q0, q0, q1
+	veor		q4, q3, q0
+	vshl.u32	q3, q4, #8
+	vsri.u32	q3, q4, #24
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+	vadd.i32	q2, q2, q3
+	veor		q4, q1, q2
+	vshl.u32	q1, q4, #7
+	vsri.u32	q1, q4, #25
+
+	// x1 = shuffle32(x1, MASK(2, 1, 0, 3))
+	vext.8		q1, q1, q1, #12
+	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+	vext.8		q2, q2, q2, #8
+	// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
+	vext.8		q3, q3, q3, #4
+
+	subs		r3, r3, #1
+	bne		.Ldoubleround
+
+	add		ip, r2, #0x20
+	vld1.8		{q4-q5}, [r2]
+	vld1.8		{q6-q7}, [ip]
+
+	// o0 = i0 ^ (x0 + s0)
+	vadd.i32	q0, q0, q8
+	veor		q0, q0, q4
+
+	// o1 = i1 ^ (x1 + s1)
+	vadd.i32	q1, q1, q9
+	veor		q1, q1, q5
+
+	// o2 = i2 ^ (x2 + s2)
+	vadd.i32	q2, q2, q10
+	veor		q2, q2, q6
+
+	// o3 = i3 ^ (x3 + s3)
+	vadd.i32	q3, q3, q11
+	veor		q3, q3, q7
+
+	add		ip, r1, #0x20
+	vst1.8		{q0-q1}, [r1]
+	vst1.8		{q2-q3}, [ip]
+
+	bx		lr
+ENDPROC(chacha20_block_xor_neon)
+
+	.align		5
+ENTRY(chacha20_4block_xor_neon)
+	push		{r4-r6, lr}
+	mov		ip, sp			// preserve the stack pointer
+	sub		r3, sp, #0x20		// allocate a 32 byte buffer
+	bic		r3, r3, #0x1f		// aligned to 32 bytes
+	mov		sp, r3
+
+	// r0: Input state matrix, s
+	// r1: 4 data blocks output, o
+	// r2: 4 data blocks input, i
+
+	//
+	// This function encrypts four consecutive ChaCha20 blocks by loading
+	// the state matrix in NEON registers four times. The algorithm performs
+	// each operation on the corresponding word of each state matrix, hence
+	// requires no word shuffling. For final XORing step we transpose the
+	// matrix by interleaving 32- and then 64-bit words, which allows us to
+	// do XOR in NEON registers.
+	//
+
+	// x0..15[0-3] = s0..3[0..3]
+	add		r3, r0, #0x20
+	vld1.32		{q0-q1}, [r0]
+	vld1.32		{q2-q3}, [r3]
+
+	adr		r3, CTRINC
+	vdup.32		q15, d7[1]
+	vdup.32		q14, d7[0]
+	vld1.32		{q11}, [r3, :128]
+	vdup.32		q13, d6[1]
+	vdup.32		q12, d6[0]
+	vadd.i32	q12, q12, q11		// x12 += counter values 0-3
+	vdup.32		q11, d5[1]
+	vdup.32		q10, d5[0]
+	vdup.32		q9, d4[1]
+	vdup.32		q8, d4[0]
+	vdup.32		q7, d3[1]
+	vdup.32		q6, d3[0]
+	vdup.32		q5, d2[1]
+	vdup.32		q4, d2[0]
+	vdup.32		q3, d1[1]
+	vdup.32		q2, d1[0]
+	vdup.32		q1, d0[1]
+	vdup.32		q0, d0[0]
+
+	mov		r3, #10
+
+.Ldoubleround4:
+	// x0 += x4, x12 = rotl32(x12 ^ x0, 16)
+	// x1 += x5, x13 = rotl32(x13 ^ x1, 16)
+	// x2 += x6, x14 = rotl32(x14 ^ x2, 16)
+	// x3 += x7, x15 = rotl32(x15 ^ x3, 16)
+	vadd.i32	q0, q0, q4
+	vadd.i32	q1, q1, q5
+	vadd.i32	q2, q2, q6
+	vadd.i32	q3, q3, q7
+
+	veor		q12, q12, q0
+	veor		q13, q13, q1
+	veor		q14, q14, q2
+	veor		q15, q15, q3
+
+	vrev32.16	q12, q12
+	vrev32.16	q13, q13
+	vrev32.16	q14, q14
+	vrev32.16	q15, q15
+
+	// x8 += x12, x4 = rotl32(x4 ^ x8, 12)
+	// x9 += x13, x5 = rotl32(x5 ^ x9, 12)
+	// x10 += x14, x6 = rotl32(x6 ^ x10, 12)
+	// x11 += x15, x7 = rotl32(x7 ^ x11, 12)
+	vadd.i32	q8, q8, q12
+	vadd.i32	q9, q9, q13
+	vadd.i32	q10, q10, q14
+	vadd.i32	q11, q11, q15
+
+	vst1.32		{q8-q9}, [sp, :256]
+
+	veor		q8, q4, q8
+	veor		q9, q5, q9
+	vshl.u32	q4, q8, #12
+	vshl.u32	q5, q9, #12
+	vsri.u32	q4, q8, #20
+	vsri.u32	q5, q9, #20
+
+	veor		q8, q6, q10
+	veor		q9, q7, q11
+	vshl.u32	q6, q8, #12
+	vshl.u32	q7, q9, #12
+	vsri.u32	q6, q8, #20
+	vsri.u32	q7, q9, #20
+
+	// x0 += x4, x12 = rotl32(x12 ^ x0, 8)
+	// x1 += x5, x13 = rotl32(x13 ^ x1, 8)
+	// x2 += x6, x14 = rotl32(x14 ^ x2, 8)
+	// x3 += x7, x15 = rotl32(x15 ^ x3, 8)
+	vadd.i32	q0, q0, q4
+	vadd.i32	q1, q1, q5
+	vadd.i32	q2, q2, q6
+	vadd.i32	q3, q3, q7
+
+	veor		q8, q12, q0
+	veor		q9, q13, q1
+	vshl.u32	q12, q8, #8
+	vshl.u32	q13, q9, #8
+	vsri.u32	q12, q8, #24
+	vsri.u32	q13, q9, #24
+
+	veor		q8, q14, q2
+	veor		q9, q15, q3
+	vshl.u32	q14, q8, #8
+	vshl.u32	q15, q9, #8
+	vsri.u32	q14, q8, #24
+	vsri.u32	q15, q9, #24
+
+	vld1.32		{q8-q9}, [sp, :256]
+
+	// x8 += x12, x4 = rotl32(x4 ^ x8, 7)
+	// x9 += x13, x5 = rotl32(x5 ^ x9, 7)
+	// x10 += x14, x6 = rotl32(x6 ^ x10, 7)
+	// x11 += x15, x7 = rotl32(x7 ^ x11, 7)
+	vadd.i32	q8, q8, q12
+	vadd.i32	q9, q9, q13
+	vadd.i32	q10, q10, q14
+	vadd.i32	q11, q11, q15
+
+	vst1.32		{q8-q9}, [sp, :256]
+
+	veor		q8, q4, q8
+	veor		q9, q5, q9
+	vshl.u32	q4, q8, #7
+	vshl.u32	q5, q9, #7
+	vsri.u32	q4, q8, #25
+	vsri.u32	q5, q9, #25
+
+	veor		q8, q6, q10
+	veor		q9, q7, q11
+	vshl.u32	q6, q8, #7
+	vshl.u32	q7, q9, #7
+	vsri.u32	q6, q8, #25
+	vsri.u32	q7, q9, #25
+
+	vld1.32		{q8-q9}, [sp, :256]
+
+	// x0 += x5, x15 = rotl32(x15 ^ x0, 16)
+	// x1 += x6, x12 = rotl32(x12 ^ x1, 16)
+	// x2 += x7, x13 = rotl32(x13 ^ x2, 16)
+	// x3 += x4, x14 = rotl32(x14 ^ x3, 16)
+	vadd.i32	q0, q0, q5
+	vadd.i32	q1, q1, q6
+	vadd.i32	q2, q2, q7
+	vadd.i32	q3, q3, q4
+
+	veor		q15, q15, q0
+	veor		q12, q12, q1
+	veor		q13, q13, q2
+	veor		q14, q14, q3
+
+	vrev32.16	q15, q15
+	vrev32.16	q12, q12
+	vrev32.16	q13, q13
+	vrev32.16	q14, q14
+
+	// x10 += x15, x5 = rotl32(x5 ^ x10, 12)
+	// x11 += x12, x6 = rotl32(x6 ^ x11, 12)
+	// x8 += x13, x7 = rotl32(x7 ^ x8, 12)
+	// x9 += x14, x4 = rotl32(x4 ^ x9, 12)
+	vadd.i32	q10, q10, q15
+	vadd.i32	q11, q11, q12
+	vadd.i32	q8, q8, q13
+	vadd.i32	q9, q9, q14
+
+	vst1.32		{q8-q9}, [sp, :256]
+
+	veor		q8, q7, q8
+	veor		q9, q4, q9
+	vshl.u32	q7, q8, #12
+	vshl.u32	q4, q9, #12
+	vsri.u32	q7, q8, #20
+	vsri.u32	q4, q9, #20
+
+	veor		q8, q5, q10
+	veor		q9, q6, q11
+	vshl.u32	q5, q8, #12
+	vshl.u32	q6, q9, #12
+	vsri.u32	q5, q8, #20
+	vsri.u32	q6, q9, #20
+
+	// x0 += x5, x15 = rotl32(x15 ^ x0, 8)
+	// x1 += x6, x12 = rotl32(x12 ^ x1, 8)
+	// x2 += x7, x13 = rotl32(x13 ^ x2, 8)
+	// x3 += x4, x14 = rotl32(x14 ^ x3, 8)
+	vadd.i32	q0, q0, q5
+	vadd.i32	q1, q1, q6
+	vadd.i32	q2, q2, q7
+	vadd.i32	q3, q3, q4
+
+	veor		q8, q15, q0
+	veor		q9, q12, q1
+	vshl.u32	q15, q8, #8
+	vshl.u32	q12, q9, #8
+	vsri.u32	q15, q8, #24
+	vsri.u32	q12, q9, #24
+
+	veor		q8, q13, q2
+	veor		q9, q14, q3
+	vshl.u32	q13, q8, #8
+	vshl.u32	q14, q9, #8
+	vsri.u32	q13, q8, #24
+	vsri.u32	q14, q9, #24
+
+	vld1.32		{q8-q9}, [sp, :256]
+
+	// x10 += x15, x5 = rotl32(x5 ^ x10, 7)
+	// x11 += x12, x6 = rotl32(x6 ^ x11, 7)
+	// x8 += x13, x7 = rotl32(x7 ^ x8, 7)
+	// x9 += x14, x4 = rotl32(x4 ^ x9, 7)
+	vadd.i32	q10, q10, q15
+	vadd.i32	q11, q11, q12
+	vadd.i32	q8, q8, q13
+	vadd.i32	q9, q9, q14
+
+	vst1.32		{q8-q9}, [sp, :256]
+
+	veor		q8, q7, q8
+	veor		q9, q4, q9
+	vshl.u32	q7, q8, #7
+	vshl.u32	q4, q9, #7
+	vsri.u32	q7, q8, #25
+	vsri.u32	q4, q9, #25
+
+	veor		q8, q5, q10
+	veor		q9, q6, q11
+	vshl.u32	q5, q8, #7
+	vshl.u32	q6, q9, #7
+	vsri.u32	q5, q8, #25
+	vsri.u32	q6, q9, #25
+
+	subs		r3, r3, #1
+	beq		0f
+
+	vld1.32		{q8-q9}, [sp, :256]
+	b		.Ldoubleround4
+
+	// x0[0-3] += s0[0]
+	// x1[0-3] += s0[1]
+	// x2[0-3] += s0[2]
+	// x3[0-3] += s0[3]
+0:	ldmia		r0!, {r3-r6}
+	vdup.32		q8, r3
+	vdup.32		q9, r4
+	vadd.i32	q0, q0, q8
+	vadd.i32	q1, q1, q9
+	vdup.32		q8, r5
+	vdup.32		q9, r6
+	vadd.i32	q2, q2, q8
+	vadd.i32	q3, q3, q9
+
+	// x4[0-3] += s1[0]
+	// x5[0-3] += s1[1]
+	// x6[0-3] += s1[2]
+	// x7[0-3] += s1[3]
+	ldmia		r0!, {r3-r6}
+	vdup.32		q8, r3
+	vdup.32		q9, r4
+	vadd.i32	q4, q4, q8
+	vadd.i32	q5, q5, q9
+	vdup.32		q8, r5
+	vdup.32		q9, r6
+	vadd.i32	q6, q6, q8
+	vadd.i32	q7, q7, q9
+
+	// interleave 32-bit words in state n, n+1
+	vzip.32		q0, q1
+	vzip.32		q2, q3
+	vzip.32		q4, q5
+	vzip.32		q6, q7
+
+	// interleave 64-bit words in state n, n+2
+	vswp		d1, d4
+	vswp		d3, d6
+	vswp		d9, d12
+	vswp		d11, d14
+
+	// xor with corresponding input, write to output
+	vld1.8		{q8-q9}, [r2]!
+	veor		q8, q8, q0
+	veor		q9, q9, q4
+	vst1.8		{q8-q9}, [r1]!
+
+	vld1.32		{q8-q9}, [sp, :256]
+
+	// x8[0-3] += s2[0]
+	// x9[0-3] += s2[1]
+	// x10[0-3] += s2[2]
+	// x11[0-3] += s2[3]
+	ldmia		r0!, {r3-r6}
+	vdup.32		q0, r3
+	vdup.32		q4, r4
+	vadd.i32	q8, q8, q0
+	vadd.i32	q9, q9, q4
+	vdup.32		q0, r5
+	vdup.32		q4, r6
+	vadd.i32	q10, q10, q0
+	vadd.i32	q11, q11, q4
+
+	// x12[0-3] += s3[0]
+	// x13[0-3] += s3[1]
+	// x14[0-3] += s3[2]
+	// x15[0-3] += s3[3]
+	ldmia		r0!, {r3-r6}
+	vdup.32		q0, r3
+	vdup.32		q4, r4
+	adr		r3, CTRINC
+	vadd.i32	q12, q12, q0
+	vld1.32		{q0}, [r3, :128]
+	vadd.i32	q13, q13, q4
+	vadd.i32	q12, q12, q0		// x12 += counter values 0-3
+
+	vdup.32		q0, r5
+	vdup.32		q4, r6
+	vadd.i32	q14, q14, q0
+	vadd.i32	q15, q15, q4
+
+	// interleave 32-bit words in state n, n+1
+	vzip.32		q8, q9
+	vzip.32		q10, q11
+	vzip.32		q12, q13
+	vzip.32		q14, q15
+
+	// interleave 64-bit words in state n, n+2
+	vswp		d17, d20
+	vswp		d19, d22
+	vswp		d25, d28
+	vswp		d27, d30
+
+	vmov		q4, q1
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q8
+	veor		q1, q1, q12
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q2
+	veor		q1, q1, q6
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q10
+	veor		q1, q1, q14
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q4
+	veor		q1, q1, q5
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q9
+	veor		q1, q1, q13
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]!
+	veor		q0, q0, q3
+	veor		q1, q1, q7
+	vst1.8		{q0-q1}, [r1]!
+
+	vld1.8		{q0-q1}, [r2]
+	veor		q0, q0, q11
+	veor		q1, q1, q15
+	vst1.8		{q0-q1}, [r1]
+
+	mov		sp, ip
+	pop		{r4-r6, pc}
+ENDPROC(chacha20_4block_xor_neon)
+
+	.align		4
+CTRINC:	.word		0, 1, 2, 3
+
diff --git a/arch/arm/crypto/chacha20-neon-glue.c b/arch/arm/crypto/chacha20-neon-glue.c
new file mode 100644
index 000000000000..20623237e2bc
--- /dev/null
+++ b/arch/arm/crypto/chacha20-neon-glue.c
@@ -0,0 +1,127 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, ARM NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <crypto/algapi.h>
+#include <crypto/chacha20.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include <asm/hwcap.h>
+#include <asm/neon.h>
+#include <asm/simd.h>
+
+asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+
+static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
+			    unsigned int bytes)
+{
+	u8 buf[CHACHA20_BLOCK_SIZE];
+
+	while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
+		chacha20_4block_xor_neon(state, dst, src);
+		bytes -= CHACHA20_BLOCK_SIZE * 4;
+		src += CHACHA20_BLOCK_SIZE * 4;
+		dst += CHACHA20_BLOCK_SIZE * 4;
+		state[12] += 4;
+	}
+	while (bytes >= CHACHA20_BLOCK_SIZE) {
+		chacha20_block_xor_neon(state, dst, src);
+		bytes -= CHACHA20_BLOCK_SIZE;
+		src += CHACHA20_BLOCK_SIZE;
+		dst += CHACHA20_BLOCK_SIZE;
+		state[12]++;
+	}
+	if (bytes) {
+		memcpy(buf, src, bytes);
+		chacha20_block_xor_neon(state, buf, buf);
+		memcpy(dst, buf, bytes);
+	}
+}
+
+static int chacha20_neon(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	u32 state[16];
+	int err;
+
+	if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
+		return crypto_chacha20_crypt(req);
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	crypto_chacha20_init(state, ctx, walk.iv);
+
+	kernel_neon_begin();
+	while (walk.nbytes > 0) {
+		unsigned int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes = round_down(nbytes, walk.chunksize);
+
+		chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
+				nbytes);
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+	}
+	kernel_neon_end();
+
+	return err;
+}
+
+static struct skcipher_alg alg = {
+	.base.cra_name		= "chacha20",
+	.base.cra_driver_name	= "chacha20-neon",
+	.base.cra_priority	= 300,
+	.base.cra_blocksize	= 1,
+	.base.cra_ctxsize	= sizeof(struct chacha20_ctx),
+	.base.cra_alignmask	= 1,
+	.base.cra_module	= THIS_MODULE,
+
+	.min_keysize		= CHACHA20_KEY_SIZE,
+	.max_keysize		= CHACHA20_KEY_SIZE,
+	.ivsize			= CHACHA20_IV_SIZE,
+	.chunksize		= 4 * CHACHA20_BLOCK_SIZE,
+	.setkey			= crypto_chacha20_setkey,
+	.encrypt		= chacha20_neon,
+	.decrypt		= chacha20_neon,
+};
+
+static int __init chacha20_simd_mod_init(void)
+{
+	if (!(elf_hwcap & HWCAP_NEON))
+		return -ENODEV;
+
+	return crypto_register_skcipher(&alg);
+}
+
+static void __exit chacha20_simd_mod_fini(void)
+{
+	crypto_unregister_skcipher(&alg);
+}
+
+module_init(chacha20_simd_mod_init);
+module_exit(chacha20_simd_mod_fini);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("chacha20");
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 2/3] crypto: arm64/chacha20 - implement NEON version based on SSE3 code
From: Ard Biesheuvel @ 2016-12-09 14:33 UTC (permalink / raw)
  To: linux-crypto, herbert; +Cc: linux-arm-kernel, Ard Biesheuvel
In-Reply-To: <1481294033-23508-1-git-send-email-ard.biesheuvel@linaro.org>

This is a straight port to arm64/NEON of the x86 SSE3 implementation
of the ChaCha20 stream cipher.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig              |   6 +
 arch/arm64/crypto/Makefile             |   3 +
 arch/arm64/crypto/chacha20-neon-core.S | 450 ++++++++++++++++++++
 arch/arm64/crypto/chacha20-neon-glue.c | 122 ++++++
 4 files changed, 581 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 450a85df041a..0bf0f531f539 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -72,4 +72,10 @@ config CRYPTO_CRC32_ARM64
 	depends on ARM64
 	select CRYPTO_HASH
 
+config CRYPTO_CHACHA20_NEON
+	tristate "NEON accelerated ChaCha20 symmetric cipher"
+	depends on KERNEL_MODE_NEON
+	select CRYPTO_BLKCIPHER
+	select CRYPTO_CHACHA20
+
 endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index aa8888d7b744..9d2826c5fccf 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -41,6 +41,9 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
 obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
 sha512-arm64-y := sha512-glue.o sha512-core.o
 
+obj-$(CONFIG_CRYPTO_CHACHA20_NEON) += chacha20-neon.o
+chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
+
 AFLAGS_aes-ce.o		:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o	:= -DINTERLEAVE=4
 
diff --git a/arch/arm64/crypto/chacha20-neon-core.S b/arch/arm64/crypto/chacha20-neon-core.S
new file mode 100644
index 000000000000..3cbbf23dc4d2
--- /dev/null
+++ b/arch/arm64/crypto/chacha20-neon-core.S
@@ -0,0 +1,450 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, x64 SSSE3 functions
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <linux/linkage.h>
+
+	.text
+	.align		6
+
+ENTRY(chacha20_block_xor_neon)
+	// x0: Input state matrix, s
+	// x1: 1 data block output, o
+	// x2: 1 data block input, i
+
+	//
+	// This function encrypts one ChaCha20 block by loading the state matrix
+	// in four NEON registers. It performs matrix operation on four words in
+	// parallel, but requires shuffling to rearrange the words after each
+	// round.
+	//
+
+	// x0..3 = s0..3
+	adr		x3, ROT8
+	ld1		{v0.4s-v3.4s}, [x0]
+	ld1		{v8.4s-v11.4s}, [x0]
+	ld1		{v12.4s}, [x3]
+
+	mov		x3, #10
+
+.Ldoubleround:
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+	add		v0.4s, v0.4s, v1.4s
+	eor		v3.16b, v3.16b, v0.16b
+	rev32		v3.8h, v3.8h
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+	add		v2.4s, v2.4s, v3.4s
+	eor		v4.16b, v1.16b, v2.16b
+	shl		v1.4s, v4.4s, #12
+	sri		v1.4s, v4.4s, #20
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+	add		v0.4s, v0.4s, v1.4s
+	eor		v3.16b, v3.16b, v0.16b
+	tbl		v3.16b, {v3.16b}, v12.16b
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+	add		v2.4s, v2.4s, v3.4s
+	eor		v4.16b, v1.16b, v2.16b
+	shl		v1.4s, v4.4s, #7
+	sri		v1.4s, v4.4s, #25
+
+	// x1 = shuffle32(x1, MASK(0, 3, 2, 1))
+	ext		v1.16b, v1.16b, v1.16b, #4
+	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+	ext		v2.16b, v2.16b, v2.16b, #8
+	// x3 = shuffle32(x3, MASK(2, 1, 0, 3))
+	ext		v3.16b, v3.16b, v3.16b, #12
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 16)
+	add		v0.4s, v0.4s, v1.4s
+	eor		v3.16b, v3.16b, v0.16b
+	rev32		v3.8h, v3.8h
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 12)
+	add		v2.4s, v2.4s, v3.4s
+	eor		v4.16b, v1.16b, v2.16b
+	shl		v1.4s, v4.4s, #12
+	sri		v1.4s, v4.4s, #20
+
+	// x0 += x1, x3 = rotl32(x3 ^ x0, 8)
+	add		v0.4s, v0.4s, v1.4s
+	eor		v3.16b, v3.16b, v0.16b
+	tbl		v3.16b, {v3.16b}, v12.16b
+
+	// x2 += x3, x1 = rotl32(x1 ^ x2, 7)
+	add		v2.4s, v2.4s, v3.4s
+	eor		v4.16b, v1.16b, v2.16b
+	shl		v1.4s, v4.4s, #7
+	sri		v1.4s, v4.4s, #25
+
+	// x1 = shuffle32(x1, MASK(2, 1, 0, 3))
+	ext		v1.16b, v1.16b, v1.16b, #12
+	// x2 = shuffle32(x2, MASK(1, 0, 3, 2))
+	ext		v2.16b, v2.16b, v2.16b, #8
+	// x3 = shuffle32(x3, MASK(0, 3, 2, 1))
+	ext		v3.16b, v3.16b, v3.16b, #4
+
+	subs		x3, x3, #1
+	b.ne		.Ldoubleround
+
+	ld1		{v4.16b-v7.16b}, [x2]
+
+	// o0 = i0 ^ (x0 + s0)
+	add		v0.4s, v0.4s, v8.4s
+	eor		v0.16b, v0.16b, v4.16b
+
+	// o1 = i1 ^ (x1 + s1)
+	add		v1.4s, v1.4s, v9.4s
+	eor		v1.16b, v1.16b, v5.16b
+
+	// o2 = i2 ^ (x2 + s2)
+	add		v2.4s, v2.4s, v10.4s
+	eor		v2.16b, v2.16b, v6.16b
+
+	// o3 = i3 ^ (x3 + s3)
+	add		v3.4s, v3.4s, v11.4s
+	eor		v3.16b, v3.16b, v7.16b
+
+	st1		{v0.16b-v3.16b}, [x1]
+
+	ret
+ENDPROC(chacha20_block_xor_neon)
+
+	.align		6
+ENTRY(chacha20_4block_xor_neon)
+	// x0: Input state matrix, s
+	// x1: 4 data blocks output, o
+	// x2: 4 data blocks input, i
+
+	//
+	// This function encrypts four consecutive ChaCha20 blocks by loading
+	// the state matrix in NEON registers four times. The algorithm performs
+	// each operation on the corresponding word of each state matrix, hence
+	// requires no word shuffling. For final XORing step we transpose the
+	// matrix by interleaving 32- and then 64-bit words, which allows us to
+	// do XOR in NEON registers.
+	//
+	adr		x3, CTRINC		// ... and ROT8
+	ld1		{v30.4s-v31.4s}, [x3]
+
+	// x0..15[0-3] = s0..3[0..3]
+	mov		x4, x0
+	ld4r		{ v0.4s- v3.4s}, [x4], #16
+	ld4r		{ v4.4s- v7.4s}, [x4], #16
+	ld4r		{ v8.4s-v11.4s}, [x4], #16
+	ld4r		{v12.4s-v15.4s}, [x4]
+
+	// x12 += counter values 0-3
+	add		v12.4s, v12.4s, v30.4s
+
+	mov		x3, #10
+
+.Ldoubleround4:
+	// x0 += x4, x12 = rotl32(x12 ^ x0, 16)
+	// x1 += x5, x13 = rotl32(x13 ^ x1, 16)
+	// x2 += x6, x14 = rotl32(x14 ^ x2, 16)
+	// x3 += x7, x15 = rotl32(x15 ^ x3, 16)
+	add		v0.4s, v0.4s, v4.4s
+	add		v1.4s, v1.4s, v5.4s
+	add		v2.4s, v2.4s, v6.4s
+	add		v3.4s, v3.4s, v7.4s
+
+	eor		v12.16b, v12.16b, v0.16b
+	eor		v13.16b, v13.16b, v1.16b
+	eor		v14.16b, v14.16b, v2.16b
+	eor		v15.16b, v15.16b, v3.16b
+
+	rev32		v12.8h, v12.8h
+	rev32		v13.8h, v13.8h
+	rev32		v14.8h, v14.8h
+	rev32		v15.8h, v15.8h
+
+	// x8 += x12, x4 = rotl32(x4 ^ x8, 12)
+	// x9 += x13, x5 = rotl32(x5 ^ x9, 12)
+	// x10 += x14, x6 = rotl32(x6 ^ x10, 12)
+	// x11 += x15, x7 = rotl32(x7 ^ x11, 12)
+	add		v8.4s, v8.4s, v12.4s
+	add		v9.4s, v9.4s, v13.4s
+	add		v10.4s, v10.4s, v14.4s
+	add		v11.4s, v11.4s, v15.4s
+
+	eor		v18.16b, v4.16b, v8.16b
+	eor		v19.16b, v5.16b, v9.16b
+	eor		v20.16b, v6.16b, v10.16b
+	eor		v21.16b, v7.16b, v11.16b
+
+	shl		v4.4s, v18.4s, #12
+	shl		v5.4s, v19.4s, #12
+	shl		v6.4s, v20.4s, #12
+	shl		v7.4s, v21.4s, #12
+
+	sri		v4.4s, v18.4s, #20
+	sri		v5.4s, v19.4s, #20
+	sri		v6.4s, v20.4s, #20
+	sri		v7.4s, v21.4s, #20
+
+	// x0 += x4, x12 = rotl32(x12 ^ x0, 8)
+	// x1 += x5, x13 = rotl32(x13 ^ x1, 8)
+	// x2 += x6, x14 = rotl32(x14 ^ x2, 8)
+	// x3 += x7, x15 = rotl32(x15 ^ x3, 8)
+	add		v0.4s, v0.4s, v4.4s
+	add		v1.4s, v1.4s, v5.4s
+	add		v2.4s, v2.4s, v6.4s
+	add		v3.4s, v3.4s, v7.4s
+
+	eor		v12.16b, v12.16b, v0.16b
+	eor		v13.16b, v13.16b, v1.16b
+	eor		v14.16b, v14.16b, v2.16b
+	eor		v15.16b, v15.16b, v3.16b
+
+	tbl		v12.16b, {v12.16b}, v31.16b
+	tbl		v13.16b, {v13.16b}, v31.16b
+	tbl		v14.16b, {v14.16b}, v31.16b
+	tbl		v15.16b, {v15.16b}, v31.16b
+
+	// x8 += x12, x4 = rotl32(x4 ^ x8, 7)
+	// x9 += x13, x5 = rotl32(x5 ^ x9, 7)
+	// x10 += x14, x6 = rotl32(x6 ^ x10, 7)
+	// x11 += x15, x7 = rotl32(x7 ^ x11, 7)
+	add		v8.4s, v8.4s, v12.4s
+	add		v9.4s, v9.4s, v13.4s
+	add		v10.4s, v10.4s, v14.4s
+	add		v11.4s, v11.4s, v15.4s
+
+	eor		v18.16b, v4.16b, v8.16b
+	eor		v19.16b, v5.16b, v9.16b
+	eor		v20.16b, v6.16b, v10.16b
+	eor		v21.16b, v7.16b, v11.16b
+
+	shl		v4.4s, v18.4s, #7
+	shl		v5.4s, v19.4s, #7
+	shl		v6.4s, v20.4s, #7
+	shl		v7.4s, v21.4s, #7
+
+	sri		v4.4s, v18.4s, #25
+	sri		v5.4s, v19.4s, #25
+	sri		v6.4s, v20.4s, #25
+	sri		v7.4s, v21.4s, #25
+
+	// x0 += x5, x15 = rotl32(x15 ^ x0, 16)
+	// x1 += x6, x12 = rotl32(x12 ^ x1, 16)
+	// x2 += x7, x13 = rotl32(x13 ^ x2, 16)
+	// x3 += x4, x14 = rotl32(x14 ^ x3, 16)
+	add		v0.4s, v0.4s, v5.4s
+	add		v1.4s, v1.4s, v6.4s
+	add		v2.4s, v2.4s, v7.4s
+	add		v3.4s, v3.4s, v4.4s
+
+	eor		v15.16b, v15.16b, v0.16b
+	eor		v12.16b, v12.16b, v1.16b
+	eor		v13.16b, v13.16b, v2.16b
+	eor		v14.16b, v14.16b, v3.16b
+
+	rev32		v15.8h, v15.8h
+	rev32		v12.8h, v12.8h
+	rev32		v13.8h, v13.8h
+	rev32		v14.8h, v14.8h
+
+	// x10 += x15, x5 = rotl32(x5 ^ x10, 12)
+	// x11 += x12, x6 = rotl32(x6 ^ x11, 12)
+	// x8 += x13, x7 = rotl32(x7 ^ x8, 12)
+	// x9 += x14, x4 = rotl32(x4 ^ x9, 12)
+	add		v10.4s, v10.4s, v15.4s
+	add		v11.4s, v11.4s, v12.4s
+	add		v8.4s, v8.4s, v13.4s
+	add		v9.4s, v9.4s, v14.4s
+
+	eor		v18.16b, v5.16b, v10.16b
+	eor		v19.16b, v6.16b, v11.16b
+	eor		v20.16b, v7.16b, v8.16b
+	eor		v21.16b, v4.16b, v9.16b
+
+	shl		v5.4s, v18.4s, #12
+	shl		v6.4s, v19.4s, #12
+	shl		v7.4s, v20.4s, #12
+	shl		v4.4s, v21.4s, #12
+
+	sri		v5.4s, v18.4s, #20
+	sri		v6.4s, v19.4s, #20
+	sri		v7.4s, v20.4s, #20
+	sri		v4.4s, v21.4s, #20
+
+	// x0 += x5, x15 = rotl32(x15 ^ x0, 8)
+	// x1 += x6, x12 = rotl32(x12 ^ x1, 8)
+	// x2 += x7, x13 = rotl32(x13 ^ x2, 8)
+	// x3 += x4, x14 = rotl32(x14 ^ x3, 8)
+	add		v0.4s, v0.4s, v5.4s
+	add		v1.4s, v1.4s, v6.4s
+	add		v2.4s, v2.4s, v7.4s
+	add		v3.4s, v3.4s, v4.4s
+
+	eor		v15.16b, v15.16b, v0.16b
+	eor		v12.16b, v12.16b, v1.16b
+	eor		v13.16b, v13.16b, v2.16b
+	eor		v14.16b, v14.16b, v3.16b
+
+	tbl		v15.16b, {v15.16b}, v31.16b
+	tbl		v12.16b, {v12.16b}, v31.16b
+	tbl		v13.16b, {v13.16b}, v31.16b
+	tbl		v14.16b, {v14.16b}, v31.16b
+
+	// x10 += x15, x5 = rotl32(x5 ^ x10, 7)
+	// x11 += x12, x6 = rotl32(x6 ^ x11, 7)
+	// x8 += x13, x7 = rotl32(x7 ^ x8, 7)
+	// x9 += x14, x4 = rotl32(x4 ^ x9, 7)
+	add		v10.4s, v10.4s, v15.4s
+	add		v11.4s, v11.4s, v12.4s
+	add		v8.4s, v8.4s, v13.4s
+	add		v9.4s, v9.4s, v14.4s
+
+	eor		v18.16b, v5.16b, v10.16b
+	eor		v19.16b, v6.16b, v11.16b
+	eor		v20.16b, v7.16b, v8.16b
+	eor		v21.16b, v4.16b, v9.16b
+
+	shl		v5.4s, v18.4s, #7
+	shl		v6.4s, v19.4s, #7
+	shl		v7.4s, v20.4s, #7
+	shl		v4.4s, v21.4s, #7
+
+	sri		v5.4s, v18.4s, #25
+	sri		v6.4s, v19.4s, #25
+	sri		v7.4s, v20.4s, #25
+	sri		v4.4s, v21.4s, #25
+
+	subs		x3, x3, #1
+	b.ne		.Ldoubleround4
+
+	ld4r		{v16.4s-v19.4s}, [x0], #16
+	ld4r		{v20.4s-v23.4s}, [x0], #16
+
+	// x12 += counter values 0-3
+	add		v12.4s, v12.4s, v30.4s
+
+	// x0[0-3] += s0[0]
+	// x1[0-3] += s0[1]
+	// x2[0-3] += s0[2]
+	// x3[0-3] += s0[3]
+	add		v0.4s, v0.4s, v16.4s
+	add		v1.4s, v1.4s, v17.4s
+	add		v2.4s, v2.4s, v18.4s
+	add		v3.4s, v3.4s, v19.4s
+
+	ld4r		{v24.4s-v27.4s}, [x0], #16
+	ld4r		{v28.4s-v31.4s}, [x0]
+
+	// x4[0-3] += s1[0]
+	// x5[0-3] += s1[1]
+	// x6[0-3] += s1[2]
+	// x7[0-3] += s1[3]
+	add		v4.4s, v4.4s, v20.4s
+	add		v5.4s, v5.4s, v21.4s
+	add		v6.4s, v6.4s, v22.4s
+	add		v7.4s, v7.4s, v23.4s
+
+	// x8[0-3] += s2[0]
+	// x9[0-3] += s2[1]
+	// x10[0-3] += s2[2]
+	// x11[0-3] += s2[3]
+	add		v8.4s, v8.4s, v24.4s
+	add		v9.4s, v9.4s, v25.4s
+	add		v10.4s, v10.4s, v26.4s
+	add		v11.4s, v11.4s, v27.4s
+
+	// x12[0-3] += s3[0]
+	// x13[0-3] += s3[1]
+	// x14[0-3] += s3[2]
+	// x15[0-3] += s3[3]
+	add		v12.4s, v12.4s, v28.4s
+	add		v13.4s, v13.4s, v29.4s
+	add		v14.4s, v14.4s, v30.4s
+	add		v15.4s, v15.4s, v31.4s
+
+	// interleave 32-bit words in state n, n+1
+	zip1		v16.4s, v0.4s, v1.4s
+	zip2		v17.4s, v0.4s, v1.4s
+	zip1		v18.4s, v2.4s, v3.4s
+	zip2		v19.4s, v2.4s, v3.4s
+	zip1		v20.4s, v4.4s, v5.4s
+	zip2		v21.4s, v4.4s, v5.4s
+	zip1		v22.4s, v6.4s, v7.4s
+	zip2		v23.4s, v6.4s, v7.4s
+	zip1		v24.4s, v8.4s, v9.4s
+	zip2		v25.4s, v8.4s, v9.4s
+	zip1		v26.4s, v10.4s, v11.4s
+	zip2		v27.4s, v10.4s, v11.4s
+	zip1		v28.4s, v12.4s, v13.4s
+	zip2		v29.4s, v12.4s, v13.4s
+	zip1		v30.4s, v14.4s, v15.4s
+	zip2		v31.4s, v14.4s, v15.4s
+
+	// interleave 64-bit words in state n, n+2
+	zip1		v0.2d, v16.2d, v18.2d
+	zip2		v4.2d, v16.2d, v18.2d
+	zip1		v8.2d, v17.2d, v19.2d
+	zip2		v12.2d, v17.2d, v19.2d
+	ld1		{v16.16b-v19.16b}, [x2], #64
+
+	zip1		v1.2d, v20.2d, v22.2d
+	zip2		v5.2d, v20.2d, v22.2d
+	zip1		v9.2d, v21.2d, v23.2d
+	zip2		v13.2d, v21.2d, v23.2d
+	ld1		{v20.16b-v23.16b}, [x2], #64
+
+	zip1		v2.2d, v24.2d, v26.2d
+	zip2		v6.2d, v24.2d, v26.2d
+	zip1		v10.2d, v25.2d, v27.2d
+	zip2		v14.2d, v25.2d, v27.2d
+	ld1		{v24.16b-v27.16b}, [x2], #64
+
+	zip1		v3.2d, v28.2d, v30.2d
+	zip2		v7.2d, v28.2d, v30.2d
+	zip1		v11.2d, v29.2d, v31.2d
+	zip2		v15.2d, v29.2d, v31.2d
+	ld1		{v28.16b-v31.16b}, [x2]
+
+	// xor with corresponding input, write to output
+	eor		v16.16b, v16.16b, v0.16b
+	eor		v17.16b, v17.16b, v1.16b
+	eor		v18.16b, v18.16b, v2.16b
+	eor		v19.16b, v19.16b, v3.16b
+	eor		v20.16b, v20.16b, v4.16b
+	eor		v21.16b, v21.16b, v5.16b
+	st1		{v16.16b-v19.16b}, [x1], #64
+	eor		v22.16b, v22.16b, v6.16b
+	eor		v23.16b, v23.16b, v7.16b
+	eor		v24.16b, v24.16b, v8.16b
+	eor		v25.16b, v25.16b, v9.16b
+	st1		{v20.16b-v23.16b}, [x1], #64
+	eor		v26.16b, v26.16b, v10.16b
+	eor		v27.16b, v27.16b, v11.16b
+	eor		v28.16b, v28.16b, v12.16b
+	st1		{v24.16b-v27.16b}, [x1], #64
+	eor		v29.16b, v29.16b, v13.16b
+	eor		v30.16b, v30.16b, v14.16b
+	eor		v31.16b, v31.16b, v15.16b
+	st1		{v28.16b-v31.16b}, [x1]
+
+	ret
+ENDPROC(chacha20_4block_xor_neon)
+
+CTRINC:	.word		0, 1, 2, 3
+ROT8:	.word		0x02010003, 0x06050407, 0x0a09080b, 0x0e0d0c0f
diff --git a/arch/arm64/crypto/chacha20-neon-glue.c b/arch/arm64/crypto/chacha20-neon-glue.c
new file mode 100644
index 000000000000..06fdb8468ac6
--- /dev/null
+++ b/arch/arm64/crypto/chacha20-neon-glue.c
@@ -0,0 +1,122 @@
+/*
+ * ChaCha20 256-bit cipher algorithm, RFC7539, arm64 NEON functions
+ *
+ * Copyright (C) 2016 Linaro, Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * Based on:
+ * ChaCha20 256-bit cipher algorithm, RFC7539, SIMD glue code
+ *
+ * Copyright (C) 2015 Martin Willi
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ */
+
+#include <crypto/algapi.h>
+#include <crypto/chacha20.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+
+#include <asm/neon.h>
+
+asmlinkage void chacha20_block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+asmlinkage void chacha20_4block_xor_neon(u32 *state, u8 *dst, const u8 *src);
+
+static void chacha20_doneon(u32 *state, u8 *dst, const u8 *src,
+			    unsigned int bytes)
+{
+	u8 buf[CHACHA20_BLOCK_SIZE];
+
+	while (bytes >= CHACHA20_BLOCK_SIZE * 4) {
+		chacha20_4block_xor_neon(state, dst, src);
+		bytes -= CHACHA20_BLOCK_SIZE * 4;
+		src += CHACHA20_BLOCK_SIZE * 4;
+		dst += CHACHA20_BLOCK_SIZE * 4;
+		state[12] += 4;
+	}
+	while (bytes >= CHACHA20_BLOCK_SIZE) {
+		chacha20_block_xor_neon(state, dst, src);
+		bytes -= CHACHA20_BLOCK_SIZE;
+		src += CHACHA20_BLOCK_SIZE;
+		dst += CHACHA20_BLOCK_SIZE;
+		state[12]++;
+	}
+	if (bytes) {
+		memcpy(buf, src, bytes);
+		chacha20_block_xor_neon(state, buf, buf);
+		memcpy(dst, buf, bytes);
+	}
+}
+
+static int chacha20_neon(struct skcipher_request *req)
+{
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
+	u32 state[16];
+	int err;
+
+	if (req->cryptlen <= CHACHA20_BLOCK_SIZE)
+		return crypto_chacha20_crypt(req);
+
+	err = skcipher_walk_virt(&walk, req, true);
+
+	crypto_chacha20_init(state, ctx, walk.iv);
+
+	kernel_neon_begin();
+	while (walk.nbytes > 0) {
+		unsigned int nbytes = walk.nbytes;
+
+		if (nbytes < walk.total)
+			nbytes = round_down(nbytes, walk.chunksize);
+
+		chacha20_doneon(state, walk.dst.virt.addr, walk.src.virt.addr,
+				nbytes);
+		err = skcipher_walk_done(&walk, walk.nbytes - nbytes);
+	}
+	kernel_neon_end();
+
+	return err;
+}
+
+static struct skcipher_alg alg = {
+	.base.cra_name		= "chacha20",
+	.base.cra_driver_name	= "chacha20-neon",
+	.base.cra_priority	= 300,
+	.base.cra_blocksize	= 1,
+	.base.cra_ctxsize	= sizeof(struct chacha20_ctx),
+	.base.cra_alignmask	= 1,
+	.base.cra_module	= THIS_MODULE,
+
+	.min_keysize		= CHACHA20_KEY_SIZE,
+	.max_keysize		= CHACHA20_KEY_SIZE,
+	.ivsize			= CHACHA20_IV_SIZE,
+	.chunksize		= 4 * CHACHA20_BLOCK_SIZE,
+	.setkey			= crypto_chacha20_setkey,
+	.encrypt		= chacha20_neon,
+	.decrypt		= chacha20_neon,
+};
+
+static int __init chacha20_simd_mod_init(void)
+{
+	return crypto_register_skcipher(&alg);
+}
+
+static void __exit chacha20_simd_mod_fini(void)
+{
+	crypto_unregister_skcipher(&alg);
+}
+
+module_init(chacha20_simd_mod_init);
+module_exit(chacha20_simd_mod_fini);
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+MODULE_ALIAS_CRYPTO("chacha20");
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 1/3] crypto: chacha20 - convert generic and x86 versions to skcipher
From: Ard Biesheuvel @ 2016-12-09 14:33 UTC (permalink / raw)
  To: linux-crypto, herbert; +Cc: linux-arm-kernel, Ard Biesheuvel
In-Reply-To: <1481294033-23508-1-git-send-email-ard.biesheuvel@linaro.org>

This converts the ChaCha20 code from a blkcipher to a skcipher, which
is now the preferred way to implement symmetric block and stream ciphers.

This ports the generic and x86 versions at the same time because the
latter reuses routines of the former.

Note that the skcipher_walk() API guarantees that all presented blocks
except the final one are a multiple of the chunk size, so we can simplify
the encrypt() routine somewhat.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/x86/crypto/chacha20_glue.c | 69 +++++++++---------
 crypto/chacha20_generic.c       | 73 ++++++++------------
 include/crypto/chacha20.h       |  6 +-
 3 files changed, 64 insertions(+), 84 deletions(-)

diff --git a/arch/x86/crypto/chacha20_glue.c b/arch/x86/crypto/chacha20_glue.c
index f910d1d449f0..78f75b07dc25 100644
--- a/arch/x86/crypto/chacha20_glue.c
+++ b/arch/x86/crypto/chacha20_glue.c
@@ -11,7 +11,7 @@
 
 #include <crypto/algapi.h>
 #include <crypto/chacha20.h>
-#include <linux/crypto.h>
+#include <crypto/internal/skcipher.h>
 #include <linux/kernel.h>
 #include <linux/module.h>
 #include <asm/fpu/api.h>
@@ -63,36 +63,34 @@ static void chacha20_dosimd(u32 *state, u8 *dst, const u8 *src,
 	}
 }
 
-static int chacha20_simd(struct blkcipher_desc *desc, struct scatterlist *dst,
-			 struct scatterlist *src, unsigned int nbytes)
+static int chacha20_simd(struct skcipher_request *req)
 {
-	u32 *state, state_buf[16 + (CHACHA20_STATE_ALIGN / sizeof(u32)) - 1];
-	struct blkcipher_walk walk;
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
+	u32 state[16] __aligned(CHACHA20_STATE_ALIGN);
+	struct skcipher_walk walk;
 	int err;
 
-	if (nbytes <= CHACHA20_BLOCK_SIZE || !may_use_simd())
-		return crypto_chacha20_crypt(desc, dst, src, nbytes);
+	if (req->cryptlen <= CHACHA20_BLOCK_SIZE || !may_use_simd())
+		return crypto_chacha20_crypt(req);
 
-	state = (u32 *)roundup((uintptr_t)state_buf, CHACHA20_STATE_ALIGN);
+	err = skcipher_walk_virt(&walk, req, true);
 
-	blkcipher_walk_init(&walk, dst, src, nbytes);
-	err = blkcipher_walk_virt_block(desc, &walk, CHACHA20_BLOCK_SIZE);
-
-	crypto_chacha20_init(state, crypto_blkcipher_ctx(desc->tfm), walk.iv);
+	crypto_chacha20_init(state, ctx, walk.iv);
 
 	kernel_fpu_begin();
 
 	while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
 		chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
 				rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
-		err = blkcipher_walk_done(desc, &walk,
-					  walk.nbytes % CHACHA20_BLOCK_SIZE);
+		err = skcipher_walk_done(&walk,
+					 walk.nbytes % CHACHA20_BLOCK_SIZE);
 	}
 
 	if (walk.nbytes) {
 		chacha20_dosimd(state, walk.dst.virt.addr, walk.src.virt.addr,
 				walk.nbytes);
-		err = blkcipher_walk_done(desc, &walk, 0);
+		err = skcipher_walk_done(&walk, 0);
 	}
 
 	kernel_fpu_end();
@@ -100,27 +98,22 @@ static int chacha20_simd(struct blkcipher_desc *desc, struct scatterlist *dst,
 	return err;
 }
 
-static struct crypto_alg alg = {
-	.cra_name		= "chacha20",
-	.cra_driver_name	= "chacha20-simd",
-	.cra_priority		= 300,
-	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
-	.cra_blocksize		= 1,
-	.cra_type		= &crypto_blkcipher_type,
-	.cra_ctxsize		= sizeof(struct chacha20_ctx),
-	.cra_alignmask		= sizeof(u32) - 1,
-	.cra_module		= THIS_MODULE,
-	.cra_u			= {
-		.blkcipher = {
-			.min_keysize	= CHACHA20_KEY_SIZE,
-			.max_keysize	= CHACHA20_KEY_SIZE,
-			.ivsize		= CHACHA20_IV_SIZE,
-			.geniv		= "seqiv",
-			.setkey		= crypto_chacha20_setkey,
-			.encrypt	= chacha20_simd,
-			.decrypt	= chacha20_simd,
-		},
-	},
+static struct skcipher_alg alg = {
+	.base.cra_name		= "chacha20",
+	.base.cra_driver_name	= "chacha20-simd",
+	.base.cra_priority	= 300,
+	.base.cra_blocksize	= 1,
+	.base.cra_ctxsize	= sizeof(struct chacha20_ctx),
+	.base.cra_alignmask	= sizeof(u32) - 1,
+	.base.cra_module	= THIS_MODULE,
+
+	.min_keysize		= CHACHA20_KEY_SIZE,
+	.max_keysize		= CHACHA20_KEY_SIZE,
+	.ivsize			= CHACHA20_IV_SIZE,
+	.chunksize		= CHACHA20_BLOCK_SIZE,
+	.setkey			= crypto_chacha20_setkey,
+	.encrypt		= chacha20_simd,
+	.decrypt		= chacha20_simd,
 };
 
 static int __init chacha20_simd_mod_init(void)
@@ -133,12 +126,12 @@ static int __init chacha20_simd_mod_init(void)
 			    boot_cpu_has(X86_FEATURE_AVX2) &&
 			    cpu_has_xfeatures(XFEATURE_MASK_SSE | XFEATURE_MASK_YMM, NULL);
 #endif
-	return crypto_register_alg(&alg);
+	return crypto_register_skcipher(&alg);
 }
 
 static void __exit chacha20_simd_mod_fini(void)
 {
-	crypto_unregister_alg(&alg);
+	crypto_unregister_skcipher(&alg);
 }
 
 module_init(chacha20_simd_mod_init);
diff --git a/crypto/chacha20_generic.c b/crypto/chacha20_generic.c
index 1cab83146e33..8b3c04d625c3 100644
--- a/crypto/chacha20_generic.c
+++ b/crypto/chacha20_generic.c
@@ -10,10 +10,9 @@
  */
 
 #include <crypto/algapi.h>
-#include <linux/crypto.h>
-#include <linux/kernel.h>
-#include <linux/module.h>
 #include <crypto/chacha20.h>
+#include <crypto/internal/skcipher.h>
+#include <linux/module.h>
 
 static inline u32 le32_to_cpuvp(const void *p)
 {
@@ -63,10 +62,10 @@ void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv)
 }
 EXPORT_SYMBOL_GPL(crypto_chacha20_init);
 
-int crypto_chacha20_setkey(struct crypto_tfm *tfm, const u8 *key,
+int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			   unsigned int keysize)
 {
-	struct chacha20_ctx *ctx = crypto_tfm_ctx(tfm);
+	struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
 	int i;
 
 	if (keysize != CHACHA20_KEY_SIZE)
@@ -79,66 +78,54 @@ int crypto_chacha20_setkey(struct crypto_tfm *tfm, const u8 *key,
 }
 EXPORT_SYMBOL_GPL(crypto_chacha20_setkey);
 
-int crypto_chacha20_crypt(struct blkcipher_desc *desc, struct scatterlist *dst,
-			  struct scatterlist *src, unsigned int nbytes)
+int crypto_chacha20_crypt(struct skcipher_request *req)
 {
-	struct blkcipher_walk walk;
+	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+	struct chacha20_ctx *ctx = crypto_skcipher_ctx(tfm);
+	struct skcipher_walk walk;
 	u32 state[16];
 	int err;
 
-	blkcipher_walk_init(&walk, dst, src, nbytes);
-	err = blkcipher_walk_virt_block(desc, &walk, CHACHA20_BLOCK_SIZE);
-
-	crypto_chacha20_init(state, crypto_blkcipher_ctx(desc->tfm), walk.iv);
+	err = skcipher_walk_virt(&walk, req, true);
 
-	while (walk.nbytes >= CHACHA20_BLOCK_SIZE) {
-		chacha20_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr,
-				 rounddown(walk.nbytes, CHACHA20_BLOCK_SIZE));
-		err = blkcipher_walk_done(desc, &walk,
-					  walk.nbytes % CHACHA20_BLOCK_SIZE);
-	}
+	crypto_chacha20_init(state, ctx, walk.iv);
 
-	if (walk.nbytes) {
+	while (walk.nbytes > 0) {
 		chacha20_docrypt(state, walk.dst.virt.addr, walk.src.virt.addr,
 				 walk.nbytes);
-		err = blkcipher_walk_done(desc, &walk, 0);
+		err = skcipher_walk_done(&walk, 0);
 	}
 
 	return err;
 }
 EXPORT_SYMBOL_GPL(crypto_chacha20_crypt);
 
-static struct crypto_alg alg = {
-	.cra_name		= "chacha20",
-	.cra_driver_name	= "chacha20-generic",
-	.cra_priority		= 100,
-	.cra_flags		= CRYPTO_ALG_TYPE_BLKCIPHER,
-	.cra_blocksize		= 1,
-	.cra_type		= &crypto_blkcipher_type,
-	.cra_ctxsize		= sizeof(struct chacha20_ctx),
-	.cra_alignmask		= sizeof(u32) - 1,
-	.cra_module		= THIS_MODULE,
-	.cra_u			= {
-		.blkcipher = {
-			.min_keysize	= CHACHA20_KEY_SIZE,
-			.max_keysize	= CHACHA20_KEY_SIZE,
-			.ivsize		= CHACHA20_IV_SIZE,
-			.geniv		= "seqiv",
-			.setkey		= crypto_chacha20_setkey,
-			.encrypt	= crypto_chacha20_crypt,
-			.decrypt	= crypto_chacha20_crypt,
-		},
-	},
+static struct skcipher_alg alg = {
+	.base.cra_name		= "chacha20",
+	.base.cra_driver_name	= "chacha20-generic",
+	.base.cra_priority	= 100,
+	.base.cra_blocksize	= 1,
+	.base.cra_ctxsize	= sizeof(struct chacha20_ctx),
+	.base.cra_alignmask	= sizeof(u32) - 1,
+	.base.cra_module	= THIS_MODULE,
+
+	.min_keysize		= CHACHA20_KEY_SIZE,
+	.max_keysize		= CHACHA20_KEY_SIZE,
+	.ivsize			= CHACHA20_IV_SIZE,
+	.chunksize		= CHACHA20_BLOCK_SIZE,
+	.setkey			= crypto_chacha20_setkey,
+	.encrypt		= crypto_chacha20_crypt,
+	.decrypt		= crypto_chacha20_crypt,
 };
 
 static int __init chacha20_generic_mod_init(void)
 {
-	return crypto_register_alg(&alg);
+	return crypto_register_skcipher(&alg);
 }
 
 static void __exit chacha20_generic_mod_fini(void)
 {
-	crypto_unregister_alg(&alg);
+	crypto_unregister_skcipher(&alg);
 }
 
 module_init(chacha20_generic_mod_init);
diff --git a/include/crypto/chacha20.h b/include/crypto/chacha20.h
index 20d20f681a72..445fc45f4b5b 100644
--- a/include/crypto/chacha20.h
+++ b/include/crypto/chacha20.h
@@ -5,6 +5,7 @@
 #ifndef _CRYPTO_CHACHA20_H
 #define _CRYPTO_CHACHA20_H
 
+#include <crypto/skcipher.h>
 #include <linux/types.h>
 #include <linux/crypto.h>
 
@@ -18,9 +19,8 @@ struct chacha20_ctx {
 
 void chacha20_block(u32 *state, void *stream);
 void crypto_chacha20_init(u32 *state, struct chacha20_ctx *ctx, u8 *iv);
-int crypto_chacha20_setkey(struct crypto_tfm *tfm, const u8 *key,
+int crypto_chacha20_setkey(struct crypto_skcipher *tfm, const u8 *key,
 			   unsigned int keysize);
-int crypto_chacha20_crypt(struct blkcipher_desc *desc, struct scatterlist *dst,
-			  struct scatterlist *src, unsigned int nbytes);
+int crypto_chacha20_crypt(struct skcipher_request *req);
 
 #endif
-- 
2.7.4

^ permalink raw reply related

* [PATCH v2 0/3] crypto: arm64/ARM: NEON accelerated ChaCha20 *skcipher*
From: Ard Biesheuvel @ 2016-12-09 14:33 UTC (permalink / raw)
  To: linux-crypto, herbert; +Cc: linux-arm-kernel, Ard Biesheuvel

Another port of existing x86 SSE code to NEON, again both for arm64 and ARM.

ChaCha20 is a stream cipher described in RFC 7539, and is intended to be
an efficient software implementable 'standby cipher', in case AES cannot
be used.

This NEON implementation is almost 2x as fast as the generic C code
(measured on Cortex-A57 using the arm64 version)

Changes in v2:
- add patch to convert the generic and x86 to skciphers first
- tweaked the arm64 version for some additional performance
- use chunksize == 4x blocksize for optimal speed

Ard Biesheuvel (3):
  crypto: chacha20 - convert generic and x86 versions to skcipher
  crypto: arm64/chacha20 - implement NEON version based on SSE3 code
  crypto: arm/chacha20 - implement NEON version based on SSE3 code

 arch/arm/crypto/Kconfig                |   6 +
 arch/arm/crypto/Makefile               |   2 +
 arch/arm/crypto/chacha20-neon-core.S   | 524 ++++++++++++++++++++
 arch/arm/crypto/chacha20-neon-glue.c   | 127 +++++
 arch/arm64/crypto/Kconfig              |   6 +
 arch/arm64/crypto/Makefile             |   3 +
 arch/arm64/crypto/chacha20-neon-core.S | 450 +++++++++++++++++
 arch/arm64/crypto/chacha20-neon-glue.c | 122 +++++
 arch/x86/crypto/chacha20_glue.c        |  69 ++-
 crypto/chacha20_generic.c              |  73 ++-
 include/crypto/chacha20.h              |   6 +-
 11 files changed, 1304 insertions(+), 84 deletions(-)
 create mode 100644 arch/arm/crypto/chacha20-neon-core.S
 create mode 100644 arch/arm/crypto/chacha20-neon-glue.c
 create mode 100644 arch/arm64/crypto/chacha20-neon-core.S
 create mode 100644 arch/arm64/crypto/chacha20-neon-glue.c

-- 
2.7.4

^ permalink raw reply

* [PATCH 4/7] hwrng: core: Replace asm/uaccess.h by linux/uaccess.h
From: Corentin Labbe @ 2016-12-09 14:21 UTC (permalink / raw)
  To: mpm, herbert, arnd, gregkh; +Cc: linux-crypto, linux-kernel, Corentin Labbe
In-Reply-To: <1481293299-21697-1-git-send-email-clabbe.montjoie@gmail.com>

This patch fix the checkpatch warning about asm/uaccess.h.
In the same time, we sort the headers in alphabetical order.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
 drivers/char/hw_random/core.c | 16 ++++++++--------
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index a8e63ae..7a2e496 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -10,19 +10,19 @@
  * of the GNU General Public License, incorporated herein by reference.
  */
 
+#include <linux/delay.h>
 #include <linux/device.h>
+#include <linux/err.h>
+#include <linux/fs.h>
 #include <linux/hw_random.h>
-#include <linux/module.h>
 #include <linux/kernel.h>
-#include <linux/fs.h>
-#include <linux/sched.h>
-#include <linux/miscdevice.h>
 #include <linux/kthread.h>
-#include <linux/delay.h>
-#include <linux/slab.h>
+#include <linux/miscdevice.h>
+#include <linux/module.h>
 #include <linux/random.h>
-#include <linux/err.h>
-#include <asm/uaccess.h>
+#include <linux/sched.h>
+#include <linux/slab.h>
+#include <linux/uaccess.h>
 
 #define RNG_MODULE_NAME		"hw_random"
 #define PFX			RNG_MODULE_NAME ": "
-- 
2.7.3

^ permalink raw reply related

* [PATCH 2/7] hwrng: core: rewrite better comparison to NULL
From: Corentin Labbe @ 2016-12-09 14:21 UTC (permalink / raw)
  To: mpm, herbert, arnd, gregkh; +Cc: linux-crypto, linux-kernel, Corentin Labbe
In-Reply-To: <1481293299-21697-1-git-send-email-clabbe.montjoie@gmail.com>

This patch fix the checkpatch warning "Comparison to NULL could be written "!ptr"

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
 drivers/char/hw_random/core.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 00cbb81..7029246 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -439,8 +439,7 @@ int hwrng_register(struct hwrng *rng)
 	int err = -EINVAL;
 	struct hwrng *old_rng, *tmp;
 
-	if (rng->name == NULL ||
-	    (rng->data_read == NULL && rng->read == NULL))
+	if (!rng->name || (!rng->data_read && !rng->read))
 		goto out;
 
 	mutex_lock(&rng_mutex);
-- 
2.7.3

^ permalink raw reply related

* [PATCH 1/7] hwrng: core: do not use multiple blank lines
From: Corentin Labbe @ 2016-12-09 14:21 UTC (permalink / raw)
  To: mpm, herbert, arnd, gregkh; +Cc: linux-crypto, linux-kernel, Corentin Labbe

This patch fix the checkpatch warning "Please don't use multiple blank lines"

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
 drivers/char/hw_random/core.c | 5 -----
 1 file changed, 5 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index d2d2c89..00cbb81 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -30,7 +30,6 @@
 
  */
 
-
 #include <linux/device.h>
 #include <linux/hw_random.h>
 #include <linux/module.h>
@@ -45,12 +44,10 @@
 #include <linux/err.h>
 #include <asm/uaccess.h>
 
-
 #define RNG_MODULE_NAME		"hw_random"
 #define PFX			RNG_MODULE_NAME ": "
 #define RNG_MISCDEV_MINOR	183 /* official */
 
-
 static struct hwrng *current_rng;
 static struct task_struct *hwrng_fill;
 static LIST_HEAD(rng_list);
@@ -296,7 +293,6 @@ static ssize_t rng_dev_read(struct file *filp, char __user *buf,
 	goto out;
 }
 
-
 static const struct file_operations rng_chrdev_ops = {
 	.owner		= THIS_MODULE,
 	.open		= rng_dev_open,
@@ -314,7 +310,6 @@ static struct miscdevice rng_miscdev = {
 	.groups		= rng_dev_groups,
 };
 
-
 static ssize_t hwrng_attr_current_store(struct device *dev,
 					struct device_attribute *attr,
 					const char *buf, size_t len)
-- 
2.7.3

^ permalink raw reply related

* [PATCH 7/7] hwrng: core: Remove two unused include
From: Corentin Labbe @ 2016-12-09 14:21 UTC (permalink / raw)
  To: mpm, herbert, arnd, gregkh; +Cc: linux-crypto, linux-kernel, Corentin Labbe
In-Reply-To: <1481293299-21697-1-git-send-email-clabbe.montjoie@gmail.com>

linux/fs.h and linux/sched.h are useless for hw_random/core.c.
This patch remove them.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
 drivers/char/hw_random/core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 5c654b5..85c9ab3 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -13,14 +13,12 @@
 #include <linux/delay.h>
 #include <linux/device.h>
 #include <linux/err.h>
-#include <linux/fs.h>
 #include <linux/hw_random.h>
 #include <linux/kernel.h>
 #include <linux/kthread.h>
 #include <linux/miscdevice.h>
 #include <linux/module.h>
 #include <linux/random.h>
-#include <linux/sched.h>
 #include <linux/slab.h>
 #include <linux/uaccess.h>
 
-- 
2.7.3

^ permalink raw reply related

* [PATCH 6/7] hwrng: core: remove unused PFX macro
From: Corentin Labbe @ 2016-12-09 14:21 UTC (permalink / raw)
  To: mpm, herbert, arnd, gregkh; +Cc: linux-crypto, linux-kernel, Corentin Labbe
In-Reply-To: <1481293299-21697-1-git-send-email-clabbe.montjoie@gmail.com>

This patch remove the unused PFX macro.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
 drivers/char/hw_random/core.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 1e1e385..5c654b5 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -25,7 +25,6 @@
 #include <linux/uaccess.h>
 
 #define RNG_MODULE_NAME		"hw_random"
-#define PFX			RNG_MODULE_NAME ": "
 
 static struct hwrng *current_rng;
 static struct task_struct *hwrng_fill;
-- 
2.7.3

^ permalink raw reply related

* [PATCH 5/7] hwrng: core: Move hwrng miscdev minor number to include/linux/miscdevice.h
From: Corentin Labbe @ 2016-12-09 14:21 UTC (permalink / raw)
  To: mpm, herbert, arnd, gregkh; +Cc: linux-crypto, linux-kernel, Corentin Labbe
In-Reply-To: <1481293299-21697-1-git-send-email-clabbe.montjoie@gmail.com>

This patch move the define for hwrng's miscdev minor number to
include/linux/miscdevice.h.
It's better that all minor number are in the same place.
Rename it to HWRNG_MINOR (from RNG_MISCDEV_MINOR) in he process since
no other miscdev define have MISCDEV in their name.

Signed-off-by: Corentin Labbe <clabbe.montjoie@gmail.com>
---
 drivers/char/hw_random/core.c | 3 +--
 include/linux/miscdevice.h    | 1 +
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/core.c b/drivers/char/hw_random/core.c
index 7a2e496..1e1e385 100644
--- a/drivers/char/hw_random/core.c
+++ b/drivers/char/hw_random/core.c
@@ -26,7 +26,6 @@
 
 #define RNG_MODULE_NAME		"hw_random"
 #define PFX			RNG_MODULE_NAME ": "
-#define RNG_MISCDEV_MINOR	183 /* official */
 
 static struct hwrng *current_rng;
 static struct task_struct *hwrng_fill;
@@ -283,7 +282,7 @@ static const struct file_operations rng_chrdev_ops = {
 static const struct attribute_group *rng_dev_groups[];
 
 static struct miscdevice rng_miscdev = {
-	.minor		= RNG_MISCDEV_MINOR,
+	.minor		= HWRNG_MINOR,
 	.name		= RNG_MODULE_NAME,
 	.nodename	= "hwrng",
 	.fops		= &rng_chrdev_ops,
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 722698a..659f586 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -31,6 +31,7 @@
 #define SGI_MMTIMER		153
 #define STORE_QUEUE_MINOR	155	/* unused */
 #define I2O_MINOR		166
+#define HWRNG_MINOR		183
 #define MICROCODE_MINOR		184
 #define VFIO_MINOR		196
 #define TUN_MINOR		200
-- 
2.7.3

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox