Linux cryptographic layer development

Linux cryptographic layer development
 help / color / mirror / Atom feed

* Re: sha1_mb broken
From: Megha Dey @ 2016-09-28 17:58 UTC (permalink / raw)
  To: Stephan Mueller; +Cc: linux-crypto
In-Reply-To: <3264980.mGNQ3Qi0fl@positron.chronox.de>

Hi Stephan,

Could you give me more info on how I could reproduce this issue on my
end?

Also was this issue there all along? Which is the first kernel version
where you see this?

Thanks,
Megha

On Mon, 2016-09-26 at 19:32 +0200, Stephan Mueller wrote:
> Am Freitag, 26. August 2016, 03:15:06 CEST schrieb Stephan Mueller:
> 
> Hi Megha,
> 
> > Hi,
> > 
> > I tried to execute tests with sha1_mb.
> 
> Have you had a chance to look into this one?
> > 
> > The execution simply stalls when invoking a digest operation -- i.e. the
> > digest operation does not finish. After some time after invoking the hashing
> > operation, the following log appears (note, the kccavs_* functions are my
> > test code; that test code works perfectly well with all other hash
> > implementations):
> > 
> > [  140.426026] INFO: rcu_sched detected stalls on CPUs/tasks:
> > [  140.426719] 	2-...: (1 GPs behind) idle=9c3/140000000000000/0
> > softirq=2680/2707 fqs=14762
> > [  140.427024] 	(detected by 0, t=60002 jiffies, g=655, c=654, q=35)
> > [  140.427024] Task dump for CPU 2:
> > [  140.427024] cavs_driver     R  running task        0   945    862
> > 0x00000008
> > [  140.427024]  ffffffff9b78d965 ffffa527b8bfa640 ffffa527bb505940
> > ffffa52775c20c50
> > [  140.427024]  ffffa527bc300000 ffffa527b93d2a80 ffffa527bb857e00
> > ffffa527bc2ffd78
> > [  140.427024]  ffffa527b93d2ac8 ffffa527bc2ffcc0 ffffffff9b78dfc8
> > ffffa527bc2ffd70
> > [  140.427024] Call Trace:
> > [  140.427024]  [<ffffffff9b78d965>] ? __schedule+0x245/0x690
> > [  140.427024]  [<ffffffff9b78dfc8>] ? preempt_schedule_common+0x18/0x30
> > [  140.427024]  [<ffffffff9b791a68>] ? _raw_spin_lock_irq+0x28/0x30
> > [  140.427024]  [<ffffffff9b78ed98>] ? wait_for_completion_interruptible
> > +0x28/0x180
> > [  140.427024]  [<ffffffffc028541c>] ? sha1_mb_async_digest+0x6c/0x70
> > [sha1_mb]
> > [  140.427024]  [<ffffffff9b3b1129>] ? crypto_ahash_op+0x29/0x70
> > [  140.427024]  [<ffffffffc0270148>] ? kccavs_test_ahash+0x198/0x2b0
> > [kcapi_cavs]
> > [  140.427024]  [<ffffffffc026e20a>] ? kccavs_data_read+0xda/0x160
> > [kcapi_cavs]
> > [  140.427024]  [<ffffffff9b370964>] ? full_proxy_read+0x54/0x90
> > [  140.427024]  [<ffffffff9b208088>] ? __vfs_read+0x28/0x110
> > [  140.427024]  [<ffffffff9b38a2c0>] ? security_file_permission+0xa0/0xc0
> > [  140.427024]  [<ffffffff9b20850e>] ? rw_verify_area+0x4e/0xb0
> > [  140.427024]  [<ffffffff9b208606>] ? vfs_read+0x96/0x130
> > [  140.427024]  [<ffffffff9b2099f6>] ? SyS_read+0x46/0xa0
> > [  140.427024]  [<ffffffff9b791cb6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
> > 
> > 
> > 
> > Ciao
> > Stephan
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 
> Ciao
> Stephan

^ permalink raw reply

* Re: sha1_mb broken
From: Megha Dey @ 2016-09-28 18:25 UTC (permalink / raw)
  To: Stephan Mueller; +Cc: linux-crypto, tim.c.chen
In-Reply-To: <1475085499.2490.1.camel@megha-Z97X-UD7-TH>

Hi Stephan,

There was a bug fix: Commit ID : 0851561d (introduced in 4.6-rc5).

Assuming that you are using an older kernel than this one, maybe we are
issuing the complete with the wrong pointer, so the original issuer of
the request never gets the complete back.

If you are using an older kernel, can you please use the lastest and let
me know if you still see this issue?

Also can you give more info on the test case? Does it issue single
request or multiple requests?

Thanks,
Megha

On Wed, 2016-09-28 at 10:58 -0700, Megha Dey wrote:
> Hi Stephan,
> 
> Could you give me more info on how I could reproduce this issue on my
> end?
> 
> Also was this issue there all along? Which is the first kernel version
> where you see this?
> 
> Thanks,
> Megha
> 
> On Mon, 2016-09-26 at 19:32 +0200, Stephan Mueller wrote:
> > Am Freitag, 26. August 2016, 03:15:06 CEST schrieb Stephan Mueller:
> > 
> > Hi Megha,
> > 
> > > Hi,
> > > 
> > > I tried to execute tests with sha1_mb.
> > 
> > Have you had a chance to look into this one?
> > > 
> > > The execution simply stalls when invoking a digest operation -- i.e. the
> > > digest operation does not finish. After some time after invoking the hashing
> > > operation, the following log appears (note, the kccavs_* functions are my
> > > test code; that test code works perfectly well with all other hash
> > > implementations):
> > > 
> > > [  140.426026] INFO: rcu_sched detected stalls on CPUs/tasks:
> > > [  140.426719] 	2-...: (1 GPs behind) idle=9c3/140000000000000/0
> > > softirq=2680/2707 fqs=14762
> > > [  140.427024] 	(detected by 0, t=60002 jiffies, g=655, c=654, q=35)
> > > [  140.427024] Task dump for CPU 2:
> > > [  140.427024] cavs_driver     R  running task        0   945    862
> > > 0x00000008
> > > [  140.427024]  ffffffff9b78d965 ffffa527b8bfa640 ffffa527bb505940
> > > ffffa52775c20c50
> > > [  140.427024]  ffffa527bc300000 ffffa527b93d2a80 ffffa527bb857e00
> > > ffffa527bc2ffd78
> > > [  140.427024]  ffffa527b93d2ac8 ffffa527bc2ffcc0 ffffffff9b78dfc8
> > > ffffa527bc2ffd70
> > > [  140.427024] Call Trace:
> > > [  140.427024]  [<ffffffff9b78d965>] ? __schedule+0x245/0x690
> > > [  140.427024]  [<ffffffff9b78dfc8>] ? preempt_schedule_common+0x18/0x30
> > > [  140.427024]  [<ffffffff9b791a68>] ? _raw_spin_lock_irq+0x28/0x30
> > > [  140.427024]  [<ffffffff9b78ed98>] ? wait_for_completion_interruptible
> > > +0x28/0x180
> > > [  140.427024]  [<ffffffffc028541c>] ? sha1_mb_async_digest+0x6c/0x70
> > > [sha1_mb]
> > > [  140.427024]  [<ffffffff9b3b1129>] ? crypto_ahash_op+0x29/0x70
> > > [  140.427024]  [<ffffffffc0270148>] ? kccavs_test_ahash+0x198/0x2b0
> > > [kcapi_cavs]
> > > [  140.427024]  [<ffffffffc026e20a>] ? kccavs_data_read+0xda/0x160
> > > [kcapi_cavs]
> > > [  140.427024]  [<ffffffff9b370964>] ? full_proxy_read+0x54/0x90
> > > [  140.427024]  [<ffffffff9b208088>] ? __vfs_read+0x28/0x110
> > > [  140.427024]  [<ffffffff9b38a2c0>] ? security_file_permission+0xa0/0xc0
> > > [  140.427024]  [<ffffffff9b20850e>] ? rw_verify_area+0x4e/0xb0
> > > [  140.427024]  [<ffffffff9b208606>] ? vfs_read+0x96/0x130
> > > [  140.427024]  [<ffffffff9b2099f6>] ? SyS_read+0x46/0xa0
> > > [  140.427024]  [<ffffffff9b791cb6>] ? entry_SYSCALL_64_fastpath+0x1e/0xa8
> > > 
> > > 
> > > 
> > > Ciao
> > > Stephan
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-crypto" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > 
> > 
> > Ciao
> > Stephan
> 

^ permalink raw reply

* Re: sha1_mb broken
From: Stephan Mueller @ 2016-09-28 18:45 UTC (permalink / raw)
  To: Megha Dey; +Cc: linux-crypto, tim.c.chen
In-Reply-To: <1475087147.2490.6.camel@megha-Z97X-UD7-TH>

Am Mittwoch, 28. September 2016, 11:25:47 CEST schrieb Megha Dey:

Hi Megha,

> Hi Stephan,
> 
> There was a bug fix: Commit ID : 0851561d (introduced in 4.6-rc5).

I use the current cryptodev-2.6 tree.
> 
> Assuming that you are using an older kernel than this one, maybe we are
> issuing the complete with the wrong pointer, so the original issuer of
> the request never gets the complete back.
> 
> If you are using an older kernel, can you please use the lastest and let
> me know if you still see this issue?
> 
> Also can you give more info on the test case? Does it issue single
> request or multiple requests?

It is a single test with the message "8c899bba" where I expect 
"ac6d8c4851beacf61c175aed0699053d8f632df8"

The code is the following which works with any other hash:

struct kccavs_ahash_def {
	struct crypto_ahash *tfm;
	struct ahash_request *req;
	struct kccavs_tcrypt_res result;
};

/* Callback function */
static void kccavs_ahash_cb(struct crypto_async_request *req, int error)
{
	struct kccavs_tcrypt_res *result = req->data;

	if (error == -EINPROGRESS)
		return;
	result->err = error;
	complete(&result->completion);
	dbg("Encryption finished successfully\n");
}

/* Perform hash */
static unsigned int kccavs_ahash_op(struct kccavs_ahash_def *ahash)
{
	int rc = 0;

	rc = crypto_ahash_digest(ahash->req);

	switch (rc) {
	case 0:
		break;
	case -EINPROGRESS:
	case -EBUSY:
		rc = wait_for_completion_interruptible(&ahash->result.completion);
		if (!rc && !ahash->result.err) {
#ifdef OLDASYNC
			INIT_COMPLETION(aead->result.completion);
#else
			reinit_completion(&ahash->result.completion);
#endif
			break;
		}
	default:
		dbg("ahash cipher operation returned with %d result"
		    " %d\n",rc, ahash->result.err);
		break;
	}
	init_completion(&ahash->result.completion);

	return rc;
}

static int kccavs_test_ahash(size_t nbytes)
{
	int ret;
	struct crypto_ahash *tfm;
	struct ahash_request *req = NULL;
	struct kccavs_ahash_def ahash;
	struct kccavs_data *data = &kccavs_test->data;
	struct kccavs_data *key = &kccavs_test->key;
	unsigned char *digest = NULL;
	struct scatterlist sg;

	/*
	 * We explicitly do not check the input buffer as we allow
	 * an empty string.
	 */

	/* allocate synchronous hash */
	tfm = crypto_alloc_ahash(kccavs_test->name, 0, 0);
	if (IS_ERR(tfm)) {
		pr_info("could not allocate digest TFM handle for %s\n", kccavs_test-
>name);
		return PTR_ERR(tfm);
	}

	digest = kzalloc(crypto_ahash_digestsize(tfm), GFP_KERNEL);
	if (!digest) {
		ret = -ENOMEM;
		goto out;
	}

	req = ahash_request_alloc(tfm, GFP_KERNEL);
	if (!req) {
		pr_info("could not allocate request queue\n");
		ret = -ENOMEM;
		goto out;
	}
	ahash.tfm = tfm;
	ahash.req = req;

	if (key->len) {
		dbg("set key for HMAC\n");
		ret = crypto_ahash_setkey(tfm, key->data, key->len);
		if (ret < 0)
			goto out;
	}

	sg_init_one(&sg, data->data, data->len);
	ahash_request_set_crypt(req, &sg, digest, data->len);
	ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
				   kccavs_ahash_cb, &ahash.result);

	ret = kccavs_ahash_op(&ahash);
	if (!ret) {
		data->len = crypto_ahash_digestsize(tfm);
		memcpy(data->data, digest, data->len);
	}

out:
	kzfree(digest);
	if (req)
		ahash_request_free(req);
	crypto_free_ahash(tfm);
	return ret;
}


Ciao
Stephan

^ permalink raw reply

* Re: [PATCH 0/3] Fix crypto/vmx/p8_ghash memory corruption
From: Anton Blanchard @ 2016-09-28 20:59 UTC (permalink / raw)
  To: Marcelo Cerri
  Cc: linux-crypto, Herbert Xu, Leonidas S. Barbosa, linux-kernel,
	Paul Mackerras, Paulo Flabiano Smorigo, George Wilson,
	linuxppc-dev, David S. Miller
In-Reply-To: <1475080931-7926-1-git-send-email-marcelo.cerri@canonical.com>

Hi Marcelo

> This series fixes the memory corruption found by Jan Stancek in
> 4.8-rc7. The problem however also affects previous versions of the
> driver.

If it affects previous versions, please add the lines in the sign off to
get it into the stable kernels.

Anton

^ permalink raw reply

* RE: sha1_mb broken
From: Dey, Megha @ 2016-09-28 22:52 UTC (permalink / raw)
  To: Stephan Mueller; +Cc: linux-crypto@vger.kernel.org, tim.c.chen@linux.intel.com
In-Reply-To: <2200246.hBmZe7vJh4@positron.chronox.de>



-----Original Message-----
From: Stephan Mueller [mailto:smueller@chronox.de] 
Sent: Wednesday, September 28, 2016 11:46 AM
To: Dey, Megha <megha.dey@intel.com>
Cc: linux-crypto@vger.kernel.org; tim.c.chen@linux.intel.com
Subject: Re: sha1_mb broken

Am Mittwoch, 28. September 2016, 11:25:47 CEST schrieb Megha Dey:

Hi Megha,

> Hi Stephan,
> 
> There was a bug fix: Commit ID : 0851561d (introduced in 4.6-rc5).

I use the current cryptodev-2.6 tree.
> 
> Assuming that you are using an older kernel than this one, maybe we 
> are issuing the complete with the wrong pointer, so the original 
> issuer of the request never gets the complete back.
> 
> If you are using an older kernel, can you please use the lastest and 
> let me know if you still see this issue?
> 
> Also can you give more info on the test case? Does it issue single 
> request or multiple requests?

It is a single test with the message "8c899bba" where I expect "ac6d8c4851beacf61c175aed0699053d8f632df8"

>> For the message 8c899bba, the expected hash is  not ac6d8c4851beacf61c175aed0699053d8f632df8 but 
2e2816f6241fef89d78dd7afc926adc03ca94ace. I used this hash value in the existing tcrypt test and the test passes.

I would like to duplicate the test case you are using so that I can have a better look.
Can you provide the complete code for the test case? The code you sent earlier are missing some structure definitions. I will try to incorporate this test in the tcrypt test suite and try to reproduce this issue.

The code is the following which works with any other hash:

struct kccavs_ahash_def {
	struct crypto_ahash *tfm;
	struct ahash_request *req;
	struct kccavs_tcrypt_res result;
};

/* Callback function */
static void kccavs_ahash_cb(struct crypto_async_request *req, int error) {
	struct kccavs_tcrypt_res *result = req->data;

	if (error == -EINPROGRESS)
		return;
	result->err = error;
	complete(&result->completion);
	dbg("Encryption finished successfully\n"); }

/* Perform hash */
static unsigned int kccavs_ahash_op(struct kccavs_ahash_def *ahash) {
	int rc = 0;

	rc = crypto_ahash_digest(ahash->req);

	switch (rc) {
	case 0:
		break;
	case -EINPROGRESS:
	case -EBUSY:
		rc = wait_for_completion_interruptible(&ahash->result.completion);
		if (!rc && !ahash->result.err) {
#ifdef OLDASYNC
			INIT_COMPLETION(aead->result.completion);
#else
			reinit_completion(&ahash->result.completion);
#endif
			break;
		}
	default:
		dbg("ahash cipher operation returned with %d result"
		    " %d\n",rc, ahash->result.err);
		break;
	}
	init_completion(&ahash->result.completion);

	return rc;
}

static int kccavs_test_ahash(size_t nbytes) {
	int ret;
	struct crypto_ahash *tfm;
	struct ahash_request *req = NULL;
	struct kccavs_ahash_def ahash;
	struct kccavs_data *data = &kccavs_test->data;
	struct kccavs_data *key = &kccavs_test->key;
	unsigned char *digest = NULL;
	struct scatterlist sg;

	/*
	 * We explicitly do not check the input buffer as we allow
	 * an empty string.
	 */

	/* allocate synchronous hash */
	tfm = crypto_alloc_ahash(kccavs_test->name, 0, 0);
	if (IS_ERR(tfm)) {
		pr_info("could not allocate digest TFM handle for %s\n", kccavs_test-
>name);
		return PTR_ERR(tfm);
	}

	digest = kzalloc(crypto_ahash_digestsize(tfm), GFP_KERNEL);
	if (!digest) {
		ret = -ENOMEM;
		goto out;
	}

	req = ahash_request_alloc(tfm, GFP_KERNEL);
	if (!req) {
		pr_info("could not allocate request queue\n");
		ret = -ENOMEM;
		goto out;
	}
	ahash.tfm = tfm;
	ahash.req = req;

	if (key->len) {
		dbg("set key for HMAC\n");
		ret = crypto_ahash_setkey(tfm, key->data, key->len);
		if (ret < 0)
			goto out;
	}

	sg_init_one(&sg, data->data, data->len);
	ahash_request_set_crypt(req, &sg, digest, data->len);
	ahash_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
				   kccavs_ahash_cb, &ahash.result);

	ret = kccavs_ahash_op(&ahash);
	if (!ret) {
		data->len = crypto_ahash_digestsize(tfm);
		memcpy(data->data, digest, data->len);
	}

out:
	kzfree(digest);
	if (req)
		ahash_request_free(req);
	crypto_free_ahash(tfm);
	return ret;
}


Ciao
Stephan

^ permalink raw reply

* Re: sha1_mb broken
From: Stephan Mueller @ 2016-09-29  5:30 UTC (permalink / raw)
  To: Dey, Megha; +Cc: linux-crypto@vger.kernel.org, tim.c.chen@linux.intel.com
In-Reply-To: <C440BA31B54DCD4AAC682D2365C8FEE703CEF203@ORSMSX111.amr.corp.intel.com>

[-- Attachment #1: Type: text/plain, Size: 136 bytes --]

Am Mittwoch, 28. September 2016, 22:52:46 CEST schrieb Dey, Megha:

Hi Megha,

see a self contained example code attached.

Ciao
Stephan

[-- Attachment #2: sha1_mb.tar.xz --]
[-- Type: application/x-xz-compressed-tar, Size: 2092 bytes --]

^ permalink raw reply

* [PATCH v9 0/8] crypto: asynchronous compression api
From: Giovanni Cabiddu @ 2016-09-29 10:18 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, Giovanni Cabiddu

The following patch set introduces acomp, a generic asynchronous
(de)compression api with support for SG lists.
We propose a new crypto type called crypto_acomp_type, a new struct acomp_alg
and struct crypto_acomp, together with number of helper functions to register
acomp type algorithms and allocate tfm instances.
This interface will allow the following operations:

    int (*compress)(struct acomp_req *req);
    int (*decompress)(struct acomp_req *req);

Together with acomp we propose a new driver-side interface, scomp, which
handles compression implementations which use linear buffers. We converted all
compression algorithms available in LKCF to use this interface so that those
algorithms will be accessible through the acomp api.

Changes in v9:
    - extended API to allow acomp layer to allocate (and free) output memory
      if not provided by the user
    - extended scomp layer to allocate (and free) output sg list if not
      provided by the user

Changes in v8:
    - centralized per-cpu scratch buffers handling in scomp layer
    - changed scomp internal API to use linear buffer (as in v6)

Changes in v7:
    - removed linearization of SG lists and per-request vmalloc allocations in
      scomp layer
    - modified scomp internal API to use SG lists
    - introduced per-cpu cache of 128K scratch buffers allocated using vmalloc
      in legacy scomp algorithms

Changes in v6:
    - changed acomp_request_alloc prototype by removing gfp parameter.
      acomp_request_alloc will always use GFP_KERNEL

Changes in v5:
    - removed qdecompress api, no longer needed
    - removed produced and consumed counters in acomp_req
    - added crypto_has_acomp function

Changes in v4:
    - added qdecompress api, a front-end for decompression algorithms which
      do not need additional vmalloc work space

Changes in v3:
    - added driver-side scomp interface
    - provided support for lzo, lz4, lz4hc, 842, deflate compression algorithms
      via the acomp api (through scomp)
    - extended testmgr to support acomp
    - removed extended acomp api for supporting deflate algorithm parameters
      (will be enhanced and re-proposed in future)
Note that (2) to (7) are a rework of Joonsoo Kim's scomp patches.

Changes in v2:
    - added compression and decompression request sizes in acomp_alg
      in order to enable noctx support
    - extended api with helpers to allocate compression and
      decompression requests

Changes from initial submit:
    - added consumed and produced fields to acomp_req
    - extended api to support configuration of deflate compressors

---
Giovanni Cabiddu (8):
  crypto: add asynchronous compression api
  crypto: add driver-side scomp interface
  crypto: acomp - add support for lzo via scomp
  crypto: acomp - add support for lz4 via scomp
  crypto: acomp - add support for lz4hc via scomp
  crypto: acomp - add support for 842 via scomp
  crypto: acomp - add support for deflate via scomp
  crypto: acomp - update testmgr with support for acomp

 crypto/842.c                        |  81 +++++++-
 crypto/Kconfig                      |  15 ++
 crypto/Makefile                     |   3 +
 crypto/acompress.c                  | 169 +++++++++++++++++
 crypto/crypto_user.c                |  19 ++
 crypto/deflate.c                    | 111 ++++++++++-
 crypto/lz4.c                        |  91 ++++++++-
 crypto/lz4hc.c                      |  92 +++++++++-
 crypto/lzo.c                        |  97 ++++++++--
 crypto/scompress.c                  | 356 ++++++++++++++++++++++++++++++++++++
 crypto/testmgr.c                    | 158 ++++++++++++++--
 include/crypto/acompress.h          | 269 +++++++++++++++++++++++++++
 include/crypto/internal/acompress.h |  81 ++++++++
 include/crypto/internal/scompress.h | 136 ++++++++++++++
 include/linux/crypto.h              |   3 +
 15 files changed, 1621 insertions(+), 60 deletions(-)
 create mode 100644 crypto/acompress.c
 create mode 100644 crypto/scompress.c
 create mode 100644 include/crypto/acompress.h
 create mode 100644 include/crypto/internal/acompress.h
 create mode 100644 include/crypto/internal/scompress.h

--
2.4.11

^ permalink raw reply

* [PATCH v9 1/8] crypto: add asynchronous compression api
From: Giovanni Cabiddu @ 2016-09-29 10:18 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, Giovanni Cabiddu
In-Reply-To: <1475144342-1157-1-git-send-email-giovanni.cabiddu@intel.com>

Add acomp, an asynchronous compression api that uses scatterlist
buffers.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 crypto/Kconfig                      |  10 ++
 crypto/Makefile                     |   2 +
 crypto/acompress.c                  | 118 +++++++++++++++
 crypto/crypto_user.c                |  19 +++
 include/crypto/acompress.h          | 281 ++++++++++++++++++++++++++++++++++++
 include/crypto/internal/acompress.h |  66 +++++++++
 include/linux/crypto.h              |   1 +
 7 files changed, 497 insertions(+)
 create mode 100644 crypto/acompress.c
 create mode 100644 include/crypto/acompress.h
 create mode 100644 include/crypto/internal/acompress.h

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 84d7148..f553f66 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -102,6 +102,15 @@ config CRYPTO_KPP
 	select CRYPTO_ALGAPI
 	select CRYPTO_KPP2
 
+config CRYPTO_ACOMP2
+	tristate
+	select CRYPTO_ALGAPI2
+
+config CRYPTO_ACOMP
+	tristate
+	select CRYPTO_ALGAPI
+	select CRYPTO_ACOMP2
+
 config CRYPTO_RSA
 	tristate "RSA algorithm"
 	select CRYPTO_AKCIPHER
@@ -138,6 +147,7 @@ config CRYPTO_MANAGER2
 	select CRYPTO_BLKCIPHER2
 	select CRYPTO_AKCIPHER2
 	select CRYPTO_KPP2
+	select CRYPTO_ACOMP2
 
 config CRYPTO_USER
 	tristate "Userspace cryptographic algorithm configuration"
diff --git a/crypto/Makefile b/crypto/Makefile
index 99cc64a..0933dc6 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -50,6 +50,8 @@ rsa_generic-y += rsa_helper.o
 rsa_generic-y += rsa-pkcs1pad.o
 obj-$(CONFIG_CRYPTO_RSA) += rsa_generic.o
 
+obj-$(CONFIG_CRYPTO_ACOMP2) += acompress.o
+
 cryptomgr-y := algboss.o testmgr.o
 
 obj-$(CONFIG_CRYPTO_MANAGER2) += cryptomgr.o
diff --git a/crypto/acompress.c b/crypto/acompress.c
new file mode 100644
index 0000000..f24fef3
--- /dev/null
+++ b/crypto/acompress.c
@@ -0,0 +1,118 @@
+/*
+ * Asynchronous Compression operations
+ *
+ * Copyright (c) 2016, Intel Corporation
+ * Authors: Weigang Li <weigang.li@intel.com>
+ *          Giovanni Cabiddu <giovanni.cabiddu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/crypto.h>
+#include <crypto/algapi.h>
+#include <linux/cryptouser.h>
+#include <net/netlink.h>
+#include <crypto/internal/acompress.h>
+#include "internal.h"
+
+#ifdef CONFIG_NET
+static int crypto_acomp_report(struct sk_buff *skb, struct crypto_alg *alg)
+{
+	struct crypto_report_comp racomp;
+
+	strncpy(racomp.type, "acomp", sizeof(racomp.type));
+
+	if (nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS,
+		    sizeof(struct crypto_report_comp), &racomp))
+		goto nla_put_failure;
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+#else
+static int crypto_acomp_report(struct sk_buff *skb, struct crypto_alg *alg)
+{
+	return -ENOSYS;
+}
+#endif
+
+static void crypto_acomp_show(struct seq_file *m, struct crypto_alg *alg)
+	__attribute__ ((unused));
+
+static void crypto_acomp_show(struct seq_file *m, struct crypto_alg *alg)
+{
+	seq_puts(m, "type         : acomp\n");
+}
+
+static void crypto_acomp_exit_tfm(struct crypto_tfm *tfm)
+{
+	struct crypto_acomp *acomp = __crypto_acomp_tfm(tfm);
+	struct acomp_alg *alg = crypto_acomp_alg(acomp);
+
+	alg->exit(acomp);
+}
+
+static int crypto_acomp_init_tfm(struct crypto_tfm *tfm)
+{
+	struct crypto_acomp *acomp = __crypto_acomp_tfm(tfm);
+	struct acomp_alg *alg = crypto_acomp_alg(acomp);
+
+	if (alg->exit)
+		acomp->base.exit = crypto_acomp_exit_tfm;
+
+	if (alg->init)
+		return alg->init(acomp);
+
+	return 0;
+}
+
+static const struct crypto_type crypto_acomp_type = {
+	.extsize = crypto_alg_extsize,
+	.init_tfm = crypto_acomp_init_tfm,
+#ifdef CONFIG_PROC_FS
+	.show = crypto_acomp_show,
+#endif
+	.report = crypto_acomp_report,
+	.maskclear = ~CRYPTO_ALG_TYPE_MASK,
+	.maskset = CRYPTO_ALG_TYPE_MASK,
+	.type = CRYPTO_ALG_TYPE_ACOMPRESS,
+	.tfmsize = offsetof(struct crypto_acomp, base),
+};
+
+struct crypto_acomp *crypto_alloc_acomp(const char *alg_name, u32 type,
+					u32 mask)
+{
+	return crypto_alloc_tfm(alg_name, &crypto_acomp_type, type, mask);
+}
+EXPORT_SYMBOL_GPL(crypto_alloc_acomp);
+
+int crypto_register_acomp(struct acomp_alg *alg)
+{
+	struct crypto_alg *base = &alg->base;
+
+	base->cra_type = &crypto_acomp_type;
+	base->cra_flags &= ~CRYPTO_ALG_TYPE_MASK;
+	base->cra_flags |= CRYPTO_ALG_TYPE_ACOMPRESS;
+
+	return crypto_register_alg(base);
+}
+EXPORT_SYMBOL_GPL(crypto_register_acomp);
+
+int crypto_unregister_acomp(struct acomp_alg *alg)
+{
+	return crypto_unregister_alg(&alg->base);
+}
+EXPORT_SYMBOL_GPL(crypto_unregister_acomp);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Asynchronous compression type");
diff --git a/crypto/crypto_user.c b/crypto/crypto_user.c
index 1c57054..31b488c 100644
--- a/crypto/crypto_user.c
+++ b/crypto/crypto_user.c
@@ -112,6 +112,21 @@ nla_put_failure:
 	return -EMSGSIZE;
 }
 
+static int crypto_report_acomp(struct sk_buff *skb, struct crypto_alg *alg)
+{
+	struct crypto_report_acomp racomp;
+
+	strncpy(racomp.type, "acomp", sizeof(racomp.type));
+
+	if (nla_put(skb, CRYPTOCFGA_REPORT_ACOMPRESS,
+		    sizeof(struct crypto_report_acomp), &racomp))
+		goto nla_put_failure;
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+
 static int crypto_report_akcipher(struct sk_buff *skb, struct crypto_alg *alg)
 {
 	struct crypto_report_akcipher rakcipher;
@@ -186,7 +201,11 @@ static int crypto_report_one(struct crypto_alg *alg,
 			goto nla_put_failure;
 
 		break;
+	case CRYPTO_ALG_TYPE_ACOMPRESS:
+		if (crypto_report_acomp(skb, alg))
+			goto nla_put_failure;
 
+		break;
 	case CRYPTO_ALG_TYPE_AKCIPHER:
 		if (crypto_report_akcipher(skb, alg))
 			goto nla_put_failure;
diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
new file mode 100644
index 0000000..14c70d8
--- /dev/null
+++ b/include/crypto/acompress.h
@@ -0,0 +1,281 @@
+/*
+ * Asynchronous Compression operations
+ *
+ * Copyright (c) 2016, Intel Corporation
+ * Authors: Weigang Li <weigang.li@intel.com>
+ *          Giovanni Cabiddu <giovanni.cabiddu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+#ifndef _CRYPTO_ACOMP_H
+#define _CRYPTO_ACOMP_H
+#include <linux/crypto.h>
+
+#define CRYPTO_ACOMP_ALLOC_OUTPUT	0x00000001
+
+/**
+ * struct acomp_req - asynchronous (de)compression request
+ *
+ * @base:	Common attributes for asynchronous crypto requests
+ * @src:	Source Data
+ * @dst:	Destination data
+ * @slen:	Size of the input buffer
+ * @dlen:	Size of the output buffer and number of bytes produced
+ * @flags:	Internal flags
+ * @__ctx:	Start of private context data
+ */
+struct acomp_req {
+	struct crypto_async_request base;
+	struct scatterlist *src;
+	struct scatterlist *dst;
+	unsigned int slen;
+	unsigned int dlen;
+	u32 flags;
+	void *__ctx[] CRYPTO_MINALIGN_ATTR;
+};
+
+/**
+ * struct crypto_acomp - user-instantiated objects which encapsulate
+ * algorithms and core processing logic
+ *
+ * @base:	Common crypto API algorithm data structure
+ */
+struct crypto_acomp {
+	struct crypto_tfm base;
+};
+
+/**
+ * struct acomp_alg - asynchronous compression algorithm
+ *
+ * @compress:	Function performs a compress operation
+ * @decompress:	Function performs a de-compress operation
+ * @dst_free:	Frees destination buffer if allocated inside the algorithm
+ * @init:	Initialize the cryptographic transformation object.
+ *		This function is used to initialize the cryptographic
+ *		transformation object. This function is called only once at
+ *		the instantiation time, right after the transformation context
+ *		was allocated. In case the cryptographic hardware has some
+ *		special requirements which need to be handled by software, this
+ *		function shall check for the precise requirement of the
+ *		transformation and put any software fallbacks in place.
+ * @exit:	Deinitialize the cryptographic transformation object. This is a
+ *		counterpart to @init, used to remove various changes set in
+ *		@init.
+ *
+ * @reqsize:	Context size for (de)compression requests
+ * @base:	Common crypto API algorithm data structure
+ */
+struct acomp_alg {
+	int (*compress)(struct acomp_req *req);
+	int (*decompress)(struct acomp_req *req);
+	void (*dst_free)(struct scatterlist *dst);
+	int (*init)(struct crypto_acomp *tfm);
+	void (*exit)(struct crypto_acomp *tfm);
+	unsigned int reqsize;
+	struct crypto_alg base;
+};
+
+/**
+ * DOC: Asynchronous Compression API
+ *
+ * The Asynchronous Compression API is used with the algorithms of type
+ * CRYPTO_ALG_TYPE_ACOMPRESS (listed as type "acomp" in /proc/crypto)
+ */
+
+/**
+ * crypto_alloc_acomp() -- allocate ACOMPRESS tfm handle
+ * @alg_name:	is the cra_name / name or cra_driver_name / driver name of the
+ *		compression algorithm e.g. "deflate"
+ * @type:	specifies the type of the algorithm
+ * @mask:	specifies the mask for the algorithm
+ *
+ * Allocate a handle for a compression algorithm. The returned struct
+ * crypto_acomp is the handle that is required for any subsequent
+ * API invocation for the compression operations.
+ *
+ * Return:	allocated handle in case of success; IS_ERR() is true in case
+ *		of an error, PTR_ERR() returns the error code.
+ */
+struct crypto_acomp *crypto_alloc_acomp(const char *alg_name, u32 type,
+					u32 mask);
+
+static inline struct crypto_tfm *crypto_acomp_tfm(struct crypto_acomp *tfm)
+{
+	return &tfm->base;
+}
+
+static inline struct acomp_alg *__crypto_acomp_alg(struct crypto_alg *alg)
+{
+	return container_of(alg, struct acomp_alg, base);
+}
+
+static inline struct crypto_acomp *__crypto_acomp_tfm(struct crypto_tfm *tfm)
+{
+	return container_of(tfm, struct crypto_acomp, base);
+}
+
+static inline struct acomp_alg *crypto_acomp_alg(struct crypto_acomp *tfm)
+{
+	return __crypto_acomp_alg(crypto_acomp_tfm(tfm)->__crt_alg);
+}
+
+static inline unsigned int crypto_acomp_reqsize(struct crypto_acomp *tfm)
+{
+	return crypto_acomp_alg(tfm)->reqsize;
+}
+
+static inline void acomp_request_set_tfm(struct acomp_req *req,
+					 struct crypto_acomp *tfm)
+{
+	req->base.tfm = crypto_acomp_tfm(tfm);
+}
+
+static inline struct crypto_acomp *crypto_acomp_reqtfm(struct acomp_req *req)
+{
+	return __crypto_acomp_tfm(req->base.tfm);
+}
+
+/**
+ * crypto_free_acomp() -- free ACOMPRESS tfm handle
+ *
+ * @tfm:	ACOMPRESS tfm handle allocated with crypto_alloc_acomp()
+ */
+static inline void crypto_free_acomp(struct crypto_acomp *tfm)
+{
+	crypto_destroy_tfm(tfm, crypto_acomp_tfm(tfm));
+}
+
+static inline int crypto_has_acomp(const char *alg_name, u32 type, u32 mask)
+{
+	type &= ~CRYPTO_ALG_TYPE_MASK;
+	type |= CRYPTO_ALG_TYPE_ACOMPRESS;
+	mask |= CRYPTO_ALG_TYPE_MASK;
+
+	return crypto_has_alg(alg_name, type, mask);
+}
+
+/**
+ * acomp_request_alloc() -- allocates asynchronous (de)compression request
+ *
+ * @tfm:	ACOMPRESS tfm handle allocated with crypto_alloc_acomp()
+ *
+ * Return:	allocated handle in case of success or NULL in case of an error
+ */
+static inline struct acomp_req *acomp_request_alloc(struct crypto_acomp *tfm)
+{
+	struct acomp_req *req;
+
+	req = kzalloc(sizeof(*req) + crypto_acomp_reqsize(tfm), GFP_KERNEL);
+	if (likely(req))
+		acomp_request_set_tfm(req, tfm);
+
+	return req;
+}
+
+/**
+ * acomp_request_free() -- zeroize and free asynchronous (de)compression
+ *			   request as well as the output buffer if allocated
+ *			   inside the algorithm
+ *
+ * @req:	request to free
+ */
+static inline void acomp_request_free(struct acomp_req *req)
+{
+	struct crypto_acomp *tfm = crypto_acomp_reqtfm(req);
+	struct acomp_alg *alg = crypto_acomp_alg(tfm);
+
+	if (req->flags & CRYPTO_ACOMP_ALLOC_OUTPUT) {
+		alg->dst_free(req->dst);
+		req->dst = NULL;
+	}
+	kzfree(req);
+}
+
+/**
+ * acomp_request_set_callback() -- Sets an asynchronous callback
+ *
+ * Callback will be called when an asynchronous operation on a given
+ * request is finished.
+ *
+ * @req:	request that the callback will be set for
+ * @flgs:	specify for instance if the operation may backlog
+ * @cmlp:	callback which will be called
+ * @data:	private data used by the caller
+ */
+static inline void acomp_request_set_callback(struct acomp_req *req,
+					      u32 flgs,
+					      crypto_completion_t cmpl,
+					      void *data)
+{
+	req->base.complete = cmpl;
+	req->base.data = data;
+	req->base.flags = flgs;
+}
+
+/**
+ * acomp_request_set_params() -- Sets request parameters
+ *
+ * Sets parameters required by an acomp operation
+ *
+ * @req:	asynchronous compress request
+ * @src:	pointer to input buffer scatterlist
+ * @dst:	pointer to output buffer scatterlist. If this is NULL, the
+ *		acomp layer will allocate the output memory
+ * @slen:	size of the input buffer
+ * @dlen:	size of the output buffer. If dst is NULL, this can be used by
+ *		the user to specify the maximum amount of memory to allocate
+ */
+static inline void acomp_request_set_params(struct acomp_req *req,
+					    struct scatterlist *src,
+					    struct scatterlist *dst,
+					    unsigned int slen,
+					    unsigned int dlen)
+{
+	req->src = src;
+	req->dst = dst;
+	req->slen = slen;
+	req->dlen = dlen;
+
+	if (!req->dst)
+		req->flags |= CRYPTO_ACOMP_ALLOC_OUTPUT;
+}
+
+/**
+ * crypto_acomp_compress() -- Invoke asynchronous compress operation
+ *
+ * Function invokes the asynchronous compress operation
+ *
+ * @req:	asynchronous compress request
+ *
+ * Return:	zero on success; error code in case of error
+ */
+static inline int crypto_acomp_compress(struct acomp_req *req)
+{
+	struct crypto_acomp *tfm = crypto_acomp_reqtfm(req);
+	struct acomp_alg *alg = crypto_acomp_alg(tfm);
+
+	return alg->compress(req);
+}
+
+/**
+ * crypto_acomp_decompress() -- Invoke asynchronous decompress operation
+ *
+ * Function invokes the asynchronous decompress operation
+ *
+ * @req:	asynchronous compress request
+ *
+ * Return:	zero on success; error code in case of error
+ */
+static inline int crypto_acomp_decompress(struct acomp_req *req)
+{
+	struct crypto_acomp *tfm = crypto_acomp_reqtfm(req);
+	struct acomp_alg *alg = crypto_acomp_alg(tfm);
+
+	return alg->decompress(req);
+}
+
+#endif
diff --git a/include/crypto/internal/acompress.h b/include/crypto/internal/acompress.h
new file mode 100644
index 0000000..a9a9000
--- /dev/null
+++ b/include/crypto/internal/acompress.h
@@ -0,0 +1,66 @@
+/*
+ * Asynchronous Compression operations
+ *
+ * Copyright (c) 2016, Intel Corporation
+ * Authors: Weigang Li <weigang.li@intel.com>
+ *          Giovanni Cabiddu <giovanni.cabiddu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+#ifndef _CRYPTO_ACOMP_INT_H
+#define _CRYPTO_ACOMP_INT_H
+#include <crypto/acompress.h>
+
+/*
+ * Transform internal helpers.
+ */
+static inline void *acomp_request_ctx(struct acomp_req *req)
+{
+	return req->__ctx;
+}
+
+static inline void *acomp_tfm_ctx(struct crypto_acomp *tfm)
+{
+	return tfm->base.__crt_ctx;
+}
+
+static inline void acomp_request_complete(struct acomp_req *req,
+					  int err)
+{
+	req->base.complete(&req->base, err);
+}
+
+static inline const char *acomp_alg_name(struct crypto_acomp *tfm)
+{
+	return crypto_acomp_tfm(tfm)->__crt_alg->cra_name;
+}
+
+/**
+ * crypto_register_acomp() -- Register asynchronous compression algorithm
+ *
+ * Function registers an implementation of an asynchronous
+ * compression algorithm
+ *
+ * @alg:	algorithm definition
+ *
+ * Return:	zero on success; error code in case of error
+ */
+int crypto_register_acomp(struct acomp_alg *alg);
+
+/**
+ * crypto_unregister_acomp() -- Unregister asynchronous compression algorithm
+ *
+ * Function unregisters an implementation of an asynchronous
+ * compression algorithm
+ *
+ * @alg:	algorithm definition
+ *
+ * Return:	zero on success; error code in case of error
+ */
+int crypto_unregister_acomp(struct acomp_alg *alg);
+
+#endif
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 7cee555..dc57a05 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -50,6 +50,7 @@
 #define CRYPTO_ALG_TYPE_SKCIPHER	0x00000005
 #define CRYPTO_ALG_TYPE_GIVCIPHER	0x00000006
 #define CRYPTO_ALG_TYPE_KPP		0x00000008
+#define CRYPTO_ALG_TYPE_ACOMPRESS	0x0000000a
 #define CRYPTO_ALG_TYPE_RNG		0x0000000c
 #define CRYPTO_ALG_TYPE_AKCIPHER	0x0000000d
 #define CRYPTO_ALG_TYPE_DIGEST		0x0000000e
-- 
2.4.11

^ permalink raw reply related

* [PATCH v9 2/8] crypto: add driver-side scomp interface
From: Giovanni Cabiddu @ 2016-09-29 10:18 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, Giovanni Cabiddu
In-Reply-To: <1475144342-1157-1-git-send-email-giovanni.cabiddu@intel.com>

Add a synchronous back-end (scomp) to acomp. This allows to easily
expose the already present compression algorithms in LKCF via acomp.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 crypto/Makefile                     |   1 +
 crypto/acompress.c                  |  55 +++++-
 crypto/scompress.c                  | 356 ++++++++++++++++++++++++++++++++++++
 include/crypto/acompress.h          |  42 ++---
 include/crypto/internal/acompress.h |  15 ++
 include/crypto/internal/scompress.h | 136 ++++++++++++++
 include/linux/crypto.h              |   2 +
 7 files changed, 578 insertions(+), 29 deletions(-)
 create mode 100644 crypto/scompress.c
 create mode 100644 include/crypto/internal/scompress.h

diff --git a/crypto/Makefile b/crypto/Makefile
index 0933dc6..5c83f3d 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -51,6 +51,7 @@ rsa_generic-y += rsa-pkcs1pad.o
 obj-$(CONFIG_CRYPTO_RSA) += rsa_generic.o
 
 obj-$(CONFIG_CRYPTO_ACOMP2) += acompress.o
+obj-$(CONFIG_CRYPTO_ACOMP2) += scompress.o
 
 cryptomgr-y := algboss.o testmgr.o
 
diff --git a/crypto/acompress.c b/crypto/acompress.c
index f24fef3..e1aa289 100644
--- a/crypto/acompress.c
+++ b/crypto/acompress.c
@@ -22,8 +22,11 @@
 #include <linux/cryptouser.h>
 #include <net/netlink.h>
 #include <crypto/internal/acompress.h>
+#include <crypto/internal/scompress.h>
 #include "internal.h"
 
+static const struct crypto_type crypto_acomp_type;
+
 #ifdef CONFIG_NET
 static int crypto_acomp_report(struct sk_buff *skb, struct crypto_alg *alg)
 {
@@ -67,6 +70,14 @@ static int crypto_acomp_init_tfm(struct crypto_tfm *tfm)
 	struct crypto_acomp *acomp = __crypto_acomp_tfm(tfm);
 	struct acomp_alg *alg = crypto_acomp_alg(acomp);
 
+	if (tfm->__crt_alg->cra_type != &crypto_acomp_type)
+		return crypto_init_scomp_ops_async(tfm);
+
+	acomp->compress = alg->compress;
+	acomp->decompress = alg->decompress;
+	acomp->dst_free = alg->dst_free;
+	acomp->reqsize = alg->reqsize;
+
 	if (alg->exit)
 		acomp->base.exit = crypto_acomp_exit_tfm;
 
@@ -76,15 +87,25 @@ static int crypto_acomp_init_tfm(struct crypto_tfm *tfm)
 	return 0;
 }
 
+static unsigned int crypto_acomp_extsize(struct crypto_alg *alg)
+{
+	int extsize = crypto_alg_extsize(alg);
+
+	if (alg->cra_type != &crypto_acomp_type)
+		extsize += sizeof(struct crypto_scomp *);
+
+	return extsize;
+}
+
 static const struct crypto_type crypto_acomp_type = {
-	.extsize = crypto_alg_extsize,
+	.extsize = crypto_acomp_extsize,
 	.init_tfm = crypto_acomp_init_tfm,
 #ifdef CONFIG_PROC_FS
 	.show = crypto_acomp_show,
 #endif
 	.report = crypto_acomp_report,
 	.maskclear = ~CRYPTO_ALG_TYPE_MASK,
-	.maskset = CRYPTO_ALG_TYPE_MASK,
+	.maskset = CRYPTO_ALG_TYPE_ACOMPRESS_MASK,
 	.type = CRYPTO_ALG_TYPE_ACOMPRESS,
 	.tfmsize = offsetof(struct crypto_acomp, base),
 };
@@ -96,6 +117,36 @@ struct crypto_acomp *crypto_alloc_acomp(const char *alg_name, u32 type,
 }
 EXPORT_SYMBOL_GPL(crypto_alloc_acomp);
 
+struct acomp_req *acomp_request_alloc(struct crypto_acomp *acomp)
+{
+	struct crypto_tfm *tfm = crypto_acomp_tfm(acomp);
+	struct acomp_req *req;
+
+	req = __acomp_request_alloc(acomp);
+	if (req && (tfm->__crt_alg->cra_type != &crypto_acomp_type))
+		return crypto_acomp_scomp_alloc_ctx(req);
+
+	return req;
+}
+EXPORT_SYMBOL_GPL(acomp_request_alloc);
+
+void acomp_request_free(struct acomp_req *req)
+{
+	struct crypto_acomp *acomp = crypto_acomp_reqtfm(req);
+	struct crypto_tfm *tfm = crypto_acomp_tfm(acomp);
+
+	if (tfm->__crt_alg->cra_type != &crypto_acomp_type)
+		crypto_acomp_scomp_free_ctx(req);
+
+	if (req->flags & CRYPTO_ACOMP_ALLOC_OUTPUT) {
+		acomp->dst_free(req->dst);
+		req->dst = NULL;
+	}
+
+	__acomp_request_free(req);
+}
+EXPORT_SYMBOL_GPL(acomp_request_free);
+
 int crypto_register_acomp(struct acomp_alg *alg)
 {
 	struct crypto_alg *base = &alg->base;
diff --git a/crypto/scompress.c b/crypto/scompress.c
new file mode 100644
index 0000000..35e396d
--- /dev/null
+++ b/crypto/scompress.c
@@ -0,0 +1,356 @@
+/*
+ * Synchronous Compression operations
+ *
+ * Copyright 2015 LG Electronics Inc.
+ * Copyright (c) 2016, Intel Corporation
+ * Author: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/seq_file.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/crypto.h>
+#include <linux/vmalloc.h>
+#include <crypto/algapi.h>
+#include <linux/cryptouser.h>
+#include <net/netlink.h>
+#include <linux/scatterlist.h>
+#include <crypto/scatterwalk.h>
+#include <crypto/internal/acompress.h>
+#include <crypto/internal/scompress.h>
+#include "internal.h"
+
+static const struct crypto_type crypto_scomp_type;
+static void * __percpu *scomp_src_scratches;
+static void * __percpu *scomp_dst_scratches;
+static int scomp_scratch_users;
+static DEFINE_MUTEX(scomp_lock);
+
+#ifdef CONFIG_NET
+static int crypto_scomp_report(struct sk_buff *skb, struct crypto_alg *alg)
+{
+	struct crypto_report_comp rscomp;
+
+	strncpy(rscomp.type, "scomp", sizeof(rscomp.type));
+
+	if (nla_put(skb, CRYPTOCFGA_REPORT_COMPRESS,
+		    sizeof(struct crypto_report_comp), &rscomp))
+		goto nla_put_failure;
+	return 0;
+
+nla_put_failure:
+	return -EMSGSIZE;
+}
+#else
+static int crypto_scomp_report(struct sk_buff *skb, struct crypto_alg *alg)
+{
+	return -ENOSYS;
+}
+#endif
+
+static void crypto_scomp_show(struct seq_file *m, struct crypto_alg *alg)
+	__attribute__ ((unused));
+
+static void crypto_scomp_show(struct seq_file *m, struct crypto_alg *alg)
+{
+	seq_puts(m, "type         : scomp\n");
+}
+
+static int crypto_scomp_init_tfm(struct crypto_tfm *tfm)
+{
+	return 0;
+}
+
+static void crypto_scomp_free_scratches(void * __percpu *scratches)
+{
+	int i;
+
+	if (!scratches)
+		return;
+
+	for_each_possible_cpu(i)
+		vfree(*per_cpu_ptr(scratches, i));
+
+	free_percpu(scratches);
+}
+
+static void * __percpu *crypto_scomp_alloc_scratches(void)
+{
+	void * __percpu *scratches;
+	int i;
+
+	scratches = alloc_percpu(void *);
+	if (!scratches)
+		return NULL;
+
+	for_each_possible_cpu(i) {
+		void *scratch;
+
+		scratch = vmalloc_node(SCOMP_SCRATCH_SIZE, cpu_to_node(i));
+		if (!scratch)
+			goto error;
+		*per_cpu_ptr(scratches, i) = scratch;
+	}
+
+	return scratches;
+
+error:
+	crypto_scomp_free_scratches(scratches);
+	return NULL;
+}
+
+static void crypto_scomp_free_all_scratches(void)
+{
+	if (!--scomp_scratch_users) {
+		crypto_scomp_free_scratches(scomp_src_scratches);
+		crypto_scomp_free_scratches(scomp_dst_scratches);
+		scomp_src_scratches = NULL;
+		scomp_dst_scratches = NULL;
+	}
+}
+
+static int crypto_scomp_alloc_all_scratches(void)
+{
+	if (!scomp_scratch_users++) {
+		scomp_src_scratches = crypto_scomp_alloc_scratches();
+		if (!scomp_src_scratches)
+			return -ENOMEM;
+		scomp_dst_scratches = crypto_scomp_alloc_scratches();
+		if (!scomp_dst_scratches)
+			return -ENOMEM;
+	}
+	return 0;
+}
+
+static void crypto_scomp_sg_free(struct scatterlist *sgl)
+{
+	int i, n;
+	struct page *page;
+
+	if (!sgl)
+		return;
+
+	n = sg_nents(sgl);
+	for_each_sg(sgl, sgl, n, i) {
+		page = sg_page(sgl);
+		if (page)
+			__free_page(page);
+	}
+
+	kfree(sgl);
+}
+
+static struct scatterlist *crypto_scomp_sg_alloc(size_t size, gfp_t gfp)
+{
+	struct scatterlist *sgl;
+	struct page *page;
+	int i, n;
+
+	n = ((size - 1) >> PAGE_SHIFT) + 1;
+
+	sgl = kmalloc_array(n, sizeof(struct scatterlist), gfp);
+	if (!sgl)
+		return NULL;
+
+	sg_init_table(sgl, n);
+
+	for (i = 0; i < n; i++) {
+		page = alloc_page(gfp);
+		if (!page)
+			goto err;
+		sg_set_page(sgl + i, page, PAGE_SIZE, 0);
+	}
+
+	return sgl;
+
+err:
+	sg_mark_end(sgl + i);
+	crypto_scomp_sg_free(sgl);
+	return NULL;
+}
+
+static int scomp_acomp_comp_decomp(struct acomp_req *req, int dir)
+{
+	struct crypto_acomp *tfm = crypto_acomp_reqtfm(req);
+	void **tfm_ctx = acomp_tfm_ctx(tfm);
+	struct crypto_scomp *scomp = *tfm_ctx;
+	void **ctx = acomp_request_ctx(req);
+	const int cpu = get_cpu();
+	u8 *scratch_src = *per_cpu_ptr(scomp_src_scratches, cpu);
+	u8 *scratch_dst = *per_cpu_ptr(scomp_dst_scratches, cpu);
+	int ret;
+
+	if (!req->src || !req->slen || req->slen > SCOMP_SCRATCH_SIZE) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (req->dst && !req->dlen) {
+		ret = -EINVAL;
+		goto out;
+	}
+
+	if (!req->dlen || req->dlen > SCOMP_SCRATCH_SIZE)
+		req->dlen = SCOMP_SCRATCH_SIZE;
+
+	scatterwalk_map_and_copy(scratch_src, req->src, 0, req->slen, 0);
+	if (dir)
+		ret = crypto_scomp_compress(scomp, scratch_src, req->slen,
+					    scratch_dst, &req->dlen, *ctx);
+	else
+		ret = crypto_scomp_decompress(scomp, scratch_src, req->slen,
+					      scratch_dst, &req->dlen, *ctx);
+	if (!ret) {
+		if (!req->dst) {
+			req->dst = crypto_scomp_sg_alloc(req->dlen,
+				   req->base.flags & CRYPTO_TFM_REQ_MAY_SLEEP ?
+				   GFP_KERNEL : GFP_ATOMIC);
+			if (!req->dst)
+				goto out;
+		}
+		scatterwalk_map_and_copy(scratch_dst, req->dst, 0, req->dlen,
+					 1);
+	}
+out:
+	put_cpu();
+	return ret;
+}
+
+static int scomp_acomp_compress(struct acomp_req *req)
+{
+	return scomp_acomp_comp_decomp(req, 1);
+}
+
+static int scomp_acomp_decompress(struct acomp_req *req)
+{
+	return scomp_acomp_comp_decomp(req, 0);
+}
+
+static void crypto_exit_scomp_ops_async(struct crypto_tfm *tfm)
+{
+	struct crypto_scomp **ctx = crypto_tfm_ctx(tfm);
+
+	crypto_free_scomp(*ctx);
+}
+
+int crypto_init_scomp_ops_async(struct crypto_tfm *tfm)
+{
+	struct crypto_alg *calg = tfm->__crt_alg;
+	struct crypto_acomp *crt = __crypto_acomp_tfm(tfm);
+	struct crypto_scomp **ctx = crypto_tfm_ctx(tfm);
+	struct crypto_scomp *scomp;
+
+	if (!crypto_mod_get(calg))
+		return -EAGAIN;
+
+	scomp = crypto_create_tfm(calg, &crypto_scomp_type);
+	if (IS_ERR(scomp)) {
+		crypto_mod_put(calg);
+		return PTR_ERR(scomp);
+	}
+
+	*ctx = scomp;
+	tfm->exit = crypto_exit_scomp_ops_async;
+
+	crt->compress = scomp_acomp_compress;
+	crt->decompress = scomp_acomp_decompress;
+	crt->dst_free = crypto_scomp_sg_free;
+	crt->reqsize = sizeof(void *);
+
+	return 0;
+}
+
+struct acomp_req *crypto_acomp_scomp_alloc_ctx(struct acomp_req *req)
+{
+	struct crypto_acomp *acomp = crypto_acomp_reqtfm(req);
+	struct crypto_tfm *tfm = crypto_acomp_tfm(acomp);
+	struct crypto_scomp **tfm_ctx = crypto_tfm_ctx(tfm);
+	struct crypto_scomp *scomp = *tfm_ctx;
+	void *ctx;
+
+	ctx = crypto_scomp_alloc_ctx(scomp);
+	if (IS_ERR(ctx)) {
+		kfree(req);
+		return NULL;
+	}
+
+	*req->__ctx = ctx;
+
+	return req;
+}
+
+void crypto_acomp_scomp_free_ctx(struct acomp_req *req)
+{
+	struct crypto_acomp *acomp = crypto_acomp_reqtfm(req);
+	struct crypto_tfm *tfm = crypto_acomp_tfm(acomp);
+	struct crypto_scomp **tfm_ctx = crypto_tfm_ctx(tfm);
+	struct crypto_scomp *scomp = *tfm_ctx;
+	void *ctx = *req->__ctx;
+
+	if (ctx)
+		crypto_scomp_free_ctx(scomp, ctx);
+}
+
+static const struct crypto_type crypto_scomp_type = {
+	.extsize = crypto_alg_extsize,
+	.init_tfm = crypto_scomp_init_tfm,
+#ifdef CONFIG_PROC_FS
+	.show = crypto_scomp_show,
+#endif
+	.report = crypto_scomp_report,
+	.maskclear = ~CRYPTO_ALG_TYPE_MASK,
+	.maskset = CRYPTO_ALG_TYPE_MASK,
+	.type = CRYPTO_ALG_TYPE_SCOMPRESS,
+	.tfmsize = offsetof(struct crypto_scomp, base),
+};
+
+int crypto_register_scomp(struct scomp_alg *alg)
+{
+	struct crypto_alg *base = &alg->base;
+	int ret = -ENOMEM;
+
+	mutex_lock(&scomp_lock);
+	if (crypto_scomp_alloc_all_scratches())
+		goto error;
+
+	base->cra_type = &crypto_scomp_type;
+	base->cra_flags &= ~CRYPTO_ALG_TYPE_MASK;
+	base->cra_flags |= CRYPTO_ALG_TYPE_SCOMPRESS;
+
+	ret = crypto_register_alg(base);
+	if (ret)
+		goto error;
+
+	mutex_unlock(&scomp_lock);
+	return ret;
+
+error:
+	crypto_scomp_free_all_scratches();
+	mutex_unlock(&scomp_lock);
+	return ret;
+}
+EXPORT_SYMBOL_GPL(crypto_register_scomp);
+
+int crypto_unregister_scomp(struct scomp_alg *alg)
+{
+	int ret;
+
+	mutex_lock(&scomp_lock);
+	ret = crypto_unregister_alg(&alg->base);
+	crypto_scomp_free_all_scratches();
+	mutex_unlock(&scomp_lock);
+
+	return ret;
+}
+EXPORT_SYMBOL_GPL(crypto_unregister_scomp);
+
+MODULE_LICENSE("GPL");
+MODULE_DESCRIPTION("Synchronous compression type");
diff --git a/include/crypto/acompress.h b/include/crypto/acompress.h
index 14c70d8..e328b52 100644
--- a/include/crypto/acompress.h
+++ b/include/crypto/acompress.h
@@ -42,9 +42,18 @@ struct acomp_req {
  * struct crypto_acomp - user-instantiated objects which encapsulate
  * algorithms and core processing logic
  *
- * @base:	Common crypto API algorithm data structure
+ * @compress:		Function performs a compress operation
+ * @decompress:		Function performs a de-compress operation
+ * @dst_free:		Frees destination buffer if allocated inside the
+ *			algorithm
+ * @reqsize:		Context size for (de)compression requests
+ * @base:		Common crypto API algorithm data structure
  */
 struct crypto_acomp {
+	int (*compress)(struct acomp_req *req);
+	int (*decompress)(struct acomp_req *req);
+	void (*dst_free)(struct scatterlist *dst);
+	unsigned int reqsize;
 	struct crypto_tfm base;
 };
 
@@ -125,7 +134,7 @@ static inline struct acomp_alg *crypto_acomp_alg(struct crypto_acomp *tfm)
 
 static inline unsigned int crypto_acomp_reqsize(struct crypto_acomp *tfm)
 {
-	return crypto_acomp_alg(tfm)->reqsize;
+	return tfm->reqsize;
 }
 
 static inline void acomp_request_set_tfm(struct acomp_req *req,
@@ -165,16 +174,7 @@ static inline int crypto_has_acomp(const char *alg_name, u32 type, u32 mask)
  *
  * Return:	allocated handle in case of success or NULL in case of an error
  */
-static inline struct acomp_req *acomp_request_alloc(struct crypto_acomp *tfm)
-{
-	struct acomp_req *req;
-
-	req = kzalloc(sizeof(*req) + crypto_acomp_reqsize(tfm), GFP_KERNEL);
-	if (likely(req))
-		acomp_request_set_tfm(req, tfm);
-
-	return req;
-}
+struct acomp_req *acomp_request_alloc(struct crypto_acomp *tfm);
 
 /**
  * acomp_request_free() -- zeroize and free asynchronous (de)compression
@@ -183,17 +183,7 @@ static inline struct acomp_req *acomp_request_alloc(struct crypto_acomp *tfm)
  *
  * @req:	request to free
  */
-static inline void acomp_request_free(struct acomp_req *req)
-{
-	struct crypto_acomp *tfm = crypto_acomp_reqtfm(req);
-	struct acomp_alg *alg = crypto_acomp_alg(tfm);
-
-	if (req->flags & CRYPTO_ACOMP_ALLOC_OUTPUT) {
-		alg->dst_free(req->dst);
-		req->dst = NULL;
-	}
-	kzfree(req);
-}
+void acomp_request_free(struct acomp_req *req);
 
 /**
  * acomp_request_set_callback() -- Sets an asynchronous callback
@@ -256,9 +246,8 @@ static inline void acomp_request_set_params(struct acomp_req *req,
 static inline int crypto_acomp_compress(struct acomp_req *req)
 {
 	struct crypto_acomp *tfm = crypto_acomp_reqtfm(req);
-	struct acomp_alg *alg = crypto_acomp_alg(tfm);
 
-	return alg->compress(req);
+	return tfm->compress(req);
 }
 
 /**
@@ -273,9 +262,8 @@ static inline int crypto_acomp_compress(struct acomp_req *req)
 static inline int crypto_acomp_decompress(struct acomp_req *req)
 {
 	struct crypto_acomp *tfm = crypto_acomp_reqtfm(req);
-	struct acomp_alg *alg = crypto_acomp_alg(tfm);
 
-	return alg->decompress(req);
+	return tfm->decompress(req);
 }
 
 #endif
diff --git a/include/crypto/internal/acompress.h b/include/crypto/internal/acompress.h
index a9a9000..1de2b5a 100644
--- a/include/crypto/internal/acompress.h
+++ b/include/crypto/internal/acompress.h
@@ -39,6 +39,21 @@ static inline const char *acomp_alg_name(struct crypto_acomp *tfm)
 	return crypto_acomp_tfm(tfm)->__crt_alg->cra_name;
 }
 
+static inline struct acomp_req *__acomp_request_alloc(struct crypto_acomp *tfm)
+{
+	struct acomp_req *req;
+
+	req = kzalloc(sizeof(*req) + crypto_acomp_reqsize(tfm), GFP_KERNEL);
+	if (likely(req))
+		acomp_request_set_tfm(req, tfm);
+	return req;
+}
+
+static inline void __acomp_request_free(struct acomp_req *req)
+{
+	kzfree(req);
+}
+
 /**
  * crypto_register_acomp() -- Register asynchronous compression algorithm
  *
diff --git a/include/crypto/internal/scompress.h b/include/crypto/internal/scompress.h
new file mode 100644
index 0000000..3fda3c5
--- /dev/null
+++ b/include/crypto/internal/scompress.h
@@ -0,0 +1,136 @@
+/*
+ * Synchronous Compression operations
+ *
+ * Copyright 2015 LG Electronics Inc.
+ * Copyright (c) 2016, Intel Corporation
+ * Author: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+#ifndef _CRYPTO_SCOMP_INT_H
+#define _CRYPTO_SCOMP_INT_H
+#include <linux/crypto.h>
+
+#define SCOMP_SCRATCH_SIZE	131072
+
+struct crypto_scomp {
+	struct crypto_tfm base;
+};
+
+/**
+ * struct scomp_alg - synchronous compression algorithm
+ *
+ * @alloc_ctx:	Function allocates algorithm specific context
+ * @free_ctx:	Function frees context allocated with alloc_ctx
+ * @compress:	Function performs a compress operation
+ * @decompress:	Function performs a de-compress operation
+ * @init:	Initialize the cryptographic transformation object.
+ *		This function is used to initialize the cryptographic
+ *		transformation object. This function is called only once at
+ *		the instantiation time, right after the transformation context
+ *		was allocated. In case the cryptographic hardware has some
+ *		special requirements which need to be handled by software, this
+ *		function shall check for the precise requirement of the
+ *		transformation and put any software fallbacks in place.
+ * @exit:	Deinitialize the cryptographic transformation object. This is a
+ *		counterpart to @init, used to remove various changes set in
+ *		@init.
+ * @base:	Common crypto API algorithm data structure
+ */
+struct scomp_alg {
+	void *(*alloc_ctx)(struct crypto_scomp *tfm);
+	void (*free_ctx)(struct crypto_scomp *tfm, void *ctx);
+	int (*compress)(struct crypto_scomp *tfm, const u8 *src,
+			unsigned int slen, u8 *dst, unsigned int *dlen,
+			void *ctx);
+	int (*decompress)(struct crypto_scomp *tfm, const u8 *src,
+			  unsigned int slen, u8 *dst, unsigned int *dlen,
+			  void *ctx);
+	struct crypto_alg base;
+};
+
+static inline struct scomp_alg *__crypto_scomp_alg(struct crypto_alg *alg)
+{
+	return container_of(alg, struct scomp_alg, base);
+}
+
+static inline struct crypto_scomp *__crypto_scomp_tfm(struct crypto_tfm *tfm)
+{
+	return container_of(tfm, struct crypto_scomp, base);
+}
+
+static inline struct crypto_tfm *crypto_scomp_tfm(struct crypto_scomp *tfm)
+{
+	return &tfm->base;
+}
+
+static inline void crypto_free_scomp(struct crypto_scomp *tfm)
+{
+	crypto_destroy_tfm(tfm, crypto_scomp_tfm(tfm));
+}
+
+static inline struct scomp_alg *crypto_scomp_alg(struct crypto_scomp *tfm)
+{
+	return __crypto_scomp_alg(crypto_scomp_tfm(tfm)->__crt_alg);
+}
+
+static inline void *crypto_scomp_alloc_ctx(struct crypto_scomp *tfm)
+{
+	return crypto_scomp_alg(tfm)->alloc_ctx(tfm);
+}
+
+static inline void crypto_scomp_free_ctx(struct crypto_scomp *tfm,
+					 void *ctx)
+{
+	return crypto_scomp_alg(tfm)->free_ctx(tfm, ctx);
+}
+
+static inline int crypto_scomp_compress(struct crypto_scomp *tfm,
+					const u8 *src, unsigned int slen,
+					u8 *dst, unsigned int *dlen, void *ctx)
+{
+	return crypto_scomp_alg(tfm)->compress(tfm, src, slen, dst, dlen, ctx);
+}
+
+static inline int crypto_scomp_decompress(struct crypto_scomp *tfm,
+					  const u8 *src, unsigned int slen,
+					  u8 *dst, unsigned int *dlen,
+					  void *ctx)
+{
+	return crypto_scomp_alg(tfm)->decompress(tfm, src, slen, dst, dlen,
+						 ctx);
+}
+
+int crypto_init_scomp_ops_async(struct crypto_tfm *tfm);
+struct acomp_req *crypto_acomp_scomp_alloc_ctx(struct acomp_req *req);
+void crypto_acomp_scomp_free_ctx(struct acomp_req *req);
+
+/**
+ * crypto_register_scomp() -- Register synchronous compression algorithm
+ *
+ * Function registers an implementation of a synchronous
+ * compression algorithm
+ *
+ * @alg:	algorithm definition
+ *
+ * Return: zero on success; error code in case of error
+ */
+int crypto_register_scomp(struct scomp_alg *alg);
+
+/**
+ * crypto_unregister_scomp() -- Unregister synchronous compression algorithm
+ *
+ * Function unregisters an implementation of a synchronous
+ * compression algorithm
+ *
+ * @alg:	algorithm definition
+ *
+ * Return: zero on success; error code in case of error
+ */
+int crypto_unregister_scomp(struct scomp_alg *alg);
+
+#endif
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index dc57a05..8348d83 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -51,6 +51,7 @@
 #define CRYPTO_ALG_TYPE_GIVCIPHER	0x00000006
 #define CRYPTO_ALG_TYPE_KPP		0x00000008
 #define CRYPTO_ALG_TYPE_ACOMPRESS	0x0000000a
+#define CRYPTO_ALG_TYPE_SCOMPRESS	0x0000000b
 #define CRYPTO_ALG_TYPE_RNG		0x0000000c
 #define CRYPTO_ALG_TYPE_AKCIPHER	0x0000000d
 #define CRYPTO_ALG_TYPE_DIGEST		0x0000000e
@@ -61,6 +62,7 @@
 #define CRYPTO_ALG_TYPE_HASH_MASK	0x0000000e
 #define CRYPTO_ALG_TYPE_AHASH_MASK	0x0000000e
 #define CRYPTO_ALG_TYPE_BLKCIPHER_MASK	0x0000000c
+#define CRYPTO_ALG_TYPE_ACOMPRESS_MASK	0x0000000e
 
 #define CRYPTO_ALG_LARVAL		0x00000010
 #define CRYPTO_ALG_DEAD			0x00000020
-- 
2.4.11

^ permalink raw reply related

* [PATCH v9 3/8] crypto: acomp - add support for lzo via scomp
From: Giovanni Cabiddu @ 2016-09-29 10:18 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, Giovanni Cabiddu
In-Reply-To: <1475144342-1157-1-git-send-email-giovanni.cabiddu@intel.com>

Add scomp backend for lzo compression algorithm.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 crypto/Kconfig |  1 +
 crypto/lzo.c   | 97 +++++++++++++++++++++++++++++++++++++++++++++++++---------
 2 files changed, 83 insertions(+), 15 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index f553f66..d275591 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1589,6 +1589,7 @@ config CRYPTO_DEFLATE
 config CRYPTO_LZO
 	tristate "LZO compression algorithm"
 	select CRYPTO_ALGAPI
+	select CRYPTO_ACOMP2
 	select LZO_COMPRESS
 	select LZO_DECOMPRESS
 	help
diff --git a/crypto/lzo.c b/crypto/lzo.c
index c3f3dd9..168df78 100644
--- a/crypto/lzo.c
+++ b/crypto/lzo.c
@@ -22,40 +22,55 @@
 #include <linux/vmalloc.h>
 #include <linux/mm.h>
 #include <linux/lzo.h>
+#include <crypto/internal/scompress.h>
 
 struct lzo_ctx {
 	void *lzo_comp_mem;
 };
 
+static void *lzo_alloc_ctx(struct crypto_scomp *tfm)
+{
+	void *ctx;
+
+	ctx = kmalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL | __GFP_NOWARN);
+	if (!ctx)
+		ctx = vmalloc(LZO1X_MEM_COMPRESS);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	return ctx;
+}
+
 static int lzo_init(struct crypto_tfm *tfm)
 {
 	struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	ctx->lzo_comp_mem = kmalloc(LZO1X_MEM_COMPRESS,
-				    GFP_KERNEL | __GFP_NOWARN);
-	if (!ctx->lzo_comp_mem)
-		ctx->lzo_comp_mem = vmalloc(LZO1X_MEM_COMPRESS);
-	if (!ctx->lzo_comp_mem)
+	ctx->lzo_comp_mem = lzo_alloc_ctx(NULL);
+	if (IS_ERR(ctx->lzo_comp_mem))
 		return -ENOMEM;
 
 	return 0;
 }
 
+static void lzo_free_ctx(struct crypto_scomp *tfm, void *ctx)
+{
+	kvfree(ctx);
+}
+
 static void lzo_exit(struct crypto_tfm *tfm)
 {
 	struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	kvfree(ctx->lzo_comp_mem);
+	lzo_free_ctx(NULL, ctx->lzo_comp_mem);
 }
 
-static int lzo_compress(struct crypto_tfm *tfm, const u8 *src,
-			    unsigned int slen, u8 *dst, unsigned int *dlen)
+static int __lzo_compress(const u8 *src, unsigned int slen,
+			  u8 *dst, unsigned int *dlen, void *ctx)
 {
-	struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
 	size_t tmp_len = *dlen; /* size_t(ulong) <-> uint on 64 bit */
 	int err;
 
-	err = lzo1x_1_compress(src, slen, dst, &tmp_len, ctx->lzo_comp_mem);
+	err = lzo1x_1_compress(src, slen, dst, &tmp_len, ctx);
 
 	if (err != LZO_E_OK)
 		return -EINVAL;
@@ -64,8 +79,23 @@ static int lzo_compress(struct crypto_tfm *tfm, const u8 *src,
 	return 0;
 }
 
-static int lzo_decompress(struct crypto_tfm *tfm, const u8 *src,
-			      unsigned int slen, u8 *dst, unsigned int *dlen)
+static int lzo_compress(struct crypto_tfm *tfm, const u8 *src,
+			unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+	struct lzo_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	return __lzo_compress(src, slen, dst, dlen, ctx->lzo_comp_mem);
+}
+
+static int lzo_scompress(struct crypto_scomp *tfm, const u8 *src,
+			 unsigned int slen, u8 *dst, unsigned int *dlen,
+			 void *ctx)
+{
+	return __lzo_compress(src, slen, dst, dlen, ctx);
+}
+
+static int __lzo_decompress(const u8 *src, unsigned int slen,
+			    u8 *dst, unsigned int *dlen)
 {
 	int err;
 	size_t tmp_len = *dlen; /* size_t(ulong) <-> uint on 64 bit */
@@ -77,7 +107,19 @@ static int lzo_decompress(struct crypto_tfm *tfm, const u8 *src,
 
 	*dlen = tmp_len;
 	return 0;
+}
 
+static int lzo_decompress(struct crypto_tfm *tfm, const u8 *src,
+			  unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+	return __lzo_decompress(src, slen, dst, dlen);
+}
+
+static int lzo_sdecompress(struct crypto_scomp *tfm, const u8 *src,
+			   unsigned int slen, u8 *dst, unsigned int *dlen,
+			   void *ctx)
+{
+	return __lzo_decompress(src, slen, dst, dlen);
 }
 
 static struct crypto_alg alg = {
@@ -88,18 +130,43 @@ static struct crypto_alg alg = {
 	.cra_init		= lzo_init,
 	.cra_exit		= lzo_exit,
 	.cra_u			= { .compress = {
-	.coa_compress 		= lzo_compress,
-	.coa_decompress  	= lzo_decompress } }
+	.coa_compress		= lzo_compress,
+	.coa_decompress		= lzo_decompress } }
+};
+
+static struct scomp_alg scomp = {
+	.alloc_ctx		= lzo_alloc_ctx,
+	.free_ctx		= lzo_free_ctx,
+	.compress		= lzo_scompress,
+	.decompress		= lzo_sdecompress,
+	.base			= {
+		.cra_name	= "lzo",
+		.cra_driver_name = "lzo-scomp",
+		.cra_module	 = THIS_MODULE,
+	}
 };
 
 static int __init lzo_mod_init(void)
 {
-	return crypto_register_alg(&alg);
+	int ret;
+
+	ret = crypto_register_alg(&alg);
+	if (ret)
+		return ret;
+
+	ret = crypto_register_scomp(&scomp);
+	if (ret) {
+		crypto_unregister_alg(&alg);
+		return ret;
+	}
+
+	return ret;
 }
 
 static void __exit lzo_mod_fini(void)
 {
 	crypto_unregister_alg(&alg);
+	crypto_unregister_scomp(&scomp);
 }
 
 module_init(lzo_mod_init);
-- 
2.4.11

^ permalink raw reply related

* [PATCH v9 4/8] crypto: acomp - add support for lz4 via scomp
From: Giovanni Cabiddu @ 2016-09-29 10:18 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, Giovanni Cabiddu
In-Reply-To: <1475144342-1157-1-git-send-email-giovanni.cabiddu@intel.com>

Add scomp backend for lz4 compression algorithm.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 crypto/Kconfig |  1 +
 crypto/lz4.c   | 91 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 82 insertions(+), 10 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index d275591..e95cbbd 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1606,6 +1606,7 @@ config CRYPTO_842
 config CRYPTO_LZ4
 	tristate "LZ4 compression algorithm"
 	select CRYPTO_ALGAPI
+	select CRYPTO_ACOMP2
 	select LZ4_COMPRESS
 	select LZ4_DECOMPRESS
 	help
diff --git a/crypto/lz4.c b/crypto/lz4.c
index aefbcea..99c1b2c 100644
--- a/crypto/lz4.c
+++ b/crypto/lz4.c
@@ -23,36 +23,53 @@
 #include <linux/crypto.h>
 #include <linux/vmalloc.h>
 #include <linux/lz4.h>
+#include <crypto/internal/scompress.h>
 
 struct lz4_ctx {
 	void *lz4_comp_mem;
 };
 
+static void *lz4_alloc_ctx(struct crypto_scomp *tfm)
+{
+	void *ctx;
+
+	ctx = vmalloc(LZ4_MEM_COMPRESS);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	return ctx;
+}
+
 static int lz4_init(struct crypto_tfm *tfm)
 {
 	struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	ctx->lz4_comp_mem = vmalloc(LZ4_MEM_COMPRESS);
-	if (!ctx->lz4_comp_mem)
+	ctx->lz4_comp_mem = lz4_alloc_ctx(NULL);
+	if (IS_ERR(ctx->lz4_comp_mem))
 		return -ENOMEM;
 
 	return 0;
 }
 
+static void lz4_free_ctx(struct crypto_scomp *tfm, void *ctx)
+{
+	vfree(ctx);
+}
+
 static void lz4_exit(struct crypto_tfm *tfm)
 {
 	struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
-	vfree(ctx->lz4_comp_mem);
+
+	lz4_free_ctx(NULL, ctx->lz4_comp_mem);
 }
 
-static int lz4_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
-			    unsigned int slen, u8 *dst, unsigned int *dlen)
+static int __lz4_compress_crypto(const u8 *src, unsigned int slen,
+				 u8 *dst, unsigned int *dlen, void *ctx)
 {
-	struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
 	size_t tmp_len = *dlen;
 	int err;
 
-	err = lz4_compress(src, slen, dst, &tmp_len, ctx->lz4_comp_mem);
+	err = lz4_compress(src, slen, dst, &tmp_len, ctx);
 
 	if (err < 0)
 		return -EINVAL;
@@ -61,8 +78,23 @@ static int lz4_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
 	return 0;
 }
 
-static int lz4_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
-			      unsigned int slen, u8 *dst, unsigned int *dlen)
+static int lz4_scompress(struct crypto_scomp *tfm, const u8 *src,
+			 unsigned int slen, u8 *dst, unsigned int *dlen,
+			 void *ctx)
+{
+	return __lz4_compress_crypto(src, slen, dst, dlen, ctx);
+}
+
+static int lz4_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
+			       unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+	struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	return __lz4_compress_crypto(src, slen, dst, dlen, ctx->lz4_comp_mem);
+}
+
+static int __lz4_decompress_crypto(const u8 *src, unsigned int slen,
+				   u8 *dst, unsigned int *dlen, void *ctx)
 {
 	int err;
 	size_t tmp_len = *dlen;
@@ -76,6 +108,20 @@ static int lz4_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
 	return err;
 }
 
+static int lz4_sdecompress(struct crypto_scomp *tfm, const u8 *src,
+			   unsigned int slen, u8 *dst, unsigned int *dlen,
+			   void *ctx)
+{
+	return __lz4_decompress_crypto(src, slen, dst, dlen, NULL);
+}
+
+static int lz4_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
+				 unsigned int slen, u8 *dst,
+				 unsigned int *dlen)
+{
+	return __lz4_decompress_crypto(src, slen, dst, dlen, NULL);
+}
+
 static struct crypto_alg alg_lz4 = {
 	.cra_name		= "lz4",
 	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
@@ -89,14 +135,39 @@ static struct crypto_alg alg_lz4 = {
 	.coa_decompress		= lz4_decompress_crypto } }
 };
 
+static struct scomp_alg scomp = {
+	.alloc_ctx		= lz4_alloc_ctx,
+	.free_ctx		= lz4_free_ctx,
+	.compress		= lz4_scompress,
+	.decompress		= lz4_sdecompress,
+	.base			= {
+		.cra_name	= "lz4",
+		.cra_driver_name = "lz4-scomp",
+		.cra_module	 = THIS_MODULE,
+	}
+};
+
 static int __init lz4_mod_init(void)
 {
-	return crypto_register_alg(&alg_lz4);
+	int ret;
+
+	ret = crypto_register_alg(&alg_lz4);
+	if (ret)
+		return ret;
+
+	ret = crypto_register_scomp(&scomp);
+	if (ret) {
+		crypto_unregister_alg(&alg_lz4);
+		return ret;
+	}
+
+	return ret;
 }
 
 static void __exit lz4_mod_fini(void)
 {
 	crypto_unregister_alg(&alg_lz4);
+	crypto_unregister_scomp(&scomp);
 }
 
 module_init(lz4_mod_init);
-- 
2.4.11

^ permalink raw reply related

* [PATCH v9 5/8] crypto: acomp - add support for lz4hc via scomp
From: Giovanni Cabiddu @ 2016-09-29 10:18 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, Giovanni Cabiddu
In-Reply-To: <1475144342-1157-1-git-send-email-giovanni.cabiddu@intel.com>

Add scomp backend for lz4hc compression algorithm.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 crypto/Kconfig |  1 +
 crypto/lz4hc.c | 92 +++++++++++++++++++++++++++++++++++++++++++++++++++-------
 2 files changed, 83 insertions(+), 10 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index e95cbbd..4258e85 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1615,6 +1615,7 @@ config CRYPTO_LZ4
 config CRYPTO_LZ4HC
 	tristate "LZ4HC compression algorithm"
 	select CRYPTO_ALGAPI
+	select CRYPTO_ACOMP2
 	select LZ4HC_COMPRESS
 	select LZ4_DECOMPRESS
 	help
diff --git a/crypto/lz4hc.c b/crypto/lz4hc.c
index a1d3b5b..75ffc4a 100644
--- a/crypto/lz4hc.c
+++ b/crypto/lz4hc.c
@@ -22,37 +22,53 @@
 #include <linux/crypto.h>
 #include <linux/vmalloc.h>
 #include <linux/lz4.h>
+#include <crypto/internal/scompress.h>
 
 struct lz4hc_ctx {
 	void *lz4hc_comp_mem;
 };
 
+static void *lz4hc_alloc_ctx(struct crypto_scomp *tfm)
+{
+	void *ctx;
+
+	ctx = vmalloc(LZ4HC_MEM_COMPRESS);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	return ctx;
+}
+
 static int lz4hc_init(struct crypto_tfm *tfm)
 {
 	struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	ctx->lz4hc_comp_mem = vmalloc(LZ4HC_MEM_COMPRESS);
-	if (!ctx->lz4hc_comp_mem)
+	ctx->lz4hc_comp_mem = lz4hc_alloc_ctx(NULL);
+	if (IS_ERR(ctx->lz4hc_comp_mem))
 		return -ENOMEM;
 
 	return 0;
 }
 
+static void lz4hc_free_ctx(struct crypto_scomp *tfm, void *ctx)
+{
+	vfree(ctx);
+}
+
 static void lz4hc_exit(struct crypto_tfm *tfm)
 {
 	struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
 
-	vfree(ctx->lz4hc_comp_mem);
+	lz4hc_free_ctx(NULL, ctx->lz4hc_comp_mem);
 }
 
-static int lz4hc_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
-			    unsigned int slen, u8 *dst, unsigned int *dlen)
+static int __lz4hc_compress_crypto(const u8 *src, unsigned int slen,
+				   u8 *dst, unsigned int *dlen, void *ctx)
 {
-	struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
 	size_t tmp_len = *dlen;
 	int err;
 
-	err = lz4hc_compress(src, slen, dst, &tmp_len, ctx->lz4hc_comp_mem);
+	err = lz4hc_compress(src, slen, dst, &tmp_len, ctx);
 
 	if (err < 0)
 		return -EINVAL;
@@ -61,8 +77,25 @@ static int lz4hc_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
 	return 0;
 }
 
-static int lz4hc_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
-			      unsigned int slen, u8 *dst, unsigned int *dlen)
+static int lz4hc_scompress(struct crypto_scomp *tfm, const u8 *src,
+			   unsigned int slen, u8 *dst, unsigned int *dlen,
+			   void *ctx)
+{
+	return __lz4hc_compress_crypto(src, slen, dst, dlen, ctx);
+}
+
+static int lz4hc_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
+				 unsigned int slen, u8 *dst,
+				 unsigned int *dlen)
+{
+	struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	return __lz4hc_compress_crypto(src, slen, dst, dlen,
+					ctx->lz4hc_comp_mem);
+}
+
+static int __lz4hc_decompress_crypto(const u8 *src, unsigned int slen,
+				     u8 *dst, unsigned int *dlen, void *ctx)
 {
 	int err;
 	size_t tmp_len = *dlen;
@@ -76,6 +109,20 @@ static int lz4hc_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
 	return err;
 }
 
+static int lz4hc_sdecompress(struct crypto_scomp *tfm, const u8 *src,
+			     unsigned int slen, u8 *dst, unsigned int *dlen,
+			     void *ctx)
+{
+	return __lz4hc_decompress_crypto(src, slen, dst, dlen, NULL);
+}
+
+static int lz4hc_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
+				   unsigned int slen, u8 *dst,
+				   unsigned int *dlen)
+{
+	return __lz4hc_decompress_crypto(src, slen, dst, dlen, NULL);
+}
+
 static struct crypto_alg alg_lz4hc = {
 	.cra_name		= "lz4hc",
 	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
@@ -89,14 +136,39 @@ static struct crypto_alg alg_lz4hc = {
 	.coa_decompress		= lz4hc_decompress_crypto } }
 };
 
+static struct scomp_alg scomp = {
+	.alloc_ctx		= lz4hc_alloc_ctx,
+	.free_ctx		= lz4hc_free_ctx,
+	.compress		= lz4hc_scompress,
+	.decompress		= lz4hc_sdecompress,
+	.base			= {
+		.cra_name	= "lz4hc",
+		.cra_driver_name = "lz4hc-scomp",
+		.cra_module	 = THIS_MODULE,
+	}
+};
+
 static int __init lz4hc_mod_init(void)
 {
-	return crypto_register_alg(&alg_lz4hc);
+	int ret;
+
+	ret = crypto_register_alg(&alg_lz4hc);
+	if (ret)
+		return ret;
+
+	ret = crypto_register_scomp(&scomp);
+	if (ret) {
+		crypto_unregister_alg(&alg_lz4hc);
+		return ret;
+	}
+
+	return ret;
 }
 
 static void __exit lz4hc_mod_fini(void)
 {
 	crypto_unregister_alg(&alg_lz4hc);
+	crypto_unregister_scomp(&scomp);
 }
 
 module_init(lz4hc_mod_init);
-- 
2.4.11

^ permalink raw reply related

* [PATCH v9 7/8] crypto: acomp - add support for deflate via scomp
From: Giovanni Cabiddu @ 2016-09-29 10:19 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, Giovanni Cabiddu
In-Reply-To: <1475144342-1157-1-git-send-email-giovanni.cabiddu@intel.com>

Add scomp backend for deflate compression algorithm.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 crypto/Kconfig   |   1 +
 crypto/deflate.c | 111 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 2 files changed, 102 insertions(+), 10 deletions(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index ac7b519..7d4808f 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1578,6 +1578,7 @@ comment "Compression"
 config CRYPTO_DEFLATE
 	tristate "Deflate compression algorithm"
 	select CRYPTO_ALGAPI
+	select CRYPTO_ACOMP2
 	select ZLIB_INFLATE
 	select ZLIB_DEFLATE
 	help
diff --git a/crypto/deflate.c b/crypto/deflate.c
index 95d8d37..f942cb3 100644
--- a/crypto/deflate.c
+++ b/crypto/deflate.c
@@ -32,6 +32,7 @@
 #include <linux/interrupt.h>
 #include <linux/mm.h>
 #include <linux/net.h>
+#include <crypto/internal/scompress.h>
 
 #define DEFLATE_DEF_LEVEL		Z_DEFAULT_COMPRESSION
 #define DEFLATE_DEF_WINBITS		11
@@ -101,9 +102,8 @@ static void deflate_decomp_exit(struct deflate_ctx *ctx)
 	vfree(ctx->decomp_stream.workspace);
 }
 
-static int deflate_init(struct crypto_tfm *tfm)
+static int __deflate_init(void *ctx)
 {
-	struct deflate_ctx *ctx = crypto_tfm_ctx(tfm);
 	int ret;
 
 	ret = deflate_comp_init(ctx);
@@ -116,19 +116,55 @@ out:
 	return ret;
 }
 
-static void deflate_exit(struct crypto_tfm *tfm)
+static void *deflate_alloc_ctx(struct crypto_scomp *tfm)
+{
+	struct deflate_ctx *ctx;
+	int ret;
+
+	ctx = kzalloc(sizeof(*ctx), GFP_KERNEL);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	ret = __deflate_init(ctx);
+	if (ret) {
+		kfree(ctx);
+		return ERR_PTR(ret);
+	}
+
+	return ctx;
+}
+
+static int deflate_init(struct crypto_tfm *tfm)
 {
 	struct deflate_ctx *ctx = crypto_tfm_ctx(tfm);
 
+	return __deflate_init(ctx);
+}
+
+static void __deflate_exit(void *ctx)
+{
 	deflate_comp_exit(ctx);
 	deflate_decomp_exit(ctx);
 }
 
-static int deflate_compress(struct crypto_tfm *tfm, const u8 *src,
-			    unsigned int slen, u8 *dst, unsigned int *dlen)
+static void deflate_free_ctx(struct crypto_scomp *tfm, void *ctx)
+{
+	__deflate_exit(ctx);
+	kzfree(ctx);
+}
+
+static void deflate_exit(struct crypto_tfm *tfm)
+{
+	struct deflate_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	__deflate_exit(ctx);
+}
+
+static int __deflate_compress(const u8 *src, unsigned int slen,
+			      u8 *dst, unsigned int *dlen, void *ctx)
 {
 	int ret = 0;
-	struct deflate_ctx *dctx = crypto_tfm_ctx(tfm);
+	struct deflate_ctx *dctx = ctx;
 	struct z_stream_s *stream = &dctx->comp_stream;
 
 	ret = zlib_deflateReset(stream);
@@ -153,12 +189,27 @@ out:
 	return ret;
 }
 
-static int deflate_decompress(struct crypto_tfm *tfm, const u8 *src,
-			      unsigned int slen, u8 *dst, unsigned int *dlen)
+static int deflate_compress(struct crypto_tfm *tfm, const u8 *src,
+			    unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+	struct deflate_ctx *dctx = crypto_tfm_ctx(tfm);
+
+	return __deflate_compress(src, slen, dst, dlen, dctx);
+}
+
+static int deflate_scompress(struct crypto_scomp *tfm, const u8 *src,
+			     unsigned int slen, u8 *dst, unsigned int *dlen,
+			     void *ctx)
+{
+	return __deflate_compress(src, slen, dst, dlen, ctx);
+}
+
+static int __deflate_decompress(const u8 *src, unsigned int slen,
+				u8 *dst, unsigned int *dlen, void *ctx)
 {
 
 	int ret = 0;
-	struct deflate_ctx *dctx = crypto_tfm_ctx(tfm);
+	struct deflate_ctx *dctx = ctx;
 	struct z_stream_s *stream = &dctx->decomp_stream;
 
 	ret = zlib_inflateReset(stream);
@@ -194,6 +245,21 @@ out:
 	return ret;
 }
 
+static int deflate_decompress(struct crypto_tfm *tfm, const u8 *src,
+			      unsigned int slen, u8 *dst, unsigned int *dlen)
+{
+	struct deflate_ctx *dctx = crypto_tfm_ctx(tfm);
+
+	return __deflate_decompress(src, slen, dst, dlen, dctx);
+}
+
+static int deflate_sdecompress(struct crypto_scomp *tfm, const u8 *src,
+			       unsigned int slen, u8 *dst, unsigned int *dlen,
+			       void *ctx)
+{
+	return __deflate_decompress(src, slen, dst, dlen, ctx);
+}
+
 static struct crypto_alg alg = {
 	.cra_name		= "deflate",
 	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
@@ -206,14 +272,39 @@ static struct crypto_alg alg = {
 	.coa_decompress  	= deflate_decompress } }
 };
 
+static struct scomp_alg scomp = {
+	.alloc_ctx		= deflate_alloc_ctx,
+	.free_ctx		= deflate_free_ctx,
+	.compress		= deflate_scompress,
+	.decompress		= deflate_sdecompress,
+	.base			= {
+		.cra_name	= "deflate",
+		.cra_driver_name = "deflate-scomp",
+		.cra_module	 = THIS_MODULE,
+	}
+};
+
 static int __init deflate_mod_init(void)
 {
-	return crypto_register_alg(&alg);
+	int ret;
+
+	ret = crypto_register_alg(&alg);
+	if (ret)
+		return ret;
+
+	ret = crypto_register_scomp(&scomp);
+	if (ret) {
+		crypto_unregister_alg(&alg);
+		return ret;
+	}
+
+	return ret;
 }
 
 static void __exit deflate_mod_fini(void)
 {
 	crypto_unregister_alg(&alg);
+	crypto_unregister_scomp(&scomp);
 }
 
 module_init(deflate_mod_init);
-- 
2.4.11

^ permalink raw reply related

* [PATCH v9 6/8] crypto: acomp - add support for 842 via scomp
From: Giovanni Cabiddu @ 2016-09-29 10:19 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, Giovanni Cabiddu
In-Reply-To: <1475144342-1157-1-git-send-email-giovanni.cabiddu@intel.com>

Add scomp backend for 842 compression algorithm.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 crypto/842.c   | 81 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
 crypto/Kconfig |  1 +
 2 files changed, 80 insertions(+), 2 deletions(-)

diff --git a/crypto/842.c b/crypto/842.c
index 98e387e..bc26dc9 100644
--- a/crypto/842.c
+++ b/crypto/842.c
@@ -31,11 +31,46 @@
 #include <linux/module.h>
 #include <linux/crypto.h>
 #include <linux/sw842.h>
+#include <crypto/internal/scompress.h>
 
 struct crypto842_ctx {
-	char wmem[SW842_MEM_COMPRESS];	/* working memory for compress */
+	void *wmem;	/* working memory for compress */
 };
 
+static void *crypto842_alloc_ctx(struct crypto_scomp *tfm)
+{
+	void *ctx;
+
+	ctx = kmalloc(SW842_MEM_COMPRESS, GFP_KERNEL);
+	if (!ctx)
+		return ERR_PTR(-ENOMEM);
+
+	return ctx;
+}
+
+static int crypto842_init(struct crypto_tfm *tfm)
+{
+	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	ctx->wmem = crypto842_alloc_ctx(NULL);
+	if (IS_ERR(ctx->wmem))
+		return -ENOMEM;
+
+	return 0;
+}
+
+static void crypto842_free_ctx(struct crypto_scomp *tfm, void *ctx)
+{
+	kfree(ctx);
+}
+
+static void crypto842_exit(struct crypto_tfm *tfm)
+{
+	struct crypto842_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	crypto842_free_ctx(NULL, ctx->wmem);
+}
+
 static int crypto842_compress(struct crypto_tfm *tfm,
 			      const u8 *src, unsigned int slen,
 			      u8 *dst, unsigned int *dlen)
@@ -45,6 +80,13 @@ static int crypto842_compress(struct crypto_tfm *tfm,
 	return sw842_compress(src, slen, dst, dlen, ctx->wmem);
 }
 
+static int crypto842_scompress(struct crypto_scomp *tfm,
+			       const u8 *src, unsigned int slen,
+			       u8 *dst, unsigned int *dlen, void *ctx)
+{
+	return sw842_compress(src, slen, dst, dlen, ctx);
+}
+
 static int crypto842_decompress(struct crypto_tfm *tfm,
 				const u8 *src, unsigned int slen,
 				u8 *dst, unsigned int *dlen)
@@ -52,6 +94,13 @@ static int crypto842_decompress(struct crypto_tfm *tfm,
 	return sw842_decompress(src, slen, dst, dlen);
 }
 
+static int crypto842_sdecompress(struct crypto_scomp *tfm,
+				 const u8 *src, unsigned int slen,
+				 u8 *dst, unsigned int *dlen, void *ctx)
+{
+	return sw842_decompress(src, slen, dst, dlen);
+}
+
 static struct crypto_alg alg = {
 	.cra_name		= "842",
 	.cra_driver_name	= "842-generic",
@@ -59,20 +108,48 @@ static struct crypto_alg alg = {
 	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
 	.cra_ctxsize		= sizeof(struct crypto842_ctx),
 	.cra_module		= THIS_MODULE,
+	.cra_init		= crypto842_init,
+	.cra_exit		= crypto842_exit,
 	.cra_u			= { .compress = {
 	.coa_compress		= crypto842_compress,
 	.coa_decompress		= crypto842_decompress } }
 };
 
+static struct scomp_alg scomp = {
+	.alloc_ctx		= crypto842_alloc_ctx,
+	.free_ctx		= crypto842_free_ctx,
+	.compress		= crypto842_scompress,
+	.decompress		= crypto842_sdecompress,
+	.base			= {
+		.cra_name	= "842",
+		.cra_driver_name = "842-scomp",
+		.cra_priority	 = 100,
+		.cra_module	 = THIS_MODULE,
+	}
+};
+
 static int __init crypto842_mod_init(void)
 {
-	return crypto_register_alg(&alg);
+	int ret;
+
+	ret = crypto_register_alg(&alg);
+	if (ret)
+		return ret;
+
+	ret = crypto_register_scomp(&scomp);
+	if (ret) {
+		crypto_unregister_alg(&alg);
+		return ret;
+	}
+
+	return ret;
 }
 module_init(crypto842_mod_init);
 
 static void __exit crypto842_mod_exit(void)
 {
 	crypto_unregister_alg(&alg);
+	crypto_unregister_scomp(&scomp);
 }
 module_exit(crypto842_mod_exit);
 
diff --git a/crypto/Kconfig b/crypto/Kconfig
index 4258e85..ac7b519 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -1598,6 +1598,7 @@ config CRYPTO_LZO
 config CRYPTO_842
 	tristate "842 compression algorithm"
 	select CRYPTO_ALGAPI
+	select CRYPTO_ACOMP2
 	select 842_COMPRESS
 	select 842_DECOMPRESS
 	help
-- 
2.4.11

^ permalink raw reply related

* [PATCH v9 8/8] crypto: acomp - update testmgr with support for acomp
From: Giovanni Cabiddu @ 2016-09-29 10:19 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, Giovanni Cabiddu
In-Reply-To: <1475144342-1157-1-git-send-email-giovanni.cabiddu@intel.com>

Add tests to the test manager for algorithms exposed through acomp.

Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 crypto/testmgr.c | 158 ++++++++++++++++++++++++++++++++++++++++++++++++++-----
 1 file changed, 145 insertions(+), 13 deletions(-)

diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index 0b01c3d..65a2d3d 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -33,6 +33,7 @@
 #include <crypto/drbg.h>
 #include <crypto/akcipher.h>
 #include <crypto/kpp.h>
+#include <crypto/acompress.h>
 
 #include "internal.h"
 
@@ -1439,6 +1440,121 @@ out:
 	return ret;
 }
 
+static int test_acomp(struct crypto_acomp *tfm, struct comp_testvec *ctemplate,
+		      struct comp_testvec *dtemplate, int ctcount, int dtcount)
+{
+	const char *algo = crypto_tfm_alg_driver_name(crypto_acomp_tfm(tfm));
+	unsigned int i;
+	char output[COMP_BUF_SIZE];
+	int ret;
+	struct scatterlist src, dst;
+	struct acomp_req *req;
+	struct tcrypt_result result;
+
+	for (i = 0; i < ctcount; i++) {
+		unsigned int dlen = COMP_BUF_SIZE;
+		int ilen = ctemplate[i].inlen;
+
+		memset(output, 0, sizeof(output));
+		init_completion(&result.completion);
+		sg_init_one(&src, ctemplate[i].input, ilen);
+		sg_init_one(&dst, output, dlen);
+
+		req = acomp_request_alloc(tfm);
+		if (!req) {
+			pr_err("alg: acomp: request alloc failed for %s\n",
+			       algo);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		acomp_request_set_params(req, &src, &dst, ilen, dlen);
+		acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+					   tcrypt_complete, &result);
+
+		ret = wait_async_op(&result, crypto_acomp_compress(req));
+		if (ret) {
+			pr_err("alg: acomp: compression failed on test %d for %s: ret=%d\n",
+			       i + 1, algo, -ret);
+			acomp_request_free(req);
+			goto out;
+		}
+
+		if (req->dlen != ctemplate[i].outlen) {
+			pr_err("alg: acomp: Compression test %d failed for %s: output len = %d\n",
+			       i + 1, algo, req->dlen);
+			ret = -EINVAL;
+			acomp_request_free(req);
+			goto out;
+		}
+
+		if (memcmp(output, ctemplate[i].output, req->dlen)) {
+			pr_err("alg: acomp: Compression test %d failed for %s\n",
+			       i + 1, algo);
+			hexdump(output, req->dlen);
+			ret = -EINVAL;
+			acomp_request_free(req);
+			goto out;
+		}
+
+		acomp_request_free(req);
+	}
+
+	for (i = 0; i < dtcount; i++) {
+		unsigned int dlen = COMP_BUF_SIZE;
+		int ilen = dtemplate[i].inlen;
+
+		memset(output, 0, sizeof(output));
+		init_completion(&result.completion);
+		sg_init_one(&src, dtemplate[i].input, ilen);
+		sg_init_one(&dst, output, dlen);
+
+		req = acomp_request_alloc(tfm);
+		if (!req) {
+			pr_err("alg: acomp: request alloc failed for %s\n",
+			       algo);
+			ret = -ENOMEM;
+			goto out;
+		}
+
+		acomp_request_set_params(req, &src, &dst, ilen, dlen);
+		acomp_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
+					   tcrypt_complete, &result);
+
+		ret = wait_async_op(&result, crypto_acomp_decompress(req));
+		if (ret) {
+			pr_err("alg: acomp: decompression failed on test %d for %s: ret=%d\n",
+			       i + 1, algo, -ret);
+			acomp_request_free(req);
+			goto out;
+		}
+
+		if (req->dlen != dtemplate[i].outlen) {
+			pr_err("alg: acomp: Decompression test %d failed for %s: output len = %d\n",
+			       i + 1, algo, req->dlen);
+			ret = -EINVAL;
+			acomp_request_free(req);
+			goto out;
+		}
+
+		if (memcmp(output, dtemplate[i].output, req->dlen)) {
+			pr_err("alg: acomp: Decompression test %d failed for %s\n",
+			       i + 1, algo);
+			hexdump(output, req->dlen);
+			ret = -EINVAL;
+			acomp_request_free(req);
+			goto out;
+		}
+
+		acomp_request_free(req);
+	}
+
+	ret = 0;
+
+out:
+	return ret;
+}
+
 static int test_cprng(struct crypto_rng *tfm, struct cprng_testvec *template,
 		      unsigned int tcount)
 {
@@ -1590,22 +1706,38 @@ out:
 static int alg_test_comp(const struct alg_test_desc *desc, const char *driver,
 			 u32 type, u32 mask)
 {
-	struct crypto_comp *tfm;
+	struct crypto_comp *comp;
+	struct crypto_acomp *acomp;
 	int err;
+	u32 algo_type = type & CRYPTO_ALG_TYPE_ACOMPRESS_MASK;
+
+	if (algo_type == CRYPTO_ALG_TYPE_ACOMPRESS) {
+		acomp = crypto_alloc_acomp(driver, type, mask);
+		if (IS_ERR(acomp)) {
+			pr_err("alg: acomp: Failed to load transform for %s: %ld\n",
+			       driver, PTR_ERR(acomp));
+			return PTR_ERR(acomp);
+		}
+		err = test_acomp(acomp, desc->suite.comp.comp.vecs,
+				 desc->suite.comp.decomp.vecs,
+				 desc->suite.comp.comp.count,
+				 desc->suite.comp.decomp.count);
+		crypto_free_acomp(acomp);
+	} else {
+		comp = crypto_alloc_comp(driver, type, mask);
+		if (IS_ERR(comp)) {
+			pr_err("alg: comp: Failed to load transform for %s: %ld\n",
+			       driver, PTR_ERR(comp));
+			return PTR_ERR(comp);
+		}
 
-	tfm = crypto_alloc_comp(driver, type, mask);
-	if (IS_ERR(tfm)) {
-		printk(KERN_ERR "alg: comp: Failed to load transform for %s: "
-		       "%ld\n", driver, PTR_ERR(tfm));
-		return PTR_ERR(tfm);
-	}
-
-	err = test_comp(tfm, desc->suite.comp.comp.vecs,
-			desc->suite.comp.decomp.vecs,
-			desc->suite.comp.comp.count,
-			desc->suite.comp.decomp.count);
+		err = test_comp(comp, desc->suite.comp.comp.vecs,
+				desc->suite.comp.decomp.vecs,
+				desc->suite.comp.comp.count,
+				desc->suite.comp.decomp.count);
 
-	crypto_free_comp(tfm);
+		crypto_free_comp(comp);
+	}
 	return err;
 }
 
-- 
2.4.11

^ permalink raw reply related

* Observed a ecryptFS crash
From: liushuoran @ 2016-09-29 12:29 UTC (permalink / raw)
  To: tyhicks@canonical.com
  Cc: linux-crypto@vger.kernel.org, ecryptfs@vger.kernel.org,
	linux-kernel@vger.kernel.org, Yaodongdong, Wangbintian,
	yingjindong, Xiakaixu, Yezongbo, likan (A)

Hi Tyhicks,

We observed a ecryptFS crash occasionally in Linux kernel 4.1.18. The call trace is attached below. Is it a known issue？ Look forward to hearing from you. Thanks in advance!

[19314.529479s][pid:2694,cpu3,GAC_Executor[0]]Call trace:
[19314.529510s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0000f3898>] do_raw_spin_lock+0x20/0x200
[19314.529510s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc001031fb0>] _raw_spin_lock+0x28/0x34
[19314.529541s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0003908e0>] selinux_inode_free_security+0x3c/0x94
[19314.529541s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc000386b04>] security_inode_free+0x2c/0x38
[19314.529541s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0001fff88>] __destroy_inode+0x2c/0x180
[19314.529571s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc000201660>] destroy_inode+0x30/0xa0
[19314.529571s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0002017d8>] evict+0x108/0x1c0
[19314.529571s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc000202588>] iput+0x184/0x258
[19314.529602s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0002f27d0>] ecryptfs_evict_inode+0x30/0x3c
[19314.529602s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc00020177c>] evict+0xac/0x1c0
[19314.529602s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0002018d4>] dispose_list+0x44/0x5c
[19314.529632s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc000202d28>] evict_inodes+0xcc/0x12c
[19314.529632s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0001e5d04>] generic_shutdown_super+0x58/0xe4
[19314.529632s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0001e7164>] kill_anon_super+0x30/0x74
[19314.529663s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0002f16a4>] ecryptfs_kill_block_super+0x24/0x54
[19314.529663s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0001e6690>] deactivate_locked_super+0x60/0x8c
[19314.529663s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0001e6754>] deactivate_super+0x98/0xa4
[19314.529693s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc000206d54>] cleanup_mnt+0x50/0xd0
[19314.529693s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc000206e48>] __cleanup_mnt+0x20/0x2c
[19314.529693s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0000c1bac>] task_work_run+0xbc/0xf8
[19314.529724s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0000a3e98>] do_exit+0x2d4/0xa14
[19314.529724s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0000a56a8>] do_group_exit+0x60/0xf8
[19314.529724s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc0000b26ac>] get_signal+0x284/0x598
[19314.529754s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc00008950c>] do_signal+0x170/0x5b8
[19314.529754s][pid:2694,cpu3,GAC_Executor[0]][<ffffffc000089be0>] do_notify_resume+0x70/0x78
[19314.529785s][pid:2694,cpu3,GAC_Executor[0]]Code: aa0003f3 aa1e03e0 97fe7718 5289d5a0 (b9400661) 
[19314.529907s][pid:2694,cpu3,GAC_Executor[0]]---[ end trace 382e4b6264b035b5 ]---
[19314.529907s][pid:2694,cpu3,GAC_Executor[0]]Kernel panic - not syncing: Fatal exception

Regards,
Shuoran

^ permalink raw reply

* [PATCH 1/1] crypto: atmel-aes: fix compiler error when VERBOSE_DEBUG is defined
From: Cyrille Pitchen @ 2016-09-29 16:46 UTC (permalink / raw)
  To: herbert, davem, nicolas.ferre, levent.demir
  Cc: linux-crypto, linux-kernel, linux-arm-kernel, Cyrille Pitchen

This patch fixes a compiler error when VERBOSE_DEBUG is defined. Indeed,
in atmel_aes_write(), the 3rd argument of atmel_aes_reg_name() was
missing.

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
Reported-by: Levent Demir <levent.demir@inria.fr>
---
 drivers/crypto/atmel-aes.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/atmel-aes.c b/drivers/crypto/atmel-aes.c
index e3d40a8dfffb..1d9e7bd3f377 100644
--- a/drivers/crypto/atmel-aes.c
+++ b/drivers/crypto/atmel-aes.c
@@ -317,7 +317,7 @@ static inline void atmel_aes_write(struct atmel_aes_dev *dd,
 		char tmp[16];
 
 		dev_vdbg(dd->dev, "write 0x%08x into %s\n", value,
-			 atmel_aes_reg_name(offset, tmp));
+			 atmel_aes_reg_name(offset, tmp, sizeof(tmp)));
 	}
 #endif /* VERBOSE_DEBUG */
 
-- 
2.7.4

^ permalink raw reply related

* [PATCH 1/1] crypto: atmel-aes: add support to the XTS mode
From: Cyrille Pitchen @ 2016-09-29 16:49 UTC (permalink / raw)
  To: herbert, davem, nicolas.ferre
  Cc: linux-crypto, linux-kernel, linux-arm-kernel, levent.demir,
	Cyrille Pitchen

This patch adds the xts(aes) algorithm, which is supported from
hardware version 0x500 and above (sama5d2x).

Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
---
 drivers/crypto/atmel-aes-regs.h |   4 +
 drivers/crypto/atmel-aes.c      | 186 ++++++++++++++++++++++++++++++++++++++--
 2 files changed, 184 insertions(+), 6 deletions(-)

diff --git a/drivers/crypto/atmel-aes-regs.h b/drivers/crypto/atmel-aes-regs.h
index 6c2951bb70b1..0ec04407b533 100644
--- a/drivers/crypto/atmel-aes-regs.h
+++ b/drivers/crypto/atmel-aes-regs.h
@@ -28,6 +28,7 @@
 #define AES_MR_OPMOD_CFB		(0x3 << 12)
 #define AES_MR_OPMOD_CTR		(0x4 << 12)
 #define AES_MR_OPMOD_GCM		(0x5 << 12)
+#define AES_MR_OPMOD_XTS		(0x6 << 12)
 #define AES_MR_LOD				(0x1 << 15)
 #define AES_MR_CFBS_MASK		(0x7 << 16)
 #define AES_MR_CFBS_128b		(0x0 << 16)
@@ -67,6 +68,9 @@
 #define AES_CTRR	0x98
 #define AES_GCMHR(x)	(0x9c + ((x) * 0x04))
 
+#define AES_TWR(x)	(0xc0 + ((x) * 0x04))
+#define AES_ALPHAR(x)	(0xd0 + ((x) * 0x04))
+
 #define AES_HW_VERSION	0xFC
 
 #endif /* __ATMEL_AES_REGS_H__ */
diff --git a/drivers/crypto/atmel-aes.c b/drivers/crypto/atmel-aes.c
index 1d9e7bd3f377..b14c10e98a06 100644
--- a/drivers/crypto/atmel-aes.c
+++ b/drivers/crypto/atmel-aes.c
@@ -68,6 +68,7 @@
 #define AES_FLAGS_CFB8		(AES_MR_OPMOD_CFB | AES_MR_CFBS_8b)
 #define AES_FLAGS_CTR		AES_MR_OPMOD_CTR
 #define AES_FLAGS_GCM		AES_MR_OPMOD_GCM
+#define AES_FLAGS_XTS		AES_MR_OPMOD_XTS
 
 #define AES_FLAGS_MODE_MASK	(AES_FLAGS_OPMODE_MASK |	\
 				 AES_FLAGS_ENCRYPT |		\
@@ -89,6 +90,7 @@ struct atmel_aes_caps {
 	bool			has_cfb64;
 	bool			has_ctr32;
 	bool			has_gcm;
+	bool			has_xts;
 	u32			max_burst_size;
 };
 
@@ -135,6 +137,12 @@ struct atmel_aes_gcm_ctx {
 	atmel_aes_fn_t		ghash_resume;
 };
 
+struct atmel_aes_xts_ctx {
+	struct atmel_aes_base_ctx	base;
+
+	u32			key2[AES_KEYSIZE_256 / sizeof(u32)];
+};
+
 struct atmel_aes_reqctx {
 	unsigned long		mode;
 };
@@ -282,6 +290,20 @@ static const char *atmel_aes_reg_name(u32 offset, char *tmp, size_t sz)
 		snprintf(tmp, sz, "GCMHR[%u]", (offset - AES_GCMHR(0)) >> 2);
 		break;
 
+	case AES_TWR(0):
+	case AES_TWR(1):
+	case AES_TWR(2):
+	case AES_TWR(3):
+		snprintf(tmp, sz, "TWR[%u]", (offset - AES_TWR(0)) >> 2);
+		break;
+
+	case AES_ALPHAR(0):
+	case AES_ALPHAR(1):
+	case AES_ALPHAR(2):
+	case AES_ALPHAR(3):
+		snprintf(tmp, sz, "ALPHAR[%u]", (offset - AES_ALPHAR(0)) >> 2);
+		break;
+
 	default:
 		snprintf(tmp, sz, "0x%02x", offset);
 		break;
@@ -453,15 +475,15 @@ static inline int atmel_aes_complete(struct atmel_aes_dev *dd, int err)
 	return err;
 }
 
-static void atmel_aes_write_ctrl(struct atmel_aes_dev *dd, bool use_dma,
-				 const u32 *iv)
+static void atmel_aes_write_ctrl_key(struct atmel_aes_dev *dd, bool use_dma,
+				     const u32 *iv, const u32 *key, int keylen)
 {
 	u32 valmr = 0;
 
 	/* MR register must be set before IV registers */
-	if (dd->ctx->keylen == AES_KEYSIZE_128)
+	if (keylen == AES_KEYSIZE_128)
 		valmr |= AES_MR_KEYSIZE_128;
-	else if (dd->ctx->keylen == AES_KEYSIZE_192)
+	else if (keylen == AES_KEYSIZE_192)
 		valmr |= AES_MR_KEYSIZE_192;
 	else
 		valmr |= AES_MR_KEYSIZE_256;
@@ -478,13 +500,19 @@ static void atmel_aes_write_ctrl(struct atmel_aes_dev *dd, bool use_dma,
 
 	atmel_aes_write(dd, AES_MR, valmr);
 
-	atmel_aes_write_n(dd, AES_KEYWR(0), dd->ctx->key,
-			  SIZE_IN_WORDS(dd->ctx->keylen));
+	atmel_aes_write_n(dd, AES_KEYWR(0), key, SIZE_IN_WORDS(keylen));
 
 	if (iv && (valmr & AES_MR_OPMOD_MASK) != AES_MR_OPMOD_ECB)
 		atmel_aes_write_block(dd, AES_IVR(0), iv);
 }
 
+static inline void atmel_aes_write_ctrl(struct atmel_aes_dev *dd, bool use_dma,
+					const u32 *iv)
+
+{
+	atmel_aes_write_ctrl_key(dd, use_dma, iv,
+				 dd->ctx->key, dd->ctx->keylen);
+}
 
 /* CPU transfer */
 
@@ -1769,6 +1797,139 @@ static struct aead_alg aes_gcm_alg = {
 };
 
 
+/* xts functions */
+
+static inline struct atmel_aes_xts_ctx *
+atmel_aes_xts_ctx_cast(struct atmel_aes_base_ctx *ctx)
+{
+	return container_of(ctx, struct atmel_aes_xts_ctx, base);
+}
+
+static int atmel_aes_xts_process_data(struct atmel_aes_dev *dd);
+
+static int atmel_aes_xts_start(struct atmel_aes_dev *dd)
+{
+	struct atmel_aes_xts_ctx *ctx = atmel_aes_xts_ctx_cast(dd->ctx);
+	struct ablkcipher_request *req = ablkcipher_request_cast(dd->areq);
+	struct atmel_aes_reqctx *rctx = ablkcipher_request_ctx(req);
+	unsigned long flags;
+	int err;
+
+	atmel_aes_set_mode(dd, rctx);
+
+	err = atmel_aes_hw_init(dd);
+	if (err)
+		return atmel_aes_complete(dd, err);
+
+	/* Compute the tweak value from req->info with ecb(aes). */
+	flags = dd->flags;
+	dd->flags &= ~AES_FLAGS_MODE_MASK;
+	dd->flags |= (AES_FLAGS_ECB | AES_FLAGS_ENCRYPT);
+	atmel_aes_write_ctrl_key(dd, false, NULL,
+				 ctx->key2, ctx->base.keylen);
+	dd->flags = flags;
+
+	atmel_aes_write_block(dd, AES_IDATAR(0), req->info);
+	return atmel_aes_wait_for_data_ready(dd, atmel_aes_xts_process_data);
+}
+
+static int atmel_aes_xts_process_data(struct atmel_aes_dev *dd)
+{
+	struct ablkcipher_request *req = ablkcipher_request_cast(dd->areq);
+	bool use_dma = (req->nbytes >= ATMEL_AES_DMA_THRESHOLD);
+	u32 tweak[AES_BLOCK_SIZE / sizeof(u32)];
+	static const u32 one[AES_BLOCK_SIZE / sizeof(u32)] = {cpu_to_le32(1), };
+	u8 *tweak_bytes = (u8 *)tweak;
+	int i;
+
+	/* Read the computed ciphered tweak value. */
+	atmel_aes_read_block(dd, AES_ODATAR(0), tweak);
+	/*
+	 * Hardware quirk:
+	 * the order of the ciphered tweak bytes need to be reverted before
+	 * writing them into the ODATARx registers.
+	 */
+	for (i = 0; i < AES_BLOCK_SIZE/2; ++i) {
+		u8 tmp = tweak_bytes[AES_BLOCK_SIZE - 1 - i];
+
+		tweak_bytes[AES_BLOCK_SIZE - 1 - i] = tweak_bytes[i];
+		tweak_bytes[i] = tmp;
+	}
+
+	/* Process the data. */
+	atmel_aes_write_ctrl(dd, use_dma, NULL);
+	atmel_aes_write_block(dd, AES_TWR(0), tweak);
+	atmel_aes_write_block(dd, AES_ALPHAR(0), one);
+	if (use_dma)
+		return atmel_aes_dma_start(dd, req->src, req->dst, req->nbytes,
+					   atmel_aes_transfer_complete);
+
+	return atmel_aes_cpu_start(dd, req->src, req->dst, req->nbytes,
+				   atmel_aes_transfer_complete);
+}
+
+static int atmel_aes_xts_setkey(struct crypto_ablkcipher *tfm, const u8 *key,
+				unsigned int keylen)
+{
+	struct atmel_aes_xts_ctx *ctx = crypto_ablkcipher_ctx(tfm);
+
+	if (keylen != AES_KEYSIZE_128 * 2 &&
+	    keylen != AES_KEYSIZE_192 * 2 &&
+	    keylen != AES_KEYSIZE_256 * 2) {
+		crypto_ablkcipher_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
+		return -EINVAL;
+	}
+
+	memcpy(ctx->base.key, key, keylen/2);
+	memcpy(ctx->key2, key + keylen/2, keylen/2);
+	ctx->base.keylen = keylen/2;
+
+	return 0;
+}
+
+static int atmel_aes_xts_encrypt(struct ablkcipher_request *req)
+{
+	return atmel_aes_crypt(req, AES_FLAGS_XTS | AES_FLAGS_ENCRYPT);
+}
+
+static int atmel_aes_xts_decrypt(struct ablkcipher_request *req)
+{
+	return atmel_aes_crypt(req, AES_FLAGS_XTS);
+}
+
+static int atmel_aes_xts_cra_init(struct crypto_tfm *tfm)
+{
+	struct atmel_aes_xts_ctx *ctx = crypto_tfm_ctx(tfm);
+
+	tfm->crt_ablkcipher.reqsize = sizeof(struct atmel_aes_reqctx);
+	ctx->base.start = atmel_aes_xts_start;
+
+	return 0;
+}
+
+static struct crypto_alg aes_xts_alg = {
+	.cra_name		= "xts(aes)",
+	.cra_driver_name	= "atmel-xts-aes",
+	.cra_priority		= ATMEL_AES_PRIORITY,
+	.cra_flags		= CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
+	.cra_blocksize		= AES_BLOCK_SIZE,
+	.cra_ctxsize		= sizeof(struct atmel_aes_xts_ctx),
+	.cra_alignmask		= 0xf,
+	.cra_type		= &crypto_ablkcipher_type,
+	.cra_module		= THIS_MODULE,
+	.cra_init		= atmel_aes_xts_cra_init,
+	.cra_exit		= atmel_aes_cra_exit,
+	.cra_u.ablkcipher = {
+		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
+		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
+		.ivsize		= AES_BLOCK_SIZE,
+		.setkey		= atmel_aes_xts_setkey,
+		.encrypt	= atmel_aes_xts_encrypt,
+		.decrypt	= atmel_aes_xts_decrypt,
+	}
+};
+
+
 /* Probe functions */
 
 static int atmel_aes_buff_init(struct atmel_aes_dev *dd)
@@ -1877,6 +2038,9 @@ static void atmel_aes_unregister_algs(struct atmel_aes_dev *dd)
 {
 	int i;
 
+	if (dd->caps.has_xts)
+		crypto_unregister_alg(&aes_xts_alg);
+
 	if (dd->caps.has_gcm)
 		crypto_unregister_aead(&aes_gcm_alg);
 
@@ -1909,8 +2073,16 @@ static int atmel_aes_register_algs(struct atmel_aes_dev *dd)
 			goto err_aes_gcm_alg;
 	}
 
+	if (dd->caps.has_xts) {
+		err = crypto_register_alg(&aes_xts_alg);
+		if (err)
+			goto err_aes_xts_alg;
+	}
+
 	return 0;
 
+err_aes_xts_alg:
+	crypto_unregister_aead(&aes_gcm_alg);
 err_aes_gcm_alg:
 	crypto_unregister_alg(&aes_cfb64_alg);
 err_aes_cfb64_alg:
@@ -1928,6 +2100,7 @@ static void atmel_aes_get_cap(struct atmel_aes_dev *dd)
 	dd->caps.has_cfb64 = 0;
 	dd->caps.has_ctr32 = 0;
 	dd->caps.has_gcm = 0;
+	dd->caps.has_xts = 0;
 	dd->caps.max_burst_size = 1;
 
 	/* keep only major version number */
@@ -1937,6 +2110,7 @@ static void atmel_aes_get_cap(struct atmel_aes_dev *dd)
 		dd->caps.has_cfb64 = 1;
 		dd->caps.has_ctr32 = 1;
 		dd->caps.has_gcm = 1;
+		dd->caps.has_xts = 1;
 		dd->caps.max_burst_size = 4;
 		break;
 	case 0x200:
-- 
2.7.4

^ permalink raw reply related

* [PATCH] crypto: caam - treat SGT address pointer as u64
From: Tudor Ambarus @ 2016-09-29 14:17 UTC (permalink / raw)
  To: horia.geanta, herbert; +Cc: linux-crypto, fabio.estevam, Tudor Ambarus

Even for i.MX, CAAM is able to use address pointers greater than
32 bits, the address pointer field being interpreted as a double word.
Enforce u64 address pointer in the sec4_sg_entry struct.

This patch fixes the SGT address pointer endianness issue for
32bit platforms where core endianness != caam endianness.

Signed-off-by: Tudor Ambarus <tudor-dan.ambarus@nxp.com>
---
 drivers/crypto/caam/desc.h       | 6 ------
 drivers/crypto/caam/regs.h       | 8 ++++++++
 drivers/crypto/caam/sg_sw_sec4.h | 2 +-
 3 files changed, 9 insertions(+), 7 deletions(-)

diff --git a/drivers/crypto/caam/desc.h b/drivers/crypto/caam/desc.h
index 26427c1..513b664 100644
--- a/drivers/crypto/caam/desc.h
+++ b/drivers/crypto/caam/desc.h
@@ -23,13 +23,7 @@
 #define SEC4_SG_OFFSET_MASK	0x00001fff
 
 struct sec4_sg_entry {
-#if !defined(CONFIG_ARCH_DMA_ADDR_T_64BIT) && \
-	defined(CONFIG_CRYPTO_DEV_FSL_CAAM_IMX)
-	u32 rsvd1;
-	dma_addr_t ptr;
-#else
 	u64 ptr;
-#endif /* CONFIG_CRYPTO_DEV_FSL_CAAM_IMX */
 	u32 len;
 	u32 bpid_offset;
 };
diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
index b3c5016..effbdd8 100644
--- a/drivers/crypto/caam/regs.h
+++ b/drivers/crypto/caam/regs.h
@@ -196,6 +196,14 @@ static inline u64 rd_reg64(void __iomem *reg)
 #define caam_dma_to_cpu(value) caam32_to_cpu(value)
 #endif /* CONFIG_ARCH_DMA_ADDR_T_64BIT  */
 
+#ifdef CONFIG_SOC_IMX7D
+#define cpu_to_caam_dma64(value) \
+		(((u64)cpu_to_caam32(lower_32_bits(value)) << 32) | \
+		 (u64)cpu_to_caam32(upper_32_bits(value)))
+#else
+#define cpu_to_caam_dma64(value) cpu_to_caam64(value)
+#endif
+
 /*
  * jr_outentry
  * Represents each entry in a JobR output ring
diff --git a/drivers/crypto/caam/sg_sw_sec4.h b/drivers/crypto/caam/sg_sw_sec4.h
index 19dc64f..41cd5a3 100644
--- a/drivers/crypto/caam/sg_sw_sec4.h
+++ b/drivers/crypto/caam/sg_sw_sec4.h
@@ -15,7 +15,7 @@ struct sec4_sg_entry;
 static inline void dma_to_sec4_sg_one(struct sec4_sg_entry *sec4_sg_ptr,
 				      dma_addr_t dma, u32 len, u16 offset)
 {
-	sec4_sg_ptr->ptr = cpu_to_caam_dma(dma);
+	sec4_sg_ptr->ptr = cpu_to_caam_dma64(dma);
 	sec4_sg_ptr->len = cpu_to_caam32(len);
 	sec4_sg_ptr->bpid_offset = cpu_to_caam32(offset & SEC4_SG_OFFSET_MASK);
 #ifdef DEBUG
-- 
1.8.3.1

^ permalink raw reply related

* Re: [PATCH] crypto: caam - treat SGT address pointer as u64
From: Fabio Estevam @ 2016-09-29 16:58 UTC (permalink / raw)
  To: Tudor Ambarus; +Cc: horia.geanta, Herbert Xu, linux-crypto, Fabio Estevam
In-Reply-To: <1475158647-16094-1-git-send-email-tudor-dan.ambarus@nxp.com>

Hi Tudor,

On Thu, Sep 29, 2016 at 11:17 AM, Tudor Ambarus
<tudor-dan.ambarus@nxp.com> wrote:

> diff --git a/drivers/crypto/caam/regs.h b/drivers/crypto/caam/regs.h
> index b3c5016..effbdd8 100644
> --- a/drivers/crypto/caam/regs.h
> +++ b/drivers/crypto/caam/regs.h
> @@ -196,6 +196,14 @@ static inline u64 rd_reg64(void __iomem *reg)
>  #define caam_dma_to_cpu(value) caam32_to_cpu(value)
>  #endif /* CONFIG_ARCH_DMA_ADDR_T_64BIT  */
>
> +#ifdef CONFIG_SOC_IMX7D

Why is this restricted to mx7d?

^ permalink raw reply

* Re: [PATCH 1/1] crypto: atmel-aes: add support to the XTS mode
From: Stephan Mueller @ 2016-09-29 17:44 UTC (permalink / raw)
  To: Cyrille Pitchen
  Cc: herbert, davem, nicolas.ferre, linux-crypto, linux-kernel,
	linux-arm-kernel, levent.demir
In-Reply-To: <4bf386be2805a97c59defcd24ee9fb56f190b901.1475167690.git.cyrille.pitchen@atmel.com>

Am Donnerstag, 29. September 2016, 18:49:07 CEST schrieb Cyrille Pitchen:

Hi Cyrille,

> This patch adds the xts(aes) algorithm, which is supported from
> hardware version 0x500 and above (sama5d2x).
> 
> Signed-off-by: Cyrille Pitchen <cyrille.pitchen@atmel.com>
> ---
>  drivers/crypto/atmel-aes-regs.h |   4 +
>  drivers/crypto/atmel-aes.c      | 186
> ++++++++++++++++++++++++++++++++++++++-- 2 files changed, 184
> insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/crypto/atmel-aes-regs.h
> b/drivers/crypto/atmel-aes-regs.h index 6c2951bb70b1..0ec04407b533 100644
> --- a/drivers/crypto/atmel-aes-regs.h
> +++ b/drivers/crypto/atmel-aes-regs.h
> @@ -28,6 +28,7 @@
>  #define AES_MR_OPMOD_CFB		(0x3 << 12)
>  #define AES_MR_OPMOD_CTR		(0x4 << 12)
>  #define AES_MR_OPMOD_GCM		(0x5 << 12)
> +#define AES_MR_OPMOD_XTS		(0x6 << 12)
>  #define AES_MR_LOD				(0x1 << 15)
>  #define AES_MR_CFBS_MASK		(0x7 << 16)
>  #define AES_MR_CFBS_128b		(0x0 << 16)
> @@ -67,6 +68,9 @@
>  #define AES_CTRR	0x98
>  #define AES_GCMHR(x)	(0x9c + ((x) * 0x04))
> 
> +#define AES_TWR(x)	(0xc0 + ((x) * 0x04))
> +#define AES_ALPHAR(x)	(0xd0 + ((x) * 0x04))
> +
>  #define AES_HW_VERSION	0xFC
> 
>  #endif /* __ATMEL_AES_REGS_H__ */
> diff --git a/drivers/crypto/atmel-aes.c b/drivers/crypto/atmel-aes.c
> index 1d9e7bd3f377..b14c10e98a06 100644
> --- a/drivers/crypto/atmel-aes.c
> +++ b/drivers/crypto/atmel-aes.c
> @@ -68,6 +68,7 @@
>  #define AES_FLAGS_CFB8		(AES_MR_OPMOD_CFB | AES_MR_CFBS_8b)
>  #define AES_FLAGS_CTR		AES_MR_OPMOD_CTR
>  #define AES_FLAGS_GCM		AES_MR_OPMOD_GCM
> +#define AES_FLAGS_XTS		AES_MR_OPMOD_XTS
> 
>  #define AES_FLAGS_MODE_MASK	(AES_FLAGS_OPMODE_MASK |	\
>  				 AES_FLAGS_ENCRYPT |		\
> @@ -89,6 +90,7 @@ struct atmel_aes_caps {
>  	bool			has_cfb64;
>  	bool			has_ctr32;
>  	bool			has_gcm;
> +	bool			has_xts;
>  	u32			max_burst_size;
>  };
> 
> @@ -135,6 +137,12 @@ struct atmel_aes_gcm_ctx {
>  	atmel_aes_fn_t		ghash_resume;
>  };
> 
> +struct atmel_aes_xts_ctx {
> +	struct atmel_aes_base_ctx	base;
> +
> +	u32			key2[AES_KEYSIZE_256 / sizeof(u32)];
> +};
> +
>  struct atmel_aes_reqctx {
>  	unsigned long		mode;
>  };
> @@ -282,6 +290,20 @@ static const char *atmel_aes_reg_name(u32 offset, char
> *tmp, size_t sz) snprintf(tmp, sz, "GCMHR[%u]", (offset - AES_GCMHR(0)) >>
> 2);
>  		break;
> 
> +	case AES_TWR(0):
> +	case AES_TWR(1):
> +	case AES_TWR(2):
> +	case AES_TWR(3):
> +		snprintf(tmp, sz, "TWR[%u]", (offset - AES_TWR(0)) >> 2);
> +		break;
> +
> +	case AES_ALPHAR(0):
> +	case AES_ALPHAR(1):
> +	case AES_ALPHAR(2):
> +	case AES_ALPHAR(3):
> +		snprintf(tmp, sz, "ALPHAR[%u]", (offset - AES_ALPHAR(0)) >> 2);
> +		break;
> +
>  	default:
>  		snprintf(tmp, sz, "0x%02x", offset);
>  		break;
> @@ -453,15 +475,15 @@ static inline int atmel_aes_complete(struct
> atmel_aes_dev *dd, int err) return err;
>  }
> 
> -static void atmel_aes_write_ctrl(struct atmel_aes_dev *dd, bool use_dma,
> -				 const u32 *iv)
> +static void atmel_aes_write_ctrl_key(struct atmel_aes_dev *dd, bool
> use_dma, +				     const u32 *iv, const u32 *key, int keylen)
>  {
>  	u32 valmr = 0;
> 
>  	/* MR register must be set before IV registers */
> -	if (dd->ctx->keylen == AES_KEYSIZE_128)
> +	if (keylen == AES_KEYSIZE_128)
>  		valmr |= AES_MR_KEYSIZE_128;
> -	else if (dd->ctx->keylen == AES_KEYSIZE_192)
> +	else if (keylen == AES_KEYSIZE_192)
>  		valmr |= AES_MR_KEYSIZE_192;
>  	else
>  		valmr |= AES_MR_KEYSIZE_256;
> @@ -478,13 +500,19 @@ static void atmel_aes_write_ctrl(struct atmel_aes_dev
> *dd, bool use_dma,
> 
>  	atmel_aes_write(dd, AES_MR, valmr);
> 
> -	atmel_aes_write_n(dd, AES_KEYWR(0), dd->ctx->key,
> -			  SIZE_IN_WORDS(dd->ctx->keylen));
> +	atmel_aes_write_n(dd, AES_KEYWR(0), key, SIZE_IN_WORDS(keylen));
> 
>  	if (iv && (valmr & AES_MR_OPMOD_MASK) != AES_MR_OPMOD_ECB)
>  		atmel_aes_write_block(dd, AES_IVR(0), iv);
>  }
> 
> +static inline void atmel_aes_write_ctrl(struct atmel_aes_dev *dd, bool
> use_dma, +					const u32 *iv)
> +
> +{
> +	atmel_aes_write_ctrl_key(dd, use_dma, iv,
> +				 dd->ctx->key, dd->ctx->keylen);
> +}
> 
>  /* CPU transfer */
> 
> @@ -1769,6 +1797,139 @@ static struct aead_alg aes_gcm_alg = {
>  };
> 
> 
> +/* xts functions */
> +
> +static inline struct atmel_aes_xts_ctx *
> +atmel_aes_xts_ctx_cast(struct atmel_aes_base_ctx *ctx)
> +{
> +	return container_of(ctx, struct atmel_aes_xts_ctx, base);
> +}
> +
> +static int atmel_aes_xts_process_data(struct atmel_aes_dev *dd);
> +
> +static int atmel_aes_xts_start(struct atmel_aes_dev *dd)
> +{
> +	struct atmel_aes_xts_ctx *ctx = atmel_aes_xts_ctx_cast(dd->ctx);
> +	struct ablkcipher_request *req = ablkcipher_request_cast(dd->areq);
> +	struct atmel_aes_reqctx *rctx = ablkcipher_request_ctx(req);
> +	unsigned long flags;
> +	int err;
> +
> +	atmel_aes_set_mode(dd, rctx);
> +
> +	err = atmel_aes_hw_init(dd);
> +	if (err)
> +		return atmel_aes_complete(dd, err);
> +
> +	/* Compute the tweak value from req->info with ecb(aes). */
> +	flags = dd->flags;
> +	dd->flags &= ~AES_FLAGS_MODE_MASK;
> +	dd->flags |= (AES_FLAGS_ECB | AES_FLAGS_ENCRYPT);
> +	atmel_aes_write_ctrl_key(dd, false, NULL,
> +				 ctx->key2, ctx->base.keylen);
> +	dd->flags = flags;
> +
> +	atmel_aes_write_block(dd, AES_IDATAR(0), req->info);
> +	return atmel_aes_wait_for_data_ready(dd, atmel_aes_xts_process_data);
> +}
> +
> +static int atmel_aes_xts_process_data(struct atmel_aes_dev *dd)
> +{
> +	struct ablkcipher_request *req = ablkcipher_request_cast(dd->areq);
> +	bool use_dma = (req->nbytes >= ATMEL_AES_DMA_THRESHOLD);
> +	u32 tweak[AES_BLOCK_SIZE / sizeof(u32)];
> +	static const u32 one[AES_BLOCK_SIZE / sizeof(u32)] = {cpu_to_le32(1), };
> +	u8 *tweak_bytes = (u8 *)tweak;
> +	int i;
> +
> +	/* Read the computed ciphered tweak value. */
> +	atmel_aes_read_block(dd, AES_ODATAR(0), tweak);
> +	/*
> +	 * Hardware quirk:
> +	 * the order of the ciphered tweak bytes need to be reverted before
> +	 * writing them into the ODATARx registers.
> +	 */
> +	for (i = 0; i < AES_BLOCK_SIZE/2; ++i) {
> +		u8 tmp = tweak_bytes[AES_BLOCK_SIZE - 1 - i];
> +
> +		tweak_bytes[AES_BLOCK_SIZE - 1 - i] = tweak_bytes[i];
> +		tweak_bytes[i] = tmp;
> +	}
> +
> +	/* Process the data. */
> +	atmel_aes_write_ctrl(dd, use_dma, NULL);
> +	atmel_aes_write_block(dd, AES_TWR(0), tweak);
> +	atmel_aes_write_block(dd, AES_ALPHAR(0), one);
> +	if (use_dma)
> +		return atmel_aes_dma_start(dd, req->src, req->dst, req->nbytes,
> +					   atmel_aes_transfer_complete);
> +
> +	return atmel_aes_cpu_start(dd, req->src, req->dst, req->nbytes,
> +				   atmel_aes_transfer_complete);
> +}
> +
> +static int atmel_aes_xts_setkey(struct crypto_ablkcipher *tfm, const u8
> *key, +				unsigned int keylen)
> +{
> +	struct atmel_aes_xts_ctx *ctx = crypto_ablkcipher_ctx(tfm);
> +
> +	if (keylen != AES_KEYSIZE_128 * 2 &&
> +	    keylen != AES_KEYSIZE_192 * 2 &&
> +	    keylen != AES_KEYSIZE_256 * 2) {
> +		crypto_ablkcipher_set_flags(tfm, CRYPTO_TFM_RES_BAD_KEY_LEN);
> +		return -EINVAL;
> +	}

Please use xts_check_key as a replacement for this code.
> +
> +	memcpy(ctx->base.key, key, keylen/2);
> +	memcpy(ctx->key2, key + keylen/2, keylen/2);
> +	ctx->base.keylen = keylen/2;
> +
> +	return 0;
> +}
> +
> +static int atmel_aes_xts_encrypt(struct ablkcipher_request *req)
> +{
> +	return atmel_aes_crypt(req, AES_FLAGS_XTS | AES_FLAGS_ENCRYPT);
> +}
> +
> +static int atmel_aes_xts_decrypt(struct ablkcipher_request *req)
> +{
> +	return atmel_aes_crypt(req, AES_FLAGS_XTS);
> +}
> +
> +static int atmel_aes_xts_cra_init(struct crypto_tfm *tfm)
> +{
> +	struct atmel_aes_xts_ctx *ctx = crypto_tfm_ctx(tfm);
> +
> +	tfm->crt_ablkcipher.reqsize = sizeof(struct atmel_aes_reqctx);
> +	ctx->base.start = atmel_aes_xts_start;
> +
> +	return 0;
> +}
> +
> +static struct crypto_alg aes_xts_alg = {
> +	.cra_name		= "xts(aes)",
> +	.cra_driver_name	= "atmel-xts-aes",
> +	.cra_priority		= ATMEL_AES_PRIORITY,
> +	.cra_flags		= CRYPTO_ALG_TYPE_ABLKCIPHER | CRYPTO_ALG_ASYNC,
> +	.cra_blocksize		= AES_BLOCK_SIZE,
> +	.cra_ctxsize		= sizeof(struct atmel_aes_xts_ctx),
> +	.cra_alignmask		= 0xf,
> +	.cra_type		= &crypto_ablkcipher_type,
> +	.cra_module		= THIS_MODULE,
> +	.cra_init		= atmel_aes_xts_cra_init,
> +	.cra_exit		= atmel_aes_cra_exit,
> +	.cra_u.ablkcipher = {
> +		.min_keysize	= 2 * AES_MIN_KEY_SIZE,
> +		.max_keysize	= 2 * AES_MAX_KEY_SIZE,
> +		.ivsize		= AES_BLOCK_SIZE,
> +		.setkey		= atmel_aes_xts_setkey,
> +		.encrypt	= atmel_aes_xts_encrypt,
> +		.decrypt	= atmel_aes_xts_decrypt,
> +	}
> +};
> +
> +
>  /* Probe functions */
> 
>  static int atmel_aes_buff_init(struct atmel_aes_dev *dd)
> @@ -1877,6 +2038,9 @@ static void atmel_aes_unregister_algs(struct
> atmel_aes_dev *dd) {
>  	int i;
> 
> +	if (dd->caps.has_xts)
> +		crypto_unregister_alg(&aes_xts_alg);
> +
>  	if (dd->caps.has_gcm)
>  		crypto_unregister_aead(&aes_gcm_alg);
> 
> @@ -1909,8 +2073,16 @@ static int atmel_aes_register_algs(struct
> atmel_aes_dev *dd) goto err_aes_gcm_alg;
>  	}
> 
> +	if (dd->caps.has_xts) {
> +		err = crypto_register_alg(&aes_xts_alg);
> +		if (err)
> +			goto err_aes_xts_alg;
> +	}
> +
>  	return 0;
> 
> +err_aes_xts_alg:
> +	crypto_unregister_aead(&aes_gcm_alg);
>  err_aes_gcm_alg:
>  	crypto_unregister_alg(&aes_cfb64_alg);
>  err_aes_cfb64_alg:
> @@ -1928,6 +2100,7 @@ static void atmel_aes_get_cap(struct atmel_aes_dev
> *dd) dd->caps.has_cfb64 = 0;
>  	dd->caps.has_ctr32 = 0;
>  	dd->caps.has_gcm = 0;
> +	dd->caps.has_xts = 0;
>  	dd->caps.max_burst_size = 1;
> 
>  	/* keep only major version number */
> @@ -1937,6 +2110,7 @@ static void atmel_aes_get_cap(struct atmel_aes_dev
> *dd) dd->caps.has_cfb64 = 1;
>  		dd->caps.has_ctr32 = 1;
>  		dd->caps.has_gcm = 1;
> +		dd->caps.has_xts = 1;
>  		dd->caps.max_burst_size = 4;
>  		break;
>  	case 0x200:



Ciao
Stephan

^ permalink raw reply

* [PATCH] arm64: add support for SHA256 using NEON instructions
From: Ard Biesheuvel @ 2016-09-29 22:51 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto, herbert
  Cc: appro, victor.chong, daniel.thompson, will.deacon,
	catalin.marinas, Ard Biesheuvel

This is a port of the ARMv7 implementation in arch/arm/crypto. For a Cortex-A57
(r2p1), the performance numbers are listed below. In summary, 40% - 50% speedup
where it counts, i.e., block sizes over 256 bytes with few updates.

testing speed of async sha256 (sha256-generic)
(   16 byte blocks,   16 bytes x   1 updates): 1379992 ops/s,  22079872 Bps
(   64 byte blocks,   16 bytes x   4 updates): 633455 ops/s,  40541120 Bps
(   64 byte blocks,   64 bytes x   1 updates): 738076 ops/s,  47236864 Bps
(  256 byte blocks,   16 bytes x  16 updates): 234420 ops/s,  60011520 Bps
(  256 byte blocks,   64 bytes x   4 updates): 293008 ops/s,  75010048 Bps
(  256 byte blocks,  256 bytes x   1 updates): 309600 ops/s,  79257600 Bps
( 1024 byte blocks,   16 bytes x  64 updates):  66997 ops/s,  68604928 Bps
( 1024 byte blocks,  256 bytes x   4 updates):  91912 ops/s,  94117888 Bps
( 1024 byte blocks, 1024 bytes x   1 updates):  93992 ops/s,  96247808 Bps
( 2048 byte blocks,   16 bytes x 128 updates):  34385 ops/s,  70420480 Bps
( 2048 byte blocks,  256 bytes x   8 updates):  47570 ops/s,  97423360 Bps
( 2048 byte blocks, 1024 bytes x   2 updates):  48557 ops/s,  99444736 Bps
( 2048 byte blocks, 2048 bytes x   1 updates):  48781 ops/s,  99903488 Bps
( 4096 byte blocks,   16 bytes x 256 updates):  17401 ops/s,  71274496 Bps
( 4096 byte blocks,  256 bytes x  16 updates):  24211 ops/s,  99168256 Bps
( 4096 byte blocks, 1024 bytes x   4 updates):  24720 ops/s, 101253120 Bps
( 4096 byte blocks, 4096 bytes x   1 updates):  24930 ops/s, 102113280 Bps
( 8192 byte blocks,   16 bytes x 512 updates):   8738 ops/s,  71581696 Bps
( 8192 byte blocks,  256 bytes x  32 updates):  12214 ops/s, 100057088 Bps
( 8192 byte blocks, 1024 bytes x   8 updates):  12474 ops/s, 102187008 Bps
( 8192 byte blocks, 4096 bytes x   2 updates):  12558 ops/s, 102875136 Bps
( 8192 byte blocks, 8192 bytes x   1 updates):  12555 ops/s, 102850560 Bps

testing speed of async sha256 (sha256-neon)
(   16 byte blocks,   16 bytes x   1 updates): 1802881 ops/s,  28846096 Bps
(   64 byte blocks,   16 bytes x   4 updates): 744861 ops/s,  47671104 Bps
(   64 byte blocks,   64 bytes x   1 updates): 1015413 ops/s,  64986432 Bps
(  256 byte blocks,   16 bytes x  16 updates): 281055 ops/s,  71950080 Bps
(  256 byte blocks,   64 bytes x   4 updates): 378437 ops/s,  96879872 Bps
(  256 byte blocks,  256 bytes x   1 updates): 453325 ops/s, 116051200 Bps
( 1024 byte blocks,   16 bytes x  64 updates):  79809 ops/s,  81724416 Bps
( 1024 byte blocks,  256 bytes x   4 updates): 131621 ops/s, 134779904 Bps
( 1024 byte blocks, 1024 bytes x   1 updates): 140708 ops/s, 144084992 Bps
( 2048 byte blocks,   16 bytes x 128 updates):  40900 ops/s,  83763200 Bps
( 2048 byte blocks,  256 bytes x   8 updates):  68348 ops/s, 139976704 Bps
( 2048 byte blocks, 1024 bytes x   2 updates):  72051 ops/s, 147560448 Bps
( 2048 byte blocks, 2048 bytes x   1 updates):  73358 ops/s, 150237184 Bps
( 4096 byte blocks,   16 bytes x 256 updates):  20746 ops/s,  84975616 Bps
( 4096 byte blocks,  256 bytes x  16 updates):  34842 ops/s, 142712832 Bps
( 4096 byte blocks, 1024 bytes x   4 updates):  36794 ops/s, 150708224 Bps
( 4096 byte blocks, 4096 bytes x   1 updates):  37422 ops/s, 153280512 Bps
( 8192 byte blocks,   16 bytes x 512 updates):  10428 ops/s,  85426176 Bps
( 8192 byte blocks,  256 bytes x  32 updates):  17600 ops/s, 144179200 Bps
( 8192 byte blocks, 1024 bytes x   8 updates):  18594 ops/s, 152322048 Bps
( 8192 byte blocks, 4096 bytes x   2 updates):  18858 ops/s, 154484736 Bps
( 8192 byte blocks, 8192 bytes x   1 updates):  18880 ops/s, 154664960 Bps

testing speed of async sha256 (sha256-ce)
(   16 byte blocks,   16 bytes x   1 updates): 4107417 ops/s,  65718672 Bps
(   64 byte blocks,   16 bytes x   4 updates): 1418054 ops/s,  90755456 Bps
(   64 byte blocks,   64 bytes x   1 updates): 3323045 ops/s, 212674880 Bps
(  256 byte blocks,   16 bytes x  16 updates): 450084 ops/s, 115221504 Bps
(  256 byte blocks,   64 bytes x   4 updates): 1034376 ops/s, 264800256 Bps
(  256 byte blocks,  256 bytes x   1 updates): 1798744 ops/s, 460478464 Bps
( 1024 byte blocks,   16 bytes x  64 updates): 121411 ops/s, 124324864 Bps
( 1024 byte blocks,  256 bytes x   4 updates): 506086 ops/s, 518232064 Bps
( 1024 byte blocks, 1024 bytes x   1 updates): 634485 ops/s, 649712640 Bps
( 2048 byte blocks,   16 bytes x 128 updates):  61520 ops/s, 125992960 Bps
( 2048 byte blocks,  256 bytes x   8 updates): 266787 ops/s, 546379776 Bps
( 2048 byte blocks, 1024 bytes x   2 updates): 316910 ops/s, 649031680 Bps
( 2048 byte blocks, 2048 bytes x   1 updates): 342777 ops/s, 702007296 Bps
( 4096 byte blocks,   16 bytes x 256 updates):  31003 ops/s, 126988288 Bps
( 4096 byte blocks,  256 bytes x  16 updates): 138097 ops/s, 565645312 Bps
( 4096 byte blocks, 1024 bytes x   4 updates): 164319 ops/s, 673050624 Bps
( 4096 byte blocks, 4096 bytes x   1 updates): 176310 ops/s, 722165760 Bps
( 8192 byte blocks,   16 bytes x 512 updates):  15566 ops/s, 127516672 Bps
( 8192 byte blocks,  256 bytes x  32 updates):  69608 ops/s, 570228736 Bps
( 8192 byte blocks, 1024 bytes x   8 updates):  83682 ops/s, 685522944 Bps
( 8192 byte blocks, 4096 bytes x   2 updates):  88813 ops/s, 727556096 Bps
( 8192 byte blocks, 8192 bytes x   1 updates):  88781 ops/s, 727293952 Bps

Ard Biesheuvel (1):
  crypto: arm64/sha256 - add support for SHA256 using NEON instructions

 arch/arm64/crypto/Kconfig               |   5 +
 arch/arm64/crypto/Makefile              |  11 +
 arch/arm64/crypto/sha256-armv4.pl       | 413 +++++++++
 arch/arm64/crypto/sha256-core.S_shipped | 883 ++++++++++++++++++++
 arch/arm64/crypto/sha256_neon_glue.c    | 103 +++
 5 files changed, 1415 insertions(+)
 create mode 100644 arch/arm64/crypto/sha256-armv4.pl
 create mode 100644 arch/arm64/crypto/sha256-core.S_shipped
 create mode 100644 arch/arm64/crypto/sha256_neon_glue.c

-- 
2.7.4

^ permalink raw reply

* [PATCH] crypto: arm64/sha256 - add support for SHA256 using NEON instructions
From: Ard Biesheuvel @ 2016-09-29 22:51 UTC (permalink / raw)
  To: linux-arm-kernel, linux-crypto, herbert
  Cc: appro, victor.chong, daniel.thompson, will.deacon,
	catalin.marinas, Ard Biesheuvel
In-Reply-To: <1475189503-9175-1-git-send-email-ard.biesheuvel@linaro.org>

This is a port to arm64 of the NEON implementation of SHA256 that lives
under arch/arm/crypto.

Due to the fact that the AArch64 assembler dialect deviates from the
32-bit ARM one in ways that makes sharing code problematic, and given
that this version only uses the NEON version whereas the original
implementation supports plain ALU assembler, NEON and Crypto Extensions,
this code is built from a version sha256-armv4.pl that has been
transliterated to the AArch64 NEON dialect.

Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
 arch/arm64/crypto/Kconfig               |   5 +
 arch/arm64/crypto/Makefile              |  11 +
 arch/arm64/crypto/sha256-armv4.pl       | 413 +++++++++
 arch/arm64/crypto/sha256-core.S_shipped | 883 ++++++++++++++++++++
 arch/arm64/crypto/sha256_neon_glue.c    | 103 +++
 5 files changed, 1415 insertions(+)

diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 2cf32e9887e1..d32371198474 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -18,6 +18,11 @@ config CRYPTO_SHA2_ARM64_CE
 	depends on ARM64 && KERNEL_MODE_NEON
 	select CRYPTO_HASH
 
+config CRYPTO_SHA2_ARM64_NEON
+	tristate "SHA-224/SHA-256 digest algorithm (ARMv8 NEON)"
+	depends on ARM64 && KERNEL_MODE_NEON
+	select CRYPTO_HASH
+
 config CRYPTO_GHASH_ARM64_CE
 	tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
 	depends on ARM64 && KERNEL_MODE_NEON
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index abb79b3cfcfe..5156ebee0488 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -29,6 +29,9 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
 obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
 aes-neon-blk-y := aes-glue-neon.o aes-neon.o
 
+obj-$(CONFIG_CRYPTO_SHA2_ARM64_NEON) := sha256-neon.o
+sha256-neon-y := sha256_neon_glue.o sha256-core.o
+
 AFLAGS_aes-ce.o		:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o	:= -DINTERLEAVE=4
 
@@ -40,3 +43,11 @@ CFLAGS_crc32-arm64.o	:= -mcpu=generic+crc
 
 $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
 	$(call if_changed_rule,cc_o_c)
+
+quiet_cmd_perl = PERL    $@
+      cmd_perl = $(PERL) $(<) > $(@)
+
+$(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl
+	$(call cmd,perl)
+
+.PRECIOUS: $(obj)/sha256-core.S
diff --git a/arch/arm64/crypto/sha256-armv4.pl b/arch/arm64/crypto/sha256-armv4.pl
new file mode 100644
index 000000000000..9ff788339b1c
--- /dev/null
+++ b/arch/arm64/crypto/sha256-armv4.pl
@@ -0,0 +1,413 @@
+#!/usr/bin/env perl
+
+#
+# AArch64 port of the OpenSSL SHA256 implementation for ARM NEON
+#
+# Copyright (c) 2016 Linaro Ltd. <ard.biesheuvel@linaro.org>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License version 2 as
+# published by the Free Software Foundation.
+#
+
+# ====================================================================
+# Written by Andy Polyakov <appro@openssl.org> for the OpenSSL
+# project. The module is, however, dual licensed under OpenSSL and
+# CRYPTOGAMS licenses depending on where you obtain it. For further
+# details see http://www.openssl.org/~appro/cryptogams/.
+#
+# Permission to use under GPL terms is granted.
+# ====================================================================
+
+# SHA256 block procedure for ARMv4. May 2007.
+
+# Performance is ~2x better than gcc 3.4 generated code and in "abso-
+# lute" terms is ~2250 cycles per 64-byte block or ~35 cycles per
+# byte [on single-issue Xscale PXA250 core].
+
+# July 2010.
+#
+# Rescheduling for dual-issue pipeline resulted in 22% improvement on
+# Cortex A8 core and ~20 cycles per processed byte.
+
+# February 2011.
+#
+# Profiler-assisted and platform-specific optimization resulted in 16%
+# improvement on Cortex A8 core and ~15.4 cycles per processed byte.
+
+# September 2013.
+#
+# Add NEON implementation. On Cortex A8 it was measured to process one
+# byte in 12.5 cycles or 23% faster than integer-only code. Snapdragon
+# S4 does it in 12.5 cycles too, but it's 50% faster than integer-only
+# code (meaning that latter performs sub-optimally, nothing was done
+# about it).
+
+# May 2014.
+#
+# Add ARMv8 code path performing at 2.0 cpb on Apple A7.
+
+while (($output=shift) && ($output!~/^\w[\w\-]*\.\w+$/)) {}
+open STDOUT,">$output";
+
+$ctx="x0";	$t0="w0";	$xt0="x0";
+$inp="x1";	$t4="w1";	$xt4="x1";
+$len="x2";	$t1="w2";	$xt1="x2";
+		$t3="w3";
+$A="w4";
+$B="w5";
+$C="w6";
+$D="w7";
+$E="w8";
+$F="w9";
+$G="w10";
+$H="w11";
+@V=($A,$B,$C,$D,$E,$F,$G,$H);
+$t2="w12";
+$xt2="x12";
+$Ktbl="x14";
+
+@Sigma0=( 2,13,22);
+@Sigma1=( 6,11,25);
+@sigma0=( 7,18, 3);
+@sigma1=(17,19,10);
+
+######################################################################
+# NEON stuff
+#
+{{{
+my @VB=map("v$_.16b",(0..3));
+my @VS=map("v$_.4s",(0..3));
+
+my ($TS0,$TS1,$TS2,$TS3,$TS4,$TS5,$TS6,$TS7)=("v4.4s","v5.4s","v6.4s","v7.4s","v8.4s","v9.4s","v10.4s","v11.4s");
+my ($TB0,$TB1,$TB2,$TB3,$TB4,$TB5,$TB6,$TB7)=("v4.16b","v5.16b","v6.16b","v7.16b","v8.16b","v9.16b","v10.16b","v11.16b");
+my ($TD5HI,$TD5LO,$TD7LO)=("v9.d[1]", "d9", "v11.d[0]");
+my $Xfer=$xt4;
+my $j=0;
+
+sub AUTOLOAD()          # thunk [simplified] x86-style perlasm
+{ my $opcode = $AUTOLOAD; $opcode =~ s/.*:://; $opcode =~ s/_/\./;
+  my $arg = pop;
+    $arg = "#$arg" if ($arg*1 eq $arg);
+    $code .= "\t$opcode\t".join(',',@_,$arg)."\n";
+}
+
+sub Xupdate()
+{ use integer;
+  my $body = shift;
+  my @insns = (&$body,&$body,&$body,&$body);
+  my ($a,$b,$c,$d,$e,$f,$g,$h);
+
+	&ext		($TB0,@VB[0],@VB[1],4);	# X[1..4]
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&ext		($TB1,@VB[2],@VB[3],4);	# X[9..12]
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&ushr		($TS2,$TS0,$sigma0[0]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&add 		(@VS[0],@VS[0],$TS1);	# X[0..3] += X[9..12]
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&ushr		($TS1,$TS0,$sigma0[2]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&sli		($TS2,$TS0,32-$sigma0[0]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&ushr		($TS3,$TS0,$sigma0[1]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&eor		($TB1,$TB1,$TB2);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&sli		($TS3,$TS0,32-$sigma0[1]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &ushr		($TS4,@VS[3],$sigma1[0]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&eor		($TB1,$TB1,$TB3);	# sigma0(X[1..4])
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &sli		($TS4,@VS[3],32-$sigma1[0]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &ushr		($TS5,@VS[3],$sigma1[2]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&add		(@VS[0],@VS[0],$TS1);	# X[0..3] += sigma0(X[1..4])
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &eor		($TB5,$TB5,$TB4);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &ushr		($TS4,@VS[3],$sigma1[1]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &sli		($TS4,@VS[3],32-$sigma1[1]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &eor		($TB5,$TB5,$TB4);	# sigma1(X[14..15])
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&mov		($TD5LO, $TD5HI);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&add		(@VS[0],@VS[0],$TS5);	# X[0..1] += sigma1(X[14..15])
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &ushr		($TS6,@VS[0],$sigma1[0]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &sli		($TS6,@VS[0],32-$sigma1[0]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &ushr		($TS7,@VS[0],$sigma1[2]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &eor		($TB7,$TB7,$TB6);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &ushr		($TS6,@VS[0],$sigma1[1]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&ld1		("{$TS0}","[$Ktbl], #16");
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &sli		($TS6,@VS[0],32-$sigma1[1]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	  &eor		($TB7,$TB7,$TB6);	# sigma1(X[16..17])
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&eor		($TB5,$TB5,$TB5);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&mov		($TD5HI, $TD7LO);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&add		(@VS[0],@VS[0],$TS5);	# X[0..3] += sigma1(X[14..17])
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&add		($TS0,$TS0,@VS[0]);
+	 while($#insns>=2) { eval(shift(@insns)); }
+	&st1		("{$TS0}","[$Xfer], #16");
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+
+	push(@VB,shift(@VB));		# "rotate" X[]
+	push(@VS,shift(@VS));		# "rotate" X[]
+}
+
+sub Xpreload()
+{ use integer;
+  my $body = shift;
+  my @insns = (&$body,&$body,&$body,&$body);
+  my ($a,$b,$c,$d,$e,$f,$g,$h);
+
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&ld1		("{$TS0}","[$Ktbl], #16");
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&rev32		(@VB[0],@VB[0]);
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	 eval(shift(@insns));
+	&add		($TS0,$TS0,@VS[0]);
+	 foreach (@insns) { eval; }	# remaining instructions
+	&st1		("{$TS0}","[$Xfer], #16");
+
+	push(@VB,shift(@VB));		# "rotate" X[]
+	push(@VS,shift(@VS));		# "rotate" X[]
+}
+
+sub body_00_15 () {
+	(
+	'($a,$b,$c,$d,$e,$f,$g,$h)=@V;'.
+	'&add	($h,$h,$t1)',			# h+=X[i]+K[i]
+	'&eor	($t1,$f,$g)',
+	'&eor	($t0,$e,$e,"ror#".($Sigma1[1]-$Sigma1[0]))',
+	'&add	($a,$a,$t2)',			# h+=Maj(a,b,c) from the past
+	'&and	($t1,$t1,$e)',
+	'&eor	($t2,$t0,$e,"ror#".($Sigma1[2]-$Sigma1[0]))',	# Sigma1(e)
+	'&eor	($t0,$a,$a,"ror#".($Sigma0[1]-$Sigma0[0]))',
+	'&ror	($t2,$t2,"#$Sigma1[0]")',
+	'&eor	($t1,$t1,$g)',			# Ch(e,f,g)
+	'&add	($h,$h,$t2)',			# h+=Sigma1(e)
+	'&eor	($t2,$a,$b)',			# a^b, b^c in next round
+	'&eor	($t0,$t0,$a,"ror#".($Sigma0[2]-$Sigma0[0]))',	# Sigma0(a)
+	'&add	($h,$h,$t1)',			# h+=Ch(e,f,g)
+	'&ldr	($t1,sprintf "[sp,#%d]",4*(($j+1)&15))	if (($j&15)!=15);'.
+	'&ldr	($t1,"[$Ktbl]")				if ($j==15);'.
+	'&ldr	($xt1,"[sp,#64]")			if ($j==31)',
+	'&and	($t3,$t3,$t2)',			# (b^c)&=(a^b)
+	'&ror	($t0,$t0,"#$Sigma0[0]")',
+	'&add	($d,$d,$h)',			# d+=h
+	'&add	($h,$h,$t0);'.			# h+=Sigma0(a)
+	'&eor	($t3,$t3,$b)',			# Maj(a,b,c)
+	'$j++;	unshift(@V,pop(@V)); ($t2,$t3)=($t3,$t2);'
+	)
+}
+
+$code.=<<___;
+
+.text
+.type	K256,%object
+.align	5
+K256:
+.word	0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+.word	0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+.word	0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+.word	0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+.word	0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+.word	0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+.word	0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+.word	0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+.word	0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+.word	0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+.word	0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+.word	0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+.word	0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+.word	0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+.word	0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+.word	0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+.size	K256,.-K256
+.word	0				// terminator
+
+.global	sha256_block_data_order_neon
+.type	sha256_block_data_order_neon,%function
+.align	4
+sha256_block_data_order_neon:
+.LNEON:
+	stp	x29, x30, [sp, #-16]!
+	mov	x29, sp
+	sub	sp,sp,#16*4+32
+	adr	$Ktbl,K256
+	bic	x15,x15,#15		// align for 128-bit stores
+	add	$len,$inp,$len,lsl#6	// len to point at the end of inp
+
+	ld1		{@VB[0]},[$inp], #16
+	ld1		{@VB[1]},[$inp], #16
+	ld1		{@VB[2]},[$inp], #16
+	ld1		{@VB[3]},[$inp], #16
+	ld1		{$TS0},[$Ktbl], #16
+	ld1		{$TS1},[$Ktbl], #16
+	ld1		{$TS2},[$Ktbl], #16
+	ld1		{$TS3},[$Ktbl], #16
+	rev32		@VB[0],@VB[0]		// yes, even on
+	str		$ctx,[sp,#64]
+	rev32		@VB[1],@VB[1]		// big-endian
+	str		$inp,[sp,#72]
+	mov		$Xfer,sp
+	rev32		@VB[2],@VB[2]
+	str		$len,[sp,#80]
+	rev32		@VB[3],@VB[3]
+	add		$TS0,$TS0,@VS[0]
+	add		$TS1,$TS1,@VS[1]
+	st1		{$TS0},[$Xfer], #16
+	add		$TS2,$TS2,@VS[2]
+	st1		{$TS1},[$Xfer], #16
+	add		$TS3,$TS3,@VS[3]
+	st1		{$TS2-$TS3},[$Xfer], #32
+
+	ldp		$A, $B, [$ctx]
+	ldp		$C, $D, [$ctx, #8]
+	ldp		$E, $F, [$ctx, #16]
+	ldp		$G, $H, [$ctx, #24]
+	sub		$Xfer,$Xfer,#64
+	ldr		$t1,[sp,#0]
+	mov		$xt2,xzr
+	eor		$t3,$B,$C
+	b		.L_00_48
+
+.align	4
+.L_00_48:
+___
+	&Xupdate(\&body_00_15);
+	&Xupdate(\&body_00_15);
+	&Xupdate(\&body_00_15);
+	&Xupdate(\&body_00_15);
+$code.=<<___;
+	cmp	$t1,#0				// check for K256 terminator
+	ldr	$t1,[sp,#0]
+	sub	$Xfer,$Xfer,#64
+	bne	.L_00_48
+
+	ldr		$inp,[sp,#72]
+	ldr		$xt0,[sp,#80]
+	sub		$Ktbl,$Ktbl,#256	// rewind $Ktbl
+	cmp		$inp,$xt0
+	mov		$xt0, #64
+	csel		$xt0, $xt0, xzr, eq
+	sub		$inp,$inp,$xt0		// avoid SEGV
+	ld1		{@VS[0]},[$inp], #16	// load next input block
+	ld1		{@VS[1]},[$inp], #16
+	ld1		{@VS[2]},[$inp], #16
+	ld1		{@VS[3]},[$inp], #16
+	str		$inp,[sp,#72]
+	mov		$Xfer,sp
+___
+	&Xpreload(\&body_00_15);
+	&Xpreload(\&body_00_15);
+	&Xpreload(\&body_00_15);
+	&Xpreload(\&body_00_15);
+$code.=<<___;
+	ldr	$t0,[$xt1,#0]
+	add	$A,$A,$t2			// h+=Maj(a,b,c) from the past
+	ldr	$t2,[$xt1,#4]
+	ldr	$t3,[$xt1,#8]
+	ldr	$t4,[$xt1,#12]
+	add	$A,$A,$t0			// accumulate
+	ldr	$t0,[$xt1,#16]
+	add	$B,$B,$t2
+	ldr	$t2,[$xt1,#20]
+	add	$C,$C,$t3
+	ldr	$t3,[$xt1,#24]
+	add	$D,$D,$t4
+	ldr	$t4,[$xt1,#28]
+	add	$E,$E,$t0
+	str	$A,[$xt1],#4
+	add	$F,$F,$t2
+	str	$B,[$xt1],#4
+	add	$G,$G,$t3
+	str	$C,[$xt1],#4
+	add	$H,$H,$t4
+	str	$D,[$xt1],#4
+
+	stp	$E, $F, [$xt1]
+	stp	$G, $H, [$xt1, #8]
+
+	b.eq	0f
+	mov	$Xfer,sp
+	ldr	$t1,[sp,#0]
+	eor	$t2,$t2,$t2
+	eor	$t3,$B,$C
+	b	.L_00_48
+
+0:	add	sp,sp,#16*4+32
+	ldp	x29, x30, [sp], #16
+	ret
+
+.size	sha256_block_data_order_neon,.-sha256_block_data_order_neon
+___
+}}}
+
+foreach (split($/,$code)) {
+
+	s/\`([^\`]*)\`/eval $1/geo;
+
+	print $_,"\n";
+}
+
+close STDOUT; # enforce flush
+	
diff --git a/arch/arm64/crypto/sha256-core.S_shipped b/arch/arm64/crypto/sha256-core.S_shipped
new file mode 100644
index 000000000000..1d9b55367ee0
--- /dev/null
+++ b/arch/arm64/crypto/sha256-core.S_shipped
@@ -0,0 +1,883 @@
+
+.text
+.type	K256,%object
+.align	5
+K256:
+.word	0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+.word	0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+.word	0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+.word	0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+.word	0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+.word	0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+.word	0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+.word	0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+.word	0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+.word	0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+.word	0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+.word	0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+.word	0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+.word	0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+.word	0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+.word	0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
+.size	K256,.-K256
+.word	0				// terminator
+
+.global	sha256_block_data_order_neon
+.type	sha256_block_data_order_neon,%function
+.align	4
+sha256_block_data_order_neon:
+.LNEON:
+	stp	x29, x30, [sp, #-16]!
+	mov	x29, sp
+	sub	sp,sp,#16*4+32
+	adr	x14,K256
+	bic	x15,x15,#15		// align for 128-bit stores
+	add	x2,x1,x2,lsl#6	// len to point at the end of inp
+
+	ld1		{v0.16b},[x1], #16
+	ld1		{v1.16b},[x1], #16
+	ld1		{v2.16b},[x1], #16
+	ld1		{v3.16b},[x1], #16
+	ld1		{v4.4s},[x14], #16
+	ld1		{v5.4s},[x14], #16
+	ld1		{v6.4s},[x14], #16
+	ld1		{v7.4s},[x14], #16
+	rev32		v0.16b,v0.16b		// yes, even on
+	str		x0,[sp,#64]
+	rev32		v1.16b,v1.16b		// big-endian
+	str		x1,[sp,#72]
+	mov		x1,sp
+	rev32		v2.16b,v2.16b
+	str		x2,[sp,#80]
+	rev32		v3.16b,v3.16b
+	add		v4.4s,v4.4s,v0.4s
+	add		v5.4s,v5.4s,v1.4s
+	st1		{v4.4s},[x1], #16
+	add		v6.4s,v6.4s,v2.4s
+	st1		{v5.4s},[x1], #16
+	add		v7.4s,v7.4s,v3.4s
+	st1		{v6.4s-v7.4s},[x1], #32
+
+	ldp		w4, w5, [x0]
+	ldp		w6, w7, [x0, #8]
+	ldp		w8, w9, [x0, #16]
+	ldp		w10, w11, [x0, #24]
+	sub		x1,x1,#64
+	ldr		w2,[sp,#0]
+	mov		x12,xzr
+	eor		w3,w5,w6
+	b		.L_00_48
+
+.align	4
+.L_00_48:
+	ext	v4.16b,v0.16b,v1.16b,#4
+	add	w11,w11,w2
+	eor	w2,w9,w10
+	eor	w0,w8,w8,ror#5
+	ext	v5.16b,v2.16b,v3.16b,#4
+	add	w4,w4,w12
+	and	w2,w2,w8
+	eor	w12,w0,w8,ror#19
+	ushr	v6.4s,v4.4s,#7
+	eor	w0,w4,w4,ror#11
+	ror	w12,w12,#6
+	add	v0.4s,v0.4s,v5.4s
+	eor	w2,w2,w10
+	add	w11,w11,w12
+	ushr	v5.4s,v4.4s,#3
+	eor	w12,w4,w5
+	eor	w0,w0,w4,ror#20
+	sli	v6.4s,v4.4s,#25
+	add	w11,w11,w2
+	ldr	w2,[sp,#4]
+	ushr	v7.4s,v4.4s,#18
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	eor	v5.16b,v5.16b,v6.16b
+	add	w7,w7,w11
+	add	w11,w11,w0
+	eor	w3,w3,w5
+	sli	v7.4s,v4.4s,#14
+	add	w10,w10,w2
+	ushr	v8.4s,v3.4s,#17
+	eor	w2,w8,w9
+	eor	w0,w7,w7,ror#5
+	eor	v5.16b,v5.16b,v7.16b
+	add	w11,w11,w3
+	and	w2,w2,w7
+	sli	v8.4s,v3.4s,#15
+	eor	w3,w0,w7,ror#19
+	eor	w0,w11,w11,ror#11
+	ushr	v9.4s,v3.4s,#10
+	ror	w3,w3,#6
+	eor	w2,w2,w9
+	add	v0.4s,v0.4s,v5.4s
+	add	w10,w10,w3
+	eor	w3,w11,w4
+	eor	v9.16b,v9.16b,v8.16b
+	eor	w0,w0,w11,ror#20
+	add	w10,w10,w2
+	ushr	v8.4s,v3.4s,#19
+	ldr	w2,[sp,#8]
+	and	w12,w12,w3
+	sli	v8.4s,v3.4s,#13
+	ror	w0,w0,#2
+	add	w6,w6,w10
+	eor	v9.16b,v9.16b,v8.16b
+	add	w10,w10,w0
+	eor	w12,w12,w4
+	mov	d9,v9.d[1]
+	add	w9,w9,w2
+	eor	w2,w7,w8
+	add	v0.4s,v0.4s,v9.4s
+	eor	w0,w6,w6,ror#5
+	add	w10,w10,w12
+	ushr	v10.4s,v0.4s,#17
+	and	w2,w2,w6
+	eor	w12,w0,w6,ror#19
+	sli	v10.4s,v0.4s,#15
+	eor	w0,w10,w10,ror#11
+	ror	w12,w12,#6
+	ushr	v11.4s,v0.4s,#10
+	eor	w2,w2,w8
+	add	w9,w9,w12
+	eor	v11.16b,v11.16b,v10.16b
+	eor	w12,w10,w11
+	eor	w0,w0,w10,ror#20
+	ushr	v10.4s,v0.4s,#19
+	add	w9,w9,w2
+	ldr	w2,[sp,#12]
+	ld1	{v4.4s},[x14], #16
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	sli	v10.4s,v0.4s,#13
+	add	w5,w5,w9
+	add	w9,w9,w0
+	eor	w3,w3,w11
+	eor	v11.16b,v11.16b,v10.16b
+	add	w8,w8,w2
+	eor	v9.16b,v9.16b,v9.16b
+	eor	w2,w6,w7
+	eor	w0,w5,w5,ror#5
+	mov	v9.d[1],v11.d[0]
+	add	w9,w9,w3
+	and	w2,w2,w5
+	add	v0.4s,v0.4s,v9.4s
+	eor	w3,w0,w5,ror#19
+	eor	w0,w9,w9,ror#11
+	add	v4.4s,v4.4s,v0.4s
+	ror	w3,w3,#6
+	eor	w2,w2,w7
+	add	w8,w8,w3
+	eor	w3,w9,w10
+	eor	w0,w0,w9,ror#20
+	add	w8,w8,w2
+	ldr	w2,[sp,#16]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w4,w4,w8
+	st1	{v4.4s},[x1], #16
+	add	w8,w8,w0
+	eor	w12,w12,w10
+	ext	v4.16b,v1.16b,v2.16b,#4
+	add	w7,w7,w2
+	eor	w2,w5,w6
+	eor	w0,w4,w4,ror#5
+	ext	v5.16b,v3.16b,v0.16b,#4
+	add	w8,w8,w12
+	and	w2,w2,w4
+	eor	w12,w0,w4,ror#19
+	ushr	v6.4s,v4.4s,#7
+	eor	w0,w8,w8,ror#11
+	ror	w12,w12,#6
+	add	v1.4s,v1.4s,v5.4s
+	eor	w2,w2,w6
+	add	w7,w7,w12
+	ushr	v5.4s,v4.4s,#3
+	eor	w12,w8,w9
+	eor	w0,w0,w8,ror#20
+	sli	v6.4s,v4.4s,#25
+	add	w7,w7,w2
+	ldr	w2,[sp,#20]
+	ushr	v7.4s,v4.4s,#18
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	eor	v5.16b,v5.16b,v6.16b
+	add	w11,w11,w7
+	add	w7,w7,w0
+	eor	w3,w3,w9
+	sli	v7.4s,v4.4s,#14
+	add	w6,w6,w2
+	ushr	v8.4s,v0.4s,#17
+	eor	w2,w4,w5
+	eor	w0,w11,w11,ror#5
+	eor	v5.16b,v5.16b,v7.16b
+	add	w7,w7,w3
+	and	w2,w2,w11
+	sli	v8.4s,v0.4s,#15
+	eor	w3,w0,w11,ror#19
+	eor	w0,w7,w7,ror#11
+	ushr	v9.4s,v0.4s,#10
+	ror	w3,w3,#6
+	eor	w2,w2,w5
+	add	v1.4s,v1.4s,v5.4s
+	add	w6,w6,w3
+	eor	w3,w7,w8
+	eor	v9.16b,v9.16b,v8.16b
+	eor	w0,w0,w7,ror#20
+	add	w6,w6,w2
+	ushr	v8.4s,v0.4s,#19
+	ldr	w2,[sp,#24]
+	and	w12,w12,w3
+	sli	v8.4s,v0.4s,#13
+	ror	w0,w0,#2
+	add	w10,w10,w6
+	eor	v9.16b,v9.16b,v8.16b
+	add	w6,w6,w0
+	eor	w12,w12,w8
+	mov	d9,v9.d[1]
+	add	w5,w5,w2
+	eor	w2,w11,w4
+	add	v1.4s,v1.4s,v9.4s
+	eor	w0,w10,w10,ror#5
+	add	w6,w6,w12
+	ushr	v10.4s,v1.4s,#17
+	and	w2,w2,w10
+	eor	w12,w0,w10,ror#19
+	sli	v10.4s,v1.4s,#15
+	eor	w0,w6,w6,ror#11
+	ror	w12,w12,#6
+	ushr	v11.4s,v1.4s,#10
+	eor	w2,w2,w4
+	add	w5,w5,w12
+	eor	v11.16b,v11.16b,v10.16b
+	eor	w12,w6,w7
+	eor	w0,w0,w6,ror#20
+	ushr	v10.4s,v1.4s,#19
+	add	w5,w5,w2
+	ldr	w2,[sp,#28]
+	ld1	{v4.4s},[x14], #16
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	sli	v10.4s,v1.4s,#13
+	add	w9,w9,w5
+	add	w5,w5,w0
+	eor	w3,w3,w7
+	eor	v11.16b,v11.16b,v10.16b
+	add	w4,w4,w2
+	eor	v9.16b,v9.16b,v9.16b
+	eor	w2,w10,w11
+	eor	w0,w9,w9,ror#5
+	mov	v9.d[1],v11.d[0]
+	add	w5,w5,w3
+	and	w2,w2,w9
+	add	v1.4s,v1.4s,v9.4s
+	eor	w3,w0,w9,ror#19
+	eor	w0,w5,w5,ror#11
+	add	v4.4s,v4.4s,v1.4s
+	ror	w3,w3,#6
+	eor	w2,w2,w11
+	add	w4,w4,w3
+	eor	w3,w5,w6
+	eor	w0,w0,w5,ror#20
+	add	w4,w4,w2
+	ldr	w2,[sp,#32]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w8,w8,w4
+	st1	{v4.4s},[x1], #16
+	add	w4,w4,w0
+	eor	w12,w12,w6
+	ext	v4.16b,v2.16b,v3.16b,#4
+	add	w11,w11,w2
+	eor	w2,w9,w10
+	eor	w0,w8,w8,ror#5
+	ext	v5.16b,v0.16b,v1.16b,#4
+	add	w4,w4,w12
+	and	w2,w2,w8
+	eor	w12,w0,w8,ror#19
+	ushr	v6.4s,v4.4s,#7
+	eor	w0,w4,w4,ror#11
+	ror	w12,w12,#6
+	add	v2.4s,v2.4s,v5.4s
+	eor	w2,w2,w10
+	add	w11,w11,w12
+	ushr	v5.4s,v4.4s,#3
+	eor	w12,w4,w5
+	eor	w0,w0,w4,ror#20
+	sli	v6.4s,v4.4s,#25
+	add	w11,w11,w2
+	ldr	w2,[sp,#36]
+	ushr	v7.4s,v4.4s,#18
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	eor	v5.16b,v5.16b,v6.16b
+	add	w7,w7,w11
+	add	w11,w11,w0
+	eor	w3,w3,w5
+	sli	v7.4s,v4.4s,#14
+	add	w10,w10,w2
+	ushr	v8.4s,v1.4s,#17
+	eor	w2,w8,w9
+	eor	w0,w7,w7,ror#5
+	eor	v5.16b,v5.16b,v7.16b
+	add	w11,w11,w3
+	and	w2,w2,w7
+	sli	v8.4s,v1.4s,#15
+	eor	w3,w0,w7,ror#19
+	eor	w0,w11,w11,ror#11
+	ushr	v9.4s,v1.4s,#10
+	ror	w3,w3,#6
+	eor	w2,w2,w9
+	add	v2.4s,v2.4s,v5.4s
+	add	w10,w10,w3
+	eor	w3,w11,w4
+	eor	v9.16b,v9.16b,v8.16b
+	eor	w0,w0,w11,ror#20
+	add	w10,w10,w2
+	ushr	v8.4s,v1.4s,#19
+	ldr	w2,[sp,#40]
+	and	w12,w12,w3
+	sli	v8.4s,v1.4s,#13
+	ror	w0,w0,#2
+	add	w6,w6,w10
+	eor	v9.16b,v9.16b,v8.16b
+	add	w10,w10,w0
+	eor	w12,w12,w4
+	mov	d9,v9.d[1]
+	add	w9,w9,w2
+	eor	w2,w7,w8
+	add	v2.4s,v2.4s,v9.4s
+	eor	w0,w6,w6,ror#5
+	add	w10,w10,w12
+	ushr	v10.4s,v2.4s,#17
+	and	w2,w2,w6
+	eor	w12,w0,w6,ror#19
+	sli	v10.4s,v2.4s,#15
+	eor	w0,w10,w10,ror#11
+	ror	w12,w12,#6
+	ushr	v11.4s,v2.4s,#10
+	eor	w2,w2,w8
+	add	w9,w9,w12
+	eor	v11.16b,v11.16b,v10.16b
+	eor	w12,w10,w11
+	eor	w0,w0,w10,ror#20
+	ushr	v10.4s,v2.4s,#19
+	add	w9,w9,w2
+	ldr	w2,[sp,#44]
+	ld1	{v4.4s},[x14], #16
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	sli	v10.4s,v2.4s,#13
+	add	w5,w5,w9
+	add	w9,w9,w0
+	eor	w3,w3,w11
+	eor	v11.16b,v11.16b,v10.16b
+	add	w8,w8,w2
+	eor	v9.16b,v9.16b,v9.16b
+	eor	w2,w6,w7
+	eor	w0,w5,w5,ror#5
+	mov	v9.d[1],v11.d[0]
+	add	w9,w9,w3
+	and	w2,w2,w5
+	add	v2.4s,v2.4s,v9.4s
+	eor	w3,w0,w5,ror#19
+	eor	w0,w9,w9,ror#11
+	add	v4.4s,v4.4s,v2.4s
+	ror	w3,w3,#6
+	eor	w2,w2,w7
+	add	w8,w8,w3
+	eor	w3,w9,w10
+	eor	w0,w0,w9,ror#20
+	add	w8,w8,w2
+	ldr	w2,[sp,#48]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w4,w4,w8
+	st1	{v4.4s},[x1], #16
+	add	w8,w8,w0
+	eor	w12,w12,w10
+	ext	v4.16b,v3.16b,v0.16b,#4
+	add	w7,w7,w2
+	eor	w2,w5,w6
+	eor	w0,w4,w4,ror#5
+	ext	v5.16b,v1.16b,v2.16b,#4
+	add	w8,w8,w12
+	and	w2,w2,w4
+	eor	w12,w0,w4,ror#19
+	ushr	v6.4s,v4.4s,#7
+	eor	w0,w8,w8,ror#11
+	ror	w12,w12,#6
+	add	v3.4s,v3.4s,v5.4s
+	eor	w2,w2,w6
+	add	w7,w7,w12
+	ushr	v5.4s,v4.4s,#3
+	eor	w12,w8,w9
+	eor	w0,w0,w8,ror#20
+	sli	v6.4s,v4.4s,#25
+	add	w7,w7,w2
+	ldr	w2,[sp,#52]
+	ushr	v7.4s,v4.4s,#18
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	eor	v5.16b,v5.16b,v6.16b
+	add	w11,w11,w7
+	add	w7,w7,w0
+	eor	w3,w3,w9
+	sli	v7.4s,v4.4s,#14
+	add	w6,w6,w2
+	ushr	v8.4s,v2.4s,#17
+	eor	w2,w4,w5
+	eor	w0,w11,w11,ror#5
+	eor	v5.16b,v5.16b,v7.16b
+	add	w7,w7,w3
+	and	w2,w2,w11
+	sli	v8.4s,v2.4s,#15
+	eor	w3,w0,w11,ror#19
+	eor	w0,w7,w7,ror#11
+	ushr	v9.4s,v2.4s,#10
+	ror	w3,w3,#6
+	eor	w2,w2,w5
+	add	v3.4s,v3.4s,v5.4s
+	add	w6,w6,w3
+	eor	w3,w7,w8
+	eor	v9.16b,v9.16b,v8.16b
+	eor	w0,w0,w7,ror#20
+	add	w6,w6,w2
+	ushr	v8.4s,v2.4s,#19
+	ldr	w2,[sp,#56]
+	and	w12,w12,w3
+	sli	v8.4s,v2.4s,#13
+	ror	w0,w0,#2
+	add	w10,w10,w6
+	eor	v9.16b,v9.16b,v8.16b
+	add	w6,w6,w0
+	eor	w12,w12,w8
+	mov	d9,v9.d[1]
+	add	w5,w5,w2
+	eor	w2,w11,w4
+	add	v3.4s,v3.4s,v9.4s
+	eor	w0,w10,w10,ror#5
+	add	w6,w6,w12
+	ushr	v10.4s,v3.4s,#17
+	and	w2,w2,w10
+	eor	w12,w0,w10,ror#19
+	sli	v10.4s,v3.4s,#15
+	eor	w0,w6,w6,ror#11
+	ror	w12,w12,#6
+	ushr	v11.4s,v3.4s,#10
+	eor	w2,w2,w4
+	add	w5,w5,w12
+	eor	v11.16b,v11.16b,v10.16b
+	eor	w12,w6,w7
+	eor	w0,w0,w6,ror#20
+	ushr	v10.4s,v3.4s,#19
+	add	w5,w5,w2
+	ldr	w2,[sp,#60]
+	ld1	{v4.4s},[x14], #16
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	sli	v10.4s,v3.4s,#13
+	add	w9,w9,w5
+	add	w5,w5,w0
+	eor	w3,w3,w7
+	eor	v11.16b,v11.16b,v10.16b
+	add	w4,w4,w2
+	eor	v9.16b,v9.16b,v9.16b
+	eor	w2,w10,w11
+	eor	w0,w9,w9,ror#5
+	mov	v9.d[1],v11.d[0]
+	add	w5,w5,w3
+	and	w2,w2,w9
+	add	v3.4s,v3.4s,v9.4s
+	eor	w3,w0,w9,ror#19
+	eor	w0,w5,w5,ror#11
+	add	v4.4s,v4.4s,v3.4s
+	ror	w3,w3,#6
+	eor	w2,w2,w11
+	add	w4,w4,w3
+	eor	w3,w5,w6
+	eor	w0,w0,w5,ror#20
+	add	w4,w4,w2
+	ldr	w2,[x14]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w8,w8,w4
+	st1	{v4.4s},[x1], #16
+	add	w4,w4,w0
+	eor	w12,w12,w6
+	cmp	w2,#0				// check for K256 terminator
+	ldr	w2,[sp,#0]
+	sub	x1,x1,#64
+	bne	.L_00_48
+
+	ldr		x1,[sp,#72]
+	ldr		x0,[sp,#80]
+	sub		x14,x14,#256	// rewind x14
+	cmp		x1,x0
+	mov		x0, #64
+	csel		x0, x0, xzr, eq
+	sub		x1,x1,x0		// avoid SEGV
+	ld1		{v0.4s},[x1], #16	// load next input block
+	ld1		{v1.4s},[x1], #16
+	ld1		{v2.4s},[x1], #16
+	ld1		{v3.4s},[x1], #16
+	str		x1,[sp,#72]
+	mov		x1,sp
+	add	w11,w11,w2
+	eor	w2,w9,w10
+	eor	w0,w8,w8,ror#5
+	add	w4,w4,w12
+	ld1	{v4.4s},[x14], #16
+	and	w2,w2,w8
+	eor	w12,w0,w8,ror#19
+	eor	w0,w4,w4,ror#11
+	ror	w12,w12,#6
+	rev32	v0.16b,v0.16b
+	eor	w2,w2,w10
+	add	w11,w11,w12
+	eor	w12,w4,w5
+	eor	w0,w0,w4,ror#20
+	add	v4.4s,v4.4s,v0.4s
+	add	w11,w11,w2
+	ldr	w2,[sp,#4]
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	add	w7,w7,w11
+	add	w11,w11,w0
+	eor	w3,w3,w5
+	add	w10,w10,w2
+	eor	w2,w8,w9
+	eor	w0,w7,w7,ror#5
+	add	w11,w11,w3
+	and	w2,w2,w7
+	eor	w3,w0,w7,ror#19
+	eor	w0,w11,w11,ror#11
+	ror	w3,w3,#6
+	eor	w2,w2,w9
+	add	w10,w10,w3
+	eor	w3,w11,w4
+	eor	w0,w0,w11,ror#20
+	add	w10,w10,w2
+	ldr	w2,[sp,#8]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w6,w6,w10
+	add	w10,w10,w0
+	eor	w12,w12,w4
+	add	w9,w9,w2
+	eor	w2,w7,w8
+	eor	w0,w6,w6,ror#5
+	add	w10,w10,w12
+	and	w2,w2,w6
+	eor	w12,w0,w6,ror#19
+	eor	w0,w10,w10,ror#11
+	ror	w12,w12,#6
+	eor	w2,w2,w8
+	add	w9,w9,w12
+	eor	w12,w10,w11
+	eor	w0,w0,w10,ror#20
+	add	w9,w9,w2
+	ldr	w2,[sp,#12]
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	add	w5,w5,w9
+	add	w9,w9,w0
+	eor	w3,w3,w11
+	add	w8,w8,w2
+	eor	w2,w6,w7
+	eor	w0,w5,w5,ror#5
+	add	w9,w9,w3
+	and	w2,w2,w5
+	eor	w3,w0,w5,ror#19
+	eor	w0,w9,w9,ror#11
+	ror	w3,w3,#6
+	eor	w2,w2,w7
+	add	w8,w8,w3
+	eor	w3,w9,w10
+	eor	w0,w0,w9,ror#20
+	add	w8,w8,w2
+	ldr	w2,[sp,#16]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w4,w4,w8
+	add	w8,w8,w0
+	eor	w12,w12,w10
+	st1	{v4.4s},[x1], #16
+	add	w7,w7,w2
+	eor	w2,w5,w6
+	eor	w0,w4,w4,ror#5
+	add	w8,w8,w12
+	ld1	{v4.4s},[x14], #16
+	and	w2,w2,w4
+	eor	w12,w0,w4,ror#19
+	eor	w0,w8,w8,ror#11
+	ror	w12,w12,#6
+	rev32	v1.16b,v1.16b
+	eor	w2,w2,w6
+	add	w7,w7,w12
+	eor	w12,w8,w9
+	eor	w0,w0,w8,ror#20
+	add	v4.4s,v4.4s,v1.4s
+	add	w7,w7,w2
+	ldr	w2,[sp,#20]
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	add	w11,w11,w7
+	add	w7,w7,w0
+	eor	w3,w3,w9
+	add	w6,w6,w2
+	eor	w2,w4,w5
+	eor	w0,w11,w11,ror#5
+	add	w7,w7,w3
+	and	w2,w2,w11
+	eor	w3,w0,w11,ror#19
+	eor	w0,w7,w7,ror#11
+	ror	w3,w3,#6
+	eor	w2,w2,w5
+	add	w6,w6,w3
+	eor	w3,w7,w8
+	eor	w0,w0,w7,ror#20
+	add	w6,w6,w2
+	ldr	w2,[sp,#24]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w10,w10,w6
+	add	w6,w6,w0
+	eor	w12,w12,w8
+	add	w5,w5,w2
+	eor	w2,w11,w4
+	eor	w0,w10,w10,ror#5
+	add	w6,w6,w12
+	and	w2,w2,w10
+	eor	w12,w0,w10,ror#19
+	eor	w0,w6,w6,ror#11
+	ror	w12,w12,#6
+	eor	w2,w2,w4
+	add	w5,w5,w12
+	eor	w12,w6,w7
+	eor	w0,w0,w6,ror#20
+	add	w5,w5,w2
+	ldr	w2,[sp,#28]
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	add	w9,w9,w5
+	add	w5,w5,w0
+	eor	w3,w3,w7
+	add	w4,w4,w2
+	eor	w2,w10,w11
+	eor	w0,w9,w9,ror#5
+	add	w5,w5,w3
+	and	w2,w2,w9
+	eor	w3,w0,w9,ror#19
+	eor	w0,w5,w5,ror#11
+	ror	w3,w3,#6
+	eor	w2,w2,w11
+	add	w4,w4,w3
+	eor	w3,w5,w6
+	eor	w0,w0,w5,ror#20
+	add	w4,w4,w2
+	ldr	w2,[sp,#32]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w8,w8,w4
+	add	w4,w4,w0
+	eor	w12,w12,w6
+	st1	{v4.4s},[x1], #16
+	add	w11,w11,w2
+	eor	w2,w9,w10
+	eor	w0,w8,w8,ror#5
+	add	w4,w4,w12
+	ld1	{v4.4s},[x14], #16
+	and	w2,w2,w8
+	eor	w12,w0,w8,ror#19
+	eor	w0,w4,w4,ror#11
+	ror	w12,w12,#6
+	rev32	v2.16b,v2.16b
+	eor	w2,w2,w10
+	add	w11,w11,w12
+	eor	w12,w4,w5
+	eor	w0,w0,w4,ror#20
+	add	v4.4s,v4.4s,v2.4s
+	add	w11,w11,w2
+	ldr	w2,[sp,#36]
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	add	w7,w7,w11
+	add	w11,w11,w0
+	eor	w3,w3,w5
+	add	w10,w10,w2
+	eor	w2,w8,w9
+	eor	w0,w7,w7,ror#5
+	add	w11,w11,w3
+	and	w2,w2,w7
+	eor	w3,w0,w7,ror#19
+	eor	w0,w11,w11,ror#11
+	ror	w3,w3,#6
+	eor	w2,w2,w9
+	add	w10,w10,w3
+	eor	w3,w11,w4
+	eor	w0,w0,w11,ror#20
+	add	w10,w10,w2
+	ldr	w2,[sp,#40]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w6,w6,w10
+	add	w10,w10,w0
+	eor	w12,w12,w4
+	add	w9,w9,w2
+	eor	w2,w7,w8
+	eor	w0,w6,w6,ror#5
+	add	w10,w10,w12
+	and	w2,w2,w6
+	eor	w12,w0,w6,ror#19
+	eor	w0,w10,w10,ror#11
+	ror	w12,w12,#6
+	eor	w2,w2,w8
+	add	w9,w9,w12
+	eor	w12,w10,w11
+	eor	w0,w0,w10,ror#20
+	add	w9,w9,w2
+	ldr	w2,[sp,#44]
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	add	w5,w5,w9
+	add	w9,w9,w0
+	eor	w3,w3,w11
+	add	w8,w8,w2
+	eor	w2,w6,w7
+	eor	w0,w5,w5,ror#5
+	add	w9,w9,w3
+	and	w2,w2,w5
+	eor	w3,w0,w5,ror#19
+	eor	w0,w9,w9,ror#11
+	ror	w3,w3,#6
+	eor	w2,w2,w7
+	add	w8,w8,w3
+	eor	w3,w9,w10
+	eor	w0,w0,w9,ror#20
+	add	w8,w8,w2
+	ldr	w2,[sp,#48]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w4,w4,w8
+	add	w8,w8,w0
+	eor	w12,w12,w10
+	st1	{v4.4s},[x1], #16
+	add	w7,w7,w2
+	eor	w2,w5,w6
+	eor	w0,w4,w4,ror#5
+	add	w8,w8,w12
+	ld1	{v4.4s},[x14], #16
+	and	w2,w2,w4
+	eor	w12,w0,w4,ror#19
+	eor	w0,w8,w8,ror#11
+	ror	w12,w12,#6
+	rev32	v3.16b,v3.16b
+	eor	w2,w2,w6
+	add	w7,w7,w12
+	eor	w12,w8,w9
+	eor	w0,w0,w8,ror#20
+	add	v4.4s,v4.4s,v3.4s
+	add	w7,w7,w2
+	ldr	w2,[sp,#52]
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	add	w11,w11,w7
+	add	w7,w7,w0
+	eor	w3,w3,w9
+	add	w6,w6,w2
+	eor	w2,w4,w5
+	eor	w0,w11,w11,ror#5
+	add	w7,w7,w3
+	and	w2,w2,w11
+	eor	w3,w0,w11,ror#19
+	eor	w0,w7,w7,ror#11
+	ror	w3,w3,#6
+	eor	w2,w2,w5
+	add	w6,w6,w3
+	eor	w3,w7,w8
+	eor	w0,w0,w7,ror#20
+	add	w6,w6,w2
+	ldr	w2,[sp,#56]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w10,w10,w6
+	add	w6,w6,w0
+	eor	w12,w12,w8
+	add	w5,w5,w2
+	eor	w2,w11,w4
+	eor	w0,w10,w10,ror#5
+	add	w6,w6,w12
+	and	w2,w2,w10
+	eor	w12,w0,w10,ror#19
+	eor	w0,w6,w6,ror#11
+	ror	w12,w12,#6
+	eor	w2,w2,w4
+	add	w5,w5,w12
+	eor	w12,w6,w7
+	eor	w0,w0,w6,ror#20
+	add	w5,w5,w2
+	ldr	w2,[sp,#60]
+	and	w3,w3,w12
+	ror	w0,w0,#2
+	add	w9,w9,w5
+	add	w5,w5,w0
+	eor	w3,w3,w7
+	add	w4,w4,w2
+	eor	w2,w10,w11
+	eor	w0,w9,w9,ror#5
+	add	w5,w5,w3
+	and	w2,w2,w9
+	eor	w3,w0,w9,ror#19
+	eor	w0,w5,w5,ror#11
+	ror	w3,w3,#6
+	eor	w2,w2,w11
+	add	w4,w4,w3
+	eor	w3,w5,w6
+	eor	w0,w0,w5,ror#20
+	add	w4,w4,w2
+	ldr	x2,[sp,#64]
+	and	w12,w12,w3
+	ror	w0,w0,#2
+	add	w8,w8,w4
+	add	w4,w4,w0
+	eor	w12,w12,w6
+	st1	{v4.4s},[x1], #16
+	ldr	w0,[x2,#0]
+	add	w4,w4,w12			// h+=Maj(a,b,c) from the past
+	ldr	w12,[x2,#4]
+	ldr	w3,[x2,#8]
+	ldr	w1,[x2,#12]
+	add	w4,w4,w0			// accumulate
+	ldr	w0,[x2,#16]
+	add	w5,w5,w12
+	ldr	w12,[x2,#20]
+	add	w6,w6,w3
+	ldr	w3,[x2,#24]
+	add	w7,w7,w1
+	ldr	w1,[x2,#28]
+	add	w8,w8,w0
+	str	w4,[x2],#4
+	add	w9,w9,w12
+	str	w5,[x2],#4
+	add	w10,w10,w3
+	str	w6,[x2],#4
+	add	w11,w11,w1
+	str	w7,[x2],#4
+
+	stp	w8, w9, [x2]
+	stp	w10, w11, [x2, #8]
+
+	b.eq	0f
+	mov	x1,sp
+	ldr	w2,[sp,#0]
+	eor	w12,w12,w12
+	eor	w3,w5,w6
+	b	.L_00_48
+
+0:	add	sp,sp,#16*4+32
+	ldp	x29, x30, [sp], #16
+	ret
+
+.size	sha256_block_data_order_neon,.-sha256_block_data_order_neon
diff --git a/arch/arm64/crypto/sha256_neon_glue.c b/arch/arm64/crypto/sha256_neon_glue.c
new file mode 100644
index 000000000000..149a4bb869ea
--- /dev/null
+++ b/arch/arm64/crypto/sha256_neon_glue.c
@@ -0,0 +1,103 @@
+/*
+ * AArch64 port of the OpenSSL SHA256 implementation for ARM NEON
+ *
+ * Copyright (c) 2016 Linaro Ltd. <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License as published by the Free
+ * Software Foundation; either version 2 of the License, or (at your option)
+ * any later version.
+ *
+ */
+
+#include <crypto/internal/hash.h>
+#include <linux/cryptohash.h>
+#include <linux/types.h>
+#include <linux/string.h>
+#include <crypto/sha.h>
+#include <crypto/sha256_base.h>
+#include <asm/neon.h>
+
+MODULE_DESCRIPTION("SHA-224/SHA-256 secure hash using ARMv8 NEON");
+MODULE_AUTHOR("Andy Polyakov <appro@openssl.org>");
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void sha256_block_data_order_neon(u32 *digest, const void *data,
+					     unsigned int num_blks);
+
+static int sha256_update(struct shash_desc *desc, const u8 *data,
+			 unsigned int len)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	if ((sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
+		return crypto_sha256_update(desc, data, len);
+
+	kernel_neon_begin_partial(12);
+	sha256_base_do_update(desc, data, len,
+			(sha256_block_fn *)sha256_block_data_order_neon);
+	kernel_neon_end();
+
+	return 0;
+}
+
+static int sha256_finup(struct shash_desc *desc, const u8 *data,
+			unsigned int len, u8 *out)
+{
+	kernel_neon_begin_partial(12);
+	if (len)
+		sha256_base_do_update(desc, data, len,
+			(sha256_block_fn *)sha256_block_data_order_neon);
+	sha256_base_do_finalize(desc,
+			(sha256_block_fn *)sha256_block_data_order_neon);
+	kernel_neon_end();
+
+	return sha256_base_finish(desc, out);
+}
+
+static int sha256_final(struct shash_desc *desc, u8 *out)
+{
+	return sha256_finup(desc, NULL, 0, out);
+}
+
+static struct shash_alg algs[] = { {
+	.digestsize		= SHA256_DIGEST_SIZE,
+	.init			= sha256_base_init,
+	.update			= sha256_update,
+	.final			= sha256_final,
+	.finup			= sha256_finup,
+	.descsize		= sizeof(struct sha256_state),
+	.base.cra_name		= "sha256",
+	.base.cra_driver_name	= "sha256-neon",
+	.base.cra_priority	= 150,
+	.base.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+	.base.cra_blocksize	= SHA256_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+}, {
+	.digestsize		= SHA224_DIGEST_SIZE,
+	.init			= sha224_base_init,
+	.update			= sha256_update,
+	.final			= sha256_final,
+	.finup			= sha256_finup,
+	.descsize		= sizeof(struct sha256_state),
+	.base.cra_name		= "sha224",
+	.base.cra_driver_name	= "sha224-neon",
+	.base.cra_priority	= 150,
+	.base.cra_flags		= CRYPTO_ALG_TYPE_SHASH,
+	.base.cra_blocksize	= SHA224_BLOCK_SIZE,
+	.base.cra_module	= THIS_MODULE,
+} };
+
+static int __init sha256_neon_mod_init(void)
+{
+	return crypto_register_shashes(algs, ARRAY_SIZE(algs));
+}
+
+static void __exit sha256_neon_mod_fini(void)
+{
+	crypto_unregister_shashes(algs, ARRAY_SIZE(algs));
+}
+
+module_init(sha256_neon_mod_init);
+module_exit(sha256_neon_mod_fini);
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH] crypto: arm64/sha256 - add support for SHA256 using NEON instructions
From: Ard Biesheuvel @ 2016-09-29 23:37 UTC (permalink / raw)
  To: linux-arm-kernel@lists.infradead.org,
	linux-crypto@vger.kernel.org, Herbert Xu
  Cc: Andy Polyakov, Victor Chong, Daniel Thompson, Will Deacon,
	Catalin Marinas, Ard Biesheuvel
In-Reply-To: <1475189503-9175-2-git-send-email-ard.biesheuvel@linaro.org>

On 29 September 2016 at 15:51, Ard Biesheuvel <ard.biesheuvel@linaro.org> wrote:
> This is a port to arm64 of the NEON implementation of SHA256 that lives
> under arch/arm/crypto.
>
> Due to the fact that the AArch64 assembler dialect deviates from the
> 32-bit ARM one in ways that makes sharing code problematic, and given
> that this version only uses the NEON version whereas the original
> implementation supports plain ALU assembler, NEON and Crypto Extensions,
> this code is built from a version sha256-armv4.pl that has been
> transliterated to the AArch64 NEON dialect.
>
> Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> ---
>  arch/arm64/crypto/Kconfig               |   5 +
>  arch/arm64/crypto/Makefile              |  11 +
>  arch/arm64/crypto/sha256-armv4.pl       | 413 +++++++++
>  arch/arm64/crypto/sha256-core.S_shipped | 883 ++++++++++++++++++++
>  arch/arm64/crypto/sha256_neon_glue.c    | 103 +++
>  5 files changed, 1415 insertions(+)
>
> diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
> index 2cf32e9887e1..d32371198474 100644
> --- a/arch/arm64/crypto/Kconfig
> +++ b/arch/arm64/crypto/Kconfig
> @@ -18,6 +18,11 @@ config CRYPTO_SHA2_ARM64_CE
>         depends on ARM64 && KERNEL_MODE_NEON
>         select CRYPTO_HASH
>
> +config CRYPTO_SHA2_ARM64_NEON
> +       tristate "SHA-224/SHA-256 digest algorithm (ARMv8 NEON)"
> +       depends on ARM64 && KERNEL_MODE_NEON
> +       select CRYPTO_HASH
> +
>  config CRYPTO_GHASH_ARM64_CE
>         tristate "GHASH (for GCM chaining mode) using ARMv8 Crypto Extensions"
>         depends on ARM64 && KERNEL_MODE_NEON
> diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
> index abb79b3cfcfe..5156ebee0488 100644
> --- a/arch/arm64/crypto/Makefile
> +++ b/arch/arm64/crypto/Makefile
> @@ -29,6 +29,9 @@ aes-ce-blk-y := aes-glue-ce.o aes-ce.o
>  obj-$(CONFIG_CRYPTO_AES_ARM64_NEON_BLK) += aes-neon-blk.o
>  aes-neon-blk-y := aes-glue-neon.o aes-neon.o
>
> +obj-$(CONFIG_CRYPTO_SHA2_ARM64_NEON) := sha256-neon.o

There is a typo here that I only spotted just now: this should be += not :=

Herbert, if you're picking this up, could you please fix this at merge
time? Or do you need me to resend?

Thanks,
Ard.

^ permalink raw reply

* Char.c
From: Fontaine david @ 2016-09-30  8:09 UTC (permalink / raw)
  To: linux-crypto

Hi Linus:

This push fixes a weakness in random number generation of file random.c.

The two polynomials of LFSR used in Linux/drivers/char/random.c are
P1(X) = x^128 + x^104 + x^76 + x^51 +x^25 + x + 1, for input pool
P2(X) = x^32 + x^26 + x^19 + x^14 + x^7 + x + 1 , for output pool

These polynomials Q1(X) = alpha ^3 *(P1(X)-1)+1 and Q2(X) = alpha ^3
*(P2(X)-1)+1 are not primitive over GF(2^32) where alpha is an
primitive element of GF(2^32).
It turns out that periods of LFSR corresponding to these polynomials
are not optimal, it means that the space of numbers generated by these
LFSR is not  GF(2^(32*deg(Pi))-1), i=1,2.

As mentioned in the random.c file, these polynomials come from an
article http://eprint.iacr.org/2012/251.pdf. It is stated in the
article that these polynomials have periods (2^(32*deg(Pi))-1)/3,
i=1,2, so not optimal.

We can improve these LFSR choosing these polynomials as primitive and
therefore increase the space of numbers generated by 3.

The polynomials used in the current implementation of the PRNG and the
point presented here, do not conclude a practical attack on the PRNG.

After several calculations, we propose here the following polynomials:
R1(x) = x^128 + x^106 +x^79 + x^51 +x^25 + x+ 1, as new polynomial of input pool
R2(x) = x^32 + x^27 + x^21 + x^14 + x^7 + x +1, as new polynomial of
the output pool

So polynomials S1(X) = alpha^4*(R1(X)-1)+1 and S2(X) =
alpha^4*(R2(X)-1)+1 are primitive on GF(2^32).
It is very easy to check their primitive with magma. Use the online
tool (http://magma.maths.usyd.edu.au/calc/) and the following code:

K0:=GF(2);
P<X> := PolynomialRing(K0);
K1<a> := ext<K0|X^32+X^26+X^23+X^22+X^16+X^12+X^11+X^10+X^8+X^7+X^5+X^4+X^2+X^1+1>;K1;
P<t> := PolynomialRing(K1);

R1 :=  t^128 + t^106 + t^79 + t^51 + t^25 + t + 1;
S1 := a^4*(R1-1)+1;

R2 := t^32 + t^27 + t^21 + t^14 + t^7 + t + 1;
S2 := a^4*(R2-1)+1;

S1test := Evaluate((1/(t^128))*S1,1/t);S1test;
> t^128 + a*t^127 + a*t^100 + a*t^74 + a*t^53 + a*t^25 + a

S1test := t^128 + a^4*t^127 + a^4*t^103 + a^4*t^77 + a^4*t^49 + a^4*t^22 + a^4;
IsPrimitive(S1test);
> true

S2test := Evaluate((1/(t^32))*S2,1/t);S2test;
> t^32 + a*t^31 + a*t^24 + a*t^16 + a*t^12 + a*t^6 + a

S2test := t^32 + a^4*t^31 + a^4*t^25 + a^4*t^18 + a^4*t^11 + a^4*t^5 + a^4;
IsPrimitive(S2test);
> true

To use these polynomials, the following changes in the random.c file
should be applied:

olivier@Zebulon:~/Documents/linux-4.7.4/drivers/char$ diff random.c random-new.c
371,372c371,373
<        /* x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */
<        { S(128),         104,     76,       51,       25,       1 },
---
>        /* was: x^128 + x^104 + x^76 + x^51 +x^25 + x + 1 */
>        /* x^128 + x^106 + x^79 + x^51 +x^25 + x + 1 */
>        { S(128),         106,     79,       51,       25,       1 },
374,375c375,377
<        /* x^32 + x^26 + x^19 + x^14 + x^7 + x + 1 */
<        { S(32),           26,       19,       14,       7,         1 },
---
>        /* was: x^32 + x^26 + x^19 + x^14 + x^7 + x + 1 */
>        /* x^32 + x^27 + x^21 + x^14 + x^7 + x + 1 */
>        { S(32),           27,       21,       14,       7,         1 },
478a481
> /* was:
481a485,490
> */
> static __u32 const twist_table[16] = {
>        0x00000000, 0x1db71064, 0x3b6e20c8, 0x26d930ac,
>         0x76dc4190, 0x6b6b51f4, 0x4db26158, 0x5005713c,
>         0xedb88320, 0xf00f9344, 0xd6d6a3e8, 0xcb61b38c,
>         0x9b64c2b0, 0x86d3d2d4, 0xa00ae278, 0xbdbdf21c };
525c534,536
<                    r->pool[i] = (w >> 3) ^ twist_table[w & 7];
---
>                    /*
>                   was: r->pool[i] = (w >> 3) ^ twist_table[w & 7];*/
>                    r->pool[i] = (w >> 4) ^ twist_table[w & 15];

Regards,
David Fontaine & Olivier Vivolo

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox