Linux cryptographic layer development
 help / color / mirror / Atom feed
* Re: [BUG] crypto: atmel-aes - erro when compiling with VERBOSE_DEBUG enable
From: Herbert Xu @ 2016-09-22  9:26 UTC (permalink / raw)
  To: levent demir; +Cc: linux-crypto, cyrille.pitchen
In-Reply-To: <1473779378.3556.20.camel@inria.fr>

levent demir <levent.demir@inria.fr> wrote:
> Hello, 
> 
> if you enable VERBOSE_DEBUG and compile you will have the following
> error : 
> 
> drivers/crypto/atmel-aes.c:323:5: error: too few arguments to function
> 'atmel_aes_reg_name'
>     atmel_aes_reg_name(offset, tmp));
>     ^
> include/linux/device.h:1306:41: note: in definition of macro 'dev_vdbg'
>   dev_printk(KERN_DEBUG, dev, format, ##arg); \
>                                         ^
> drivers/crypto/atmel-aes.c:205:20: note: declared here
> static const char *atmel_aes_reg_name(u32 offset, char *tmp, size_t sz)
> 
> Indeed, in atmel_aes_write function the call to atmel_aes_reg_name
> contains only two arguments instead of 3 : 
> 
> atmel_aes_reg_name(offset, tmp));
> 
> To fix it, one has to only add the size of tmp as third argument : 
> 
> atmel_aes_reg_name(offset, tmp, sizeof(tmp)));

Thanks for the patch.  In order to apply it, you need to fix the
white-space damage as well as add a sign-off.  For details please
refer to Documentation/SubmittingPatches.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v7 4/9] crypto: acomp - add support for lzo via scomp
From: Herbert Xu @ 2016-09-22  9:22 UTC (permalink / raw)
  To: Giovanni Cabiddu; +Cc: linux-crypto
In-Reply-To: <20160920115140.GA12332@sivswdev01.ir.intel.com>

On Tue, Sep 20, 2016 at 12:51:40PM +0100, Giovanni Cabiddu wrote:
> Hi Herbert,
> 
> On Tue, Sep 20, 2016 at 05:26:18PM +0800, Herbert Xu wrote:
> > Rather than duplicating the scratch buffer handling in every scomp
> > algorithm, let's centralize this and put it into scomp.c.
> Are you suggesting to hide the scratch buffers from the scomp implementations 
> and allocate them inside crypto_register_scomp?

I'm suggesting that we have just one set of buffers for all scomp
algorithms.  After all, a CPU can only process one request at a
time.

> Does that mean we have to go back to a scomp_alg with a flat buffer API
> and linearize inside scomp?

Yes scomp should just be flat.  A sync algorithm capable of producing
partial output should use the acomp interface.

> If we take this direction, how do we support DMA from scomp implementation?
> Scratch buffers are allocated using vmalloc.

Huh? If you're doing DMA then you'd be using the acomp interface,
how can you even get into scomp?

I think you may have misread my earlier message from June.  What
I'd like to see is for the acomp layer to allocate the output
memory, rather than have it provided by the user as is the case
with the current interface.  The user could provide a maximum to
prevent crazy cases consuming unlimited memory.

What will happen with scomp is that it will simply use a linear
vmalloc buffer to store the output before copying it into an SG
list of individual pages allocated on the spot.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: BUG: rsa-pkcs1pad decrypt regression in 4.8
From: Herbert Xu @ 2016-09-22  9:04 UTC (permalink / raw)
  To: Mat Martineau; +Cc: linux-crypto, smueller
In-Reply-To: <alpine.OSX.2.20.1609211630400.14260@mjmartin-mac01.local>

On Wed, Sep 21, 2016 at 04:39:30PM -0700, Mat Martineau wrote:
> 
> Herbert -
> 
> There was a regression in pkcs1pad signature verification, related
> to signature verification, that you fixed in commit 27710b8ea3defcb:
> 
> https://git.kernel.org/cgit/linux/kernel/git/herbert/crypto-2.6.git/commit/?id=27710b8ea3defcbd7d340dbd0423d911b4eb7c4f
> 
> There is a very similar problem in the decrypt operation, which was
> not adjusted for the leading zero changes. See
> pkcs1pad_decrypt_complete().
> 
> I haven't had a chance to test a fix yet, but with the final 4.8
> release coming up very soon I wanted to report the issue.

Thanks.  This patch should fix the problem.

---8<---
crypto: rsa-pkcs1pad - Handle leading zero for decryption

As the software RSA implementation now produces fixed-length
output, we need to eliminate leading zeros in the calling code
instead.

This patch does just that for pkcs1pad decryption while signature
verification was fixed in an earlier patch.

Fixes: 9b45b7bba3d2 ("crypto: rsa - Generate fixed-length output")
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/crypto/rsa-pkcs1pad.c b/crypto/rsa-pkcs1pad.c
index 877019a..8baab43 100644
--- a/crypto/rsa-pkcs1pad.c
+++ b/crypto/rsa-pkcs1pad.c
@@ -298,41 +298,48 @@ static int pkcs1pad_decrypt_complete(struct akcipher_request *req, int err)
 	struct crypto_akcipher *tfm = crypto_akcipher_reqtfm(req);
 	struct pkcs1pad_ctx *ctx = akcipher_tfm_ctx(tfm);
 	struct pkcs1pad_request *req_ctx = akcipher_request_ctx(req);
+	unsigned int dst_len;
 	unsigned int pos;
-
-	if (err == -EOVERFLOW)
-		/* Decrypted value had no leading 0 byte */
-		err = -EINVAL;
+	u8 *out_buf;
 
 	if (err)
 		goto done;
 
-	if (req_ctx->child_req.dst_len != ctx->key_size - 1) {
-		err = -EINVAL;
+	err = -EINVAL;
+	dst_len = req_ctx->child_req.dst_len;
+	if (dst_len < ctx->key_size - 1)
 		goto done;
+
+	out_buf = req_ctx->out_buf;
+	if (dst_len == ctx->key_size) {
+		if (out_buf[0] != 0x00)
+			/* Decrypted value had no leading 0 byte */
+			goto done;
+
+		dst_len--;
+		out_buf++;
 	}
 
-	if (req_ctx->out_buf[0] != 0x02) {
-		err = -EINVAL;
+	if (out_buf[0] != 0x02)
 		goto done;
-	}
-	for (pos = 1; pos < req_ctx->child_req.dst_len; pos++)
-		if (req_ctx->out_buf[pos] == 0x00)
+
+	for (pos = 1; pos < dst_len; pos++)
+		if (out_buf[pos] == 0x00)
 			break;
-	if (pos < 9 || pos == req_ctx->child_req.dst_len) {
-		err = -EINVAL;
+	if (pos < 9 || pos == dst_len)
 		goto done;
-	}
 	pos++;
 
-	if (req->dst_len < req_ctx->child_req.dst_len - pos)
+	err = 0;
+
+	if (req->dst_len < dst_len - pos)
 		err = -EOVERFLOW;
-	req->dst_len = req_ctx->child_req.dst_len - pos;
+	req->dst_len = dst_len - pos;
 
 	if (!err)
 		sg_copy_from_buffer(req->dst,
 				sg_nents_for_len(req->dst, req->dst_len),
-				req_ctx->out_buf + pos, req->dst_len);
+				out_buf + pos, req->dst_len);
 
 done:
 	kzfree(req_ctx->out_buf);
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* [PATCH v2] crypto: caam - fix sg dump
From: Catalin Vasile @ 2016-09-22  8:57 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-crypto-owner, horia.geanta, Catalin Vasile

Ensure scatterlists have a virtual memory mapping before dumping.

Signed-off-by: Catalin Vasile <cata.vasile@nxp.com>
---
Changes:
	V2:
		* resolved issue of sleeping in atomic contexts
---
---
 drivers/crypto/caam/caamalg.c | 79 +++++++++++++++++++++++++++++++++----------
 1 file changed, 61 insertions(+), 18 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index f1116e7..eb97562 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -111,6 +111,42 @@
 #else
 #define debug(format, arg...)
 #endif
+
+#ifdef DEBUG
+#include <linux/highmem.h>
+
+static void dbg_dump_sg(const char *level, const char *prefix_str,
+			int prefix_type, int rowsize, int groupsize,
+			struct scatterlist *sg, size_t tlen, bool ascii,
+			bool may_sleep)
+{
+	struct scatterlist *it;
+	void *it_page;
+	size_t len;
+	void *buf;
+
+	for (it = sg; it != NULL && tlen > 0 ; it = sg_next(sg)) {
+		/*
+		 * make sure the scatterlist's page
+		 * has a valid virtual memory mapping
+		 */
+		it_page = kmap_atomic(sg_page(it));
+		if (unlikely(!it_page)) {
+			printk(KERN_ERR "dbg_dump_sg: kmap failed\n");
+			return;
+		}
+
+		buf = it_page + it->offset;
+		len = min(tlen, it->length);
+		print_hex_dump(level, prefix_str, prefix_type, rowsize,
+			       groupsize, buf, len, ascii);
+		tlen -= len;
+
+		kunmap_atomic(it_page);
+	}
+}
+#endif
+
 static struct list_head alg_list;
 
 struct caam_alg_entry {
@@ -1982,9 +2018,9 @@ static void ablkcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
 	print_hex_dump(KERN_ERR, "dstiv  @"__stringify(__LINE__)": ",
 		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
 		       edesc->src_nents > 1 ? 100 : ivsize, 1);
-	print_hex_dump(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
-		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
-		       edesc->dst_nents > 1 ? 100 : req->nbytes, 1);
+	dbg_dump_sg(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
+		    DUMP_PREFIX_ADDRESS, 16, 4, req->dst,
+		    edesc->dst_nents > 1 ? 100 : req->nbytes, 1, true);
 #endif
 
 	ablkcipher_unmap(jrdev, edesc, req);
@@ -2014,9 +2050,9 @@ static void ablkcipher_decrypt_done(struct device *jrdev, u32 *desc, u32 err,
 	print_hex_dump(KERN_ERR, "dstiv  @"__stringify(__LINE__)": ",
 		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
 		       ivsize, 1);
-	print_hex_dump(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
-		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
-		       edesc->dst_nents > 1 ? 100 : req->nbytes, 1);
+	dbg_dump_sg(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
+		    DUMP_PREFIX_ADDRESS, 16, 4, req->dst,
+		    edesc->dst_nents > 1 ? 100 : req->nbytes, 1, true);
 #endif
 
 	ablkcipher_unmap(jrdev, edesc, req);
@@ -2171,12 +2207,15 @@ static void init_ablkcipher_job(u32 *sh_desc, dma_addr_t ptr,
 	int len, sec4_sg_index = 0;
 
 #ifdef DEBUG
+	bool may_sleep = ((req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
+					      CRYPTO_TFM_REQ_MAY_SLEEP)) != 0);
 	print_hex_dump(KERN_ERR, "presciv@"__stringify(__LINE__)": ",
 		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
 		       ivsize, 1);
-	print_hex_dump(KERN_ERR, "src    @"__stringify(__LINE__)": ",
-		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
-		       edesc->src_nents ? 100 : req->nbytes, 1);
+	printk(KERN_ERR "asked=%d, nbytes%d\n", (int)edesc->src_nents ? 100 : req->nbytes, req->nbytes);
+	dbg_dump_sg(KERN_ERR, "src    @"__stringify(__LINE__)": ",
+		    DUMP_PREFIX_ADDRESS, 16, 4, req->src,
+		    edesc->src_nents ? 100 : req->nbytes, 1, may_sleep);
 #endif
 
 	len = desc_len(sh_desc);
@@ -2228,12 +2267,14 @@ static void init_ablkcipher_giv_job(u32 *sh_desc, dma_addr_t ptr,
 	int len, sec4_sg_index = 0;
 
 #ifdef DEBUG
+	bool may_sleep = ((req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
+					      CRYPTO_TFM_REQ_MAY_SLEEP)) != 0);
 	print_hex_dump(KERN_ERR, "presciv@" __stringify(__LINE__) ": ",
 		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
 		       ivsize, 1);
-	print_hex_dump(KERN_ERR, "src    @" __stringify(__LINE__) ": ",
-		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
-		       edesc->src_nents ? 100 : req->nbytes, 1);
+	dbg_dump_sg(KERN_ERR, "src    @" __stringify(__LINE__) ": ",
+		    DUMP_PREFIX_ADDRESS, 16, 4, req->src,
+		    edesc->src_nents ? 100 : req->nbytes, 1, may_sleep);
 #endif
 
 	len = desc_len(sh_desc);
@@ -2503,18 +2544,20 @@ static int aead_decrypt(struct aead_request *req)
 	u32 *desc;
 	int ret = 0;
 
+#ifdef DEBUG
+	bool may_sleep = ((req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
+					      CRYPTO_TFM_REQ_MAY_SLEEP)) != 0);
+	dbg_dump_sg(KERN_ERR, "dec src@"__stringify(__LINE__)": ",
+		    DUMP_PREFIX_ADDRESS, 16, 4, req->src,
+		    req->assoclen + req->cryptlen, 1, may_sleep);
+#endif
+
 	/* allocate extended descriptor */
 	edesc = aead_edesc_alloc(req, AUTHENC_DESC_JOB_IO_LEN,
 				 &all_contig, false);
 	if (IS_ERR(edesc))
 		return PTR_ERR(edesc);
 
-#ifdef DEBUG
-	print_hex_dump(KERN_ERR, "dec src@"__stringify(__LINE__)": ",
-		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
-		       req->assoclen + req->cryptlen, 1);
-#endif
-
 	/* Create and submit job descriptor*/
 	init_authenc_job(req, edesc, all_contig, false);
 #ifdef DEBUG
-- 
1.8.3.1

^ permalink raw reply related

* BUG: rsa-pkcs1pad decrypt regression in 4.8
From: Mat Martineau @ 2016-09-21 23:39 UTC (permalink / raw)
  To: linux-crypto, herbert; +Cc: smueller


Herbert -

There was a regression in pkcs1pad signature verification, related to 
signature verification, that you fixed in commit 27710b8ea3defcb:

https://git.kernel.org/cgit/linux/kernel/git/herbert/crypto-2.6.git/commit/?id=27710b8ea3defcbd7d340dbd0423d911b4eb7c4f

There is a very similar problem in the decrypt operation, which was not 
adjusted for the leading zero changes. See pkcs1pad_decrypt_complete().

I haven't had a chance to test a fix yet, but with the final 4.8 release 
coming up very soon I wanted to report the issue.


Regards,

--
Mat Martineau
Intel OTC

^ permalink raw reply

* Re: [RFC PATCH v1 03/28] kvm: svm: Use the hardware provided GPA instead of page walk
From: Borislav Petkov @ 2016-09-21 17:16 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: simon.guinot, linux-efi, kvm, rkrcmar, matt, linus.walleij,
	linux-mm, paul.gortmaker, hpa, dan.j.williams, aarcange, sfr,
	andriy.shevchenko, herbert, bhe, xemul, joro, x86, mingo, msalter,
	ross.zwisler, dyoung, thomas.lendacky, jroedel, keescook,
	toshi.kani, mathieu.desnoyers, devel, tglx, mchehab,
	iamjoonsoo.kim, labbott, tony.luck, alexandre.bounine,
	kuleshovmail, linux-kernel, mcgrof, linux-crypto
In-Reply-To: <147190824754.9523.13923968456167130181.stgit@brijesh-build-machine>

On Mon, Aug 22, 2016 at 07:24:07PM -0400, Brijesh Singh wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> When a guest causes a NPF which requires emulation, KVM sometimes walks
> the guest page tables to translate the GVA to a GPA. This is unnecessary
> most of the time on AMD hardware since the hardware provides the GPA in
> EXITINFO2.
> 
> The only exception cases involve string operations involving rep or
> operations that use two memory locations. With rep, the GPA will only be
> the value of the initial NPF and with dual memory locations we won't know
> which memory address was translated into EXITINFO2.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/kvm_emulate.h |    3 +++
>  arch/x86/include/asm/kvm_host.h    |    3 +++
>  arch/x86/kvm/svm.c                 |    2 ++
>  arch/x86/kvm/x86.c                 |   17 ++++++++++++++++-
>  4 files changed, 24 insertions(+), 1 deletion(-)

FWIW, LGTM. (Gotta love replying in acronyms :-))

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC PATCH v1 02/28] kvm: svm: Add kvm_fast_pio_in support
From: Borislav Petkov @ 2016-09-21 10:58 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: simon.guinot-jKBdWWKqtFpg9hUCZPvPmw,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	rkrcmar-H+wXaHxf7aLQT0dZR+AlfA,
	matt-mF/unelCI9GS6iBeEJttW/XRex20P6io,
	linus.walleij-QSEj5FYQhm4dnm+yROfE0A,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	paul.gortmaker-CWA4WttNNZF54TAoqtyWWQ, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w,
	aarcange-H+wXaHxf7aLQT0dZR+AlfA, sfr-3FnU+UHB4dNDw9hX6IcOSA,
	andriy.shevchenko-VuQAYsv1563Yd54FQh9/CA,
	herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
	bhe-H+wXaHxf7aLQT0dZR+AlfA, xemul-bzQdu9zFT3WakBO8gow8eQ,
	joro-zLv9SwRftAIdnm+yROfE0A, x86-DgEjT+Ai2ygdnm+yROfE0A,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, msalter-H+wXaHxf7aLQT0dZR+AlfA,
	ross.zwisler-VuQAYsv1563Yd54FQh9/CA,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	jroedel-l3A5Bk7waGM, keescook-F7+t8E8rja9g9hUCZPvPmw,
	toshi.kani-ZPxbGqLxI0U, mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w,
	devel-tBiZLqfeLfOHmIFyCCdPziST3g8Odh+X,
	tglx-hfZtesqFncYOwBW4kG4KsQ, mchehab-DgEjT+Ai2ygdnm+yROfE0A,
	iamjoonsoo.kim-Hm3cg6mZ9cc,
	labbott-rxtnV0ftBwyoClj4AeEUq9i2O/JbrIOy,
	tony.luck-ral2JQCrhuEAvxtiuMwx3w, alexandre.bounine
In-Reply-To: <147190823395.9523.16184607551630730040.stgit@brijesh-build-machine>

On Mon, Aug 22, 2016 at 07:23:54PM -0400, Brijesh Singh wrote:
> From: Tom Lendacky <thomas.lendacky-5C7GfCeVMHo@public.gmane.org>
> 
> Update the I/O interception support to add the kvm_fast_pio_in function
> to speed up the in instruction similar to the out instruction.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky-5C7GfCeVMHo@public.gmane.org>
> ---
>  arch/x86/include/asm/kvm_host.h |    1 +
>  arch/x86/kvm/svm.c              |    5 +++--
>  arch/x86/kvm/x86.c              |   43 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 47 insertions(+), 2 deletions(-)

FWIW: Reviewed-by: Borislav Petkov <bp-l3A5Bk7waGM@public.gmane.org>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply

* Re: crypto: caam from tasklet to threadirq
From: Thomas Gleixner @ 2016-09-20 23:31 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Cata Vasile, Horia Geanta Neag, linux-crypto@vger.kernel.org
In-Reply-To: <20160920213219.GR1041@n2100.armlinux.org.uk>

On Tue, 20 Sep 2016, Russell King - ARM Linux wrote:
> each of those places where af_alg_wait_for_completion() called, we
> end up submitting a bunch of data and then immediately waiting for
> the operation to complete... and this can be seen in the perf
> trace logs.

That'd explain it.
 
> So, unless I'm mistaken, there's no way for a crypto backend to run
> asynchronously, and there's no way for a crypto backend to batch up
> the "job" - in order to do that, I think it would have to store quite
> a lot of state.

Hmm.

> Now, I'm not entirely sure that asking perf to record irq:* and
> sched:* events was what we were after - there appears to be no trace
> events recorded for entering a threaded IRQ handler.

Indeed. We can only deduce it from the thread being woken and scheduled
in/out. ?me makes note to add a tracepoint in the thread handler
invocation.

> So 123us (260322 - 260199) to the switch to openssl via the threaded IRQ.

> 101us (667202 - 667101) between the same two events, which is 22us
> faster than the above.

So it looks like the two extra context switches are responsible for that
delta.
 
> Attached are compressed files of the perf script -G output.  If you
> want the perf.data files, I have them (I'm not sure how useful they
> are without the binaries though.)

Thanks. I'll have a look tomorrow when brain is unfried.

> > Vs. the PMU interrupt thing. What's the politics about that? Do you have
> > any pointers?
> 
> I just remember there being a discussion about how stupid FSL have been
> and "we're not going to support that" - the perf code wants the per-CPU
> performance unit interrupts delivered _on_ the CPU to which the perf
> unit is attached.  FSL decided in their stupidity to OR all the perf
> unit interrupts together and route them to a single common interrupt.

Brilliant.
 
> This means that we end up with one CPU taking the perf interrupt for
> any perf unit - and the CPUs can only access their local perf unit.
> So, if (eg) CPU1's perf unit fires an interrupt, but the common
> interrupt is routed to CPU0, CPU0 checks its perf unit, finds no
> interrupt, and returns with IRQ_NONE.
> 
> There's no mechanism in perf (or anywhere else) to hand the interrupt
> over to another CPU.
> 
> The result is that trying to run perf on the multi-core iMX SoCs ends
> up with the perf interrupt disabled, at which point perf collapses in
> a sad pile.

Not surprising.

Solving that in perf is probably the wrong place. So what we'd need is some
kind of special irq flow handler which does:

	ret = handle_irq(desc);
	if (ret == IRQ_NONE && desc->ipi_next) {
     		dest = get_next_cpu(this_cpu);
		if (dest != this_cpu)
			desc->ipi_next(dest);
			
     }

get_next_cpu() would just pick the next cpu in the online mask or the first
when this_cpu is the last one in the mask.

That shouldn't be overly complex to implement. All you'd need to do in the
PMU driver is to hook into that IPI vector.

If you're interested then I can hack the core bits.

Thanks,

	tglx




   

      	

^ permalink raw reply

* Re: crypto: caam from tasklet to threadirq
From: Russell King - ARM Linux @ 2016-09-20 21:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Cata Vasile, Horia Geanta Neag, linux-crypto@vger.kernel.org
In-Reply-To: <alpine.DEB.2.20.1609202202520.5476@nanos>

[-- Attachment #1: Type: text/plain, Size: 8782 bytes --]

On Tue, Sep 20, 2016 at 10:10:20PM +0200, Thomas Gleixner wrote:
> On Tue, 20 Sep 2016, Russell King - ARM Linux wrote:
> > which corresponds to an 8% slowdown for the threaded IRQ case.  So,
> > tasklets are indeed faster than threaded IRQs.
> 
> Fair enough.
> 
> > I've tried to perf it, but...
> >  ....
>  
> > So, sorry, I'm not going to bother trying to get any further with this.
> > If the job was not made harder stupid hardware design and kernel
> > politics, then I might be more inclined to do deeper investigation, but
> > right now I'm finding that I'm not interested in trying to jump through
> > these stupid hoops.
> 
> I'd be very interested in a sched_switch + irq + softirq trace which does
> not involve PMU hardware for both irqthreads and tasklets, but I can
> understand if you can't be bothered to gather it.

That's involved a rebuild of perf to get it to see the trace events.
What I see probably indicates why the crypto AF_ALG way of doing
things sucks big time - the interface to the crypto backends is
entirely synchronous.

This means that we're guaranteed to get one interrupt per msghdr
entry:

crypto/algif_hash.c:
        lock_sock(sk);
        if (!ctx->more) {
                err = af_alg_wait_for_completion(crypto_ahash_init(&ctx->req),
                                                &ctx->completion);
                if (err)
                        goto unlock;
        }
...
        while (msg_data_left(msg)) {
...
                ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, NULL, len);

                err = af_alg_wait_for_completion(crypto_ahash_update(&ctx->req),
                                                 &ctx->completion);
...
        }
...
        ctx->more = msg->msg_flags & MSG_MORE;
        if (!ctx->more) {
                ahash_request_set_crypt(&ctx->req, NULL, ctx->result, 0);
                err = af_alg_wait_for_completion(crypto_ahash_final(&ctx->req),
                                                 &ctx->completion);
        }

each of those places where af_alg_wait_for_completion() called, we
end up submitting a bunch of data and then immediately waiting for
the operation to complete... and this can be seen in the perf
trace logs.

So, unless I'm mistaken, there's no way for a crypto backend to run
asynchronously, and there's no way for a crypto backend to batch up
the "job" - in order to do that, I think it would have to store quite
a lot of state.

Now, I'm not entirely sure that asking perf to record irq:* and
sched:* events was what we were after - there appears to be no trace
events recorded for entering a threaded IRQ handler.

swapper     0 [000]  3600.260199:             irq:irq_handler_entry: irq=311 name=2101000.jr0
swapper     0 [000]  3600.260209:              irq:irq_handler_exit: irq=311 ret=handled
swapper     0 [000]  3600.260219:                sched:sched_waking: comm=irq/311-2101000 pid=2426 prio=49 target_cpu=000
swapper     0 [000]  3600.260236:                sched:sched_wakeup: comm=irq/311-2101000 pid=2426 prio=49 target_cpu=000
swapper     0 [000]  3600.260258:                sched:sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=irq/311-2101000 next_pid=2426 next_prio=49
irq/311-2101000  2426 [000]  3600.260278:                sched:sched_waking: comm=openssl pid=8005 prio=120 target_cpu=000
irq/311-2101000  2426 [000]  3600.260296:                sched:sched_wakeup: comm=openssl pid=8005 prio=120 target_cpu=000
irq/311-2101000  2426 [000]  3600.260322:                sched:sched_switch: prev_comm=irq/311-2101000 prev_pid=2426 prev_prio=49 prev_state=S ==> next_comm=openssl next_pid=8005 next_prio=120
openssl  8005 [000]  3600.260369:                sched:sched_waking: comm=dd pid=8002 prio=120 target_cpu=000
openssl  8005 [000]  3600.260388:          sched:sched_stat_runtime: comm=openssl pid=8005 runtime=71000 [ns] vruntime=419661176835 [ns]
openssl  8005 [000]  3600.260401:                sched:sched_wakeup: comm=dd pid=8002 prio=120 target_cpu=000
openssl  8005 [000]  3600.260421:                sched:sched_switch: prev_comm=openssl prev_pid=8005 prev_prio=120 prev_state=R ==> next_comm=dd next_pid=8002 next_prio=120
dd  8002 [000]  3600.260459:          sched:sched_stat_runtime: comm=dd pid=8002 runtime=71667 [ns] vruntime=419655248502 [ns]
dd  8002 [000]  3600.260473:                sched:sched_switch: prev_comm=dd prev_pid=8002 prev_prio=120 prev_state=S ==> next_comm=openssl next_pid=8005 next_prio=120
openssl  8005 [000]  3600.260572:          sched:sched_stat_runtime: comm=openssl pid=8005 runtime=112666 [ns] vruntime=419661289501 [ns]
openssl  8005 [000]  3600.260587:                sched:sched_switch: prev_comm=openssl prev_pid=8005 prev_prio=120 prev_state=D ==> next_comm=swapper/0 next_pid=0 next_prio=120
swapper     0 [000]  3600.260638:             irq:irq_handler_entry: irq=311 name=2101000.jr0
...

So 123us (260322 - 260199) to the switch to openssl via the threaded IRQ.

tasklet case:

swapper     0 [000]  5082.667101:             irq:irq_handler_entry: irq=311 name=2101000.jr0
swapper     0 [000]  5082.667111:                 irq:softirq_raise: vec=6 [action=TASKLET]
swapper     0 [000]  5082.667119:              irq:irq_handler_exit: irq=311 ret=handled
swapper     0 [000]  5082.667134:                 irq:softirq_entry: vec=6 [action=TASKLET]
swapper     0 [000]  5082.667151:                sched:sched_waking: comm=openssl pid=8251 prio=120 target_cpu=000
swapper     0 [000]  5082.667169:                sched:sched_wakeup: comm=openssl pid=8251 prio=120 target_cpu=000
swapper     0 [000]  5082.667183:                  irq:softirq_exit: vec=6 [action=TASKLET]
swapper     0 [000]  5082.667202:                sched:sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=openssl next_pid=8251 next_prio=120
openssl  8251 [000]  5082.667248:                sched:sched_waking: comm=dd pid=8248 prio=120 target_cpu=000
openssl  8251 [000]  5082.667267:          sched:sched_stat_runtime: comm=openssl pid=8251 runtime=39668 [ns] vruntime=444714027428 [ns]
openssl  8251 [000]  5082.667280:                sched:sched_wakeup: comm=dd pid=8248 prio=120 target_cpu=000
openssl  8251 [000]  5082.667301:                sched:sched_switch: prev_comm=openssl prev_pid=8251 prev_prio=120 prev_state=R ==> next_comm=dd next_pid=8248 next_prio=120
dd  8248 [000]  5082.667339:          sched:sched_stat_runtime: comm=dd pid=8248 runtime=70000 [ns] vruntime=444708097428 [ns]
dd  8248 [000]  5082.667354:                sched:sched_switch: prev_comm=dd prev_pid=8248 prev_prio=120 prev_state=S ==> next_comm=openssl next_pid=8251 next_prio=120
openssl  8251 [000]  5082.667451:          sched:sched_stat_runtime: comm=openssl pid=8251 runtime=113666 [ns] vruntime=444714141094 [ns]
openssl  8251 [000]  5082.667466:                sched:sched_switch: prev_comm=openssl prev_pid=8251 prev_prio=120 prev_state=D ==> next_comm=swapper/0 next_pid=0 next_prio=120
swapper     0 [000]  5082.667517:             irq:irq_handler_entry: irq=311 name=2101000.jr0

101us (667202 - 667101) between the same two events, which is 22us
faster than the above.

In both cases, I picked out an irq_handler_entry event which was near
to a whole number of 100us.  I haven't looked any deeper to see what
the variations are in the hard IRQ->openssl schedule latency - I
expect that needs some scripts written.

Attached are compressed files of the perf script -G output.  If you
want the perf.data files, I have them (I'm not sure how useful they
are without the binaries though.)

> Vs. the PMU interrupt thing. What's the politics about that? Do you have
> any pointers?

I just remember there being a discussion about how stupid FSL have been
and "we're not going to support that" - the perf code wants the per-CPU
performance unit interrupts delivered _on_ the CPU to which the perf
unit is attached.  FSL decided in their stupidity to OR all the perf
unit interrupts together and route them to a single common interrupt.

This means that we end up with one CPU taking the perf interrupt for
any perf unit - and the CPUs can only access their local perf unit.
So, if (eg) CPU1's perf unit fires an interrupt, but the common
interrupt is routed to CPU0, CPU0 checks its perf unit, finds no
interrupt, and returns with IRQ_NONE.

There's no mechanism in perf (or anywhere else) to hand the interrupt
over to another CPU.

The result is that trying to run perf on the multi-core iMX SoCs ends
up with the perf interrupt disabled, at which point perf collapses in
a sad pile.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

[-- Attachment #2: threaded.txt.bz2 --]
[-- Type: application/x-bzip2, Size: 23786 bytes --]

[-- Attachment #3: tasklet.txt.bz2 --]
[-- Type: application/x-bzip2, Size: 23617 bytes --]

^ permalink raw reply

* Re: crypto: caam from tasklet to threadirq
From: Thomas Gleixner @ 2016-09-20 20:10 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Cata Vasile, Horia Geanta Neag, linux-crypto@vger.kernel.org
In-Reply-To: <20160920161228.GQ1041@n2100.armlinux.org.uk>

On Tue, 20 Sep 2016, Russell King - ARM Linux wrote:
> which corresponds to an 8% slowdown for the threaded IRQ case.  So,
> tasklets are indeed faster than threaded IRQs.

Fair enough.

> I've tried to perf it, but...
>  ....
 
> So, sorry, I'm not going to bother trying to get any further with this.
> If the job was not made harder stupid hardware design and kernel
> politics, then I might be more inclined to do deeper investigation, but
> right now I'm finding that I'm not interested in trying to jump through
> these stupid hoops.

I'd be very interested in a sched_switch + irq + softirq trace which does
not involve PMU hardware for both irqthreads and tasklets, but I can
understand if you can't be bothered to gather it.

Vs. the PMU interrupt thing. What's the politics about that? Do you have
any pointers?
 
> I think I've proven from the above that this patch needs to be reverted
> due to the performance regression, and that there _is_ most definitely
> a deterimental effect of switching from tasklets to threaded IRQs.

I agree that the revert should happen, but I'd rather see a bit more
information why this regression happens with the switch from tasklets to
threaded irqs.

Thanks,

	tglx

^ permalink raw reply

* Re: crypto: caam from tasklet to threadirq
From: Russell King - ARM Linux @ 2016-09-20 16:12 UTC (permalink / raw)
  To: Cata Vasile, Thomas Gleixner
  Cc: Horia Geanta Neag, linux-crypto@vger.kernel.org
In-Reply-To: <DB5PR04MB1302AD00536D2E37E726E104EEF30@DB5PR04MB1302.eurprd04.prod.outlook.com>

Okay, I've re-tested, using a different way of measuring, because using
openssl speed is impractical for off-loaded engines.  I've decided to
use this way to measure the performance:

dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5

For the threaded IRQs case gives:

0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k
0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k
0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k
	=> 5.36s => 25.0MB/s

and the tasklet case:

0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k
0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k
0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k
	=> 4.95 => 27.1MB/s

which corresponds to an 8% slowdown for the threaded IRQ case.  So,
tasklets are indeed faster than threaded IRQs.

I guess the reason is that tasklets are much simpler, being able to
run just before we return to userspace without involving scheduler
overheads, but that's speculation.

I've tried to perf it, but...

Samples: 31K of event 'cycles', Event count (approx.): 3552246846
  Overhead  Command          Shared Object     Symbol
+   33.22%  kworker/0:1      [kernel.vmlinux]  [k] __do_softirq
+   15.78%  irq/311-2101000  [kernel.vmlinux]  [k] __do_softirq
+    7.49%  irqbalance       [kernel.vmlinux]  [k] __do_softirq
+    7.26%  openssl          [kernel.vmlinux]  [k] __do_softirq
+    5.71%  ksoftirqd/0      [kernel.vmlinux]  [k] __do_softirq
+    3.64%  kworker/0:2      [kernel.vmlinux]  [k] __do_softirq
+    3.52%  swapper          [kernel.vmlinux]  [k] __do_softirq
+    3.14%  kworker/0:1      [kernel.vmlinux]  [k] _raw_spin_unlock_irq

I was going to try to get the threaded IRQ case, but I've ended up with
perf getting buggered because of the iMX6 SMP perf disfunctionality:

[ 3448.810416] irq 24: nobody cared (try booting with the "irqpoll" option)
...
[ 3448.824528] Disabling IRQ #24

caused by FSL's utterly brain-dead idea of routing all the perf
interrupts to single non-CPU local interrupt input, and the refusal of
kernel folk to find an acceptable solution to support this.

So, sorry, I'm not going to bother trying to get any further with this.
If the job was not made harder stupid hardware design and kernel
politics, then I might be more inclined to do deeper investigation, but
right now I'm finding that I'm not interested in trying to jump through
these stupid hoops.

I think I've proven from the above that this patch needs to be reverted
due to the performance regression, and that there _is_ most definitely
a deterimental effect of switching from tasklets to threaded IRQs.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply

* [PATCH] hwrng: omap - Only fail if pm_runtime_get_sync returns < 0
From: Dave Gerlach @ 2016-09-20 15:25 UTC (permalink / raw)
  To: Deepak Saxena, Matt Mackall, Herbert Xu, Nishanth Menon
  Cc: linux-kernel, linux-crypto, linux-arm-kernel, linux-omap,
	Dave Gerlach

Currently omap-rng checks the return value of pm_runtime_get_sync and
reports failure if anything is returned, however it should be checking
if ret < 0 as pm_runtime_get_sync return 0 on success but also can return
1 if the device was already active which is not a failure case. Only
values < 0 are actual failures.

Fixes: 61dc0a446e5d ("hwrng: omap - Fix assumption that runtime_get_sync will always succeed")
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
---
 drivers/char/hw_random/omap-rng.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c
index 01d4be2c354b..f5c26a5f6875 100644
--- a/drivers/char/hw_random/omap-rng.c
+++ b/drivers/char/hw_random/omap-rng.c
@@ -385,7 +385,7 @@ static int omap_rng_probe(struct platform_device *pdev)
 
 	pm_runtime_enable(&pdev->dev);
 	ret = pm_runtime_get_sync(&pdev->dev);
-	if (ret) {
+	if (ret < 0) {
 		dev_err(&pdev->dev, "Failed to runtime_get device: %d\n", ret);
 		pm_runtime_put_noidle(&pdev->dev);
 		goto err_ioremap;
@@ -443,7 +443,7 @@ static int __maybe_unused omap_rng_resume(struct device *dev)
 	int ret;
 
 	ret = pm_runtime_get_sync(dev);
-	if (ret) {
+	if (ret < 0) {
 		dev_err(dev, "Failed to runtime_get device: %d\n", ret);
 		pm_runtime_put_noidle(dev);
 		return ret;
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH] trusted-keys: skcipher bug info
From: Mimi Zohar @ 2016-09-20 12:59 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Howells, linux-crypto, keyrings
In-Reply-To: <20160920123555.GA22272@gondor.apana.org.au>

On Tue, 2016-09-20 at 20:35 +0800, Herbert Xu wrote:
> On Tue, Sep 20, 2016 at 08:11:51AM -0400, Mimi Zohar wrote:
> > Hi Herbert,
> > 
> > The initial random iv value, initialized in encrypted_init(), should
> > not be modified.  Commit c3917fd "KEYS: Use skcipher", which replaced
> > the blkcipher with skcipher, modifies the iv in
> > crypto_skcipher_encrypt()/decrypt().
> > 
> > The following example creates an encrypted key, writes the key to a
> > file, and then loads the key from the file.  To illustrate the problem,
> > this patch provides crypto_skcipher_encrypt()/decrypt() with a copy of
> > the iv.  With this change, the resulting test-key and test-key1 keys
> > are the same.
> 
> Sorry, I missed the subtlety.  This patch should fix the problem.

Thanks!

Mimi
> 
> ---8<---
> Subject: KEYS: Fix skcipher IV clobbering
> 
> The IV must not be modified by the skcipher operation so we need
> to duplicate it.
> 
> Fixes: c3917fd9dfbc ("KEYS: Use skcipher")
> Cc: stable@vger.kernel.org
> Reported-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
> 
> diff --git a/security/keys/encrypted-keys/encrypted.c b/security/keys/encrypted-keys/encrypted.c
> index 5adbfc3..17a0610 100644
> --- a/security/keys/encrypted-keys/encrypted.c
> +++ b/security/keys/encrypted-keys/encrypted.c
> @@ -29,6 +29,7 @@
>  #include <linux/rcupdate.h>
>  #include <linux/scatterlist.h>
>  #include <linux/ctype.h>
> +#include <crypto/aes.h>
>  #include <crypto/hash.h>
>  #include <crypto/sha.h>
>  #include <crypto/skcipher.h>
> @@ -478,6 +479,7 @@ static int derived_key_encrypt(struct encrypted_key_payload *epayload,
>  	struct crypto_skcipher *tfm;
>  	struct skcipher_request *req;
>  	unsigned int encrypted_datalen;
> +	u8 iv[AES_BLOCK_SIZE];
>  	unsigned int padlen;
>  	char pad[16];
>  	int ret;
> @@ -500,8 +502,8 @@ static int derived_key_encrypt(struct encrypted_key_payload *epayload,
>  	sg_init_table(sg_out, 1);
>  	sg_set_buf(sg_out, epayload->encrypted_data, encrypted_datalen);
> 
> -	skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen,
> -				   epayload->iv);
> +	memcpy(iv, epayload->iv, sizeof(iv));
> +	skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen, iv);
>  	ret = crypto_skcipher_encrypt(req);
>  	tfm = crypto_skcipher_reqtfm(req);
>  	skcipher_request_free(req);
> @@ -581,6 +583,7 @@ static int derived_key_decrypt(struct encrypted_key_payload *epayload,
>  	struct crypto_skcipher *tfm;
>  	struct skcipher_request *req;
>  	unsigned int encrypted_datalen;
> +	u8 iv[AES_BLOCK_SIZE];
>  	char pad[16];
>  	int ret;
> 
> @@ -599,8 +602,8 @@ static int derived_key_decrypt(struct encrypted_key_payload *epayload,
>  		   epayload->decrypted_datalen);
>  	sg_set_buf(&sg_out[1], pad, sizeof pad);
> 
> -	skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen,
> -				   epayload->iv);
> +	memcpy(iv, epayload->iv, sizeof(iv));
> +	skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen, iv);
>  	ret = crypto_skcipher_decrypt(req);
>  	tfm = crypto_skcipher_reqtfm(req);
>  	skcipher_request_free(req);
> 

^ permalink raw reply

* Re: [PATCH v5] KEYS: add SP800-56A KDF support for DH
From: Stephan Mueller @ 2016-09-20 12:51 UTC (permalink / raw)
  To: dhowells; +Cc: herbert, mathew.j.martineau, linux-crypto, keyrings
In-Reply-To: <1571629.ErTDR5PMQO@positron.chronox.de>

Am Freitag, 19. August 2016, 20:39:09 CEST schrieb Stephan Mueller:

Hi David,

> 
> SP800-56A defines the use of DH with key derivation function based on a
> counter. The input to the KDF is defined as (DH shared secret || other
> information). The value for the "other information" is to be provided by
> the caller.
> 
> The KDF is implemented using the hash support from the kernel crypto API.
> The implementation uses the symmetric hash support as the input to the
> hash operation is usually very small. The caller is allowed to specify
> the hash name that he wants to use to derive the key material allowing
> the use of all supported hashes provided with the kernel crypto API.
> 
> As the KDF implements the proper truncation of the DH shared secret to
> the requested size, this patch fills the caller buffer up to its size.
> 
> The patch is tested with a new test added to the keyutils user space
> code which uses a CAVS test vector testing the compliance with
> SP800-56A.

Is there a decision about this patch set?

Ciao
Stephan

^ permalink raw reply

* Re: [PATCH] trusted-keys: skcipher bug info
From: Herbert Xu @ 2016-09-20 12:35 UTC (permalink / raw)
  To: Mimi Zohar; +Cc: David Howells, linux-crypto, keyrings
In-Reply-To: <1474373511.14532.9.camel@linux.vnet.ibm.com>

On Tue, Sep 20, 2016 at 08:11:51AM -0400, Mimi Zohar wrote:
> Hi Herbert,
> 
> The initial random iv value, initialized in encrypted_init(), should
> not be modified.  Commit c3917fd "KEYS: Use skcipher", which replaced
> the blkcipher with skcipher, modifies the iv in
> crypto_skcipher_encrypt()/decrypt().
> 
> The following example creates an encrypted key, writes the key to a
> file, and then loads the key from the file.  To illustrate the problem,
> this patch provides crypto_skcipher_encrypt()/decrypt() with a copy of
> the iv.  With this change, the resulting test-key and test-key1 keys
> are the same.

Sorry, I missed the subtlety.  This patch should fix the problem.

---8<---
Subject: KEYS: Fix skcipher IV clobbering

The IV must not be modified by the skcipher operation so we need
to duplicate it.

Fixes: c3917fd9dfbc ("KEYS: Use skcipher")
Cc: stable@vger.kernel.org
Reported-by: Mimi Zohar <zohar@linux.vnet.ibm.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/security/keys/encrypted-keys/encrypted.c b/security/keys/encrypted-keys/encrypted.c
index 5adbfc3..17a0610 100644
--- a/security/keys/encrypted-keys/encrypted.c
+++ b/security/keys/encrypted-keys/encrypted.c
@@ -29,6 +29,7 @@
 #include <linux/rcupdate.h>
 #include <linux/scatterlist.h>
 #include <linux/ctype.h>
+#include <crypto/aes.h>
 #include <crypto/hash.h>
 #include <crypto/sha.h>
 #include <crypto/skcipher.h>
@@ -478,6 +479,7 @@ static int derived_key_encrypt(struct encrypted_key_payload *epayload,
 	struct crypto_skcipher *tfm;
 	struct skcipher_request *req;
 	unsigned int encrypted_datalen;
+	u8 iv[AES_BLOCK_SIZE];
 	unsigned int padlen;
 	char pad[16];
 	int ret;
@@ -500,8 +502,8 @@ static int derived_key_encrypt(struct encrypted_key_payload *epayload,
 	sg_init_table(sg_out, 1);
 	sg_set_buf(sg_out, epayload->encrypted_data, encrypted_datalen);
 
-	skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen,
-				   epayload->iv);
+	memcpy(iv, epayload->iv, sizeof(iv));
+	skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen, iv);
 	ret = crypto_skcipher_encrypt(req);
 	tfm = crypto_skcipher_reqtfm(req);
 	skcipher_request_free(req);
@@ -581,6 +583,7 @@ static int derived_key_decrypt(struct encrypted_key_payload *epayload,
 	struct crypto_skcipher *tfm;
 	struct skcipher_request *req;
 	unsigned int encrypted_datalen;
+	u8 iv[AES_BLOCK_SIZE];
 	char pad[16];
 	int ret;
 
@@ -599,8 +602,8 @@ static int derived_key_decrypt(struct encrypted_key_payload *epayload,
 		   epayload->decrypted_datalen);
 	sg_set_buf(&sg_out[1], pad, sizeof pad);
 
-	skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen,
-				   epayload->iv);
+	memcpy(iv, epayload->iv, sizeof(iv));
+	skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen, iv);
 	ret = crypto_skcipher_decrypt(req);
 	tfm = crypto_skcipher_reqtfm(req);
 	skcipher_request_free(req);

-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* Re: [PATCH v7 4/9] crypto: acomp - add support for lzo via scomp
From: Giovanni Cabiddu @ 2016-09-20 12:23 UTC (permalink / raw)
  To: Herbert Xu; +Cc: linux-crypto, giovanni.cabiddu
In-Reply-To: <20160920092618.GA20808@gondor.apana.org.au>

Hi Herbert,

apologies for the duplicate. The previous email didn't get delivered to
the ML.

On Tue, Sep 20, 2016 at 05:26:18PM +0800, Herbert Xu wrote:
> Rather than duplicating the scratch buffer handling in every scomp
> algorithm, let's centralize this and put it into scomp.c.
Are you suggesting to hide the scratch buffers from the scomp
implementations and allocate them inside crypto_register_scomp?
Does that mean we have to go back to a scomp_alg with a flat buffer API
and linearize inside scomp?
If we take this direction, how do we support DMA from scomp
implementation? Scratch buffers are allocated using vmalloc.

Thanks,

--
Giovanni

^ permalink raw reply

* [PATCH] trusted-keys: skcipher bug info
From: Mimi Zohar @ 2016-09-20 12:11 UTC (permalink / raw)
  To: Herbert Xu; +Cc: David Howells, linux-crypto, keyrings

Hi Herbert,

The initial random iv value, initialized in encrypted_init(), should
not be modified.  Commit c3917fd "KEYS: Use skcipher", which replaced
the blkcipher with skcipher, modifies the iv in
crypto_skcipher_encrypt()/decrypt().

The following example creates an encrypted key, writes the key to a
file, and then loads the key from the file.  To illustrate the problem,
this patch provides crypto_skcipher_encrypt()/decrypt() with a copy of
the iv.  With this change, the resulting test-key and test-key1 keys
are the same.

keyctl add user kmk "`dd if=/dev/urandom bs=1 count=32 2>/dev/null`" @u
keyctl pipe `keyctl search @u user kmk` > ~/tmp/kmk
keyctl add encrypted test-key "new user:kmk 64" @u
keyctl pipe `keyctl search @u encrypted test-key` > /tmp/test-key1
keyctl add encrypted test-key1 "load `cat /tmp/test-key1`" @u

keyctl print `keyctl search @u encrypted test-key`
keyctl print `keyctl search @u encrypted test-key1`

Either the underlying problem should be fixed or commit c3917fd 
"KEYS: Use skcipher" should be reverted.

thanks,

Mimi

---
 security/keys/encrypted-keys/encrypted.c | 22 +++++++++++++++++++---
 1 file changed, 19 insertions(+), 3 deletions(-)

diff --git a/security/keys/encrypted-keys/encrypted.c b/security/keys/encrypted-keys/encrypted.c
index 5adbfc3..e94f586 100644
--- a/security/keys/encrypted-keys/encrypted.c
+++ b/security/keys/encrypted-keys/encrypted.c
@@ -480,6 +480,7 @@ static int derived_key_encrypt(struct encrypted_key_payload *epayload,
 	unsigned int encrypted_datalen;
 	unsigned int padlen;
 	char pad[16];
+	u8 iv[16];
 	int ret;
 
 	encrypted_datalen = roundup(epayload->decrypted_datalen, blksize);
@@ -500,9 +501,19 @@ static int derived_key_encrypt(struct encrypted_key_payload *epayload,
 	sg_init_table(sg_out, 1);
 	sg_set_buf(sg_out, epayload->encrypted_data, encrypted_datalen);
 
+	memcpy(iv, epayload->iv, 16);	/* iv is modified */
 	skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen,
-				   epayload->iv);
+				   iv);
+print_hex_dump(KERN_INFO, "original iv: ", DUMP_PREFIX_NONE, 32, 1,
+	       epayload->iv, ivsize, 0);
+print_hex_dump(KERN_INFO, "copied iv: ", DUMP_PREFIX_NONE, 32, 1,
+	       iv, ivsize, 0);
 	ret = crypto_skcipher_encrypt(req);
+print_hex_dump(KERN_INFO, "original iv: ", DUMP_PREFIX_NONE, 32, 1,
+	       epayload->iv, ivsize, 0);
+print_hex_dump(KERN_INFO, "modified iv: ", DUMP_PREFIX_NONE, 32, 1,
+	       iv, ivsize, 0);
+
 	tfm = crypto_skcipher_reqtfm(req);
 	skcipher_request_free(req);
 	crypto_free_skcipher(tfm);
@@ -582,6 +593,7 @@ static int derived_key_decrypt(struct encrypted_key_payload *epayload,
 	struct skcipher_request *req;
 	unsigned int encrypted_datalen;
 	char pad[16];
+	u8 iv[16];
 	int ret;
 
 	encrypted_datalen = roundup(epayload->decrypted_datalen, blksize);
@@ -599,8 +611,9 @@ static int derived_key_decrypt(struct encrypted_key_payload *epayload,
 		   epayload->decrypted_datalen);
 	sg_set_buf(&sg_out[1], pad, sizeof pad);
 
+	memcpy(iv, epayload->iv, 16);	/* iv is modified */
 	skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen,
-				   epayload->iv);
+				   iv);
 	ret = crypto_skcipher_decrypt(req);
 	tfm = crypto_skcipher_reqtfm(req);
 	skcipher_request_free(req);
@@ -778,8 +791,11 @@ static int encrypted_init(struct encrypted_key_payload *epayload,
 
 		get_random_bytes(epayload->decrypted_data,
 				 epayload->decrypted_datalen);
-	} else
+	} else {
 		ret = encrypted_key_decrypt(epayload, format, hex_encoded_iv);
+print_hex_dump(KERN_INFO, "init iv: ", DUMP_PREFIX_NONE, 32, 1,
+	       epayload->iv, ivsize, 0);
+	}
 	return ret;
 }
 
-- 
2.7.4

^ permalink raw reply related

* Re: [PATCH v7 4/9] crypto: acomp - add support for lzo via scomp
From: Herbert Xu @ 2016-09-20  9:26 UTC (permalink / raw)
  To: Giovanni Cabiddu; +Cc: linux-crypto
In-Reply-To: <1473770981-9869-5-git-send-email-giovanni.cabiddu@intel.com>

On Tue, Sep 13, 2016 at 01:49:36PM +0100, Giovanni Cabiddu wrote:
>
> +	lzo_src_scratches = crypto_scomp_alloc_scratches(LZO_SCRATCH_SIZE);
> +	if (!lzo_src_scratches)
> +		return -ENOMEM;

Rather than duplicating the scratch buffer handling in every scomp
algorithm, let's centralize this and put it into scomp.c.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] crypto: ccp - Fix return value check in ccp_dmaengine_register()
From: Gary R Hook @ 2016-09-19 15:09 UTC (permalink / raw)
  To: Wei Yongjun, Tom Lendacky, Gary Hook, Herbert Xu
  Cc: Wei Yongjun, linux-crypto
In-Reply-To: <1474128082-14996-1-git-send-email-weiyj.lk@gmail.com>

On 09/17/2016 11:01 AM, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
>
> Fix the retrn value check which testing the wrong variable
> in ccp_dmaengine_register().
>
> Fixes: 58ea8abf4904 ("crypto: ccp - Register the CCP as a DMA resource")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>  drivers/crypto/ccp/ccp-dmaengine.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/crypto/ccp/ccp-dmaengine.c b/drivers/crypto/ccp/ccp-dmaengine.c
> index 94f77b0..32f645e 100644
> --- a/drivers/crypto/ccp/ccp-dmaengine.c
> +++ b/drivers/crypto/ccp/ccp-dmaengine.c
> @@ -650,7 +650,7 @@ int ccp_dmaengine_register(struct ccp_device *ccp)
>  	dma_desc_cache_name = devm_kasprintf(ccp->dev, GFP_KERNEL,
>  					     "%s-dmaengine-desc-cache",
>  					     ccp->name);
> -	if (!dma_cmd_cache_name)
> +	if (!dma_desc_cache_name)
>  		return -ENOMEM;
>  	ccp->dma_desc_cache = kmem_cache_create(dma_desc_cache_name,
>  						sizeof(struct ccp_dma_desc),
>

Acked-by: Gary R Hook <gary.hook@amd.com>

^ permalink raw reply

* Re: [PATCH -next] crypto: ccp - use kmem_cache_zalloc instead of kmem_cache_alloc/memset
From: Gary R Hook @ 2016-09-19 15:10 UTC (permalink / raw)
  To: Wei Yongjun, Tom Lendacky, Gary Hook, Herbert Xu
  Cc: Wei Yongjun, linux-crypto
In-Reply-To: <1473910084-31739-1-git-send-email-weiyj.lk@gmail.com>

On 09/14/2016 10:28 PM, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
>
> Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset().
>
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>
> ---
>  drivers/crypto/ccp/ccp-dmaengine.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
>
> diff --git a/drivers/crypto/ccp/ccp-dmaengine.c b/drivers/crypto/ccp/ccp-dmaengine.c
> index ded26f4..2e5a05c 100644
> --- a/drivers/crypto/ccp/ccp-dmaengine.c
> +++ b/drivers/crypto/ccp/ccp-dmaengine.c
> @@ -299,12 +299,10 @@ static struct ccp_dma_desc *ccp_alloc_dma_desc(struct ccp_dma_chan *chan,
>  {
>  	struct ccp_dma_desc *desc;
>
> -	desc = kmem_cache_alloc(chan->ccp->dma_desc_cache, GFP_NOWAIT);
> +	desc = kmem_cache_zalloc(chan->ccp->dma_desc_cache, GFP_NOWAIT);
>  	if (!desc)
>  		return NULL;
>
> -	memset(desc, 0, sizeof(*desc));
> -
>  	dma_async_tx_descriptor_init(&desc->tx_desc, &chan->dma_chan);
>  	desc->tx_desc.flags = flags;
>  	desc->tx_desc.tx_submit = ccp_tx_submit;
>

Acked-by: Gary R Hook <gary.hook@amd.com>

^ permalink raw reply

* [PATCH 8/8] crypto: omap-sham: shrink the internal buffer size
From: Tero Kristo @ 2016-09-19 15:22 UTC (permalink / raw)
  To: herbert, lokeshvutla, davem, linux-crypto, linux-omap; +Cc: linux-arm-kernel
In-Reply-To: <1474298539-23897-1-git-send-email-t-kristo@ti.com>

The current internal buffer size is way too large for crypto core, so
shrink it to be smaller. This makes the buffer to fit into the space
reserved for the export/import buffers also.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 drivers/crypto/omap-sham.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 8eefd79..d0b16e5 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -135,7 +135,7 @@
 #define OMAP_ALIGN_MASK		(sizeof(u32)-1)
 #define OMAP_ALIGNED		__attribute__((aligned(sizeof(u32))))
 
-#define BUFLEN			PAGE_SIZE
+#define BUFLEN			SHA512_BLOCK_SIZE
 #define OMAP_SHA_DMA_THRESHOLD	256
 
 struct omap_sham_dev;
-- 
1.9.1

^ permalink raw reply related

* [PATCH 7/8] crypto: omap-sham: add support for export/import
From: Tero Kristo @ 2016-09-19 15:22 UTC (permalink / raw)
  To: herbert, lokeshvutla, davem, linux-crypto, linux-omap; +Cc: linux-arm-kernel
In-Reply-To: <1474298539-23897-1-git-send-email-t-kristo@ti.com>

Now that the driver has been converted to use scatterlists for data
handling, add proper implementation for the export/import stubs also.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 drivers/crypto/omap-sham.c | 16 +++++++++++++---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 412559e..8eefd79 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -1418,12 +1418,21 @@ static void omap_sham_cra_exit(struct crypto_tfm *tfm)
 
 static int omap_sham_export(struct ahash_request *req, void *out)
 {
-	return -ENOTSUPP;
+	struct omap_sham_reqctx *rctx = ahash_request_ctx(req);
+
+	memcpy(out, rctx, sizeof(*rctx) + rctx->bufcnt);
+
+	return 0;
 }
 
 static int omap_sham_import(struct ahash_request *req, const void *in)
 {
-	return -ENOTSUPP;
+	struct omap_sham_reqctx *rctx = ahash_request_ctx(req);
+	const struct omap_sham_reqctx *ctx_in = in;
+
+	memcpy(rctx, in, sizeof(*rctx) + ctx_in->bufcnt);
+
+	return 0;
 }
 
 static struct ahash_alg algs_sha1_md5[] = {
@@ -2083,7 +2092,8 @@ static int omap_sham_probe(struct platform_device *pdev)
 			alg = &dd->pdata->algs_info[i].algs_list[j];
 			alg->export = omap_sham_export;
 			alg->import = omap_sham_import;
-			alg->halg.statesize = sizeof(struct omap_sham_reqctx);
+			alg->halg.statesize = sizeof(struct omap_sham_reqctx) +
+					      BUFLEN;
 			err = crypto_register_ahash(alg);
 			if (err)
 				goto err_algs;
-- 
1.9.1

^ permalink raw reply related

* [PATCH 6/8] crypto: omap-sham: convert driver logic to use sgs for data xmit
From: Tero Kristo @ 2016-09-19 15:22 UTC (permalink / raw)
  To: herbert, lokeshvutla, davem, linux-crypto, linux-omap; +Cc: linux-arm-kernel
In-Reply-To: <1474298539-23897-1-git-send-email-t-kristo@ti.com>

Currently, the internal buffer has been used for data transmission. Change
this so that scatterlists are used instead, and change the driver to
actually use the previously introduced helper functions for scatterlist
preparation.

This patch also removes the old buffer handling code which is no longer
needed.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 drivers/crypto/omap-sham.c | 344 ++++++++++-----------------------------------
 1 file changed, 74 insertions(+), 270 deletions(-)

diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 5c95bf9..412559e 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -116,7 +116,6 @@
 #define FLAGS_SGS_ALLOCED	10
 /* context flags */
 #define FLAGS_FINUP		16
-#define FLAGS_SG		17
 
 #define FLAGS_MODE_SHIFT	18
 #define FLAGS_MODE_MASK		(SHA_REG_MODE_ALGO_MASK	<< FLAGS_MODE_SHIFT)
@@ -150,13 +149,11 @@ struct omap_sham_reqctx {
 	size_t			digcnt;
 	size_t			bufcnt;
 	size_t			buflen;
-	dma_addr_t		dma_addr;
 
 	/* walk state */
 	struct scatterlist	*sg;
 	struct scatterlist	sgl[2];
-	struct scatterlist	sgl_tmp;
-	unsigned int		offset;	/* offset in current sg */
+	int			offset;	/* offset in current sg */
 	int			sg_len;
 	unsigned int		total;	/* total request */
 
@@ -516,12 +513,14 @@ static int omap_sham_poll_irq_omap4(struct omap_sham_dev *dd)
 			      SHA_REG_IRQSTATUS_INPUT_RDY);
 }
 
-static int omap_sham_xmit_cpu(struct omap_sham_dev *dd, const u8 *buf,
-			      size_t length, int final)
+static int omap_sham_xmit_cpu(struct omap_sham_dev *dd, size_t length,
+			      int final)
 {
 	struct omap_sham_reqctx *ctx = ahash_request_ctx(dd->req);
 	int count, len32, bs32, offset = 0;
-	const u32 *buffer = (const u32 *)buf;
+	const u32 *buffer;
+	int mlen;
+	struct sg_mapping_iter mi;
 
 	dev_dbg(dd->dev, "xmit_cpu: digcnt: %d, length: %d, final: %d\n",
 						ctx->digcnt, length, final);
@@ -531,6 +530,7 @@ static int omap_sham_xmit_cpu(struct omap_sham_dev *dd, const u8 *buf,
 
 	/* should be non-zero before next lines to disable clocks later */
 	ctx->digcnt += length;
+	ctx->total -= length;
 
 	if (final)
 		set_bit(FLAGS_FINAL, &dd->flags); /* catch last interrupt */
@@ -540,16 +540,35 @@ static int omap_sham_xmit_cpu(struct omap_sham_dev *dd, const u8 *buf,
 	len32 = DIV_ROUND_UP(length, sizeof(u32));
 	bs32 = get_block_size(ctx) / sizeof(u32);
 
+	sg_miter_start(&mi, ctx->sg, ctx->sg_len,
+		       SG_MITER_FROM_SG | SG_MITER_ATOMIC);
+
+	mlen = 0;
+
 	while (len32) {
 		if (dd->pdata->poll_irq(dd))
 			return -ETIMEDOUT;
 
-		for (count = 0; count < min(len32, bs32); count++, offset++)
+		for (count = 0; count < min(len32, bs32); count++, offset++) {
+			if (!mlen) {
+				sg_miter_next(&mi);
+				mlen = mi.length;
+				if (!mlen) {
+					pr_err("sg miter failure.\n");
+					return -EINVAL;
+				}
+				offset = 0;
+				buffer = mi.addr;
+			}
 			omap_sham_write(dd, SHA_REG_DIN(dd, count),
 					buffer[offset]);
+			mlen -= 4;
+		}
 		len32 -= min(len32, bs32);
 	}
 
+	sg_miter_stop(&mi);
+
 	return -EINPROGRESS;
 }
 
@@ -561,22 +580,27 @@ static void omap_sham_dma_callback(void *param)
 	tasklet_schedule(&dd->done_task);
 }
 
-static int omap_sham_xmit_dma(struct omap_sham_dev *dd, dma_addr_t dma_addr,
-			      size_t length, int final, int is_sg)
+static int omap_sham_xmit_dma(struct omap_sham_dev *dd, size_t length,
+			      int final)
 {
 	struct omap_sham_reqctx *ctx = ahash_request_ctx(dd->req);
 	struct dma_async_tx_descriptor *tx;
 	struct dma_slave_config cfg;
-	int len32, ret, dma_min = get_block_size(ctx);
+	int ret;
 
 	dev_dbg(dd->dev, "xmit_dma: digcnt: %d, length: %d, final: %d\n",
 						ctx->digcnt, length, final);
 
+	if (!dma_map_sg(dd->dev, ctx->sg, ctx->sg_len, DMA_TO_DEVICE)) {
+		dev_err(dd->dev, "dma_map_sg error\n");
+		return -EINVAL;
+	}
+
 	memset(&cfg, 0, sizeof(cfg));
 
 	cfg.dst_addr = dd->phys_base + SHA_REG_DIN(dd, 0);
 	cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
-	cfg.dst_maxburst = dma_min / DMA_SLAVE_BUSWIDTH_4_BYTES;
+	cfg.dst_maxburst = get_block_size(ctx) / DMA_SLAVE_BUSWIDTH_4_BYTES;
 
 	ret = dmaengine_slave_config(dd->dma_lch, &cfg);
 	if (ret) {
@@ -584,31 +608,12 @@ static int omap_sham_xmit_dma(struct omap_sham_dev *dd, dma_addr_t dma_addr,
 		return ret;
 	}
 
-	len32 = DIV_ROUND_UP(length, dma_min) * dma_min;
-
-	if (is_sg) {
-		/*
-		 * The SG entry passed in may not have the 'length' member
-		 * set correctly so use a local SG entry (sgl_tmp) with the
-		 * proper value for 'length' instead.  If this is not done,
-		 * the dmaengine may try to DMA the incorrect amount of data.
-		 */
-		sg_init_table(&ctx->sgl_tmp, 1);
-		sg_assign_page(&ctx->sgl_tmp, sg_page(ctx->sg));
-		ctx->sgl_tmp.offset = ctx->sg->offset;
-		sg_dma_len(&ctx->sgl_tmp) = len32;
-		sg_dma_address(&ctx->sgl_tmp) = sg_dma_address(ctx->sg);
-
-		tx = dmaengine_prep_slave_sg(dd->dma_lch, &ctx->sgl_tmp, 1,
-					     DMA_MEM_TO_DEV,
-					     DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
-	} else {
-		tx = dmaengine_prep_slave_single(dd->dma_lch, dma_addr, len32,
-			DMA_MEM_TO_DEV, DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
-	}
+	tx = dmaengine_prep_slave_sg(dd->dma_lch, ctx->sg, ctx->sg_len,
+				     DMA_MEM_TO_DEV,
+				     DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
 
 	if (!tx) {
-		dev_err(dd->dev, "prep_slave_sg/single() failed\n");
+		dev_err(dd->dev, "prep_slave_sg failed\n");
 		return -EINVAL;
 	}
 
@@ -618,6 +623,7 @@ static int omap_sham_xmit_dma(struct omap_sham_dev *dd, dma_addr_t dma_addr,
 	dd->pdata->write_ctrl(dd, length, final, 1);
 
 	ctx->digcnt += length;
+	ctx->total -= length;
 
 	if (final)
 		set_bit(FLAGS_FINAL, &dd->flags); /* catch last interrupt */
@@ -886,208 +892,13 @@ static int omap_sham_prepare_request(struct ahash_request *req, bool update)
 	return 0;
 }
 
-static size_t omap_sham_append_buffer(struct omap_sham_reqctx *ctx,
-				const u8 *data, size_t length)
-{
-	size_t count = min(length, ctx->buflen - ctx->bufcnt);
-
-	count = min(count, ctx->total);
-	if (count <= 0)
-		return 0;
-	memcpy(ctx->buffer + ctx->bufcnt, data, count);
-	ctx->bufcnt += count;
-
-	return count;
-}
-
-static size_t omap_sham_append_sg(struct omap_sham_reqctx *ctx)
-{
-	size_t count;
-	const u8 *vaddr;
-
-	while (ctx->sg) {
-		vaddr = kmap_atomic(sg_page(ctx->sg));
-		vaddr += ctx->sg->offset;
-
-		count = omap_sham_append_buffer(ctx,
-				vaddr + ctx->offset,
-				ctx->sg->length - ctx->offset);
-
-		kunmap_atomic((void *)vaddr);
-
-		if (!count)
-			break;
-		ctx->offset += count;
-		ctx->total -= count;
-		if (ctx->offset == ctx->sg->length) {
-			ctx->sg = sg_next(ctx->sg);
-			if (ctx->sg)
-				ctx->offset = 0;
-			else
-				ctx->total = 0;
-		}
-	}
-
-	return 0;
-}
-
-static int omap_sham_xmit_dma_map(struct omap_sham_dev *dd,
-					struct omap_sham_reqctx *ctx,
-					size_t length, int final)
-{
-	int ret;
-
-	ctx->dma_addr = dma_map_single(dd->dev, ctx->buffer, ctx->buflen,
-				       DMA_TO_DEVICE);
-	if (dma_mapping_error(dd->dev, ctx->dma_addr)) {
-		dev_err(dd->dev, "dma %u bytes error\n", ctx->buflen);
-		return -EINVAL;
-	}
-
-	ctx->flags &= ~BIT(FLAGS_SG);
-
-	ret = omap_sham_xmit_dma(dd, ctx->dma_addr, length, final, 0);
-	if (ret != -EINPROGRESS)
-		dma_unmap_single(dd->dev, ctx->dma_addr, ctx->buflen,
-				 DMA_TO_DEVICE);
-
-	return ret;
-}
-
-static int omap_sham_update_dma_slow(struct omap_sham_dev *dd)
-{
-	struct omap_sham_reqctx *ctx = ahash_request_ctx(dd->req);
-	unsigned int final;
-	size_t count;
-
-	omap_sham_append_sg(ctx);
-
-	final = (ctx->flags & BIT(FLAGS_FINUP)) && !ctx->total;
-
-	dev_dbg(dd->dev, "slow: bufcnt: %u, digcnt: %d, final: %d\n",
-					 ctx->bufcnt, ctx->digcnt, final);
-
-	if (final || (ctx->bufcnt == ctx->buflen && ctx->total)) {
-		count = ctx->bufcnt;
-		ctx->bufcnt = 0;
-		return omap_sham_xmit_dma_map(dd, ctx, count, final);
-	}
-
-	return 0;
-}
-
-/* Start address alignment */
-#define SG_AA(sg)	(IS_ALIGNED(sg->offset, sizeof(u32)))
-/* SHA1 block size alignment */
-#define SG_SA(sg, bs)	(IS_ALIGNED(sg->length, bs))
-
-static int omap_sham_update_dma_start(struct omap_sham_dev *dd)
-{
-	struct omap_sham_reqctx *ctx = ahash_request_ctx(dd->req);
-	unsigned int length, final, tail;
-	struct scatterlist *sg;
-	int ret, bs;
-
-	if (!ctx->total)
-		return 0;
-
-	if (ctx->bufcnt || ctx->offset)
-		return omap_sham_update_dma_slow(dd);
-
-	/*
-	 * Don't use the sg interface when the transfer size is less
-	 * than the number of elements in a DMA frame.  Otherwise,
-	 * the dmaengine infrastructure will calculate that it needs
-	 * to transfer 0 frames which ultimately fails.
-	 */
-	if (ctx->total < get_block_size(ctx))
-		return omap_sham_update_dma_slow(dd);
-
-	dev_dbg(dd->dev, "fast: digcnt: %d, bufcnt: %u, total: %u\n",
-			ctx->digcnt, ctx->bufcnt, ctx->total);
-
-	sg = ctx->sg;
-	bs = get_block_size(ctx);
-
-	if (!SG_AA(sg))
-		return omap_sham_update_dma_slow(dd);
-
-	if (!sg_is_last(sg) && !SG_SA(sg, bs))
-		/* size is not BLOCK_SIZE aligned */
-		return omap_sham_update_dma_slow(dd);
-
-	length = min(ctx->total, sg->length);
-
-	if (sg_is_last(sg)) {
-		if (!(ctx->flags & BIT(FLAGS_FINUP))) {
-			/* not last sg must be BLOCK_SIZE aligned */
-			tail = length & (bs - 1);
-			/* without finup() we need one block to close hash */
-			if (!tail)
-				tail = bs;
-			length -= tail;
-		}
-	}
-
-	if (!dma_map_sg(dd->dev, ctx->sg, 1, DMA_TO_DEVICE)) {
-		dev_err(dd->dev, "dma_map_sg  error\n");
-		return -EINVAL;
-	}
-
-	ctx->flags |= BIT(FLAGS_SG);
-
-	ctx->total -= length;
-	ctx->offset = length; /* offset where to start slow */
-
-	final = (ctx->flags & BIT(FLAGS_FINUP)) && !ctx->total;
-
-	ret = omap_sham_xmit_dma(dd, sg_dma_address(ctx->sg), length, final, 1);
-	if (ret != -EINPROGRESS)
-		dma_unmap_sg(dd->dev, ctx->sg, 1, DMA_TO_DEVICE);
-
-	return ret;
-}
-
-static int omap_sham_update_cpu(struct omap_sham_dev *dd)
-{
-	struct omap_sham_reqctx *ctx = ahash_request_ctx(dd->req);
-	int bufcnt, final;
-
-	if (!ctx->total)
-		return 0;
-
-	omap_sham_append_sg(ctx);
-
-	final = (ctx->flags & BIT(FLAGS_FINUP)) && !ctx->total;
-
-	dev_dbg(dd->dev, "cpu: bufcnt: %u, digcnt: %d, final: %d\n",
-		ctx->bufcnt, ctx->digcnt, final);
-
-	if (final || (ctx->bufcnt == ctx->buflen && ctx->total)) {
-		bufcnt = ctx->bufcnt;
-		ctx->bufcnt = 0;
-		return omap_sham_xmit_cpu(dd, ctx->buffer, bufcnt, final);
-	}
-
-	return 0;
-}
-
 static int omap_sham_update_dma_stop(struct omap_sham_dev *dd)
 {
 	struct omap_sham_reqctx *ctx = ahash_request_ctx(dd->req);
 
+	dma_unmap_sg(dd->dev, ctx->sg, ctx->sg_len, DMA_TO_DEVICE);
 
-	if (ctx->flags & BIT(FLAGS_SG)) {
-		dma_unmap_sg(dd->dev, ctx->sg, 1, DMA_TO_DEVICE);
-		if (ctx->sg->length == ctx->offset) {
-			ctx->sg = sg_next(ctx->sg);
-			if (ctx->sg)
-				ctx->offset = 0;
-		}
-	} else {
-		dma_unmap_single(dd->dev, ctx->dma_addr, ctx->buflen,
-				 DMA_TO_DEVICE);
-	}
+	clear_bit(FLAGS_DMA_ACTIVE, &dd->flags);
 
 	return 0;
 }
@@ -1148,6 +959,8 @@ static int omap_sham_init(struct ahash_request *req)
 
 	ctx->bufcnt = 0;
 	ctx->digcnt = 0;
+	ctx->total = 0;
+	ctx->offset = 0;
 	ctx->buflen = BUFLEN;
 
 	if (tctx->flags & BIT(FLAGS_HMAC)) {
@@ -1170,14 +983,19 @@ static int omap_sham_update_req(struct omap_sham_dev *dd)
 	struct ahash_request *req = dd->req;
 	struct omap_sham_reqctx *ctx = ahash_request_ctx(req);
 	int err;
+	bool final = ctx->flags & BIT(FLAGS_FINUP);
 
 	dev_dbg(dd->dev, "update_req: total: %u, digcnt: %d, finup: %d\n",
 		 ctx->total, ctx->digcnt, (ctx->flags & BIT(FLAGS_FINUP)) != 0);
 
+	if (ctx->total < get_block_size(ctx) ||
+	    ctx->total < OMAP_SHA_DMA_THRESHOLD)
+		ctx->flags |= BIT(FLAGS_CPU);
+
 	if (ctx->flags & BIT(FLAGS_CPU))
-		err = omap_sham_update_cpu(dd);
+		err = omap_sham_xmit_cpu(dd, ctx->total, final);
 	else
-		err = omap_sham_update_dma_start(dd);
+		err = omap_sham_xmit_dma(dd, ctx->total, final);
 
 	/* wait for dma completion before can take more data */
 	dev_dbg(dd->dev, "update: err: %d, digcnt: %d\n", err, ctx->digcnt);
@@ -1191,7 +1009,7 @@ static int omap_sham_final_req(struct omap_sham_dev *dd)
 	struct omap_sham_reqctx *ctx = ahash_request_ctx(req);
 	int err = 0, use_dma = 1;
 
-	if ((ctx->bufcnt <= get_block_size(ctx)) || dd->polling_mode)
+	if ((ctx->total <= get_block_size(ctx)) || dd->polling_mode)
 		/*
 		 * faster to handle last block with cpu or
 		 * use cpu when dma is not present.
@@ -1199,9 +1017,9 @@ static int omap_sham_final_req(struct omap_sham_dev *dd)
 		use_dma = 0;
 
 	if (use_dma)
-		err = omap_sham_xmit_dma_map(dd, ctx, ctx->bufcnt, 1);
+		err = omap_sham_xmit_dma(dd, ctx->total, 1);
 	else
-		err = omap_sham_xmit_cpu(dd, ctx->buffer, ctx->bufcnt, 1);
+		err = omap_sham_xmit_cpu(dd, ctx->total, 1);
 
 	ctx->bufcnt = 0;
 
@@ -1249,6 +1067,17 @@ static void omap_sham_finish_req(struct ahash_request *req, int err)
 	struct omap_sham_reqctx *ctx = ahash_request_ctx(req);
 	struct omap_sham_dev *dd = ctx->dd;
 
+	if (test_bit(FLAGS_SGS_COPIED, &dd->flags))
+		free_pages((unsigned long)sg_virt(ctx->sg),
+			   get_order(ctx->sg->length));
+
+	if (test_bit(FLAGS_SGS_ALLOCED, &dd->flags))
+		kfree(ctx->sg);
+
+	ctx->sg = NULL;
+
+	dd->flags &= ~(BIT(FLAGS_SGS_ALLOCED) | BIT(FLAGS_SGS_COPIED));
+
 	if (!err) {
 		dd->pdata->copy_hash(req, 1);
 		if (test_bit(FLAGS_FINAL, &dd->flags))
@@ -1300,7 +1129,7 @@ retry:
 	dd->req = req;
 	ctx = ahash_request_ctx(req);
 
-	err = omap_sham_prepare_request(NULL, ctx->op == OP_UPDATE);
+	err = omap_sham_prepare_request(req, ctx->op == OP_UPDATE);
 	if (err)
 		goto err1;
 
@@ -1356,34 +1185,15 @@ static int omap_sham_update(struct ahash_request *req)
 {
 	struct omap_sham_reqctx *ctx = ahash_request_ctx(req);
 	struct omap_sham_dev *dd = ctx->dd;
-	int bs = get_block_size(ctx);
 
 	if (!req->nbytes)
 		return 0;
 
-	ctx->total = req->nbytes;
-	ctx->sg = req->src;
-	ctx->offset = 0;
-
-	if (ctx->flags & BIT(FLAGS_FINUP)) {
-		if ((ctx->digcnt + ctx->bufcnt + ctx->total) < 240) {
-			/*
-			* OMAP HW accel works only with buffers >= 9
-			* will switch to bypass in final()
-			* final has the same request and data
-			*/
-			omap_sham_append_sg(ctx);
-			return 0;
-		} else if ((ctx->bufcnt + ctx->total <= bs) ||
-			   dd->polling_mode) {
-			/*
-			 * faster to use CPU for short transfers or
-			 * use cpu when dma is not present.
-			 */
-			ctx->flags |= BIT(FLAGS_CPU);
-		}
-	} else if (ctx->bufcnt + ctx->total < ctx->buflen) {
-		omap_sham_append_sg(ctx);
+	if (ctx->total + req->nbytes < ctx->buflen) {
+		scatterwalk_map_and_copy(ctx->buffer + ctx->bufcnt, req->src,
+					 0, req->nbytes, 0);
+		ctx->bufcnt += req->nbytes;
+		ctx->total += req->nbytes;
 		return 0;
 	}
 
@@ -1917,12 +1727,8 @@ static void omap_sham_done_task(unsigned long data)
 	}
 
 	if (test_bit(FLAGS_CPU, &dd->flags)) {
-		if (test_and_clear_bit(FLAGS_OUTPUT_READY, &dd->flags)) {
-			/* hash or semi-hash ready */
-			err = omap_sham_update_cpu(dd);
-			if (err != -EINPROGRESS)
-				goto finish;
-		}
+		if (test_and_clear_bit(FLAGS_OUTPUT_READY, &dd->flags))
+			goto finish;
 	} else if (test_bit(FLAGS_DMA_READY, &dd->flags)) {
 		if (test_and_clear_bit(FLAGS_DMA_ACTIVE, &dd->flags)) {
 			omap_sham_update_dma_stop(dd);
@@ -1934,8 +1740,6 @@ static void omap_sham_done_task(unsigned long data)
 		if (test_and_clear_bit(FLAGS_OUTPUT_READY, &dd->flags)) {
 			/* hash or semi-hash ready */
 			clear_bit(FLAGS_DMA_READY, &dd->flags);
-			err = omap_sham_update_dma_start(dd);
-			if (err != -EINPROGRESS)
 				goto finish;
 		}
 	}
-- 
1.9.1

^ permalink raw reply related

* [PATCH 4/8] crypto: omap-sham: add support functions for sg based data handling
From: Tero Kristo @ 2016-09-19 15:22 UTC (permalink / raw)
  To: herbert, lokeshvutla, davem, linux-crypto, linux-omap; +Cc: linux-arm-kernel
In-Reply-To: <1474298539-23897-1-git-send-email-t-kristo@ti.com>

Currently omap-sham uses a huge internal buffer for caching data, and
pushing this out to the DMA as large chunks. This, unfortunately,
doesn't work too well with the export/import functionality required
for ahash algorithms, and must be changed towards more scatterlist
centric approach.

This patch adds support functions for (mostly) scatterlist based data
handling. omap_sham_prepare_request() prepares a scatterlist for DMA
transfer to SHA crypto accelerator. This requires checking the data /
offset / length alignment of the data, splitting the data to SHA block
size granularity, and adding any remaining data back to the buffer.
With this patch, the code doesn't actually go live yet, the support code
will be taken properly into use with additional patches that modify the
SHA driver functionality itself.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 drivers/crypto/omap-sham.c | 263 +++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 263 insertions(+)

diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 33bea52..8558989 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -112,6 +112,8 @@
 #define FLAGS_DMA_READY		6
 #define FLAGS_AUTO_XOR		7
 #define FLAGS_BE32_SHA1		8
+#define FLAGS_SGS_COPIED	9
+#define FLAGS_SGS_ALLOCED	10
 /* context flags */
 #define FLAGS_FINUP		16
 #define FLAGS_SG		17
@@ -151,8 +153,10 @@ struct omap_sham_reqctx {
 
 	/* walk state */
 	struct scatterlist	*sg;
+	struct scatterlist	sgl[2];
 	struct scatterlist	sgl_tmp;
 	unsigned int		offset;	/* offset in current sg */
+	int			sg_len;
 	unsigned int		total;	/* total request */
 
 	u8			buffer[0] OMAP_ALIGNED;
@@ -223,6 +227,7 @@ struct omap_sham_dev {
 	struct dma_chan		*dma_lch;
 	struct tasklet_struct	done_task;
 	u8			polling_mode;
+	u8			xmit_buf[BUFLEN];
 
 	unsigned long		flags;
 	struct crypto_queue	queue;
@@ -626,6 +631,260 @@ static int omap_sham_xmit_dma(struct omap_sham_dev *dd, dma_addr_t dma_addr,
 	return -EINPROGRESS;
 }
 
+static int omap_sham_copy_sg_lists(struct omap_sham_reqctx *ctx,
+				   struct scatterlist *sg, int bs, int new_len)
+{
+	int n = sg_nents(sg);
+	struct scatterlist *tmp;
+	int offset = ctx->offset;
+
+	if (ctx->bufcnt)
+		n++;
+
+	ctx->sg = kmalloc_array(n, sizeof(*sg), GFP_KERNEL);
+	if (!ctx->sg)
+		return -ENOMEM;
+
+	sg_init_table(ctx->sg, n);
+
+	tmp = ctx->sg;
+
+	ctx->sg_len = 0;
+
+	if (ctx->bufcnt) {
+		sg_set_buf(tmp, ctx->dd->xmit_buf, ctx->bufcnt);
+		tmp = sg_next(tmp);
+		ctx->sg_len++;
+	}
+
+	while (sg && new_len) {
+		int len = sg->length - offset;
+
+		if (offset) {
+			offset -= sg->length;
+			if (offset < 0)
+				offset = 0;
+		}
+
+		if (new_len < len)
+			len = new_len;
+
+		if (len > 0) {
+			new_len -= len;
+			sg_set_page(tmp, sg_page(sg), len, sg->offset);
+			if (new_len <= 0)
+				sg_mark_end(tmp);
+			tmp = sg_next(tmp);
+			ctx->sg_len++;
+		}
+
+		sg = sg_next(sg);
+	}
+
+	set_bit(FLAGS_SGS_ALLOCED, &ctx->dd->flags);
+
+	ctx->bufcnt = 0;
+
+	return 0;
+}
+
+static int omap_sham_copy_sgs(struct omap_sham_reqctx *ctx,
+			      struct scatterlist *sg, int bs, int new_len)
+{
+	int pages;
+	void *buf;
+	int len;
+
+	len = new_len + ctx->bufcnt;
+
+	pages = get_order(ctx->total);
+
+	buf = (void *)__get_free_pages(GFP_ATOMIC, pages);
+	if (!buf) {
+		pr_err("Couldn't allocate pages for unaligned cases.\n");
+		return -ENOMEM;
+	}
+
+	if (ctx->bufcnt)
+		memcpy(buf, ctx->dd->xmit_buf, ctx->bufcnt);
+
+	scatterwalk_map_and_copy(buf + ctx->bufcnt, sg, ctx->offset,
+				 ctx->total - ctx->bufcnt, 0);
+	sg_init_table(ctx->sgl, 1);
+	sg_set_buf(ctx->sgl, buf, len);
+	ctx->sg = ctx->sgl;
+	set_bit(FLAGS_SGS_COPIED, &ctx->dd->flags);
+	ctx->sg_len = 1;
+	ctx->bufcnt = 0;
+	ctx->offset = 0;
+
+	return 0;
+}
+
+static int omap_sham_align_sgs(struct scatterlist *sg,
+			       int nbytes, int bs, bool final,
+			       struct omap_sham_reqctx *rctx)
+{
+	int n = 0;
+	bool aligned = true;
+	bool list_ok = true;
+	struct scatterlist *sg_tmp = sg;
+	int new_len;
+	int offset = rctx->offset;
+
+	if (!sg || !sg->length || !nbytes)
+		return 0;
+
+	new_len = nbytes;
+
+	if (offset)
+		list_ok = false;
+
+	if (final)
+		new_len = DIV_ROUND_UP(new_len, bs) * bs;
+	else
+		new_len = new_len / bs * bs;
+
+	while (nbytes > 0 && sg_tmp) {
+		n++;
+
+		if (offset < sg_tmp->length) {
+			if (!IS_ALIGNED(offset + sg_tmp->offset, 4)) {
+				aligned = false;
+				break;
+			}
+
+			if (!IS_ALIGNED(sg_tmp->length - offset, bs)) {
+				aligned = false;
+				break;
+			}
+		}
+
+		if (offset) {
+			offset -= sg_tmp->length;
+			if (offset < 0) {
+				nbytes += offset;
+				offset = 0;
+			}
+		} else {
+			nbytes -= sg_tmp->length;
+		}
+
+		sg_tmp = sg_next(sg_tmp);
+
+		if (nbytes < 0) {
+			list_ok = false;
+			break;
+		}
+	}
+
+	if (!aligned)
+		return omap_sham_copy_sgs(rctx, sg, bs, new_len);
+	else if (!list_ok)
+		return omap_sham_copy_sg_lists(rctx, sg, bs, new_len);
+
+	rctx->sg_len = n;
+	rctx->sg = sg;
+
+	return 0;
+}
+
+static int omap_sham_prepare_request(struct ahash_request *req, bool update)
+{
+	struct omap_sham_reqctx *rctx = ahash_request_ctx(req);
+	int bs;
+	int ret;
+	int nbytes;
+	bool final = rctx->flags & BIT(FLAGS_FINUP);
+	int xmit_len, hash_later;
+
+	if (!req)
+		return 0;
+
+	bs = get_block_size(rctx);
+
+	if (update)
+		nbytes = req->nbytes;
+	else
+		nbytes = 0;
+
+	rctx->total = nbytes + rctx->bufcnt;
+
+	if (!rctx->total)
+		return 0;
+
+	if (nbytes && (!IS_ALIGNED(rctx->bufcnt, bs))) {
+		int len = bs - rctx->bufcnt % bs;
+
+		if (len > nbytes)
+			len = nbytes;
+		scatterwalk_map_and_copy(rctx->buffer + rctx->bufcnt, req->src,
+					 0, len, 0);
+		rctx->bufcnt += len;
+		nbytes -= len;
+		rctx->offset = len;
+	}
+
+	if (rctx->bufcnt)
+		memcpy(rctx->dd->xmit_buf, rctx->buffer, rctx->bufcnt);
+
+	ret = omap_sham_align_sgs(req->src, nbytes, bs, final, rctx);
+	if (ret)
+		return ret;
+
+	xmit_len = rctx->total;
+
+	if (!IS_ALIGNED(xmit_len, bs)) {
+		if (final)
+			xmit_len = DIV_ROUND_UP(xmit_len, bs) * bs;
+		else
+			xmit_len = xmit_len / bs * bs;
+	}
+
+	hash_later = rctx->total - xmit_len;
+	if (hash_later < 0)
+		hash_later = 0;
+
+	if (rctx->bufcnt && nbytes) {
+		/* have data from previous operation and current */
+		sg_init_table(rctx->sgl, 2);
+		sg_set_buf(rctx->sgl, rctx->dd->xmit_buf, rctx->bufcnt);
+
+		sg_chain(rctx->sgl, 2, req->src);
+
+		rctx->sg = rctx->sgl;
+
+		rctx->sg_len++;
+	} else if (rctx->bufcnt) {
+		/* have buffered data only */
+		sg_init_table(rctx->sgl, 1);
+		sg_set_buf(rctx->sgl, rctx->dd->xmit_buf, xmit_len);
+
+		rctx->sg = rctx->sgl;
+
+		rctx->sg_len = 1;
+	}
+
+	if (hash_later) {
+		if (req->nbytes) {
+			scatterwalk_map_and_copy(rctx->buffer, req->src,
+						 req->nbytes - hash_later,
+						 hash_later, 0);
+		} else {
+			memcpy(rctx->buffer, rctx->buffer + xmit_len,
+			       hash_later);
+		}
+		rctx->bufcnt = hash_later;
+	} else {
+		rctx->bufcnt = 0;
+	}
+
+	if (!final)
+		rctx->total = xmit_len;
+
+	return 0;
+}
+
 static size_t omap_sham_append_buffer(struct omap_sham_reqctx *ctx,
 				const u8 *data, size_t length)
 {
@@ -1040,6 +1299,10 @@ retry:
 	dd->req = req;
 	ctx = ahash_request_ctx(req);
 
+	err = omap_sham_prepare_request(NULL, ctx->op == OP_UPDATE);
+	if (err)
+		goto err1;
+
 	dev_dbg(dd->dev, "handling new req, op: %lu, nbytes: %d\n",
 						ctx->op, req->nbytes);
 
-- 
1.9.1

^ permalink raw reply related

* [PATCH 5/8] crypto: omap-sham: change the DMA threshold value to a define
From: Tero Kristo @ 2016-09-19 15:22 UTC (permalink / raw)
  To: herbert, lokeshvutla, davem, linux-crypto, linux-omap; +Cc: linux-arm-kernel
In-Reply-To: <1474298539-23897-1-git-send-email-t-kristo@ti.com>

Currently the threshold value was hardcoded in the driver. Having a define
for it makes it easier to configure.

Signed-off-by: Tero Kristo <t-kristo@ti.com>
---
 drivers/crypto/omap-sham.c | 8 +++++---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/crypto/omap-sham.c b/drivers/crypto/omap-sham.c
index 8558989..5c95bf9 100644
--- a/drivers/crypto/omap-sham.c
+++ b/drivers/crypto/omap-sham.c
@@ -137,6 +137,7 @@
 #define OMAP_ALIGNED		__attribute__((aligned(sizeof(u32))))
 
 #define BUFLEN			PAGE_SIZE
+#define OMAP_SHA_DMA_THRESHOLD	256
 
 struct omap_sham_dev;
 
@@ -1435,10 +1436,11 @@ static int omap_sham_final(struct ahash_request *req)
 	/*
 	 * OMAP HW accel works only with buffers >= 9.
 	 * HMAC is always >= 9 because ipad == block size.
-	 * If buffersize is less than 240, we use fallback SW encoding,
-	 * as using DMA + HW in this case doesn't provide any benefit.
+	 * If buffersize is less than DMA_THRESHOLD, we use fallback
+	 * SW encoding, as using DMA + HW in this case doesn't provide
+	 * any benefit.
 	 */
-	if (!ctx->digcnt && ctx->bufcnt < 240)
+	if (!ctx->digcnt && ctx->bufcnt < OMAP_SHA_DMA_THRESHOLD)
 		return omap_sham_final_shash(req);
 	else if (ctx->bufcnt)
 		return omap_sham_enqueue(req, OP_FINAL);
-- 
1.9.1

^ permalink raw reply related


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox