Linux cryptographic layer development

Linux cryptographic layer development
 help / color / mirror / Atom feed

* Re: [PATCH 0/8] crypto: omap-sham: convert to sg based data engine
From: Herbert Xu @ 2016-09-22 10:46 UTC (permalink / raw)
  To: Tero Kristo
  Cc: lokeshvutla, davem, linux-crypto, linux-omap, linux-arm-kernel
In-Reply-To: <1474298539-23897-1-git-send-email-t-kristo@ti.com>

On Mon, Sep 19, 2016 at 06:22:11PM +0300, Tero Kristo wrote:
> Hi,
> 
> This series converts the omap-sham buffer handling towards a scatterlist
> approach. This avoids the need to have a huge internal buffer within the
> driver, and also allows us to properly implement export/import for the
> driver. I tried splitting up the changes to some sane patches, but this
> was rather difficult due to the fact that this is largely a complete
> rewrite of portions of the driver. Patch #6 is a prime example of this
> being pretty large, but splitting this up would break bisectability.
> Hopefully the patch is still understandable though.
> 
> Crypto manager tests work fine at least on omap3/am3/am4/dra7 SoC:s.
> (Something is broken with test farm again and could not try omap2/omap4.)
> 
> Also tested tcrypt SHA performance on DRA7 and it seems to be working
> fine with different buffer sizes.
> 
> My test branch is also available here for interested parties:
> tree: https://github.com/t-kristo/linux-pm.git
> breanch: 4.8-rc1-crypto

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] crypto: sun4i-ss: mark sun4i_hash() static
From: Herbert Xu @ 2016-09-22 10:45 UTC (permalink / raw)
  To: Baoyou Xie
  Cc: arnd, xie.baoyou, linux-kernel, wens, clabbe.montjoie,
	linux-crypto, maxime.ripard, davem, linux-arm-kernel
In-Reply-To: <1474203164-14884-1-git-send-email-baoyou.xie@linaro.org>

On Sun, Sep 18, 2016 at 08:52:44PM +0800, Baoyou Xie wrote:
> We get 1 warning when building kernel with W=1:
> drivers/crypto/sunxi-ss/sun4i-ss-hash.c:168:5: warning: no previous prototype for 'sun4i_hash' [-Wmissing-prototypes]
> 
> In fact, this function is only used in the file in which it is
> declared and don't need a declaration, but can be made static.
> So this patch marks it 'static'.
> 
> Signed-off-by: Baoyou Xie <baoyou.xie@linaro.org>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] crypto: ccp - Fix return value check in ccp_dmaengine_register()
From: Herbert Xu @ 2016-09-22 10:45 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: Tom Lendacky, Gary Hook, Wei Yongjun, linux-crypto
In-Reply-To: <1474128082-14996-1-git-send-email-weiyj.lk@gmail.com>

On Sat, Sep 17, 2016 at 04:01:22PM +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
> 
> Fix the retrn value check which testing the wrong variable
> in ccp_dmaengine_register().
> 
> Fixes: 58ea8abf4904 ("crypto: ccp - Register the CCP as a DMA resource")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH -next] hwrng: geode - fix return value check in mod_init()
From: Herbert Xu @ 2016-09-22 10:45 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Matt Mackall, PrasannaKumar Muralidharan, Wei Yongjun,
	linux-geode, linux-crypto
In-Reply-To: <1473990601-18721-1-git-send-email-weiyj.lk@gmail.com>

On Fri, Sep 16, 2016 at 01:50:01AM +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
> 
> In case of error, the function devm_ioremap() returns NULL pointer
> not ERR_PTR(). The IS_ERR() test in the return value check should
> be replaced with NULL test.
> 
> Fixes: 6e9b5e76882c ("hwrng: geode - Migrate to managed API")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH -next] hwrng: amd - Fix return value check in mod_init()
From: Herbert Xu @ 2016-09-22 10:44 UTC (permalink / raw)
  To: Wei Yongjun
  Cc: Matt Mackall, Corentin LABBE, PrasannaKumar Muralidharan,
	Wei Yongjun, linux-crypto
In-Reply-To: <1473990581-18602-1-git-send-email-weiyj.lk@gmail.com>

On Fri, Sep 16, 2016 at 01:49:41AM +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
> 
> In case of error, the function devm_kzalloc() or devm_ioport_map()
> return NULL pointer not ERR_PTR(). The IS_ERR() test in the return
> value check should be replaced with NULL test.
> 
> Fixes: 31b2a73c9c5f ("hwrng: amd - Migrate to managed API")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: crypto-caamhash: Fine-tuning for several function implementations
From: Herbert Xu @ 2016-09-22 10:44 UTC (permalink / raw)
  To: SF Markus Elfring
  Cc: linux-crypto, David S. Miller, Labbe Corentin, Russell King, LKML,
	kernel-janitors, Julia Lawall
In-Reply-To: <970e9e1b-c1dc-eb28-b380-92c15e9b1961@users.sourceforge.net>

On Thu, Sep 15, 2016 at 04:36:35PM +0200, SF Markus Elfring wrote:
> From: Markus Elfring <elfring@users.sourceforge.net>
> Date: Thu, 15 Sep 2016 16:27:23 +0200
> 
> Some update suggestions were taken into account
> from static source code analysis.
> 
> Markus Elfring (6):
>   Use kmalloc_array() in ahash_setkey()
>   Rename jump labels in ahash_setkey()
>   Rename a jump label in five functions
>   Return a value directly in caam_hash_cra_init()
>   Delete an unnecessary initialisation in seven functions
>   Move common error handling code in two functions
> 
>  drivers/crypto/caam/caamhash.c | 111 +++++++++++++++++++----------------------
>  1 file changed, 52 insertions(+), 59 deletions(-)

All applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH -next] crypto: ccp - use kmem_cache_zalloc instead of kmem_cache_alloc/memset
From: Herbert Xu @ 2016-09-22 10:43 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: Tom Lendacky, Gary Hook, Wei Yongjun, linux-crypto
In-Reply-To: <1473910084-31739-1-git-send-email-weiyj.lk@gmail.com>

On Thu, Sep 15, 2016 at 03:28:04AM +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
> 
> Using kmem_cache_zalloc() instead of kmem_cache_alloc() and memset().
> 
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH -next] crypto: omap-aes - fix error return code in omap_aes_probe()
From: Herbert Xu @ 2016-09-22 10:43 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: Baolin Wang, Wei Yongjun, linux-crypto
In-Reply-To: <1473910052-31515-1-git-send-email-weiyj.lk@gmail.com>

On Thu, Sep 15, 2016 at 03:27:32AM +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
> 
> Fix to return error code -ENOMEM from the crypto_engine_alloc_init()
> error handling case instead of 0, as done elsewhere in this function.
> 
> Fixes: 0529900a01cb ("crypto: omap-aes - Support crypto engine framework")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH -next] crypto: omap-des - fix error return code in omap_des_probe()
From: Herbert Xu @ 2016-09-22 10:43 UTC (permalink / raw)
  To: Wei Yongjun; +Cc: Baolin, Wei Yongjun, linux-crypto
In-Reply-To: <1473910035-31283-1-git-send-email-weiyj.lk@gmail.com>

On Thu, Sep 15, 2016 at 03:27:15AM +0000, Wei Yongjun wrote:
> From: Wei Yongjun <weiyongjun1@huawei.com>
> 
> Fix to return error code -ENOMEM from the crypto_engine_alloc_init()
> error handling case instead of 0, as done elsewhere in this function.
> 
> Fixes: f1b77aaca85a ("crypto: omap-des - Integrate with the crypto
> engine framework")
> Signed-off-by: Wei Yongjun <weiyongjun1@huawei.com>

Patch applied.  Thanks.
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v2] crypto: caam - fix sg dump
From: Horia Geanta Neag @ 2016-09-22 10:23 UTC (permalink / raw)
  To: Cata Vasile, linux-crypto@vger.kernel.org
  Cc: linux-crypto-owner@vger.kernel.org
In-Reply-To: <1474534678-10118-1-git-send-email-cata.vasile@nxp.com>

On 9/22/2016 11:58 AM, Catalin Vasile wrote:
> Ensure scatterlists have a virtual memory mapping before dumping.
> 
> Signed-off-by: Catalin Vasile <cata.vasile@nxp.com>
> ---
> Changes:
> 	V2:
> 		* resolved issue of sleeping in atomic contexts
> ---
> ---
>  drivers/crypto/caam/caamalg.c | 79 +++++++++++++++++++++++++++++++++----------
>  1 file changed, 61 insertions(+), 18 deletions(-)
> 
> diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
> index f1116e7..eb97562 100644
> --- a/drivers/crypto/caam/caamalg.c
> +++ b/drivers/crypto/caam/caamalg.c
> @@ -111,6 +111,42 @@
>  #else
>  #define debug(format, arg...)
>  #endif
> +
> +#ifdef DEBUG
> +#include <linux/highmem.h>
> +
> +static void dbg_dump_sg(const char *level, const char *prefix_str,
> +			int prefix_type, int rowsize, int groupsize,
> +			struct scatterlist *sg, size_t tlen, bool ascii,
> +			bool may_sleep)
may_sleep is no longer needed.

> +{
> +	struct scatterlist *it;
> +	void *it_page;
> +	size_t len;
> +	void *buf;
> +
> +	for (it = sg; it != NULL && tlen > 0 ; it = sg_next(sg)) {
> +		/*
> +		 * make sure the scatterlist's page
> +		 * has a valid virtual memory mapping
> +		 */
> +		it_page = kmap_atomic(sg_page(it));
> +		if (unlikely(!it_page)) {
> +			printk(KERN_ERR "dbg_dump_sg: kmap failed\n");
pr_err is preferred.

> +			return;
> +		}
> +
> +		buf = it_page + it->offset;
> +		len = min(tlen, it->length);
> +		print_hex_dump(level, prefix_str, prefix_type, rowsize,
> +			       groupsize, buf, len, ascii);
> +		tlen -= len;
> +
> +		kunmap_atomic(it_page);
> +	}
> +}
> +#endif
I see that the function currently shows up only in places where DEBUG
define is already used.
However, it would be nice to have a no-op implementation in case DEBUG
is not set, to avoid ifdeffery as much as possible.

> +
>  static struct list_head alg_list;
>  
>  struct caam_alg_entry {
> @@ -1982,9 +2018,9 @@ static void ablkcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
>  	print_hex_dump(KERN_ERR, "dstiv  @"__stringify(__LINE__)": ",
>  		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
>  		       edesc->src_nents > 1 ? 100 : ivsize, 1);
> -	print_hex_dump(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
> -		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
> -		       edesc->dst_nents > 1 ? 100 : req->nbytes, 1);
> +	dbg_dump_sg(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
> +		    DUMP_PREFIX_ADDRESS, 16, 4, req->dst,
> +		    edesc->dst_nents > 1 ? 100 : req->nbytes, 1, true);
>  #endif
>  
>  	ablkcipher_unmap(jrdev, edesc, req);
> @@ -2014,9 +2050,9 @@ static void ablkcipher_decrypt_done(struct device *jrdev, u32 *desc, u32 err,
>  	print_hex_dump(KERN_ERR, "dstiv  @"__stringify(__LINE__)": ",
>  		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
>  		       ivsize, 1);
> -	print_hex_dump(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
> -		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
> -		       edesc->dst_nents > 1 ? 100 : req->nbytes, 1);
> +	dbg_dump_sg(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
> +		    DUMP_PREFIX_ADDRESS, 16, 4, req->dst,
> +		    edesc->dst_nents > 1 ? 100 : req->nbytes, 1, true);
>  #endif
>  
>  	ablkcipher_unmap(jrdev, edesc, req);
> @@ -2171,12 +2207,15 @@ static void init_ablkcipher_job(u32 *sh_desc, dma_addr_t ptr,
>  	int len, sec4_sg_index = 0;
>  
>  #ifdef DEBUG
> +	bool may_sleep = ((req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
> +					      CRYPTO_TFM_REQ_MAY_SLEEP)) != 0);
>  	print_hex_dump(KERN_ERR, "presciv@"__stringify(__LINE__)": ",
>  		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
>  		       ivsize, 1);
> -	print_hex_dump(KERN_ERR, "src    @"__stringify(__LINE__)": ",
> -		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
> -		       edesc->src_nents ? 100 : req->nbytes, 1);
> +	printk(KERN_ERR "asked=%d, nbytes%d\n", (int)edesc->src_nents ? 100 : req->nbytes, req->nbytes);
> +	dbg_dump_sg(KERN_ERR, "src    @"__stringify(__LINE__)": ",
> +		    DUMP_PREFIX_ADDRESS, 16, 4, req->src,
> +		    edesc->src_nents ? 100 : req->nbytes, 1, may_sleep);
>  #endif
>  
>  	len = desc_len(sh_desc);
> @@ -2228,12 +2267,14 @@ static void init_ablkcipher_giv_job(u32 *sh_desc, dma_addr_t ptr,
>  	int len, sec4_sg_index = 0;
>  
>  #ifdef DEBUG
> +	bool may_sleep = ((req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
> +					      CRYPTO_TFM_REQ_MAY_SLEEP)) != 0);
>  	print_hex_dump(KERN_ERR, "presciv@" __stringify(__LINE__) ": ",
>  		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
>  		       ivsize, 1);
> -	print_hex_dump(KERN_ERR, "src    @" __stringify(__LINE__) ": ",
> -		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
> -		       edesc->src_nents ? 100 : req->nbytes, 1);
> +	dbg_dump_sg(KERN_ERR, "src    @" __stringify(__LINE__) ": ",
> +		    DUMP_PREFIX_ADDRESS, 16, 4, req->src,
> +		    edesc->src_nents ? 100 : req->nbytes, 1, may_sleep);
>  #endif
>  
>  	len = desc_len(sh_desc);
> @@ -2503,18 +2544,20 @@ static int aead_decrypt(struct aead_request *req)
>  	u32 *desc;
>  	int ret = 0;
>  
> +#ifdef DEBUG
> +	bool may_sleep = ((req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
> +					      CRYPTO_TFM_REQ_MAY_SLEEP)) != 0);
> +	dbg_dump_sg(KERN_ERR, "dec src@"__stringify(__LINE__)": ",
> +		    DUMP_PREFIX_ADDRESS, 16, 4, req->src,
> +		    req->assoclen + req->cryptlen, 1, may_sleep);
> +#endif
> +
The logic / flow shouldn't change (in this patch at least), leave the
print in the same place.

>  	/* allocate extended descriptor */
>  	edesc = aead_edesc_alloc(req, AUTHENC_DESC_JOB_IO_LEN,
>  				 &all_contig, false);
>  	if (IS_ERR(edesc))
>  		return PTR_ERR(edesc);
>  
> -#ifdef DEBUG
> -	print_hex_dump(KERN_ERR, "dec src@"__stringify(__LINE__)": ",
> -		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
> -		       req->assoclen + req->cryptlen, 1);
> -#endif
> -
>  	/* Create and submit job descriptor*/
>  	init_authenc_job(req, edesc, all_contig, false);
>  #ifdef DEBUG
> 


^ permalink raw reply

* Re: [PATCH] crypto: gcm - Fix IV buffer size in crypto_gcm_setkey
From: Herbert Xu @ 2016-09-22 10:32 UTC (permalink / raw)
  To: Ondrej Mosnáček; +Cc: linux-crypto
In-Reply-To: <CAAUqJDt=wXHbU1z5f_=r9tuJxUKDZKR1d9AURoDrye+iwnVrKQ@mail.gmail.com>

On Fri, Sep 16, 2016 at 02:07:40PM +0200, Ondrej Mosnáček wrote:
> The cipher block size for GCM is 16 bytes, and thus the CTR transform
> used in crypto_gcm_setkey() will also expect a 16-byte IV. However,
> the code currently reserves only 8 bytes for the IV, causing
> an out-of-bounds access in the CTR transform. This patch fixes
> the issue by setting the size of the IV buffer to 16 bytes.
> 
> Fixes: 84c911523020 ("[CRYPTO] gcm: Add support for async ciphers")
> Signed-off-by: Ondrej Mosnacek <omosnacek@gmail.com>
> ---
> I randomly noticed this while going over igcm.c for an unrelated
> reason. It seems the wrong buffer size never caused any noticeable
> problems (it's been there since 2007), but it should be corrected
> nonetheless...

Sorry, but your patch has been line-wrapped and doesn't apply.
Please resubmit.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH] Fix Kconfig dependencies for FIPS
From: Herbert Xu @ 2016-09-22  9:32 UTC (permalink / raw)
  To: NTU; +Cc: linux-crypto
In-Reply-To: <CAM5Ud7Ph1LR_8F7Pm9x55B9bozuQL+=6NZfVMjPtTPiDswfUHw@mail.gmail.com>

NTU <neotheuser@gmail.com> wrote:
> Currently FIPS depends on MODULE_SIG, even if MODULES is disabled.
> This change allows the enabling of FIPS without support for modules.
> 
> If module loading support is enabled, only then does
> FIPS require MODULE_SIG.
> 
> Signed-off-by: Alec Ari <neotheuser@gmail.com>
> ---
> crypto/Kconfig | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/crypto/Kconfig b/crypto/Kconfig
> index 84d7148..fd28805 100644
> --- a/crypto/Kconfig
> +++ b/crypto/Kconfig
> @@ -24,7 +24,7 @@ comment "Crypto core or helper"
> config CRYPTO_FIPS
>     bool "FIPS 200 compliance"
>     depends on (CRYPTO_ANSI_CPRNG || CRYPTO_DRBG) &&
> !CRYPTO_MANAGER_DISABLE_TESTS
> -    depends on MODULE_SIG
> +    depends on (MODULE_SIG || !MODULES)
>     help
>       This options enables the fips boot option which is
>       required if you want to system to operate in a FIPS 200

Your email has turned tabs into white spaces.  Please fix it
and resubmit.

Thanks,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH RESEND] crypto: doc - fix documentation for bulk registration functions
From: Herbert Xu @ 2016-09-22  9:28 UTC (permalink / raw)
  To: Eric Biggers; +Cc: davem, linux-crypto, linux-doc, ebiggers
In-Reply-To: <1473879397-110630-1-git-send-email-ebiggers@google.com>

Eric Biggers <ebiggers@google.com> wrote:
> Update the documentation for crypto_register_algs() and
> crypto_unregister_algs() to match the actual behavior.
> 
> Signed-off-by: Eric Biggers <ebiggers@google.com>

Your previous submission has already been applied.

Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [BUG] crypto: atmel-aes - erro when compiling with VERBOSE_DEBUG enable
From: Herbert Xu @ 2016-09-22  9:26 UTC (permalink / raw)
  To: levent demir; +Cc: linux-crypto, cyrille.pitchen
In-Reply-To: <1473779378.3556.20.camel@inria.fr>

levent demir <levent.demir@inria.fr> wrote:
> Hello, 
> 
> if you enable VERBOSE_DEBUG and compile you will have the following
> error : 
> 
> drivers/crypto/atmel-aes.c:323:5: error: too few arguments to function
> 'atmel_aes_reg_name'
>     atmel_aes_reg_name(offset, tmp));
>     ^
> include/linux/device.h:1306:41: note: in definition of macro 'dev_vdbg'
>   dev_printk(KERN_DEBUG, dev, format, ##arg); \
>                                         ^
> drivers/crypto/atmel-aes.c:205:20: note: declared here
> static const char *atmel_aes_reg_name(u32 offset, char *tmp, size_t sz)
> 
> Indeed, in atmel_aes_write function the call to atmel_aes_reg_name
> contains only two arguments instead of 3 : 
> 
> atmel_aes_reg_name(offset, tmp));
> 
> To fix it, one has to only add the size of tmp as third argument : 
> 
> atmel_aes_reg_name(offset, tmp, sizeof(tmp)));

Thanks for the patch.  In order to apply it, you need to fix the
white-space damage as well as add a sign-off.  For details please
refer to Documentation/SubmittingPatches.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [PATCH v7 4/9] crypto: acomp - add support for lzo via scomp
From: Herbert Xu @ 2016-09-22  9:22 UTC (permalink / raw)
  To: Giovanni Cabiddu; +Cc: linux-crypto
In-Reply-To: <20160920115140.GA12332@sivswdev01.ir.intel.com>

On Tue, Sep 20, 2016 at 12:51:40PM +0100, Giovanni Cabiddu wrote:
> Hi Herbert,
> 
> On Tue, Sep 20, 2016 at 05:26:18PM +0800, Herbert Xu wrote:
> > Rather than duplicating the scratch buffer handling in every scomp
> > algorithm, let's centralize this and put it into scomp.c.
> Are you suggesting to hide the scratch buffers from the scomp implementations 
> and allocate them inside crypto_register_scomp?

I'm suggesting that we have just one set of buffers for all scomp
algorithms.  After all, a CPU can only process one request at a
time.

> Does that mean we have to go back to a scomp_alg with a flat buffer API
> and linearize inside scomp?

Yes scomp should just be flat.  A sync algorithm capable of producing
partial output should use the acomp interface.

> If we take this direction, how do we support DMA from scomp implementation?
> Scratch buffers are allocated using vmalloc.

Huh? If you're doing DMA then you'd be using the acomp interface,
how can you even get into scomp?

I think you may have misread my earlier message from June.  What
I'd like to see is for the acomp layer to allocate the output
memory, rather than have it provided by the user as is the case
with the current interface.  The user could provide a maximum to
prevent crazy cases consuming unlimited memory.

What will happen with scomp is that it will simply use a linear
vmalloc buffer to store the output before copying it into an SG
list of individual pages allocated on the spot.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: BUG: rsa-pkcs1pad decrypt regression in 4.8
From: Herbert Xu @ 2016-09-22  9:04 UTC (permalink / raw)
  To: Mat Martineau; +Cc: linux-crypto, smueller
In-Reply-To: <alpine.OSX.2.20.1609211630400.14260@mjmartin-mac01.local>

On Wed, Sep 21, 2016 at 04:39:30PM -0700, Mat Martineau wrote:
> 
> Herbert -
> 
> There was a regression in pkcs1pad signature verification, related
> to signature verification, that you fixed in commit 27710b8ea3defcb:
> 
> https://git.kernel.org/cgit/linux/kernel/git/herbert/crypto-2.6.git/commit/?id=27710b8ea3defcbd7d340dbd0423d911b4eb7c4f
> 
> There is a very similar problem in the decrypt operation, which was
> not adjusted for the leading zero changes. See
> pkcs1pad_decrypt_complete().
> 
> I haven't had a chance to test a fix yet, but with the final 4.8
> release coming up very soon I wanted to report the issue.

Thanks.  This patch should fix the problem.

---8<---
crypto: rsa-pkcs1pad - Handle leading zero for decryption

As the software RSA implementation now produces fixed-length
output, we need to eliminate leading zeros in the calling code
instead.

This patch does just that for pkcs1pad decryption while signature
verification was fixed in an earlier patch.

Fixes: 9b45b7bba3d2 ("crypto: rsa - Generate fixed-length output")
Reported-by: Mat Martineau <mathew.j.martineau@linux.intel.com>
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>

diff --git a/crypto/rsa-pkcs1pad.c b/crypto/rsa-pkcs1pad.c
index 877019a..8baab43 100644
--- a/crypto/rsa-pkcs1pad.c
+++ b/crypto/rsa-pkcs1pad.c
@@ -298,41 +298,48 @@ static int pkcs1pad_decrypt_complete(struct akcipher_request *req, int err)
 	struct crypto_akcipher *tfm = crypto_akcipher_reqtfm(req);
 	struct pkcs1pad_ctx *ctx = akcipher_tfm_ctx(tfm);
 	struct pkcs1pad_request *req_ctx = akcipher_request_ctx(req);
+	unsigned int dst_len;
 	unsigned int pos;
-
-	if (err == -EOVERFLOW)
-		/* Decrypted value had no leading 0 byte */
-		err = -EINVAL;
+	u8 *out_buf;
 
 	if (err)
 		goto done;
 
-	if (req_ctx->child_req.dst_len != ctx->key_size - 1) {
-		err = -EINVAL;
+	err = -EINVAL;
+	dst_len = req_ctx->child_req.dst_len;
+	if (dst_len < ctx->key_size - 1)
 		goto done;
+
+	out_buf = req_ctx->out_buf;
+	if (dst_len == ctx->key_size) {
+		if (out_buf[0] != 0x00)
+			/* Decrypted value had no leading 0 byte */
+			goto done;
+
+		dst_len--;
+		out_buf++;
 	}
 
-	if (req_ctx->out_buf[0] != 0x02) {
-		err = -EINVAL;
+	if (out_buf[0] != 0x02)
 		goto done;
-	}
-	for (pos = 1; pos < req_ctx->child_req.dst_len; pos++)
-		if (req_ctx->out_buf[pos] == 0x00)
+
+	for (pos = 1; pos < dst_len; pos++)
+		if (out_buf[pos] == 0x00)
 			break;
-	if (pos < 9 || pos == req_ctx->child_req.dst_len) {
-		err = -EINVAL;
+	if (pos < 9 || pos == dst_len)
 		goto done;
-	}
 	pos++;
 
-	if (req->dst_len < req_ctx->child_req.dst_len - pos)
+	err = 0;
+
+	if (req->dst_len < dst_len - pos)
 		err = -EOVERFLOW;
-	req->dst_len = req_ctx->child_req.dst_len - pos;
+	req->dst_len = dst_len - pos;
 
 	if (!err)
 		sg_copy_from_buffer(req->dst,
 				sg_nents_for_len(req->dst, req->dst_len),
-				req_ctx->out_buf + pos, req->dst_len);
+				out_buf + pos, req->dst_len);
 
 done:
 	kzfree(req_ctx->out_buf);
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply related

* [PATCH v2] crypto: caam - fix sg dump
From: Catalin Vasile @ 2016-09-22  8:57 UTC (permalink / raw)
  To: linux-crypto; +Cc: linux-crypto-owner, horia.geanta, Catalin Vasile

Ensure scatterlists have a virtual memory mapping before dumping.

Signed-off-by: Catalin Vasile <cata.vasile@nxp.com>
---
Changes:
	V2:
		* resolved issue of sleeping in atomic contexts
---
---
 drivers/crypto/caam/caamalg.c | 79 +++++++++++++++++++++++++++++++++----------
 1 file changed, 61 insertions(+), 18 deletions(-)

diff --git a/drivers/crypto/caam/caamalg.c b/drivers/crypto/caam/caamalg.c
index f1116e7..eb97562 100644
--- a/drivers/crypto/caam/caamalg.c
+++ b/drivers/crypto/caam/caamalg.c
@@ -111,6 +111,42 @@
 #else
 #define debug(format, arg...)
 #endif
+
+#ifdef DEBUG
+#include <linux/highmem.h>
+
+static void dbg_dump_sg(const char *level, const char *prefix_str,
+			int prefix_type, int rowsize, int groupsize,
+			struct scatterlist *sg, size_t tlen, bool ascii,
+			bool may_sleep)
+{
+	struct scatterlist *it;
+	void *it_page;
+	size_t len;
+	void *buf;
+
+	for (it = sg; it != NULL && tlen > 0 ; it = sg_next(sg)) {
+		/*
+		 * make sure the scatterlist's page
+		 * has a valid virtual memory mapping
+		 */
+		it_page = kmap_atomic(sg_page(it));
+		if (unlikely(!it_page)) {
+			printk(KERN_ERR "dbg_dump_sg: kmap failed\n");
+			return;
+		}
+
+		buf = it_page + it->offset;
+		len = min(tlen, it->length);
+		print_hex_dump(level, prefix_str, prefix_type, rowsize,
+			       groupsize, buf, len, ascii);
+		tlen -= len;
+
+		kunmap_atomic(it_page);
+	}
+}
+#endif
+
 static struct list_head alg_list;
 
 struct caam_alg_entry {
@@ -1982,9 +2018,9 @@ static void ablkcipher_encrypt_done(struct device *jrdev, u32 *desc, u32 err,
 	print_hex_dump(KERN_ERR, "dstiv  @"__stringify(__LINE__)": ",
 		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
 		       edesc->src_nents > 1 ? 100 : ivsize, 1);
-	print_hex_dump(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
-		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
-		       edesc->dst_nents > 1 ? 100 : req->nbytes, 1);
+	dbg_dump_sg(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
+		    DUMP_PREFIX_ADDRESS, 16, 4, req->dst,
+		    edesc->dst_nents > 1 ? 100 : req->nbytes, 1, true);
 #endif
 
 	ablkcipher_unmap(jrdev, edesc, req);
@@ -2014,9 +2050,9 @@ static void ablkcipher_decrypt_done(struct device *jrdev, u32 *desc, u32 err,
 	print_hex_dump(KERN_ERR, "dstiv  @"__stringify(__LINE__)": ",
 		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
 		       ivsize, 1);
-	print_hex_dump(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
-		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
-		       edesc->dst_nents > 1 ? 100 : req->nbytes, 1);
+	dbg_dump_sg(KERN_ERR, "dst    @"__stringify(__LINE__)": ",
+		    DUMP_PREFIX_ADDRESS, 16, 4, req->dst,
+		    edesc->dst_nents > 1 ? 100 : req->nbytes, 1, true);
 #endif
 
 	ablkcipher_unmap(jrdev, edesc, req);
@@ -2171,12 +2207,15 @@ static void init_ablkcipher_job(u32 *sh_desc, dma_addr_t ptr,
 	int len, sec4_sg_index = 0;
 
 #ifdef DEBUG
+	bool may_sleep = ((req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
+					      CRYPTO_TFM_REQ_MAY_SLEEP)) != 0);
 	print_hex_dump(KERN_ERR, "presciv@"__stringify(__LINE__)": ",
 		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
 		       ivsize, 1);
-	print_hex_dump(KERN_ERR, "src    @"__stringify(__LINE__)": ",
-		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
-		       edesc->src_nents ? 100 : req->nbytes, 1);
+	printk(KERN_ERR "asked=%d, nbytes%d\n", (int)edesc->src_nents ? 100 : req->nbytes, req->nbytes);
+	dbg_dump_sg(KERN_ERR, "src    @"__stringify(__LINE__)": ",
+		    DUMP_PREFIX_ADDRESS, 16, 4, req->src,
+		    edesc->src_nents ? 100 : req->nbytes, 1, may_sleep);
 #endif
 
 	len = desc_len(sh_desc);
@@ -2228,12 +2267,14 @@ static void init_ablkcipher_giv_job(u32 *sh_desc, dma_addr_t ptr,
 	int len, sec4_sg_index = 0;
 
 #ifdef DEBUG
+	bool may_sleep = ((req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
+					      CRYPTO_TFM_REQ_MAY_SLEEP)) != 0);
 	print_hex_dump(KERN_ERR, "presciv@" __stringify(__LINE__) ": ",
 		       DUMP_PREFIX_ADDRESS, 16, 4, req->info,
 		       ivsize, 1);
-	print_hex_dump(KERN_ERR, "src    @" __stringify(__LINE__) ": ",
-		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
-		       edesc->src_nents ? 100 : req->nbytes, 1);
+	dbg_dump_sg(KERN_ERR, "src    @" __stringify(__LINE__) ": ",
+		    DUMP_PREFIX_ADDRESS, 16, 4, req->src,
+		    edesc->src_nents ? 100 : req->nbytes, 1, may_sleep);
 #endif
 
 	len = desc_len(sh_desc);
@@ -2503,18 +2544,20 @@ static int aead_decrypt(struct aead_request *req)
 	u32 *desc;
 	int ret = 0;
 
+#ifdef DEBUG
+	bool may_sleep = ((req->base.flags & (CRYPTO_TFM_REQ_MAY_BACKLOG |
+					      CRYPTO_TFM_REQ_MAY_SLEEP)) != 0);
+	dbg_dump_sg(KERN_ERR, "dec src@"__stringify(__LINE__)": ",
+		    DUMP_PREFIX_ADDRESS, 16, 4, req->src,
+		    req->assoclen + req->cryptlen, 1, may_sleep);
+#endif
+
 	/* allocate extended descriptor */
 	edesc = aead_edesc_alloc(req, AUTHENC_DESC_JOB_IO_LEN,
 				 &all_contig, false);
 	if (IS_ERR(edesc))
 		return PTR_ERR(edesc);
 
-#ifdef DEBUG
-	print_hex_dump(KERN_ERR, "dec src@"__stringify(__LINE__)": ",
-		       DUMP_PREFIX_ADDRESS, 16, 4, sg_virt(req->src),
-		       req->assoclen + req->cryptlen, 1);
-#endif
-
 	/* Create and submit job descriptor*/
 	init_authenc_job(req, edesc, all_contig, false);
 #ifdef DEBUG
-- 
1.8.3.1

^ permalink raw reply related

* BUG: rsa-pkcs1pad decrypt regression in 4.8
From: Mat Martineau @ 2016-09-21 23:39 UTC (permalink / raw)
  To: linux-crypto, herbert; +Cc: smueller

Herbert -

There was a regression in pkcs1pad signature verification, related to 
signature verification, that you fixed in commit 27710b8ea3defcb:

https://git.kernel.org/cgit/linux/kernel/git/herbert/crypto-2.6.git/commit/?id=27710b8ea3defcbd7d340dbd0423d911b4eb7c4f

There is a very similar problem in the decrypt operation, which was not 
adjusted for the leading zero changes. See pkcs1pad_decrypt_complete().

I haven't had a chance to test a fix yet, but with the final 4.8 release 
coming up very soon I wanted to report the issue.

Regards,

--
Mat Martineau
Intel OTC

^ permalink raw reply

* Re: [RFC PATCH v1 03/28] kvm: svm: Use the hardware provided GPA instead of page walk
From: Borislav Petkov @ 2016-09-21 17:16 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: simon.guinot, linux-efi, kvm, rkrcmar, matt, linus.walleij,
	linux-mm, paul.gortmaker, hpa, dan.j.williams, aarcange, sfr,
	andriy.shevchenko, herbert, bhe, xemul, joro, x86, mingo, msalter,
	ross.zwisler, dyoung, thomas.lendacky, jroedel, keescook,
	toshi.kani, mathieu.desnoyers, devel, tglx, mchehab,
	iamjoonsoo.kim, labbott, tony.luck, alexandre.bounine,
	kuleshovmail, linux-kernel, mcgrof, linux-crypto
In-Reply-To: <147190824754.9523.13923968456167130181.stgit@brijesh-build-machine>

On Mon, Aug 22, 2016 at 07:24:07PM -0400, Brijesh Singh wrote:
> From: Tom Lendacky <thomas.lendacky@amd.com>
> 
> When a guest causes a NPF which requires emulation, KVM sometimes walks
> the guest page tables to translate the GVA to a GPA. This is unnecessary
> most of the time on AMD hardware since the hardware provides the GPA in
> EXITINFO2.
> 
> The only exception cases involve string operations involving rep or
> operations that use two memory locations. With rep, the GPA will only be
> the value of the initial NPF and with dual memory locations we won't know
> which memory address was translated into EXITINFO2.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky@amd.com>
> ---
>  arch/x86/include/asm/kvm_emulate.h |    3 +++
>  arch/x86/include/asm/kvm_host.h    |    3 +++
>  arch/x86/kvm/svm.c                 |    2 ++
>  arch/x86/kvm/x86.c                 |   17 ++++++++++++++++-
>  4 files changed, 24 insertions(+), 1 deletion(-)

FWIW, LGTM. (Gotta love replying in acronyms :-))

Reviewed-by: Borislav Petkov <bp@suse.de>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply

* Re: [RFC PATCH v1 02/28] kvm: svm: Add kvm_fast_pio_in support
From: Borislav Petkov @ 2016-09-21 10:58 UTC (permalink / raw)
  To: Brijesh Singh
  Cc: simon.guinot-jKBdWWKqtFpg9hUCZPvPmw,
	linux-efi-u79uwXL29TY76Z2rM5mHXA, kvm-u79uwXL29TY76Z2rM5mHXA,
	rkrcmar-H+wXaHxf7aLQT0dZR+AlfA,
	matt-mF/unelCI9GS6iBeEJttW/XRex20P6io,
	linus.walleij-QSEj5FYQhm4dnm+yROfE0A,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
	paul.gortmaker-CWA4WttNNZF54TAoqtyWWQ, hpa-YMNOUZJC4hwAvxtiuMwx3w,
	dan.j.williams-ral2JQCrhuEAvxtiuMwx3w,
	aarcange-H+wXaHxf7aLQT0dZR+AlfA, sfr-3FnU+UHB4dNDw9hX6IcOSA,
	andriy.shevchenko-VuQAYsv1563Yd54FQh9/CA,
	herbert-lOAM2aK0SrRLBo1qDEOMRrpzq4S04n8Q,
	bhe-H+wXaHxf7aLQT0dZR+AlfA, xemul-bzQdu9zFT3WakBO8gow8eQ,
	joro-zLv9SwRftAIdnm+yROfE0A, x86-DgEjT+Ai2ygdnm+yROfE0A,
	mingo-H+wXaHxf7aLQT0dZR+AlfA, msalter-H+wXaHxf7aLQT0dZR+AlfA,
	ross.zwisler-VuQAYsv1563Yd54FQh9/CA,
	dyoung-H+wXaHxf7aLQT0dZR+AlfA, thomas.lendacky-5C7GfCeVMHo,
	jroedel-l3A5Bk7waGM, keescook-F7+t8E8rja9g9hUCZPvPmw,
	toshi.kani-ZPxbGqLxI0U, mathieu.desnoyers-vg+e7yoeK/dWk0Htik3J/w,
	devel-tBiZLqfeLfOHmIFyCCdPziST3g8Odh+X,
	tglx-hfZtesqFncYOwBW4kG4KsQ, mchehab-DgEjT+Ai2ygdnm+yROfE0A,
	iamjoonsoo.kim-Hm3cg6mZ9cc,
	labbott-rxtnV0ftBwyoClj4AeEUq9i2O/JbrIOy,
	tony.luck-ral2JQCrhuEAvxtiuMwx3w, alexandre.bounine
In-Reply-To: <147190823395.9523.16184607551630730040.stgit@brijesh-build-machine>

On Mon, Aug 22, 2016 at 07:23:54PM -0400, Brijesh Singh wrote:
> From: Tom Lendacky <thomas.lendacky-5C7GfCeVMHo@public.gmane.org>
> 
> Update the I/O interception support to add the kvm_fast_pio_in function
> to speed up the in instruction similar to the out instruction.
> 
> Signed-off-by: Tom Lendacky <thomas.lendacky-5C7GfCeVMHo@public.gmane.org>
> ---
>  arch/x86/include/asm/kvm_host.h |    1 +
>  arch/x86/kvm/svm.c              |    5 +++--
>  arch/x86/kvm/x86.c              |   43 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 47 insertions(+), 2 deletions(-)

FWIW: Reviewed-by: Borislav Petkov <bp-l3A5Bk7waGM@public.gmane.org>

-- 
Regards/Gruss,
    Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 (AG Nürnberg)
-- 

^ permalink raw reply

* Re: crypto: caam from tasklet to threadirq
From: Thomas Gleixner @ 2016-09-20 23:31 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Cata Vasile, Horia Geanta Neag, linux-crypto@vger.kernel.org
In-Reply-To: <20160920213219.GR1041@n2100.armlinux.org.uk>

On Tue, 20 Sep 2016, Russell King - ARM Linux wrote:
> each of those places where af_alg_wait_for_completion() called, we
> end up submitting a bunch of data and then immediately waiting for
> the operation to complete... and this can be seen in the perf
> trace logs.

That'd explain it.
 
> So, unless I'm mistaken, there's no way for a crypto backend to run
> asynchronously, and there's no way for a crypto backend to batch up
> the "job" - in order to do that, I think it would have to store quite
> a lot of state.

Hmm.

> Now, I'm not entirely sure that asking perf to record irq:* and
> sched:* events was what we were after - there appears to be no trace
> events recorded for entering a threaded IRQ handler.

Indeed. We can only deduce it from the thread being woken and scheduled
in/out. ?me makes note to add a tracepoint in the thread handler
invocation.

> So 123us (260322 - 260199) to the switch to openssl via the threaded IRQ.

> 101us (667202 - 667101) between the same two events, which is 22us
> faster than the above.

So it looks like the two extra context switches are responsible for that
delta.
 
> Attached are compressed files of the perf script -G output.  If you
> want the perf.data files, I have them (I'm not sure how useful they
> are without the binaries though.)

Thanks. I'll have a look tomorrow when brain is unfried.

> > Vs. the PMU interrupt thing. What's the politics about that? Do you have
> > any pointers?
> 
> I just remember there being a discussion about how stupid FSL have been
> and "we're not going to support that" - the perf code wants the per-CPU
> performance unit interrupts delivered _on_ the CPU to which the perf
> unit is attached.  FSL decided in their stupidity to OR all the perf
> unit interrupts together and route them to a single common interrupt.

Brilliant.
 
> This means that we end up with one CPU taking the perf interrupt for
> any perf unit - and the CPUs can only access their local perf unit.
> So, if (eg) CPU1's perf unit fires an interrupt, but the common
> interrupt is routed to CPU0, CPU0 checks its perf unit, finds no
> interrupt, and returns with IRQ_NONE.
> 
> There's no mechanism in perf (or anywhere else) to hand the interrupt
> over to another CPU.
> 
> The result is that trying to run perf on the multi-core iMX SoCs ends
> up with the perf interrupt disabled, at which point perf collapses in
> a sad pile.

Not surprising.

Solving that in perf is probably the wrong place. So what we'd need is some
kind of special irq flow handler which does:

	ret = handle_irq(desc);
	if (ret == IRQ_NONE && desc->ipi_next) {
     		dest = get_next_cpu(this_cpu);
		if (dest != this_cpu)
			desc->ipi_next(dest);
			
     }

get_next_cpu() would just pick the next cpu in the online mask or the first
when this_cpu is the last one in the mask.

That shouldn't be overly complex to implement. All you'd need to do in the
PMU driver is to hook into that IPI vector.

If you're interested then I can hack the core bits.

Thanks,

	tglx




   

      	

^ permalink raw reply

* Re: crypto: caam from tasklet to threadirq
From: Russell King - ARM Linux @ 2016-09-20 21:32 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Cata Vasile, Horia Geanta Neag, linux-crypto@vger.kernel.org
In-Reply-To: <alpine.DEB.2.20.1609202202520.5476@nanos>

[-- Attachment #1: Type: text/plain, Size: 8782 bytes --]

On Tue, Sep 20, 2016 at 10:10:20PM +0200, Thomas Gleixner wrote:
> On Tue, 20 Sep 2016, Russell King - ARM Linux wrote:
> > which corresponds to an 8% slowdown for the threaded IRQ case.  So,
> > tasklets are indeed faster than threaded IRQs.
> 
> Fair enough.
> 
> > I've tried to perf it, but...
> >  ....
>  
> > So, sorry, I'm not going to bother trying to get any further with this.
> > If the job was not made harder stupid hardware design and kernel
> > politics, then I might be more inclined to do deeper investigation, but
> > right now I'm finding that I'm not interested in trying to jump through
> > these stupid hoops.
> 
> I'd be very interested in a sched_switch + irq + softirq trace which does
> not involve PMU hardware for both irqthreads and tasklets, but I can
> understand if you can't be bothered to gather it.

That's involved a rebuild of perf to get it to see the trace events.
What I see probably indicates why the crypto AF_ALG way of doing
things sucks big time - the interface to the crypto backends is
entirely synchronous.

This means that we're guaranteed to get one interrupt per msghdr
entry:

crypto/algif_hash.c:
        lock_sock(sk);
        if (!ctx->more) {
                err = af_alg_wait_for_completion(crypto_ahash_init(&ctx->req),
                                                &ctx->completion);
                if (err)
                        goto unlock;
        }
...
        while (msg_data_left(msg)) {
...
                ahash_request_set_crypt(&ctx->req, ctx->sgl.sg, NULL, len);

                err = af_alg_wait_for_completion(crypto_ahash_update(&ctx->req),
                                                 &ctx->completion);
...
        }
...
        ctx->more = msg->msg_flags & MSG_MORE;
        if (!ctx->more) {
                ahash_request_set_crypt(&ctx->req, NULL, ctx->result, 0);
                err = af_alg_wait_for_completion(crypto_ahash_final(&ctx->req),
                                                 &ctx->completion);
        }

each of those places where af_alg_wait_for_completion() called, we
end up submitting a bunch of data and then immediately waiting for
the operation to complete... and this can be seen in the perf
trace logs.

So, unless I'm mistaken, there's no way for a crypto backend to run
asynchronously, and there's no way for a crypto backend to batch up
the "job" - in order to do that, I think it would have to store quite
a lot of state.

Now, I'm not entirely sure that asking perf to record irq:* and
sched:* events was what we were after - there appears to be no trace
events recorded for entering a threaded IRQ handler.

swapper     0 [000]  3600.260199:             irq:irq_handler_entry: irq=311 name=2101000.jr0
swapper     0 [000]  3600.260209:              irq:irq_handler_exit: irq=311 ret=handled
swapper     0 [000]  3600.260219:                sched:sched_waking: comm=irq/311-2101000 pid=2426 prio=49 target_cpu=000
swapper     0 [000]  3600.260236:                sched:sched_wakeup: comm=irq/311-2101000 pid=2426 prio=49 target_cpu=000
swapper     0 [000]  3600.260258:                sched:sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=irq/311-2101000 next_pid=2426 next_prio=49
irq/311-2101000  2426 [000]  3600.260278:                sched:sched_waking: comm=openssl pid=8005 prio=120 target_cpu=000
irq/311-2101000  2426 [000]  3600.260296:                sched:sched_wakeup: comm=openssl pid=8005 prio=120 target_cpu=000
irq/311-2101000  2426 [000]  3600.260322:                sched:sched_switch: prev_comm=irq/311-2101000 prev_pid=2426 prev_prio=49 prev_state=S ==> next_comm=openssl next_pid=8005 next_prio=120
openssl  8005 [000]  3600.260369:                sched:sched_waking: comm=dd pid=8002 prio=120 target_cpu=000
openssl  8005 [000]  3600.260388:          sched:sched_stat_runtime: comm=openssl pid=8005 runtime=71000 [ns] vruntime=419661176835 [ns]
openssl  8005 [000]  3600.260401:                sched:sched_wakeup: comm=dd pid=8002 prio=120 target_cpu=000
openssl  8005 [000]  3600.260421:                sched:sched_switch: prev_comm=openssl prev_pid=8005 prev_prio=120 prev_state=R ==> next_comm=dd next_pid=8002 next_prio=120
dd  8002 [000]  3600.260459:          sched:sched_stat_runtime: comm=dd pid=8002 runtime=71667 [ns] vruntime=419655248502 [ns]
dd  8002 [000]  3600.260473:                sched:sched_switch: prev_comm=dd prev_pid=8002 prev_prio=120 prev_state=S ==> next_comm=openssl next_pid=8005 next_prio=120
openssl  8005 [000]  3600.260572:          sched:sched_stat_runtime: comm=openssl pid=8005 runtime=112666 [ns] vruntime=419661289501 [ns]
openssl  8005 [000]  3600.260587:                sched:sched_switch: prev_comm=openssl prev_pid=8005 prev_prio=120 prev_state=D ==> next_comm=swapper/0 next_pid=0 next_prio=120
swapper     0 [000]  3600.260638:             irq:irq_handler_entry: irq=311 name=2101000.jr0
...

So 123us (260322 - 260199) to the switch to openssl via the threaded IRQ.

tasklet case:

swapper     0 [000]  5082.667101:             irq:irq_handler_entry: irq=311 name=2101000.jr0
swapper     0 [000]  5082.667111:                 irq:softirq_raise: vec=6 [action=TASKLET]
swapper     0 [000]  5082.667119:              irq:irq_handler_exit: irq=311 ret=handled
swapper     0 [000]  5082.667134:                 irq:softirq_entry: vec=6 [action=TASKLET]
swapper     0 [000]  5082.667151:                sched:sched_waking: comm=openssl pid=8251 prio=120 target_cpu=000
swapper     0 [000]  5082.667169:                sched:sched_wakeup: comm=openssl pid=8251 prio=120 target_cpu=000
swapper     0 [000]  5082.667183:                  irq:softirq_exit: vec=6 [action=TASKLET]
swapper     0 [000]  5082.667202:                sched:sched_switch: prev_comm=swapper/0 prev_pid=0 prev_prio=120 prev_state=R ==> next_comm=openssl next_pid=8251 next_prio=120
openssl  8251 [000]  5082.667248:                sched:sched_waking: comm=dd pid=8248 prio=120 target_cpu=000
openssl  8251 [000]  5082.667267:          sched:sched_stat_runtime: comm=openssl pid=8251 runtime=39668 [ns] vruntime=444714027428 [ns]
openssl  8251 [000]  5082.667280:                sched:sched_wakeup: comm=dd pid=8248 prio=120 target_cpu=000
openssl  8251 [000]  5082.667301:                sched:sched_switch: prev_comm=openssl prev_pid=8251 prev_prio=120 prev_state=R ==> next_comm=dd next_pid=8248 next_prio=120
dd  8248 [000]  5082.667339:          sched:sched_stat_runtime: comm=dd pid=8248 runtime=70000 [ns] vruntime=444708097428 [ns]
dd  8248 [000]  5082.667354:                sched:sched_switch: prev_comm=dd prev_pid=8248 prev_prio=120 prev_state=S ==> next_comm=openssl next_pid=8251 next_prio=120
openssl  8251 [000]  5082.667451:          sched:sched_stat_runtime: comm=openssl pid=8251 runtime=113666 [ns] vruntime=444714141094 [ns]
openssl  8251 [000]  5082.667466:                sched:sched_switch: prev_comm=openssl prev_pid=8251 prev_prio=120 prev_state=D ==> next_comm=swapper/0 next_pid=0 next_prio=120
swapper     0 [000]  5082.667517:             irq:irq_handler_entry: irq=311 name=2101000.jr0

101us (667202 - 667101) between the same two events, which is 22us
faster than the above.

In both cases, I picked out an irq_handler_entry event which was near
to a whole number of 100us.  I haven't looked any deeper to see what
the variations are in the hard IRQ->openssl schedule latency - I
expect that needs some scripts written.

Attached are compressed files of the perf script -G output.  If you
want the perf.data files, I have them (I'm not sure how useful they
are without the binaries though.)

> Vs. the PMU interrupt thing. What's the politics about that? Do you have
> any pointers?

I just remember there being a discussion about how stupid FSL have been
and "we're not going to support that" - the perf code wants the per-CPU
performance unit interrupts delivered _on_ the CPU to which the perf
unit is attached.  FSL decided in their stupidity to OR all the perf
unit interrupts together and route them to a single common interrupt.

This means that we end up with one CPU taking the perf interrupt for
any perf unit - and the CPUs can only access their local perf unit.
So, if (eg) CPU1's perf unit fires an interrupt, but the common
interrupt is routed to CPU0, CPU0 checks its perf unit, finds no
interrupt, and returns with IRQ_NONE.

There's no mechanism in perf (or anywhere else) to hand the interrupt
over to another CPU.

The result is that trying to run perf on the multi-core iMX SoCs ends
up with the perf interrupt disabled, at which point perf collapses in
a sad pile.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

[-- Attachment #2: threaded.txt.bz2 --]
[-- Type: application/x-bzip2, Size: 23786 bytes --]

[-- Attachment #3: tasklet.txt.bz2 --]
[-- Type: application/x-bzip2, Size: 23617 bytes --]

^ permalink raw reply

* Re: crypto: caam from tasklet to threadirq
From: Thomas Gleixner @ 2016-09-20 20:10 UTC (permalink / raw)
  To: Russell King - ARM Linux
  Cc: Cata Vasile, Horia Geanta Neag, linux-crypto@vger.kernel.org
In-Reply-To: <20160920161228.GQ1041@n2100.armlinux.org.uk>

On Tue, 20 Sep 2016, Russell King - ARM Linux wrote:
> which corresponds to an 8% slowdown for the threaded IRQ case.  So,
> tasklets are indeed faster than threaded IRQs.

Fair enough.

> I've tried to perf it, but...
>  ....
 
> So, sorry, I'm not going to bother trying to get any further with this.
> If the job was not made harder stupid hardware design and kernel
> politics, then I might be more inclined to do deeper investigation, but
> right now I'm finding that I'm not interested in trying to jump through
> these stupid hoops.

I'd be very interested in a sched_switch + irq + softirq trace which does
not involve PMU hardware for both irqthreads and tasklets, but I can
understand if you can't be bothered to gather it.

Vs. the PMU interrupt thing. What's the politics about that? Do you have
any pointers?
 
> I think I've proven from the above that this patch needs to be reverted
> due to the performance regression, and that there _is_ most definitely
> a deterimental effect of switching from tasklets to threaded IRQs.

I agree that the revert should happen, but I'd rather see a bit more
information why this regression happens with the switch from tasklets to
threaded irqs.

Thanks,

	tglx

^ permalink raw reply

* Re: crypto: caam from tasklet to threadirq
From: Russell King - ARM Linux @ 2016-09-20 16:12 UTC (permalink / raw)
  To: Cata Vasile, Thomas Gleixner
  Cc: Horia Geanta Neag, linux-crypto@vger.kernel.org
In-Reply-To: <DB5PR04MB1302AD00536D2E37E726E104EEF30@DB5PR04MB1302.eurprd04.prod.outlook.com>

Okay, I've re-tested, using a different way of measuring, because using
openssl speed is impractical for off-loaded engines.  I've decided to
use this way to measure the performance:

dd if=/dev/zero bs=1048576 count=128 | /usr/bin/time openssl dgst -md5

For the threaded IRQs case gives:

0.05user 2.74system 0:05.30elapsed 52%CPU (0avgtext+0avgdata 2400maxresident)k
0.06user 2.52system 0:05.18elapsed 49%CPU (0avgtext+0avgdata 2404maxresident)k
0.12user 2.60system 0:05.61elapsed 48%CPU (0avgtext+0avgdata 2460maxresident)k
	=> 5.36s => 25.0MB/s

and the tasklet case:

0.08user 2.53system 0:04.83elapsed 54%CPU (0avgtext+0avgdata 2468maxresident)k
0.09user 2.47system 0:05.16elapsed 49%CPU (0avgtext+0avgdata 2368maxresident)k
0.10user 2.51system 0:04.87elapsed 53%CPU (0avgtext+0avgdata 2460maxresident)k
	=> 4.95 => 27.1MB/s

which corresponds to an 8% slowdown for the threaded IRQ case.  So,
tasklets are indeed faster than threaded IRQs.

I guess the reason is that tasklets are much simpler, being able to
run just before we return to userspace without involving scheduler
overheads, but that's speculation.

I've tried to perf it, but...

Samples: 31K of event 'cycles', Event count (approx.): 3552246846
  Overhead  Command          Shared Object     Symbol
+   33.22%  kworker/0:1      [kernel.vmlinux]  [k] __do_softirq
+   15.78%  irq/311-2101000  [kernel.vmlinux]  [k] __do_softirq
+    7.49%  irqbalance       [kernel.vmlinux]  [k] __do_softirq
+    7.26%  openssl          [kernel.vmlinux]  [k] __do_softirq
+    5.71%  ksoftirqd/0      [kernel.vmlinux]  [k] __do_softirq
+    3.64%  kworker/0:2      [kernel.vmlinux]  [k] __do_softirq
+    3.52%  swapper          [kernel.vmlinux]  [k] __do_softirq
+    3.14%  kworker/0:1      [kernel.vmlinux]  [k] _raw_spin_unlock_irq

I was going to try to get the threaded IRQ case, but I've ended up with
perf getting buggered because of the iMX6 SMP perf disfunctionality:

[ 3448.810416] irq 24: nobody cared (try booting with the "irqpoll" option)
...
[ 3448.824528] Disabling IRQ #24

caused by FSL's utterly brain-dead idea of routing all the perf
interrupts to single non-CPU local interrupt input, and the refusal of
kernel folk to find an acceptable solution to support this.

So, sorry, I'm not going to bother trying to get any further with this.
If the job was not made harder stupid hardware design and kernel
politics, then I might be more inclined to do deeper investigation, but
right now I'm finding that I'm not interested in trying to jump through
these stupid hoops.

I think I've proven from the above that this patch needs to be reverted
due to the performance regression, and that there _is_ most definitely
a deterimental effect of switching from tasklets to threaded IRQs.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line: currently at 9.6Mbps down 400kbps up
according to speedtest.net.

^ permalink raw reply

* [PATCH] hwrng: omap - Only fail if pm_runtime_get_sync returns < 0
From: Dave Gerlach @ 2016-09-20 15:25 UTC (permalink / raw)
  To: Deepak Saxena, Matt Mackall, Herbert Xu, Nishanth Menon
  Cc: linux-kernel, linux-crypto, linux-arm-kernel, linux-omap,
	Dave Gerlach

Currently omap-rng checks the return value of pm_runtime_get_sync and
reports failure if anything is returned, however it should be checking
if ret < 0 as pm_runtime_get_sync return 0 on success but also can return
1 if the device was already active which is not a failure case. Only
values < 0 are actual failures.

Fixes: 61dc0a446e5d ("hwrng: omap - Fix assumption that runtime_get_sync will always succeed")
Signed-off-by: Dave Gerlach <d-gerlach@ti.com>
---
 drivers/char/hw_random/omap-rng.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/char/hw_random/omap-rng.c b/drivers/char/hw_random/omap-rng.c
index 01d4be2c354b..f5c26a5f6875 100644
--- a/drivers/char/hw_random/omap-rng.c
+++ b/drivers/char/hw_random/omap-rng.c
@@ -385,7 +385,7 @@ static int omap_rng_probe(struct platform_device *pdev)
 
 	pm_runtime_enable(&pdev->dev);
 	ret = pm_runtime_get_sync(&pdev->dev);
-	if (ret) {
+	if (ret < 0) {
 		dev_err(&pdev->dev, "Failed to runtime_get device: %d\n", ret);
 		pm_runtime_put_noidle(&pdev->dev);
 		goto err_ioremap;
@@ -443,7 +443,7 @@ static int __maybe_unused omap_rng_resume(struct device *dev)
 	int ret;
 
 	ret = pm_runtime_get_sync(dev);
-	if (ret) {
+	if (ret < 0) {
 		dev_err(dev, "Failed to runtime_get device: %d\n", ret);
 		pm_runtime_put_noidle(dev);
 		return ret;
-- 
2.9.3

^ permalink raw reply related

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox