Linux cryptographic layer development

Linux cryptographic layer development
 help / color / mirror / Atom feed

* [PATCH] crypto: qat - fix comments describing adf_disable_sriov()
From: Giovanni Cabiddu @ 2016-12-22 14:59 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, giovanni.cabiddu, Ahsan Atta, Giovanni Cabiddu

From: Ahsan Atta <ahsan.atta@intel.com>

Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 drivers/crypto/qat/qat_common/adf_sriov.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/qat/qat_common/adf_sriov.c b/drivers/crypto/qat/qat_common/adf_sriov.c
index 9320ae1..b36d865 100644
--- a/drivers/crypto/qat/qat_common/adf_sriov.c
+++ b/drivers/crypto/qat/qat_common/adf_sriov.c
@@ -162,9 +162,9 @@ static int adf_enable_sriov(struct adf_accel_dev *accel_dev)
 
 /**
  * adf_disable_sriov() - Disable SRIOV for the device
- * @pdev:  Pointer to pci device.
+ * @accel_dev:  Pointer to accel device.
  *
- * Function disables SRIOV for the pci device.
+ * Function disables SRIOV for the accel device.
  *
  * Return: 0 on success, error code otherwise.
  */
-- 
2.9.3

^ permalink raw reply related

* [PATCH] crypto: qat - fix indentation
From: Giovanni Cabiddu @ 2016-12-22 14:58 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto, giovanni.cabiddu, Ahsan Atta, Giovanni Cabiddu

From: Ahsan Atta <ahsan.atta@intel.com>

Signed-off-by: Ahsan Atta <ahsan.atta@intel.com>
Signed-off-by: Giovanni Cabiddu <giovanni.cabiddu@intel.com>
---
 drivers/crypto/qat/qat_common/adf_dev_mgr.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/crypto/qat/qat_common/adf_dev_mgr.c b/drivers/crypto/qat/qat_common/adf_dev_mgr.c
index b3ebb25..8afac52 100644
--- a/drivers/crypto/qat/qat_common/adf_dev_mgr.c
+++ b/drivers/crypto/qat/qat_common/adf_dev_mgr.c
@@ -152,7 +152,7 @@ void adf_devmgr_update_class_index(struct adf_hw_device_data *hw_data)
 			ptr->hw_device->instance_id = i++;
 
 		if (i == class->instances)
-				break;
+			break;
 	}
 }
 EXPORT_SYMBOL_GPL(adf_devmgr_update_class_index);
-- 
2.9.3

^ permalink raw reply related

* Re: [PATCH v6 1/2] sparc: fix a building error reported by kbuild
From: Anatoly Pugachev @ 2016-12-22 14:58 UTC (permalink / raw)
  To: sparclinux@vger.kernel.org
  Cc: Sam Ravnborg, linux-crypto@vger.kernel.org, Gonglei (Arei),
	virtio-dev
In-Reply-To: <33183CC9F5247A488A2544077AF19020DA159386@DGGEMA505-MBX.china.huawei.com>

Guys,

so was it fixed or not?

I can not compile current git kernel:

linux-2.6$ git desc
v4.9-11937-g52bce91165e5

linux-2.6$ git status
On branch master
Your branch is up-to-date with 'origin/master'.
nothing to commit, working tree clean

linux-2.6$ git log --oneline arch/sparc/include/asm/topology_64.h | head -1
541cc39433a8 sparc: fix a building error reported by kbuild

linux-2.6$ make
  CHK     include/config/kernel.release
  CHK     include/generated/uapi/linux/version.h
  CHK     include/generated/utsrelease.h
  CHK     include/generated/bounds.h
  CHK     include/generated/timeconst.h
  CHK     include/generated/asm-offsets.h
  CALL    scripts/checksyscalls.sh
<stdin>:1316:2: warning: #warning syscall pkey_mprotect not implemented [-Wcpp]
<stdin>:1319:2: warning: #warning syscall pkey_alloc not implemented [-Wcpp]
<stdin>:1322:2: warning: #warning syscall pkey_free not implemented [-Wcpp]
  CHK     include/generated/compile.h
  GEN     usr/initramfs_data.cpio".gz"
  AS      usr/initramfs_data.o
  LD      usr/built-in.o
  CHK     kernel/config_data.h
  CC [M]  drivers/crypto/virtio/virtio_crypto_mgr.o
In file included from ./arch/sparc/include/asm/topology.h:4:0,
                 from ./include/linux/topology.h:35,
                 from ./include/linux/gfp.h:8,
                 from ./include/linux/kmod.h:22,
                 from ./include/linux/module.h:13,
                 from drivers/crypto/virtio/virtio_crypto_mgr.c:21:
drivers/crypto/virtio/virtio_crypto_common.h: In function
‘virtio_crypto_get_current_node’:
./arch/sparc/include/asm/topology_64.h:45:44: error: implicit
declaration of function ‘cpu_data’
[-Werror=implicit-function-declaration]
 #define topology_physical_package_id(cpu) (cpu_data(cpu).proc_id)
                                            ^
drivers/crypto/virtio/virtio_crypto_common.h:119:9: note: in expansion
of macro ‘topology_physical_package_id’
  node = topology_physical_package_id(cpu);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
./arch/sparc/include/asm/topology_64.h:45:57: error: request for
member ‘proc_id’ in something not a structure or union
 #define topology_physical_package_id(cpu) (cpu_data(cpu).proc_id)
                                                         ^
drivers/crypto/virtio/virtio_crypto_common.h:119:9: note: in expansion
of macro ‘topology_physical_package_id’
  node = topology_physical_package_id(cpu);
         ^~~~~~~~~~~~~~~~~~~~~~~~~~~~
cc1: some warnings being treated as errors
scripts/Makefile.build:293: recipe for target
'drivers/crypto/virtio/virtio_crypto_mgr.o' failed
make[3]: *** [drivers/crypto/virtio/virtio_crypto_mgr.o] Error 1
scripts/Makefile.build:551: recipe for target 'drivers/crypto/virtio' failed
make[2]: *** [drivers/crypto/virtio] Error 2
scripts/Makefile.build:551: recipe for target 'drivers/crypto' failed
make[1]: *** [drivers/crypto] Error 2
Makefile:988: recipe for target 'drivers' failed
make: *** [drivers] Error 2
linux-2.6$

^ permalink raw reply

* Re: [PATCH 1/3] crypto: Change lz4 modules to work with new lz4 compressor module version
From: Sergey Senozhatsky @ 2016-12-22 13:25 UTC (permalink / raw)
  To: Sven Schmidt
  Cc: akpm, bongkyu.kim, sergey.senozhatsky, gregkh, linux-kernel,
	Herbert Xu, linux-crypto
In-Reply-To: <1482259992-16680-2-git-send-email-4sschmid@informatik.uni-hamburg.de>


Cc Herbert and linux-crypto on this

https://marc.info/?l=linux-kernel&m=148226084823926


On (12/20/16 19:53), Sven Schmidt wrote:
> 
> This patch updates the crypto modules using LZ4 compression to work with the
> new LZ4 kernel module version.
> 
> Signed-off-by: Sven Schmidt <4sschmid@informatik.uni-hamburg.de>
> ---
>  crypto/lz4.c   | 93 +++++++--------------------------------------------------
>  crypto/lz4hc.c | 94 +++++++---------------------------------------------------
>  2 files changed, 22 insertions(+), 165 deletions(-)
> 
> diff --git a/crypto/lz4.c b/crypto/lz4.c
> index 99c1b2c..9fa217e 100644
> --- a/crypto/lz4.c
> +++ b/crypto/lz4.c
> @@ -23,53 +23,36 @@
>  #include <linux/crypto.h>
>  #include <linux/vmalloc.h>
>  #include <linux/lz4.h>
> -#include <crypto/internal/scompress.h>
>  
>  struct lz4_ctx {
>  	void *lz4_comp_mem;
>  };
>  
> -static void *lz4_alloc_ctx(struct crypto_scomp *tfm)
> -{
> -	void *ctx;
> -
> -	ctx = vmalloc(LZ4_MEM_COMPRESS);
> -	if (!ctx)
> -		return ERR_PTR(-ENOMEM);
> -
> -	return ctx;
> -}
> -
>  static int lz4_init(struct crypto_tfm *tfm)
>  {
>  	struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
>  
> -	ctx->lz4_comp_mem = lz4_alloc_ctx(NULL);
> -	if (IS_ERR(ctx->lz4_comp_mem))
> +	ctx->lz4_comp_mem = vmalloc(LZ4_MEM_COMPRESS);
> +	if (!ctx->lz4_comp_mem)
>  		return -ENOMEM;
>  
>  	return 0;
>  }
>  
> -static void lz4_free_ctx(struct crypto_scomp *tfm, void *ctx)
> -{
> -	vfree(ctx);
> -}
> -
>  static void lz4_exit(struct crypto_tfm *tfm)
>  {
>  	struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
> -
> -	lz4_free_ctx(NULL, ctx->lz4_comp_mem);
> +	vfree(ctx->lz4_comp_mem);
>  }
>  
> -static int __lz4_compress_crypto(const u8 *src, unsigned int slen,
> -				 u8 *dst, unsigned int *dlen, void *ctx)
> +static int lz4_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
> +			    unsigned int slen, u8 *dst, unsigned int *dlen)
>  {
> +	struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
>  	size_t tmp_len = *dlen;
>  	int err;
>  
> -	err = lz4_compress(src, slen, dst, &tmp_len, ctx);
> +	err = LZ4_compress_default(src, dst, slen, tmp_len, ctx->lz4_comp_mem);
>  
>  	if (err < 0)
>  		return -EINVAL;
> @@ -78,29 +61,14 @@ static int __lz4_compress_crypto(const u8 *src, unsigned int slen,
>  	return 0;
>  }
>  
> -static int lz4_scompress(struct crypto_scomp *tfm, const u8 *src,
> -			 unsigned int slen, u8 *dst, unsigned int *dlen,
> -			 void *ctx)
> -{
> -	return __lz4_compress_crypto(src, slen, dst, dlen, ctx);
> -}
> -
> -static int lz4_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
> -			       unsigned int slen, u8 *dst, unsigned int *dlen)
> -{
> -	struct lz4_ctx *ctx = crypto_tfm_ctx(tfm);
> -
> -	return __lz4_compress_crypto(src, slen, dst, dlen, ctx->lz4_comp_mem);
> -}
> -
> -static int __lz4_decompress_crypto(const u8 *src, unsigned int slen,
> -				   u8 *dst, unsigned int *dlen, void *ctx)
> +static int lz4_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
> +			      unsigned int slen, u8 *dst, unsigned int *dlen)
>  {
>  	int err;
>  	size_t tmp_len = *dlen;
>  	size_t __slen = slen;
>  
> -	err = lz4_decompress_unknownoutputsize(src, __slen, dst, &tmp_len);
> +	err = LZ4_decompress_safe(src, dst, __slen, tmp_len);
>  	if (err < 0)
>  		return -EINVAL;
>  
> @@ -108,20 +76,6 @@ static int __lz4_decompress_crypto(const u8 *src, unsigned int slen,
>  	return err;
>  }
>  
> -static int lz4_sdecompress(struct crypto_scomp *tfm, const u8 *src,
> -			   unsigned int slen, u8 *dst, unsigned int *dlen,
> -			   void *ctx)
> -{
> -	return __lz4_decompress_crypto(src, slen, dst, dlen, NULL);
> -}
> -
> -static int lz4_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
> -				 unsigned int slen, u8 *dst,
> -				 unsigned int *dlen)
> -{
> -	return __lz4_decompress_crypto(src, slen, dst, dlen, NULL);
> -}
> -
>  static struct crypto_alg alg_lz4 = {
>  	.cra_name		= "lz4",
>  	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
> @@ -135,39 +89,14 @@ static struct crypto_alg alg_lz4 = {
>  	.coa_decompress		= lz4_decompress_crypto } }
>  };
>  
> -static struct scomp_alg scomp = {
> -	.alloc_ctx		= lz4_alloc_ctx,
> -	.free_ctx		= lz4_free_ctx,
> -	.compress		= lz4_scompress,
> -	.decompress		= lz4_sdecompress,
> -	.base			= {
> -		.cra_name	= "lz4",
> -		.cra_driver_name = "lz4-scomp",
> -		.cra_module	 = THIS_MODULE,
> -	}
> -};
> -
>  static int __init lz4_mod_init(void)
>  {
> -	int ret;
> -
> -	ret = crypto_register_alg(&alg_lz4);
> -	if (ret)
> -		return ret;
> -
> -	ret = crypto_register_scomp(&scomp);
> -	if (ret) {
> -		crypto_unregister_alg(&alg_lz4);
> -		return ret;
> -	}
> -
> -	return ret;
> +	return crypto_register_alg(&alg_lz4);
>  }
>  
>  static void __exit lz4_mod_fini(void)
>  {
>  	crypto_unregister_alg(&alg_lz4);
> -	crypto_unregister_scomp(&scomp);
>  }
>  
>  module_init(lz4_mod_init);
> diff --git a/crypto/lz4hc.c b/crypto/lz4hc.c
> index 75ffc4a..aa1265c 100644
> --- a/crypto/lz4hc.c
> +++ b/crypto/lz4hc.c
> @@ -22,53 +22,37 @@
>  #include <linux/crypto.h>
>  #include <linux/vmalloc.h>
>  #include <linux/lz4.h>
> -#include <crypto/internal/scompress.h>
>  
>  struct lz4hc_ctx {
>  	void *lz4hc_comp_mem;
>  };
>  
> -static void *lz4hc_alloc_ctx(struct crypto_scomp *tfm)
> -{
> -	void *ctx;
> -
> -	ctx = vmalloc(LZ4HC_MEM_COMPRESS);
> -	if (!ctx)
> -		return ERR_PTR(-ENOMEM);
> -
> -	return ctx;
> -}
> -
>  static int lz4hc_init(struct crypto_tfm *tfm)
>  {
>  	struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
>  
> -	ctx->lz4hc_comp_mem = lz4hc_alloc_ctx(NULL);
> -	if (IS_ERR(ctx->lz4hc_comp_mem))
> +	ctx->lz4hc_comp_mem = vmalloc(LZ4HC_MEM_COMPRESS);
> +	if (!ctx->lz4hc_comp_mem)
>  		return -ENOMEM;
>  
>  	return 0;
>  }
>  
> -static void lz4hc_free_ctx(struct crypto_scomp *tfm, void *ctx)
> -{
> -	vfree(ctx);
> -}
> -
>  static void lz4hc_exit(struct crypto_tfm *tfm)
>  {
>  	struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
>  
> -	lz4hc_free_ctx(NULL, ctx->lz4hc_comp_mem);
> +	vfree(ctx->lz4hc_comp_mem);
>  }
>  
> -static int __lz4hc_compress_crypto(const u8 *src, unsigned int slen,
> -				   u8 *dst, unsigned int *dlen, void *ctx)
> +static int lz4hc_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
> +			    unsigned int slen, u8 *dst, unsigned int *dlen)
>  {
> +	struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
>  	size_t tmp_len = *dlen;
>  	int err;
>  
> -	err = lz4hc_compress(src, slen, dst, &tmp_len, ctx);
> +	err = LZ4_compress_HC(src, dst, slen, tmp_len, LZ4HC_DEFAULT_CLEVEL, ctx->lz4hc_comp_mem);
>  
>  	if (err < 0)
>  		return -EINVAL;
> @@ -77,31 +61,14 @@ static int __lz4hc_compress_crypto(const u8 *src, unsigned int slen,
>  	return 0;
>  }
>  
> -static int lz4hc_scompress(struct crypto_scomp *tfm, const u8 *src,
> -			   unsigned int slen, u8 *dst, unsigned int *dlen,
> -			   void *ctx)
> -{
> -	return __lz4hc_compress_crypto(src, slen, dst, dlen, ctx);
> -}
> -
> -static int lz4hc_compress_crypto(struct crypto_tfm *tfm, const u8 *src,
> -				 unsigned int slen, u8 *dst,
> -				 unsigned int *dlen)
> -{
> -	struct lz4hc_ctx *ctx = crypto_tfm_ctx(tfm);
> -
> -	return __lz4hc_compress_crypto(src, slen, dst, dlen,
> -					ctx->lz4hc_comp_mem);
> -}
> -
> -static int __lz4hc_decompress_crypto(const u8 *src, unsigned int slen,
> -				     u8 *dst, unsigned int *dlen, void *ctx)
> +static int lz4hc_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
> +			      unsigned int slen, u8 *dst, unsigned int *dlen)
>  {
>  	int err;
>  	size_t tmp_len = *dlen;
>  	size_t __slen = slen;
>  
> -	err = lz4_decompress_unknownoutputsize(src, __slen, dst, &tmp_len);
> +	err = LZ4_decompress_safe(src, dst, __slen, (int)tmp_len);
>  	if (err < 0)
>  		return -EINVAL;
>  
> @@ -109,20 +76,6 @@ static int __lz4hc_decompress_crypto(const u8 *src, unsigned int slen,
>  	return err;
>  }
>  
> -static int lz4hc_sdecompress(struct crypto_scomp *tfm, const u8 *src,
> -			     unsigned int slen, u8 *dst, unsigned int *dlen,
> -			     void *ctx)
> -{
> -	return __lz4hc_decompress_crypto(src, slen, dst, dlen, NULL);
> -}
> -
> -static int lz4hc_decompress_crypto(struct crypto_tfm *tfm, const u8 *src,
> -				   unsigned int slen, u8 *dst,
> -				   unsigned int *dlen)
> -{
> -	return __lz4hc_decompress_crypto(src, slen, dst, dlen, NULL);
> -}
> -
>  static struct crypto_alg alg_lz4hc = {
>  	.cra_name		= "lz4hc",
>  	.cra_flags		= CRYPTO_ALG_TYPE_COMPRESS,
> @@ -136,39 +89,14 @@ static struct crypto_alg alg_lz4hc = {
>  	.coa_decompress		= lz4hc_decompress_crypto } }
>  };
>  
> -static struct scomp_alg scomp = {
> -	.alloc_ctx		= lz4hc_alloc_ctx,
> -	.free_ctx		= lz4hc_free_ctx,
> -	.compress		= lz4hc_scompress,
> -	.decompress		= lz4hc_sdecompress,
> -	.base			= {
> -		.cra_name	= "lz4hc",
> -		.cra_driver_name = "lz4hc-scomp",
> -		.cra_module	 = THIS_MODULE,
> -	}
> -};
> -
>  static int __init lz4hc_mod_init(void)
>  {
> -	int ret;
> -
> -	ret = crypto_register_alg(&alg_lz4hc);
> -	if (ret)
> -		return ret;
> -
> -	ret = crypto_register_scomp(&scomp);
> -	if (ret) {
> -		crypto_unregister_alg(&alg_lz4hc);
> -		return ret;
> -	}
> -
> -	return ret;
> +	return crypto_register_alg(&alg_lz4hc);
>  }
>  
>  static void __exit lz4hc_mod_fini(void)
>  {
>  	crypto_unregister_alg(&alg_lz4hc);
> -	crypto_unregister_scomp(&scomp);
>  }
>  
>  module_init(lz4hc_mod_init);
> -- 
> 2.1.4
> 

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22 13:10 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Theodore Ts'o, Andy Lutomirski, Netdev, LKML,
	Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <1482410840.2472.2.camel@stressinduktion.org>

On Thu, Dec 22, 2016 at 1:47 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> following up on what appears to be a random subject: ;)
>
> IIRC, ext4 code by default still uses half_md4 for hashing of filenames
> in the htree. siphash seems to fit this use case pretty good.

I saw this too. I'll try to address it in v8 of this series.

^ permalink raw reply

* Re: Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Hannes Frederic Sowa @ 2016-12-22 12:47 UTC (permalink / raw)
  To: Theodore Ts'o, kernel-hardening
  Cc: Andy Lutomirski, Netdev, LKML, Linux Crypto Mailing List,
	David Laight, Eric Dumazet, Linus Torvalds, Eric Biggers,
	Tom Herbert, Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <20161222054125.lzxhd6ctovm3wk4p@thunk.org>

Hi Ted,

On Thu, 2016-12-22 at 00:41 -0500, Theodore Ts'o wrote:
> On Thu, Dec 22, 2016 at 03:49:39AM +0100, Jason A. Donenfeld wrote:
> > 
> > Funny -- while you guys were sending this back & forth, I was writing
> > my reply to Andy which essentially arrives at the same conclusion.
> > Given that we're all arriving to the same thing, and that Ted shot in
> > this direction long before we all did, I'm leaning toward abandoning
> > SipHash for the de-MD5-ification of get_random_int/long, and working
> > on polishing Ted's idea into something shiny for this patchset.
> 
> here are my numbers comparing siphash (using the first three patches
> of the v7 siphash patches) with my batched chacha20 implementation.
> The results are taken by running get_random_* 10000 times, and then
> dividing the numbers by 10000 to get the average number of cycles for
> the call.  I compiled 32-bit and 64-bit kernels, and ran the results
> using kvm:
> 
>                    siphash                        batched chacha20
>          get_random_int  get_random_long   get_random_int   get_random_long   
> 
> 32-bit    270                  278             114            146
> 64-bit     75                   75             106            186
> 
> > I did have two objections to this. The first was that my SipHash
> > construction is faster.
> 
> Well, it's faster on everything except 32-bit x86.  :-P
> 
> > The second, and the more
> > important one, was that batching entropy up like this means that 32
> > calls will be really fast, and then the 33rd will be slow, since it
> > has to do a whole ChaCha round, because get_random_bytes must be
> > called to refill the batch.
> 
> ... and this will take 2121 cycles on 64-bit x86, and 2315 cycles on a
> 32-bit x86.  Which on a 2.3 GHz processor, is just under a
> microsecond.  As far as being inconsistent on process startup, I very
> much doubt a microsecond is really going to be visible to the user.  :-)
> 
> The bottom line is that I think we're really "pixel peeping" at this
> point --- which is what obsessed digital photographers will do when
> debating the quality of a Canon vs Nikon DSLR by blowing up a photo by
> a thousand times, and then trying to claim that this is visible to the
> human eye.  Or people who obsessing over the frequency response curves
> of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's
> likely that in a blind head-to-head comparison, most people wouldn't
> be able to tell the difference....
> 
> I think the main argument for using the batched getrandom approach is
> that it, I would argue, simpler than introducing siphash into the
> picture.  On 64-bit platforms it is faster and more consistent, so
> it's basically that versus complexity of having to adding siphash to
> the things that people have to analyze when considering random number
> security on Linux.   But it's a close call either way, I think.

following up on what appears to be a random subject: ;)

IIRC, ext4 code by default still uses half_md4 for hashing of filenames
in the htree. siphash seems to fit this use case pretty good.

xfs could also need an update, as they don't seed the directory hash
tables at all (but not sure if they are vulnerable). I should improve
[1] a bit.

[1] http://oss.sgi.com/cgi-bin/gitweb.cgi?p=xfs/cmds/xfstests.git;a=blo
b;f=src/dirhash_collide.c;h=55cec872d5061ac2ca0f56d1f11e6bf349d5bb97;hb
=HEAD

Bye,
Hannes

^ permalink raw reply

* Re: [RFC PATCH v2] crypto: Add IV generation algorithms
From: Binoy Jayan @ 2016-12-22 10:55 UTC (permalink / raw)
  To: Herbert Xu
  Cc: Milan Broz, Oded, Ofir, David S. Miller, linux-crypto, Mark Brown,
	Arnd Bergmann, Linux kernel mailing list, Alasdair Kergon,
	Mike Snitzer, dm-devel, Shaohua Li, linux-raid, Rajendra
In-Reply-To: <20161222085509.GA2160@gondor.apana.org.au>

Hi Herbert,

On 22 December 2016 at 14:25, Herbert Xu <herbert@gondor.apana.org.au> wrote:
> On Tue, Dec 13, 2016 at 11:01:08AM +0100, Milan Broz wrote:
>>
>> By the move everything to cryptoAPI we are basically introducing some strange mix
>> of IV and modes there, I wonder how this is going to be maintained.
>> Anyway, Herbert should say if it is ok...
>
> Well there is precedent in how do the IPsec IV generation.  In
> that case the IV generators too are completely specific to that
> application, i.e., IPsec.  However, the way structured it allowed
> us to have one single entry path from the IPsec stack into the
> crypto layer regardless of whether you are using AEAD or traditional
> encryption/hashing algorithms.
>
> For IPsec we make the IV generators behave like normal AEAD
> algorithms, except that they take the sequence number as the IV.
>
> The goal here are obviously different.  However, by employing
> the same method as we do in IPsec, it appears to me that you
> can effectively process multiple blocks at once instead of having
> to supply one block at a time due to the IV generation issue.

Thank you for clarifying that part.:)
So, I hope we can consider algorithms like lmk and tcw too as IV generation
algorithms, even though they manipulate encrypted data directly?

>> I really do not think the disk encryption key management should be moved
>> outside of dm-crypt. We cannot then change key structure later easily.

I agree with this too, only problem with this is when multiple keys are involved
(where the master key is split into 2 or more), and the key selection is made
with a modular division of the (512-byte) sector number by the number of keys.

> It doesn't have to live outside of dm-crypt.  You can register
> these IV generators from there if you really want.

Sorry, but I didn't understand this part.

Thanks,
Binoy

^ permalink raw reply

* RE: dm-crypt optimization
From: Ofir Drang @ 2016-12-22 10:14 UTC (permalink / raw)
  To: Herbert Xu, Binoy Jayan
  Cc: Milan Broz, Oded Golombek, Arnd Bergmann, Mark Brown,
	Alasdair Kergon, David S. Miller, private-kwg@linaro.org,
	dm-devel@redhat.com, linux-crypto@vger.kernel.org, Rajendra,
	Linux kernel mailing list, linux-raid@vger.kernel.org, Shaohua Li,
	Mike Snitzer
In-Reply-To: <20161222085927.GB2160@gondor.apana.org.au>



-----Original Message-----
From: Herbert Xu [mailto:herbert@gondor.apana.org.au]
Sent: Thursday, December 22, 2016 10:59 AM
To: Binoy Jayan
Cc: Milan Broz; Oded Golombek; Ofir Drang; Arnd Bergmann; Mark Brown; Alasdair Kergon; David S. Miller; private-kwg@linaro.org; dm-devel@redhat.com; linux-crypto@vger.kernel.org; Rajendra; Linux kernel mailing list; linux-raid@vger.kernel.org; Shaohua Li; Mike Snitzer
Subject: Re: dm-crypt optimization

On Thu, Dec 22, 2016 at 01:55:59PM +0530, Binoy Jayan wrote:
>>
>> > Support of bigger block sizes would be unsafe without additional
>> > mechanism that provides atomic writes of multiple sectors. Maybe it
>> > applies to 4k as well on some devices though...)
>>
>> Did you mean write to the crypto output buffers or the actual disk write?
>> I didn't quite understand how the block size for encryption affects
>> atomic writes as it is the block layer which handles them. As far as
>> dm-crypt is, concerned it just encrypts/decrypts a 'struct bio'
>> instance and submits the IO operation to the block layer.

>I think Milan's talking about increasing the real block size, which would obviously require the hardware to be able to write that out atomically, as otherwise it breaks the crypto.
>
>But if we can instead do the IV generation within the crypto API, then the block size won't be an issue at all.  Because you can supply as many blocks as you want and they would be processed block-by-block.
>
>Now there is a disadvantage to this approach, and that is you have to wait for the whole thing to be encrypted before you can start doing the IO.  I'm not sure how big a problem that is but if it is bad enough to affect performance, we can look into adding >some form of partial completion to the crypto API.
>
>Cheers,

But assuming we have hardware accelerator that know to handle the IV generation for each sector, it will make sense to send out to the hardware the maximum block size as this will allow us to better utilize the hardware and offload the software. So if possible we need to provide generic interface that will be able to optimize the hardware accelerates.

Thx Ofir
--
Email: Herbert Xu <herbert@gondor.apana.org.au> Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

^ permalink raw reply

* Re: dm-crypt optimization
From: Herbert Xu @ 2016-12-22  8:59 UTC (permalink / raw)
  To: Binoy Jayan
  Cc: Milan Broz, Oded, Ofir, Arnd Bergmann, Mark Brown,
	Alasdair Kergon, David S. Miller, private-kwg, dm-devel,
	linux-crypto, Rajendra, Linux kernel mailing list, linux-raid,
	Shaohua Li, Mike Snitzer
In-Reply-To: <CAHv-k_-K9dOiM+Pm_wqVJrvmNYhjtS82-emKVZ8OjsoMHf+7hg@mail.gmail.com>

On Thu, Dec 22, 2016 at 01:55:59PM +0530, Binoy Jayan wrote:
>
> > Support of bigger block sizes would be unsafe without additional mechanism that provides
> > atomic writes of multiple sectors. Maybe it applies to 4k as well on some devices though...)
> 
> Did you mean write to the crypto output buffers or the actual disk write?
> I didn't quite understand how the block size for encryption affects atomic
> writes as it is the block layer which handles them. As far as dm-crypt is,
> concerned it just encrypts/decrypts a 'struct bio' instance and submits the IO
> operation to the block layer.

I think Milan's talking about increasing the real block size, which
would obviously require the hardware to be able to write that out
atomically, as otherwise it breaks the crypto.

But if we can instead do the IV generation within the crypto API,
then the block size won't be an issue at all.  Because you can
supply as many blocks as you want and they would be processed
block-by-block.

Now there is a disadvantage to this approach, and that is you
have to wait for the whole thing to be encrypted before you can 
start doing the IO.  I'm not sure how big a problem that is but
if it is bad enough to affect performance, we can look into adding
some form of partial completion to the crypto API.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: [RFC PATCH v2] crypto: Add IV generation algorithms
From: Herbert Xu @ 2016-12-22  8:55 UTC (permalink / raw)
  To: Milan Broz
  Cc: Binoy Jayan, Oded, Ofir, David S. Miller, linux-crypto,
	Mark Brown, Arnd Bergmann, linux-kernel, Alasdair Kergon,
	Mike Snitzer, dm-devel, Shaohua Li, linux-raid, Rajendra
In-Reply-To: <d6d92865-98fa-4d02-035f-9080bc265c35@gmail.com>

On Tue, Dec 13, 2016 at 11:01:08AM +0100, Milan Broz wrote:
>
> By the move everything to cryptoAPI we are basically introducing some strange mix
> of IV and modes there, I wonder how this is going to be maintained.
> Anyway, Herbert should say if it is ok...

Well there is precedent in how do the IPsec IV generation.  In
that case the IV generators too are completely specific to that
application, i.e., IPsec.  However, the way structured it allowed
us to have one single entry path from the IPsec stack into the
crypto layer regardless of whether you are using AEAD or traditional
encryption/hashing algorithms.

For IPsec we make the IV generators behave like normal AEAD
algorithms, except that they take the sequence number as the IV.

The goal here are obviously different.  However, by employing
the same method as we do in IPsec, it appears to me that you
can effectively process multiple blocks at once instead of having
to supply one block at a time due to the IV generation issue.

> I really do not think the disk encryption key management should be moved
> outside of dm-crypt. We cannot then change key structure later easily.

It doesn't have to live outside of dm-crypt.  You can register
these IV generators from there if you really want.

Cheers,
-- 
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

^ permalink raw reply

* Re: dm-crypt optimization
From: Binoy Jayan @ 2016-12-22  8:25 UTC (permalink / raw)
  To: Milan Broz
  Cc: Oded, Ofir, Herbert Xu, Arnd Bergmann, Mark Brown,
	Alasdair Kergon, David S. Miller, private-kwg, dm-devel,
	linux-crypto, Rajendra, Linux kernel mailing list, linux-raid,
	Shaohua Li, Mike Snitzer
In-Reply-To: <bf5e7237-2f5c-3fc6-c7d6-38c3b13ac2c3@gmail.com>

Hi Milan,

On 21 December 2016 at 18:17, Milan Broz <gmazyland@gmail.com> wrote:

> So the core problem is that your crypto accelerator can operate efficiently only
> with bigger batch sizes.

Thank you for the reply. Yes, that would be rather an improvement when having
bigger block sizes.

> How big blocks your crypto hw need to be able to operate more efficiently?
> What about 4k blocks (no batches), could it be usable trade-off?

The benchmark results for Qualcomm Snapdragon SoC's (mentioned below) show
significant improvement with 4K blocks but in batches of all such contiguous
segments in the block layer's request queue in the form of a chained
scatterlist.
However, it uses the algorithm 'aes-xts' instead of the conventional
'essiv-cbc-aes'
used in dm-crypt. Also, it uses the device mapper dm-req-crypt instead
of dm-cypt.

http://nelenkov.blogspot.in/2015/05/hardware-accelerated-disk-encryption-in.html
Section : 'Performance'

Its reports and IO rate of 46.3MB/s compared to an IO rate of 25.1MB/s while
using a software-based FDE (based on dm-crypt).  But I am not sure how genuine
this data is or how it was tested.

Since qualcomm SoC's use hardware backed keystore for managing keys and since
there is no easy way to make dm-crypt work with qualcomm's engines, I do not
have solid benchmark data to show an improved performance when using 4k blocks.

> With some (backward incompatible) changes in LUKS format I would like to see support
> for encryption blocks equivalent to sectors size, so it basically means for 4k drive 4k
> encryption block.
> (This should decrease overhead, now is everything processed on 512 blocks only.)
>
> Support of bigger block sizes would be unsafe without additional mechanism that provides
> atomic writes of multiple sectors. Maybe it applies to 4k as well on some devices though...)

Did you mean write to the crypto output buffers or the actual disk write?
I didn't quite understand how the block size for encryption affects atomic
writes as it is the block layer which handles them. As far as dm-crypt is,
concerned it just encrypts/decrypts a 'struct bio' instance and submits the IO
operation to the block layer.

> The above is not going against your proposal, I am just curious if this is enough
> to provide better performance on your hw accelerator or not.

May be I should be able to procure an open crypto board and get back to you with
some results. Or may be show even a marginal improvement while using software
algorithm by avoiding the crypto overhead for every 512 bytes.

-Binoy

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-22  8:02 UTC (permalink / raw)
  To: linux, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, luto, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <CALCETrU84-EeU91AkoAZLWrhK=FSrBgV139oSSM-Vw5Mc0mdAg@mail.gmail.com>

> True, but it's called get_random_int(), and it seems like making it
> stronger, especially if the performance cost is low to zero, is a good
> thing.

If it's cheap enough, I don't mind.  But it's documented as being
marginal-quality, used where speed is more important.

In particular, it's *not* used for key material; only values that matter
only as long as they are in use.  Whule they're in use, they can't be
concealed from an attacker with kernel access, and when they're dne
being used, they're worthless.

>> If you want anti-backtracking, though, it's easy to add.  What we
>> hash is:
>>
>> entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ...
>>
>> You mix the output word right back in to the (unfinalized) state after
>> generating it.  This is still equivalent to unmodified back-box SipHash,
>> you're just using a (conceptually independent) SipHash invocation to
>> produce some of its input.

> Ah, cute.  This could probably be sped up by doing something like:
>
> entropy_0 || secret || output_0 ^ entropy_1 || secret || ...

I'm disinclined to do that because that requires deferring the mixing
until the *next* time you generate something.  Storing the value you
don't want revealed by a state capture defeats the purpose.

I'd rather mix it in immediately, so you have anti-backtracking from
the moment of creation.

Also, I don't think it's particularly "cute" or clever; mixing output back
in is the standard way all antibacktracking is accomplished.  It's how
the Davies-Meyer hash construction makes a reversible function one-way.

(There is a second way to do it by throwing away state, but that's
expensive in seed entropy.)

> It's a little weak because the output is only 64 bits, so you could
> plausibly backtrack it on a GPU or FPGA cluster or on an ASIC if the
> old entropy is guessable.  I suspect there are sneaky ways around it
> like using output_n-1 ^ output_n-2 or similar.  I'll sleep on it.

Ah, yes, I see.  Given the final state, you guess the output word, go
back one round, then forward the finalization rounds.   Is the output
equal to the guessed output?  You'll find the true value, plus
Poisson(1 - 2^-64) additional.  (Since you have 2^64-1 chances at
something with probability 1 in 2^64.)

And this can be iterated as often as you like to get earlier output words,
as long as you can guess the entropy.  *That's* the part that hurts;
you'd like something that peters out.

You could use the double-length-output SipHash variant (which requires
a second set of finalization rounds) and mix more output back, but
that's expensive.

The challenge is coming up with more unpredictable data to mix in than one
invocation of SipHash returns.  And without storing previous output
anywhere, because that is exactly wrong.

A running sum or xor or whatever of the outputs doesn't help, because
once you've guessed the last output, you can backtrack that for no
additional effort.

State capture is incredibly difficult, our application doesn't require
resistance anyway... unless you can think of something cheap, I think
we can just live with this.

>> I'd *like* to persuade you that skipping the padding byte wouldn't
>> invalidate any security proofs, because it's true and would simplify
>> the code.  But if you want 100% stock, I'm willing to cater to that.

> I lean toward stock in the absence of a particularly good reason.  At
> the very least I'd want to read that paper carefully.

Er... adding the length is standard Merkle-Damgaard strengthening.
Why you do this is described in the original papers by Merkle and Damgaard.

The lay summary is at
https://en.wikipedia.org/wiki/Merkle-Damgard_construction

The original sources are:
http://www.merkle.com/papers/Thesis1979.pdf
http://saluc.engr.uconn.edu/refs/algorithms/hashalg/damgard89adesign.pdf

Merkle describes the construction; Damgaard proves it secure.  Basically,
appending the length is required to handle variable-length input if the
input is not itself self-delimiting.

The proof of security is theorem 3.1 in the latter.  (The first, more
detailed explanation involves the use of an extra bit, which the second
then explains how todo without.)

In particular, see the top of page 420, which notes that the security
proof only requires encoding *how much padding is added* in the final
block, not the overall length of the message, and the second remark on
p. 421 which notes that no such suffix is required if it's not necessary
to distinguish messages with different numbers of trailing null bytes.

The rules are alluded to in the "Choice of padding rule" part of the
"Rationale" section of the SipHash paper (p. 7), but the description is
very brief because it assumes the reader has the background.

That's why they say "We could have chosen a slightly simpler padding rule,
such as appending a <tt>80</tt> byte followed by zeroes."

The thing is, if the amount of the last block that is used is fixed
(within the domain of a particular key), you don't need to encode this
information at all.

^ permalink raw reply

* Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22  6:03 UTC (permalink / raw)
  To: kernel-hardening, Theodore Ts'o, Hannes Frederic Sowa,
	Andy Lutomirski, Netdev, LKML, Linux Crypto Mailing List,
	David Laight, Eric Dumazet, Linus Torvalds, Eric Biggers,
	Tom Herbert, Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <20161222054125.lzxhd6ctovm3wk4p@thunk.org>

Hi Ted,

On Thu, Dec 22, 2016 at 6:41 AM, Theodore Ts'o <tytso@mit.edu> wrote:
> The bottom line is that I think we're really "pixel peeping" at this
> point --- which is what obsessed digital photographers will do when
> debating the quality of a Canon vs Nikon DSLR by blowing up a photo by
> a thousand times, and then trying to claim that this is visible to the
> human eye.  Or people who obsessing over the frequency response curves
> of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's
> likely that in a blind head-to-head comparison, most people wouldn't
> be able to tell the difference....

This is hilarious, thanks for the laugh. I believe you're right about this...

>
> I think the main argument for using the batched getrandom approach is
> that it, I would argue, simpler than introducing siphash into the
> picture.  On 64-bit platforms it is faster and more consistent, so
> it's basically that versus complexity of having to adding siphash to
> the things that people have to analyze when considering random number
> security on Linux.   But it's a close call either way, I think.

I find this compelling. We'll have one csprng for both
get_random_int/long and for /dev/urandom, and we'll be able to update
that silly warning on the comment of get_random_int/long to read
"gives output of either rdrand quality or of /dev/urandom quality",
which makes it more useful for other things. It introduces less error
prone code, and it lets the RNG analysis be spent on just one RNG, not
two.

So, with your blessing, I'm going to move ahead with implementing a
pretty version of this for v8.

Regards,
Jason

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Andy Lutomirski @ 2016-12-22  5:42 UTC (permalink / raw)
  To: George Spelvin
  Cc: Andrew Lutomirski, Andi Kleen, David S. Miller, David Laight,
	D. J. Bernstein, Eric Biggers, Eric Dumazet, Hannes Frederic Sowa,
	Jason A. Donenfeld, Jean-Philippe Aumasson,
	kernel-hardening@lists.openwall.com, Linux Crypto Mailing List,
	linux-kernel@vger.kernel.org, Network Development, Tom Herbert,
	Linus Torvalds, Ted Ts'o
In-Reply-To: <20161222050138.12011.qmail@ns.sciencehorizons.net>

On Wed, Dec 21, 2016 at 9:01 PM, George Spelvin
<linux@sciencehorizons.net> wrote:
> Andy Lutomirski wrote:
>> I don't even think it needs that.  This is just adding a
>> non-destructive final operation, right?
>
> It is, but the problem is that SipHash is intended for *small* inputs,
> so the standard implementations aren't broken into init/update/final
> functions.
>
> There's just one big function that keeps the state variables in
> registers and never stores them anywhere.
>
> If we *had* init/update/final functions, then it would be trivial.
>
>> Just to clarify, if we replace SipHash with a black box, I think this
>> effectively means, where "entropy" is random_get_entropy() || jiffies
>> || current->pid:
>
>> The first call returns H(random seed || entropy_0 || secret).  The
>> second call returns H(random seed || entropy_0 || secret || entropy_1
>> || secret).  Etc.
>
> Basically, yes.  I was skipping the padding byte and keying the
> finalization rounds on the grounds of "can't hurt and might help",
> but we could do it a more standard way.
>
>> If not, then I have a fairly strong preference to keep whatever
>> construction we come up with consistent with something that could
>> actually happen with invocations of unmodified SipHash -- then all the
>> security analysis on SipHash goes through.
>
> Okay.  I don't think it makes a difference, but it's not a *big* waste
> of time.  If we have finalization rounds, we can reduce the secret
> to 128 bits.
>
> If we include the padding byte, we can do one of two things:
> 1) Make the secret 184 bits, to fill up the final partial word as
>    much as possible, or
> 2) Make the entropy 1 byte smaller and conceptually misalign the
>    secret.  What we'd actually do is remove the last byte of
>    the secret and include it in the entropy words, but that's
>    just a rotation of the secret between storage and hashing.
>
> Also, I assume you'd like SipHash-2-4, since you want to rely
> on a security analysis.

I haven't looked, but I assume that the analysis at least thought
about reduced rounds, so maybe other variants are okay.

>> The one thing I don't like is
>> that I don't see how to prove that you can't run it backwards if you
>> manage to acquire a memory dump.  In fact, I that that there exist, at
>> least in theory, hash functions that are secure in the random oracle
>> model but that *can* be run backwards given the full state.  From
>> memory, SHA-3 has exactly that property, and it would be a bit sad for
>> a CSPRNG to be reversible.
>
> Er...  get_random_int() is specifically *not* designed to be resistant
> to state capture, and I didn't try.  Remember, what it's used for
> is ASLR, what we're worried about is somene learning the layouts
> of still-running processes, and and if you get a memory dump, you have
> the memory layout!

True, but it's called get_random_int(), and it seems like making it
stronger, especially if the performance cost is low to zero, is a good
thing.

>
> If you want anti-backtracking, though, it's easy to add.  What we
> hash is:
>
> entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ...
>
> You mix the output word right back in to the (unfinalized) state after
> generating it.  This is still equivalent to unmodified back-box SipHash,
> you're just using a (conceptually independent) SipHash invocation to
> produce some of its input.

Ah, cute.  This could probably be sped up by doing something like:

entropy_0 || secret || output_0 ^ entropy_1 || secret || ...

It's a little weak because the output is only 64 bits, so you could
plausibly backtrack it on a GPU or FPGA cluster or on an ASIC if the
old entropy is guessable.  I suspect there are sneaky ways around it
like using output_n-1 ^ output_n-2 or similar.  I'll sleep on it.

>
> The only remaining issues are:
> 1) How many rounds, and
> 2) May we use HalfSipHash?

I haven't looked closely enough to have a real opinion here.  I don't
know what the security margin is believed to be.

>
> I'd *like* to persuade you that skipping the padding byte wouldn't
> invalidate any security proofs, because it's true and would simplify
> the code.  But if you want 100% stock, I'm willing to cater to that.

I lean toward stock in the absence of a particularly good reason.  At
the very least I'd want to read that paper carefully.

>
> Ted, what do you think?



-- 
Andy Lutomirski
AMA Capital Management, LLC

^ permalink raw reply

* Re: Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Theodore Ts'o @ 2016-12-22  5:41 UTC (permalink / raw)
  To: kernel-hardening
  Cc: Hannes Frederic Sowa, Andy Lutomirski, Netdev, LKML,
	Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAHmME9phg=GuhEUaMxxv_=RexffPDqrOEhmaKffy_ZSt7bfC7g@mail.gmail.com>

On Thu, Dec 22, 2016 at 03:49:39AM +0100, Jason A. Donenfeld wrote:
> 
> Funny -- while you guys were sending this back & forth, I was writing
> my reply to Andy which essentially arrives at the same conclusion.
> Given that we're all arriving to the same thing, and that Ted shot in
> this direction long before we all did, I'm leaning toward abandoning
> SipHash for the de-MD5-ification of get_random_int/long, and working
> on polishing Ted's idea into something shiny for this patchset.

here are my numbers comparing siphash (using the first three patches
of the v7 siphash patches) with my batched chacha20 implementation.
The results are taken by running get_random_* 10000 times, and then
dividing the numbers by 10000 to get the average number of cycles for
the call.  I compiled 32-bit and 64-bit kernels, and ran the results
using kvm:

                   siphash                        batched chacha20
         get_random_int  get_random_long   get_random_int   get_random_long   

32-bit    270                  278             114            146
64-bit     75                   75             106            186

> I did have two objections to this. The first was that my SipHash
> construction is faster.

Well, it's faster on everything except 32-bit x86.  :-P

> The second, and the more
> important one, was that batching entropy up like this means that 32
> calls will be really fast, and then the 33rd will be slow, since it
> has to do a whole ChaCha round, because get_random_bytes must be
> called to refill the batch.

... and this will take 2121 cycles on 64-bit x86, and 2315 cycles on a
32-bit x86.  Which on a 2.3 GHz processor, is just under a
microsecond.  As far as being inconsistent on process startup, I very
much doubt a microsecond is really going to be visible to the user.  :-)

The bottom line is that I think we're really "pixel peeping" at this
point --- which is what obsessed digital photographers will do when
debating the quality of a Canon vs Nikon DSLR by blowing up a photo by
a thousand times, and then trying to claim that this is visible to the
human eye.  Or people who obsessing over the frequency response curves
of TH-X00 headphones with Mahogony vs Purpleheart wood, when it's
likely that in a blind head-to-head comparison, most people wouldn't
be able to tell the difference....

I think the main argument for using the batched getrandom approach is
that it, I would argue, simpler than introducing siphash into the
picture.  On 64-bit platforms it is faster and more consistent, so
it's basically that versus complexity of having to adding siphash to
the things that people have to analyze when considering random number
security on Linux.   But it's a close call either way, I think.

	    	     	      	      - Ted

P.S.  My benchmarking code....

diff --git a/drivers/char/random.c b/drivers/char/random.c
index a51f0ff43f00..41860864b775 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -1682,6 +1682,55 @@ static int rand_initialize(void)
 }
 early_initcall(rand_initialize);
 
+static unsigned int get_random_int_new(void);
+static unsigned long get_random_long_new(void);
+
+#define NUM_CYCLES 10000
+#define AVG(finish, start) ((unsigned int)(finish - start + NUM_CYCLES/2) / NUM_CYCLES)
+
+static int rand_benchmark(void)
+{
+	cycles_t start,finish;
+	int i, out;
+
+	pr_crit("random benchmark!!\n");
+	start = get_cycles();
+	for (i = 0; i < NUM_CYCLES; i++) {
+		get_random_int();}
+	finish = get_cycles();
+	pr_err("get_random_int # cycles: %u\n", AVG(finish, start));
+
+	start = get_cycles();
+	for (i = 0; i < NUM_CYCLES; i++) {
+		get_random_int_new();
+	}
+	finish = get_cycles();
+	pr_err("get_random_int_new (batched chacha20) # cycles: %u\n", AVG(finish, start));
+
+	start = get_cycles();
+	for (i = 0; i < NUM_CYCLES; i++) {
+		get_random_long();
+	}
+	finish = get_cycles();
+	pr_err("get_random_long # cycles: %u\n", AVG(finish, start));
+
+	start = get_cycles();
+	for (i = 0; i < NUM_CYCLES; i++) {
+		get_random_long_new();
+	}
+	finish = get_cycles();
+	pr_err("get_random_long_new (batched chacha20) # cycles: %u\n", AVG(finish, start));
+
+	start = get_cycles();
+	for (i = 0; i < NUM_CYCLES; i++) {
+		get_random_bytes(&out, sizeof(out));
+	}
+	finish = get_cycles();
+	pr_err("get_random_bytes # cycles: %u\n", AVG(finish, start));
+	return 0;
+}
+device_initcall(rand_benchmark);
+
 #ifdef CONFIG_BLOCK
 void rand_initialize_disk(struct gendisk *disk)
 {
@@ -2064,8 +2113,10 @@ unsigned int get_random_int(void)
 	unsigned int ret;
 	u64 *chaining;
 
+#if 0	// force slow path
 	if (arch_get_random_int(&ret))
 		return ret;
+#endif
 
 	chaining = &get_cpu_var(get_random_int_chaining);
 	ret = *chaining = siphash_3u64(*chaining, jiffies, random_get_entropy() +
@@ -2083,8 +2134,10 @@ unsigned long get_random_long(void)
 	unsigned long ret;
 	u64 *chaining;
 
+#if 0 // force slow path
 	if (arch_get_random_long(&ret))
 		return ret;
+#endif 
 
 	chaining = &get_cpu_var(get_random_int_chaining);
 	ret = *chaining = siphash_3u64(*chaining, jiffies, random_get_entropy() +
@@ -2094,6 +2147,47 @@ unsigned long get_random_long(void)
 }
 EXPORT_SYMBOL(get_random_long);
 
+struct random_buf {
+	__u8 buf[CHACHA20_BLOCK_SIZE];
+	int ptr;
+};
+
+static DEFINE_PER_CPU(struct random_buf, batched_entropy);
+
+static void get_batched_entropy(void *buf, int n)
+{
+	struct random_buf *p;
+
+	p = &get_cpu_var(batched_entropy);
+
+	if ((p->ptr == 0) ||
+	    (p->ptr + n >= CHACHA20_BLOCK_SIZE)) {
+		extract_crng(p->buf);
+		p->ptr = 0;
+	}
+	BUG_ON(n > CHACHA20_BLOCK_SIZE);
+	memcpy(buf, p->buf, n);
+	p->ptr += n;
+	put_cpu_var(batched_entropy);
+}
+
+static unsigned int get_random_int_new(void)
+{
+	unsigned int ret;
+
+	get_batched_entropy(&ret, sizeof(ret));
+	return ret;
+}
+
+static unsigned long get_random_long_new(void)
+{
+	unsigned long ret;
+
+	get_batched_entropy(&ret, sizeof(ret));
+	return ret;
+}
+
+
 /**
  * randomize_page - Generate a random, page aligned address
  * @start:	The smallest acceptable address the caller will take.

^ permalink raw reply related

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-22  5:01 UTC (permalink / raw)
  To: linux, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <CALCETrVn1tWBQx-RCSqCQ2ZcB6hPdioaV52q8vY+Mz1fRKsUXA@mail.gmail.com>

Andy Lutomirski wrote:
> I don't even think it needs that.  This is just adding a
> non-destructive final operation, right?

It is, but the problem is that SipHash is intended for *small* inputs,
so the standard implementations aren't broken into init/update/final
functions.

There's just one big function that keeps the state variables in
registers and never stores them anywhere.

If we *had* init/update/final functions, then it would be trivial.

> Just to clarify, if we replace SipHash with a black box, I think this
> effectively means, where "entropy" is random_get_entropy() || jiffies
> || current->pid:

> The first call returns H(random seed || entropy_0 || secret).  The
> second call returns H(random seed || entropy_0 || secret || entropy_1
> || secret).  Etc.

Basically, yes.  I was skipping the padding byte and keying the
finalization rounds on the grounds of "can't hurt and might help",
but we could do it a more standard way.

> If not, then I have a fairly strong preference to keep whatever
> construction we come up with consistent with something that could
> actually happen with invocations of unmodified SipHash -- then all the
> security analysis on SipHash goes through.

Okay.  I don't think it makes a difference, but it's not a *big* waste
of time.  If we have finalization rounds, we can reduce the secret
to 128 bits.

If we include the padding byte, we can do one of two things:
1) Make the secret 184 bits, to fill up the final partial word as
   much as possible, or
2) Make the entropy 1 byte smaller and conceptually misalign the
   secret.  What we'd actually do is remove the last byte of
   the secret and include it in the entropy words, but that's
   just a rotation of the secret between storage and hashing.

Also, I assume you'd like SipHash-2-4, since you want to rely
on a security analysis.

(Regarding the padding byte, getting it right might be annoying
to do exactly.  All of the security analysis depends *only* on
its low 3 bits indicating how much of the final block is used.
As it says in the SipHash paper, they included 8 bits just because
it was easy.  But if you want it exact, it's just one more byte of
state.)

> The one thing I don't like is
> that I don't see how to prove that you can't run it backwards if you
> manage to acquire a memory dump.  In fact, I that that there exist, at
> least in theory, hash functions that are secure in the random oracle
> model but that *can* be run backwards given the full state.  From
> memory, SHA-3 has exactly that property, and it would be a bit sad for
> a CSPRNG to be reversible.

Er...  get_random_int() is specifically *not* designed to be resistant
to state capture, and I didn't try.  Remember, what it's used for
is ASLR, what we're worried about is somene learning the layouts
of still-running processes, and and if you get a memory dump, you have
the memory layout!

If you want anti-backtracking, though, it's easy to add.  What we
hash is:

entropy_0 || secret || output_0 || entropy_1 || secret || output_1 || ...

You mix the output word right back in to the (unfinalized) state after
generating it.  This is still equivalent to unmodified back-box SipHash,
you're just using a (conceptually independent) SipHash invocation to
produce some of its input.

Each output is produced by copying the state, padding & finalizing after the
secret.

In fact, to make our lives easier, let's define the secret to end with
a counter byte that happens to be equal to the padding byte.  The input
stream will be:

Previous output: 8 (or 4 for HalfSipHash) bytes
Entropy: 15 bytes (8 bytes timer, 4 bytes jiffies, 3 bytes pid)
Secret: 16 bytes
Counter: 1 byte
...repeat...

> We could also periodically mix in a big (128-bit?) chunk of fresh
> urandom output to keep the bad guys guessing.

Simpler and faster to just update the global master secret.
The state is per-CPU, so mixing in has to be repeated per CPU.

With these changes, I'm satisifed that it's secure, cheap, has a
sufficiently wide state size, *and* all standard SipHash analysis applies.

The only remaining issues are:
1) How many rounds, and
2) May we use HalfSipHash?

I'd *like* to persuade you that skipping the padding byte wouldn't
invalidate any security proofs, because it's true and would simplify
the code.  But if you want 100% stock, I'm willing to cater to that.

Ted, what do you think?

^ permalink raw reply

* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
From: Jason A. Donenfeld @ 2016-12-22  4:40 UTC (permalink / raw)
  To: George Spelvin
  Cc: Andi Kleen, David Miller, David Laight, Daniel J . Bernstein,
	Eric Biggers, Eric Dumazet, Hannes Frederic Sowa,
	Jean-Philippe Aumasson, kernel-hardening,
	Linux Crypto Mailing List, LKML, Andy Lutomirski, Netdev,
	Tom Herbert, Linus Torvalds, Theodore Ts'o, Vegard Nossum
In-Reply-To: <20161222035549.10827.qmail@ns.sciencehorizons.net>

Hi George,

On Thu, Dec 22, 2016 at 4:55 AM, George Spelvin
<linux@sciencehorizons.net> wrote:
> Do we have to go through this?  No, the benchmark was *not* bogus.
> Then I replaced the kernel #includes with the necessary typedefs
> and #defines to make it compile in user-space.
> * I didn't iterate 100K times, I timed the functions *once*.
> * I saved the times in a buffer and printed them all at the end
>   so printf() wouldn't pollute the caches.
> * Before every even-numbered iteration, I flushed the I-cache
>   of everything from _init to _fini (i.e. all the non-library code).
>   This cold-cache case is what is going to happen in the kernel.

Wow! Great. Thanks for the pointers on the right way to do this. Very
helpful, and enlightening results indeed. Think you could send me the
whole .c of what you finally came up with? I'd like to run this on
some other architectures; I've got a few odd boxes laying around here.

> The P4 results were:
> SipHash actually wins slightly in the cold-cache case, because
> it iterates more.  In the hot-cache case, it loses
> Core 2 duo:
> Pretty much a tie, honestly.
> Ivy Bridge:
> Modern processors *hate* cold caches.  But notice how md5 is *faster*
> than SipHash on hot-cache IPv6.
> Ivy Bridge, -m64:
> Of course, when you compile -m64, SipHash is unbeatable.

Okay, so I think these results are consistent with some of the
assessments from before -- that SipHash is really just fine as a
replacement for MD5. Not great on older 32-bit x86, but not too
horrible, and the performance improvements on every other architecture
and the security improvements everywhere are a net good.

> Here's the modified benchmark() code.  The entire package is
> a bit voluminous for the mailing list, but anyone is welcome to it.

Please do send! I'm sure I'll learn from reading it. Thanks again for
doing the hardwork of putting something proper together.

Thanks,
Jason

^ permalink raw reply

* Re: [kernel-hardening] Re: HalfSipHash Acceptable Usage
From: George Spelvin @ 2016-12-22  3:55 UTC (permalink / raw)
  To: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, hannes,
	Jason, jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, linux, luto, netdev, tom, torvalds, tytso,
	vegard.nossum
In-Reply-To: <CAHmME9pww5Q0Wy9MtkO7PAx2Tstfp=6Og3qZLZ=Rh8NaFo0Gog@mail.gmail.com>

> Plus the benchmark was bogus anyway, and when I built a more specific
> harness -- actually comparing the TCP sequence number functions --
> SipHash was faster than MD5, even on register starved x86. So I think
> we're fine and this chapter of the discussion can come to a close, in
> order to move on to more interesting things.

Do we have to go through this?  No, the benchmark was *not* bogus.

Here's myresults from *your* benchmark.  I can't reboot some of my test
machines, so I took net/core/secure_seq.c, lib/siphash.c, lib/md5.c and
include/linux/siphash.h straight out of your test tree.

Then I replaced the kernel #includes with the necessary typedefs
and #defines to make it compile in user-space.  (Voluminous but
straightforward.)  E.g.

#define __aligned(x) __attribute__((__aligned__(x)))
#define ____cacheline_aligned __aligned(64)
#define CONFIG_INET 1
#define IS_ENABLED(x) 1
#define ktime_get_real_ns() 0
#define sysctl_tcp_timestamps 0

... etc.

Then I modified your benchmark code into the appended code.  The
differences are:
* I didn't iterate 100K times, I timed the functions *once*.
* I saved the times in a buffer and printed them all at the end
  so printf() wouldn't pollute the caches.
* Before every even-numbered iteration, I flushed the I-cache
  of everything from _init to _fini (i.e. all the non-library code).
  This cold-cache case is what is going to happen in the kernel.

In the results below, note that I did *not* re-flush between phases
of the test.  The effects of cacheing is clearly apparent in the tcpv4
results, where the tcpv6 code loaded the cache.

You can also see that the SipHash code benefits more from cacheing when
entered with a cold cache, as it iterates over the input words, while
the MD5 code is one big unrolled blob.

Order of computation is down the columns first, across second.

The P4 results were:
tcpv6 md5 cold:		4084	3488	3584	3584	3568
tcpv4 md5 cold:		1052	 996	 996	1060	 996
tcpv6 siphash cold:	4080	3296	3312	3296	3312
tcpv4 siphash cold:	2968	2748	2972	2716	2716
tcpv6 md5 hot:		 900	 712	 712	712	 712
tcpv4 md5 hot:		 632	 672	 672	672	 672
tcpv6 siphash hot:	2484	2292	2340	2340	2340
tcpv4 siphash hot:	1660	1560	1564	2340	1564

SipHash actually wins slightly in the cold-cache case, because
it iterates more.  In the hot-cache case, it loses horribly.

Core 2 duo:
tcpv6 md5 cold:		3396	2868	2964	3012	2832
tcpv4 md5 cold:		1368	1044	1320	1332	1308
tcpv6 siphash cold:	2940	2952	2916	2448	2604
tcpv4 siphash cold:	3192	2988	3576	3504	3624
tcpv6 md5 hot:		1116	1032	 996	1008	1008
tcpv4 md5 hot:		 936	 936	 936	 936	 936
tcpv6 siphash hot:	1200	1236	1236	1188	1188
tcpv4 siphash hot:	 936	 804	 804	 804	 804

Pretty much a tie, honestly.

Ivy Bridge:
tcpv6 md5 cold:		6086	6136	6962	6358	6060
tcpv4 md5 cold:		 816	 732	1046	1054	1012
tcpv6 siphash cold:	3756	1886	2152	2390	2566
tcpv4 siphash cold:	3264	2108	3026	3120	3526
tcpv6 md5 hot:		1062	 808	 824	 824	 832
tcpv4 md5 hot:		 730	 730	 740	 748	 748
tcpv6 siphash hot:	 960	 952	 936	1112	 926
tcpv4 siphash hot:	 638	 544	 562	 552	 560

Modern processors *hate* cold caches.  But notice how md5 is *faster*
than SipHash on hot-cache IPv6.

Ivy Bridge, -m64:
tcpv6 md5 cold:		4680	3672	3956	3616	3525
tcpv4 md5 cold:		1066	1416	1179	1179	1134
tcpv6 siphash cold:	 940	1258	1995	1609	2255
tcpv4 siphash cold:	1440	1269	1292	1870	1621
tcpv6 md5 hot:		1372	1111	1122	1088	1088
tcpv4 md5 hot:		 997	 997	 997	 997	 998
tcpv6 siphash hot:	 340	 340	 340	 352	 340
tcpv4 siphash hot:	 227	 238	 238	 238	 238

Of course, when you compile -m64, SipHash is unbeatable.


Here's the modified benchmark() code.  The entire package is
a bit voluminous for the mailing list, but anyone is welcome to it.

static void clflush(void)
{
	extern char const _init, _fini;
	char const *p = &_init;

	while (p < &_fini) {
		asm("clflush %0" : : "m" (*p));
		p += 64;
	}
}

typedef uint32_t cycles_t;
static cycles_t get_cycles(void)
{
	uint32_t eax, edx;
	asm volatile("rdtsc" : "=a" (eax), "=d" (edx));
	return eax;
}

static int benchmark(void)
{
	cycles_t start, finish;
	int i;
	u32 seq_number = 0;
	__be32 saddr6[4] = { 1, 4, 182, 393 }, daddr6[4] = { 9192, 18288, 2222222, 0xffffff10 };
	__be32 saddr4 = 28888, daddr4 = 182112;
	__be16 sport = 22, dport = 41992;
	u32 tsoff;
	cycles_t result[4];

	printf("seq num benchmark\n");

	for (i = 0; i < 10; i++) {
		if ((i & 1) == 0)
			clflush();

		start = get_cycles();
		seq_number += secure_tcpv6_sequence_number_md5(saddr6, daddr6, sport, dport, &tsoff);
		finish = get_cycles();
		result[0] = finish - start;

		start = get_cycles();
		seq_number += secure_tcp_sequence_number_md5(saddr4, daddr4, sport, dport, &tsoff);
		finish = get_cycles();
		result[1] = finish - start;

		start = get_cycles();
		seq_number += secure_tcpv6_sequence_number(saddr6, daddr6, sport, dport, &tsoff);
		finish = get_cycles();
		result[2] = finish - start;

		start = get_cycles();
		seq_number += secure_tcp_sequence_number(saddr4, daddr4, sport, dport, &tsoff);
		finish = get_cycles();
		result[3] = finish - start;

		printf("* Iteration %d results:\n", i);
		printf("secure_tcpv6_sequence_number_md5# cycles: %u\n", result[0]);
		printf("secure_tcp_sequence_number_md5# cycles: %u\n", result[1]);
		printf("secure_tcpv6_sequence_number_siphash# cycles: %u\n", result[2]);
		printf("secure_tcp_sequence_number_siphash# cycles: %u\n", result[3]);
		printf("benchmark result: %u\n", seq_number);
	}

	printf("benchmark result: %u\n", seq_number);
	return 0;
}
//device_initcall(benchmark);

int
main(void)
{
	memset(net_secret, 0xff, sizeof net_secret);
	memset(net_secret_md5, 0xff, sizeof net_secret_md5);
	return benchmark();
}

^ permalink raw reply

* Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22  3:12 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Andy Lutomirski, Netdev, kernel-hardening@lists.openwall.com,
	LKML, Linux Crypto Mailing List, David Laight, Ted Tso,
	Eric Dumazet, Linus Torvalds, Eric Biggers, Tom Herbert,
	Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CAHmME9phg=GuhEUaMxxv_=RexffPDqrOEhmaKffy_ZSt7bfC7g@mail.gmail.com>

On Thu, Dec 22, 2016 at 3:49 AM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> I did have two objections to this. The first was that my SipHash
> construction is faster. But in any case, they're both faster than the
> current MD5, so it's just extra rice. The second, and the more
> important one, was that batching entropy up like this means that 32
> calls will be really fast, and then the 33rd will be slow, since it
> has to do a whole ChaCha round, because get_random_bytes must be
> called to refill the batch. Since get_random_long is called for every
> process startup, I didn't really like there being inconsistent
> performance on process startup. And I'm pretty sure that one ChaCha
> whole block is slower than computing MD5, even though it lasts 32
> times as long, though I need to measure this. But maybe that's dumb in
> the end? Are these concerns that should point us toward the
> determinism (and speed) of SipHash? Are these concerns that don't
> matter and so we should roll with the simplicity of reusing ChaCha?

I ran some measurements in order to quantify what I'm talking about.
Repeatedly running md5_transform is about 2.3 times faster than
repeatedly running extract_crng. What does this mean?

One call to extract_crng gives us 32 times as many longs as one call
to md5_transform. This means that spread over 32 process creations,
chacha will be 13.9 times faster. However, every 32nd process will
take 2.3 times as long to generate its ASLR value as it would with the
old md5_transform code.

Personally, I don't think that 2.3 is a big deal. And I really like
how much this simplifies the analysis.
But if it's a big deal to you, then we can continue to discuss my
SipHash construction, which gives faster and more consistent
performance, at the cost of a more complicated and probably less
impressive security analysis.

Jason

^ permalink raw reply

* Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22  2:49 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Andy Lutomirski, Netdev, kernel-hardening@lists.openwall.com,
	LKML, Linux Crypto Mailing List, David Laight, Ted Tso,
	Eric Dumazet, Linus Torvalds, Eric Biggers, Tom Herbert,
	Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <17bd0c70-d2c1-165b-f5b2-252dfca404e8@stressinduktion.org>

Hi Andy & Hannes,

On Thu, Dec 22, 2016 at 3:07 AM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> I wonder if Ted's proposal was analyzed further in terms of performance
> if get_random_int should provide cprng alike properties?
>
> For reference: https://lkml.org/lkml/2016/12/14/351
>
> The proposal made sense to me and would completely solve the above
> mentioned problem on the cost of repeatedly reseeding from the crng.

On Thu, Dec 22, 2016 at 3:09 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> Unless I've misunderstood it, Ted's proposal causes get_random_int()
> to return bytes straight from urandom (effectively), which should make
> it very strong.  And if urandom is competitively fast now, I don't see
> the problem.  ChaCha20 is designed for speed, after all.

Funny -- while you guys were sending this back & forth, I was writing
my reply to Andy which essentially arrives at the same conclusion.
Given that we're all arriving to the same thing, and that Ted shot in
this direction long before we all did, I'm leaning toward abandoning
SipHash for the de-MD5-ification of get_random_int/long, and working
on polishing Ted's idea into something shiny for this patchset.

I did have two objections to this. The first was that my SipHash
construction is faster. But in any case, they're both faster than the
current MD5, so it's just extra rice. The second, and the more
important one, was that batching entropy up like this means that 32
calls will be really fast, and then the 33rd will be slow, since it
has to do a whole ChaCha round, because get_random_bytes must be
called to refill the batch. Since get_random_long is called for every
process startup, I didn't really like there being inconsistent
performance on process startup. And I'm pretty sure that one ChaCha
whole block is slower than computing MD5, even though it lasts 32
times as long, though I need to measure this. But maybe that's dumb in
the end? Are these concerns that should point us toward the
determinism (and speed) of SipHash? Are these concerns that don't
matter and so we should roll with the simplicity of reusing ChaCha?

Jason

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Jason A. Donenfeld @ 2016-12-22  2:40 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: George Spelvin, Ted Ts'o, Andi Kleen, David S. Miller,
	David Laight, D. J. Bernstein, Eric Biggers, Eric Dumazet,
	Hannes Frederic Sowa, Jean-Philippe Aumasson,
	kernel-hardening@lists.openwall.com, Linux Crypto Mailing List,
	linux-kernel@vger.kernel.org, Network Development, Tom Herbert,
	Linus Torvalds, Vegard Nossum

> On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin
>> After some thinking, I still like the "state-preserving" construct
>> that's equivalent to the current MD5 code.  Yes, we could just do
>> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
>> preserve a bit more.
>>
>> It requires library support from the SipHash code to return the full
>> SipHash state, but I hope that's a fair thing to ask for.

This is not a good idea. If I understand correctly, the idea here is
to just keep around SipHash's internal state variables, and chain them
over to the next call, sort of like how md5_transform with the current
code works on the same scratch space. There has been no security
analysis in the literature on this use of the primitive, and I have no
confidence that this is a secure use of the function. Unless somebody
can point me toward a paper I missed or a comment from a real
cryptographer about the specifics of SipHash, I think I'm right to
admonish against this dangerous road.

Let's talk about constructions. And let's only decide on a
construction that we're actually equipped to analyze. Let's definitely
not talk about making our own primitives, or retrofitting nice
primitive's internals into our own Frankenstein.

Alternatively, if I'm wrong, please send me an eprint/arXiv link to a
paper that discusses this use of SipHash.

^ permalink raw reply

* Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Jason A. Donenfeld @ 2016-12-22  2:31 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Netdev, kernel-hardening@lists.openwall.com, LKML,
	Linux Crypto Mailing List, David Laight, Ted Tso,
	Hannes Frederic Sowa, Eric Dumazet, Linus Torvalds, Eric Biggers,
	Tom Herbert, Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrVttVoZMvCYZcrAqM1c=YQP_nCfdfO1MsrSHjvjTFxH+A@mail.gmail.com>

Hi Andy,

On Thu, Dec 22, 2016 at 12:42 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> So this is probably good enough, and making it better is hard.  Changing it to:
>
> u64 entropy = (u64)random_get_entropy() + current->pid;
> result = siphash(..., entropy, ...);
> secret->chaining += result + entropy;
>
> would reduce this problem by forcing an attacker to brute-force the
> entropy on each iteration, which is probably an improvement.

Ahh, so that's the reasoning behind a similar suggestion of yours in a
previous email. Makes sense to me. I'll include this in the next merge
if we don't come up with a different idea before then. Your reasoning
seems good for it.

Part of what makes this process a bit goofy is that it's not all
together clear what the design goals are. Right now we're going for
"not worse than before", which we've nearly achieved. How good of an
RNG do we want? I'm willing to examine and analyze the security and
performance of all constructions we can come up with. One thing I
don't want to do, however, is start tweaking the primitives themselves
in ways not endorsed by the designers. So, I believe that precludes
things like carrying over SipHash's internal state (like what was done
with MD5), because there hasn't been a formal security analysis of
this like there has with other uses of SipHash. I also don't want to
change any internals of how SipHash actually works. I mention that
because of some of the suggestions on other threads, which make me
rather uneasy.

So with that said, while writing this reply to you, I was
simultaneously reading some other crypto code and was reminded that
there's a variant of SipHash which outputs an additional 64-bits; it's
part of the siphash reference code, which they call the "128-bit
mode". It has the benefit that we can return 64-bits to the caller and
save 64-bits for the chaining key. That way there's no correlation
between the returned secret and the chaining key, which I think would
completely alleviate all of your concerns, and simplify the analysis a
bit.

Here's what it looks like:
https://git.zx2c4.com/linux-dev/commit/?h=siphash&id=46fbe5b408e66b2d16b4447860f8083480e1c08d

The downside is that it takes 4 extra Sip rounds. This puts the
performance still better than MD5, though, and likely still better
than the other batched entropy solution. We could optimize this, I
suppose, by giving it only two parameters -- chaining,
jiffies+entropy+pid -- instead of the current three -- chaining,
jiffies, entropy+pid -- which would then shave off 2 Sip rounds. But I
liked the idea of having a bit more spread in the entropy input field.

Anyway, with this in mind, we now have three possibilities:

1. result = siphash(chaining, entropy, key); chaining += result + entropy
2. result = siphash_extra_output(chaining, entropy, key, &chaining);
3. Ted's batched entropy idea using chacha20

The more I think about this, the more I suspect that we should just
use chacha20. It will still be faster than MD5. I don't like the
non-determinism of it (some processes will start slower than others,
if the batched entropy has run out and ASLR demands more), but I guess
I can live with that. But, most importantly, it greatly simplifies
both the security analysis and what we can promise to callers about
the function. Right now in the comment documentation, we're coy with
callers about the security of the RNG. If we moved to a known
construction like chacha20/get_random_bytes_batched, then we could
just be straight up with a promise that the numbers it returns are
high quality.

Thoughts on 2 and 3, and on 1 vs 2 vs 3?

Jason

^ permalink raw reply

* Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Andy Lutomirski @ 2016-12-22  2:09 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Jason A. Donenfeld, Netdev, kernel-hardening@lists.openwall.com,
	LKML, Linux Crypto Mailing List, David Laight, Ted Tso,
	Eric Dumazet, Linus Torvalds, Eric Biggers, Tom Herbert,
	Andi Kleen, David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <17bd0c70-d2c1-165b-f5b2-252dfca404e8@stressinduktion.org>

On Wed, Dec 21, 2016 at 6:07 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> On 22.12.2016 00:42, Andy Lutomirski wrote:
>> On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>>  unsigned int get_random_int(void)
>>>  {
>>> -       __u32 *hash;
>>> -       unsigned int ret;
>>> -
>>> -       if (arch_get_random_int(&ret))
>>> -               return ret;
>>> -
>>> -       hash = get_cpu_var(get_random_int_hash);
>>> -
>>> -       hash[0] += current->pid + jiffies + random_get_entropy();
>>> -       md5_transform(hash, random_int_secret);
>>> -       ret = hash[0];
>>> -       put_cpu_var(get_random_int_hash);
>>> -
>>> -       return ret;
>>> +       unsigned int arch_result;
>>> +       u64 result;
>>> +       struct random_int_secret *secret;
>>> +
>>> +       if (arch_get_random_int(&arch_result))
>>> +               return arch_result;
>>> +
>>> +       secret = get_random_int_secret();
>>> +       result = siphash_3u64(secret->chaining, jiffies,
>>> +                             (u64)random_get_entropy() + current->pid,
>>> +                             secret->secret);
>>> +       secret->chaining += result;
>>> +       put_cpu_var(secret);
>>> +       return result;
>>>  }
>>>  EXPORT_SYMBOL(get_random_int);
>>
>> Hmm.  I haven't tried to prove anything for real.  But here goes (in
>> the random oracle model):
>>
>> Suppose I'm an attacker and I don't know the secret or the chaining
>> value.  Then, regardless of what the entropy is, I can't predict the
>> numbers.
>>
>> Now suppose I do know the secret and the chaining value due to some
>> leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
>> to find a value "result" such that prev_chaining + result = chaining
>> and result = H(prev_chaining, ..., secret);.  I don't think this can
>> be done efficiently in the random oracle model regardless of what the
>> "..." is.
>>
>> But, if I know the secret and chaining value, I can predict the next
>> output assuming I can guess the entropy.  What's worse is that, even
>> if I can't guess the entropy, if I *observe* the next output then I
>> can calculate the next chaining value.
>>
>> So this is probably good enough, and making it better is hard.  Changing it to:
>>
>> u64 entropy = (u64)random_get_entropy() + current->pid;
>> result = siphash(..., entropy, ...);
>> secret->chaining += result + entropy;
>>
>> would reduce this problem by forcing an attacker to brute-force the
>> entropy on each iteration, which is probably an improvement.
>>
>> To fully fix it, something like "catastrophic reseeding" would be
>> needed, but that's hard to get right.
>
> I wonder if Ted's proposal was analyzed further in terms of performance
> if get_random_int should provide cprng alike properties?
>
> For reference: https://lkml.org/lkml/2016/12/14/351
>
> The proposal made sense to me and would completely solve the above
> mentioned problem on the cost of repeatedly reseeding from the crng.
>

Unless I've misunderstood it, Ted's proposal causes get_random_int()
to return bytes straight from urandom (effectively), which should make
it very strong.  And if urandom is competitively fast now, I don't see
the problem.  ChaCha20 is designed for speed, after all.

^ permalink raw reply

* George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Andy Lutomirski @ 2016-12-22  2:07 UTC (permalink / raw)
  To: George Spelvin
  Cc: Ted Ts'o, Andi Kleen, David S. Miller, David Laight,
	D. J. Bernstein, Eric Biggers, Eric Dumazet, Hannes Frederic Sowa,
	Jason A. Donenfeld, Jean-Philippe Aumasson,
	kernel-hardening@lists.openwall.com, Linux Crypto Mailing List,
	linux-kernel@vger.kernel.org, Network Development, Tom Herbert,
	Linus Torvalds, Vegard Nossum

On Wed, Dec 21, 2016 at 5:13 PM, George Spelvin
<linux@sciencehorizons.net> wrote:
> As a separate message, to disentangle the threads, I'd like to
> talk about get_random_long().
>
> After some thinking, I still like the "state-preserving" construct
> that's equivalent to the current MD5 code.  Yes, we could just do
> siphash(current_cpu || per_cpu_counter, global_key), but it's nice to
> preserve a bit more.
>
> It requires library support from the SipHash code to return the full
> SipHash state, but I hope that's a fair thing to ask for.

I don't even think it needs that.  This is just adding a
non-destructive final operation, right?

>
> Here's my current straw man design for comment.  It's very similar to
> the current MD5-based design, but feeds all the seed material in the
> "correct" way, as opposed to Xring directly into the MD5 state.
>
> * Each CPU has a (Half)SipHash state vector,
>   "unsigned long get_random_int_hash[4]".  Unlike the current
>   MD5 code, we take care to initialize it to an asymmetric state.
>
> * There's a global 256-bit random_int_secret (which we could
>   reseed periodically).
>
> To generate a random number:
> * If get_random_int_hash is all-zero, seed it with fresh a half-sized
>   SipHash key and the appropriate XOR constants.
> * Generate three words of random_get_entropy(), jiffies, and current->pid.
>   (This is arbitary seed material, copied from the current code.)
> * Crank through that with (Half)SipHash-1-0.
> * Crank through the random_int_secret with (Half)SipHash-1-0.
> * Return v1 ^ v3.

Just to clarify, if we replace SipHash with a black box, I think this
effectively means, where "entropy" is random_get_entropy() || jiffies
|| current->pid:

The first call returns H(random seed || entropy_0 || secret).  The
second call returns H(random seed || entropy_0 || secret || entropy_1
|| secret).  Etc.

If not, then I have a fairly strong preference to keep whatever
construction we come up with consistent with something that could
actually happen with invocations of unmodified SipHash -- then all the
security analysis on SipHash goes through.

Anyway, I have mixed thoughts about the construction.  It manages to
have a wide state at essentially no cost, which buys us quite a bit of
work factor to break it.  Even with full knowledge of the state, an
output doesn't reveal the entropy except to the extent that it can be
brute-force (this is just whatever the appropriate extended version of
first preimage resistance gives us).  The one thing I don't like is
that I don't see how to prove that you can't run it backwards if you
manage to acquire a memory dump.  In fact, I that that there exist, at
least in theory, hash functions that are secure in the random oracle
model but that *can* be run backwards given the full state.  From
memory, SHA-3 has exactly that property, and it would be a bit sad for
a CSPRNG to be reversible.

We could also periodically mix in a big (128-bit?) chunk of fresh
urandom output to keep the bad guys guessing.

(P.S.  This kind of resembles the duplex sponge construction.  If
hardware SHA-3 ever shows up, a duplex sponge RNG might nice indeed.)

^ permalink raw reply

* Re: [PATCH v7 3/6] random: use SipHash in place of MD5
From: Hannes Frederic Sowa @ 2016-12-22  2:07 UTC (permalink / raw)
  To: Andy Lutomirski, Jason A. Donenfeld
  Cc: Netdev, kernel-hardening@lists.openwall.com, LKML,
	Linux Crypto Mailing List, David Laight, Ted Tso, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrVttVoZMvCYZcrAqM1c=YQP_nCfdfO1MsrSHjvjTFxH+A@mail.gmail.com>

On 22.12.2016 00:42, Andy Lutomirski wrote:
> On Wed, Dec 21, 2016 at 3:02 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
>>  unsigned int get_random_int(void)
>>  {
>> -       __u32 *hash;
>> -       unsigned int ret;
>> -
>> -       if (arch_get_random_int(&ret))
>> -               return ret;
>> -
>> -       hash = get_cpu_var(get_random_int_hash);
>> -
>> -       hash[0] += current->pid + jiffies + random_get_entropy();
>> -       md5_transform(hash, random_int_secret);
>> -       ret = hash[0];
>> -       put_cpu_var(get_random_int_hash);
>> -
>> -       return ret;
>> +       unsigned int arch_result;
>> +       u64 result;
>> +       struct random_int_secret *secret;
>> +
>> +       if (arch_get_random_int(&arch_result))
>> +               return arch_result;
>> +
>> +       secret = get_random_int_secret();
>> +       result = siphash_3u64(secret->chaining, jiffies,
>> +                             (u64)random_get_entropy() + current->pid,
>> +                             secret->secret);
>> +       secret->chaining += result;
>> +       put_cpu_var(secret);
>> +       return result;
>>  }
>>  EXPORT_SYMBOL(get_random_int);
> 
> Hmm.  I haven't tried to prove anything for real.  But here goes (in
> the random oracle model):
> 
> Suppose I'm an attacker and I don't know the secret or the chaining
> value.  Then, regardless of what the entropy is, I can't predict the
> numbers.
> 
> Now suppose I do know the secret and the chaining value due to some
> leak.  If I want to deduce prior outputs, I think I'm stuck: I'd need
> to find a value "result" such that prev_chaining + result = chaining
> and result = H(prev_chaining, ..., secret);.  I don't think this can
> be done efficiently in the random oracle model regardless of what the
> "..." is.
> 
> But, if I know the secret and chaining value, I can predict the next
> output assuming I can guess the entropy.  What's worse is that, even
> if I can't guess the entropy, if I *observe* the next output then I
> can calculate the next chaining value.
> 
> So this is probably good enough, and making it better is hard.  Changing it to:
> 
> u64 entropy = (u64)random_get_entropy() + current->pid;
> result = siphash(..., entropy, ...);
> secret->chaining += result + entropy;
> 
> would reduce this problem by forcing an attacker to brute-force the
> entropy on each iteration, which is probably an improvement.
> 
> To fully fix it, something like "catastrophic reseeding" would be
> needed, but that's hard to get right.

I wonder if Ted's proposal was analyzed further in terms of performance
if get_random_int should provide cprng alike properties?

For reference: https://lkml.org/lkml/2016/12/14/351

The proposal made sense to me and would completely solve the above
mentioned problem on the cost of repeatedly reseeding from the crng.

Bye,
Hannes

^ permalink raw reply

page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox