Linux cryptographic layer development
 help / color / mirror / Atom feed
* [sparc64] cryptomgr_test OOPS kernel 4.9.0+
From: Anatoly Pugachev @ 2016-12-25 17:56 UTC (permalink / raw)
  To: linux-crypto; +Cc: Sparc kernel list

Hello!

Disabling kernel config option
CRYPTO_MANAGER_DISABLE_TESTS
i.e. enable run-time self tests, makes kernel unbootable:

tested with git kernels v4.9-8648-g5cc60aeedf31 and v4.9-12259-g7c0f6ba682b9


SILO Version 1.4.14
boot:
Allocated 64 Megs of memory at 0x40000000 for kernel
Uncompressing image...
Loaded kernel version 4.9.0
Loading initial ramdisk (14000758 bytes at 0x74000000 phys, 0x40C00000 virt)...
/
[    0.000000] PROMLIB: Sun IEEE Boot Prom 'OBP 4.38.5 2016/06/22 19:36'
[    0.000000] PROMLIB: Root node compatible: sun4v
[    0.000000] Linux version 4.9.0+ (mator@ttip) (gcc version 6.2.1
20161215 (Debian 6.2.1-7) ) #38 SMP Sun Dec 25 13:35:48 MSK 2016
[    0.000000] debug: skip boot console de-registration.
[    0.000000] bootconsole [earlyprom0] enabled
[    0.000000] ARCH: SUN4V
[    0.000000] Ethernet address: 00:14:4f:fa:06:f2
[    0.000000] MM: PAGE_OFFSET is 0xfff8000000000000 (max_phys_bits == 47)
[    0.000000] MM: VMALLOC [0x0000000100000000 --> 0x0006000000000000]
[    0.000000] MM: VMEMMAP [0x0006000000000000 --> 0x000c000000000000]
[    0.000000] Kernel: Using 5 locked TLB entries for main kernel image.
[    0.000000] Remapping the kernel...
[    0.000000] done.
[    0.000000] kmemleak: Kernel memory leak detector disabled
[    0.000000] OF stdout device is: /virtual-devices@100/console@1
[    0.000000] PROM: Built device tree with 85327 bytes of memory.
[    0.000000] MDESC: Size is 35552 bytes.
[    0.000000] PLATFORM: banner-name [SPARC T5-2]
[    0.000000] PLATFORM: name [ORCL,SPARC-T5-2]
[    0.000000] PLATFORM: hostid [84fa06f2]
[    0.000000] PLATFORM: serial# [0035260e]
[    0.000000] PLATFORM: stick-frequency [3b9aca00]
[    0.000000] PLATFORM: mac-address [144ffa06f2]
[    0.000000] PLATFORM: watchdog-resolution [1000 ms]
[    0.000000] PLATFORM: watchdog-max-timeout [31536000000 ms]
[    0.000000] PLATFORM: max-cpus [1024]
[    0.000000] Top of RAM: 0x82f93a000, Total RAM: 0x7ff350000
[    0.000000] Memory hole size: 773MB
[    0.000000] Allocated 24576 bytes for kernel page tables.
[    0.000000] Zone ranges:
[    0.000000]   Normal   [mem 0x0000000030400000-0x000000082f939fff]
[    0.000000] Movable zone start for each node
[    0.000000] Early memory node ranges
[    0.000000]   node   0: [mem 0x0000000030400000-0x000000006fe7ffff]
[    0.000000]   node   0: [mem 0x000000006ff00000-0x000000006ff25fff]
[    0.000000]   node   0: [mem 0x0000000070000000-0x000000082f87ffff]
[    0.000000]   node   0: [mem 0x000000082f900000-0x000000082f921fff]
[    0.000000]   node   0: [mem 0x000000082f932000-0x000000082f939fff]
[    0.000000] Initmem setup node 0 [mem 0x0000000030400000-0x000000082f939fff]
[    0.000000] Booting Linux...
[    0.000000] CPU CAPS: [flush,stbar,swap,muldiv,v9,blkinit,n2,mul32]
[    0.000000] CPU CAPS: [div32,v8plus,popc,vis,vis2,ASIBlkInit,fmaf,vis3]
[    0.000000] CPU CAPS: [hpc,ima,pause,cbcond,aes,des,kasumi,camellia]
[    0.000000] CPU CAPS: [md5,sha1,sha256,sha512,mpmul,montmul,montsqr,crc32c]
[    0.000000] percpu: Embedded 11 pages/cpu @fff800082d000000 s46024
r8192 d35896 u131072
[    0.000000] SUN4V: Mondo queue sizes [cpu(131072) dev(16384) r(8192) nr(256)]
[    0.000000] Built 1 zonelists in Zone order, mobility grouping on.
Total pages: 4155828
[    0.000000] Kernel command line: root=/dev/vdiska2 ro
zswap.enabled=1 keep_bootcon noresume
[    0.000000] log_buf_len individual max cpu contribution: 4096 bytes
[    0.000000] log_buf_len total cpu_extra contributions: 1044480 bytes
[    0.000000] log_buf_len min size: 131072 bytes
[    0.000000] log_buf_len: 2097152 bytes
[    0.000000] early log buf free: 126208(96%)
[    0.000000] PID hash table entries: 4096 (order: 2, 32768 bytes)
[    0.000000] Dentry cache hash table entries: 4194304 (order: 12,
33554432 bytes)
[    0.000000] Inode-cache hash table entries: 2097152 (order: 11,
16777216 bytes)
[    0.000000] Sorting __ex_table...
[    0.000000] Memory: 33114224K/33541440K available (6603K kernel
code, 894K rwdata, 1824K rodata, 608K init, 9985K bss, 427216K
reserved, 0K cma-reserved)
[    0.000000] Running RCU self tests
[    0.000000] Hierarchical RCU implementation.
[    0.000000]  RCU lockdep checking is enabled.
[    0.000000]  Build-time adjustment of leaf fanout to 64.
[    0.000000] NR_IRQS:2048 nr_irqs:2048 1
[    0.000000] SUN4V: Using IRQ API major 3, cookie only virqs enabled
[11059882.082988] clocksource: stick: mask: 0xffffffffffffffff
max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[11059882.083097] clocksource: mult[800000] shift[23]
[11059882.083148] clockevent: mult[80000000] shift[31]
[11059882.085633] Console: colour dummy device 80x25
[11059882.085696] console [tty0] enabled
[11059882.085740] Lock dependency validator: Copyright (c) 2006 Red
Hat, Inc., Ingo Molnar
[11059882.085819] ... MAX_LOCKDEP_SUBCLASSES:  8
[11059882.085866] ... MAX_LOCK_DEPTH:          48
[11059882.085912] ... MAX_LOCKDEP_KEYS:        8191
[11059882.085962] ... CLASSHASH_SIZE:          4096
[11059882.086011] ... MAX_LOCKDEP_ENTRIES:     16384
[11059882.086061] ... MAX_LOCKDEP_CHAINS:      32768
[11059882.086110] ... CHAINHASH_SIZE:          16384
[11059882.086160]  memory used by lock dependency info: 5855 kB
[11059882.086221]  per task-struct memory footprint: 1920 bytes
[11059882.086971] kmemleak: Early log buffer exceeded (26295), please
increase DEBUG_KMEMLEAK_EARLY_LOG_SIZE
[11059884.094525] Calibrating delay using timer specific routine..
2006.23 BogoMIPS (lpj=4012467)
[11059884.094652] pid_max: default: 262144 minimum: 2048
[11059884.095661] Security Framework initialized
[11059884.095713] Yama: becoming mindful.
[11059884.095776] AppArmor: AppArmor disabled by boot time parameter
[11059884.096179] Mount-cache hash table entries: 65536 (order: 6, 524288 bytes)
[11059884.096261] Mountpoint-cache hash table entries: 65536 (order:
6, 524288 bytes)
[11059884.098289] ftrace: allocating 19365 entries in 38 pages
[11059884.119485] smp: Bringing up secondary CPUs ...
[11059884.173617] smp: Brought up 1 node, 32 CPUs
[11059884.179296] devtmpfs: initialized
[11059884.185352] Performance events:
[11059884.185387] Testing NMI watchdog ...
[11059884.265513] OK.
[11059884.265632] Supported PMU type is 'niagara5'
[11059884.299456] ldc.c:v1.1 (July 22, 2008)
[11059884.301371] clocksource: jiffies: mask: 0xffffffff max_cycles:
0xffffffff, max_idle_ns: 7645041785100000 ns
[11059884.306042] NET: Registered protocol family 16
[11059884.316862] VIO: Adding device channel-devices
[11059884.317163] VIO: Adding device vnet-port-0-0
[11059884.317437] VIO: Adding device vnet-port-0-1
[11059884.317694] VIO: Adding device vnet-port-0-2
[11059884.317961] VIO: Adding device vnet-port-0-3
[11059884.318852] VIO: Adding device vnet-port-0-4
[11059884.319757] VIO: Adding device vdc-port-0-0
[11059884.320640] VIO: Adding device vdc-port-1-0
[11059884.321523] VIO: Adding device vdc-port-2-0
[11059884.322420] VIO: Adding device vdc-port-3-0
[11059884.323303] VIO: Adding device vlds-port-0-0
[11059884.324207] VIO: Adding device ds-0
[11059884.393536] HugeTLB registered 8 MB page size, pre-allocated 0 pages
[11059884.403418] vgaarb: loaded
[11059884.407260] SUN4V: Reboot data supported (maj=1,min=0).
[11059884.407477] ds.c:v1.0 (Jul 11, 2007)
[11059884.407815] ds-0: ds_conn_reset() from send_events
[11059884.407968] ds-0: Registered md-update service.
[11059884.408021] ds-0: Registered domain-shutdown service.
[11059884.408090] ds-0: Registered domain-panic service.
[11059884.408155] ds-0: Registered dr-cpu service.
[11059884.408211] ds-0: Registered pri service.
[11059884.408266] ds-0: Registered var-config service.
[11059884.408413] clocksource: Switched to clocksource stick
[11059884.509681] VFS: Disk quotas dquot_6.6.0
[11059884.511529] VFS: Dquot-cache hash table entries: 1024 (order 0,
8192 bytes)
[11059884.531075] NET: Registered protocol family 2
[11059884.538440] TCP established hash table entries: 262144 (order:
8, 2097152 bytes)
[11059884.539519] TCP bind hash table entries: 65536 (order: 9, 4194304 bytes)
[11059884.550419] TCP: Hash tables configured (established 262144 bind 65536)
[11059884.550753] UDP hash table entries: 16384 (order: 8, 2621440 bytes)
[11059884.557183] UDP-Lite hash table entries: 16384 (order: 8, 2621440 bytes)
[11059884.568838] NET: Registered protocol family 1
[11059884.569346] Unpacking initramfs...
[11059884.902162] Freeing initrd memory: 13672K
[11059884.944211] futex hash table entries: 65536 (order: 10, 8388608 bytes)
[11059884.955799] audit: initializing netlink subsys (disabled)
[11059884.956191] audit: type=2000 audit(0.936:1): initialized
[11059884.958931] workingset: timestamp_bits=46 max_order=22 bucket_order=0
[11059884.960142] zbud: loaded
[11059884.979364] Unable to handle kernel paging request at virtual
address 000612000001c000
[11059884.979448] tsk->{mm,active_mm}->context = 0000000000000000
[11059884.979502] tsk->{mm,active_mm}->pgd = fff8000070002000
[11059884.979555]               \|/ ____ \|/
[11059884.979555]               "@'/ .. \`@"
[11059884.979555]               /_| \__/ |_\
[11059884.979555]                  \__U_/
[11059884.979695] cryptomgr_test(229): Oops [#1]
[11059884.979743] CPU: 27 PID: 229 Comm: cryptomgr_test Not tainted 4.9.0+ #38
[11059884.979808] task: fff800080c42c040 task.stack: fff8000808c74000
[11059884.979866] TSTATE: 0000009980001602 TPC: 0000000000745208 TNPC:
000000000074520c Y: 000000d1    Not tainted
[11059884.979966] TPC: <scatterwalk_copychunks+0xa8/0x1e0>
[11059884.980015] g0: 0000000000000000 g1: 000612000001cc20 g2:
00000000000000b8 g3: 0000000000000002
[11059884.980096] g4: fff800080c42c040 g5: fff800082c5ac000 g6:
fff8000808c74000 g7: 0000000000001b00
[11059884.980177] o0: fff8000808c77b08 o1: 00000001003a4000 o2:
0000000000000046 o3: 00000000024002c0
[11059884.980258] o4: fff8000030404920 o5: 0000000000cdec00 sp:
fff8000808c76f11 ret_pc: 0000000000000000
[11059884.980341] RPC: <          (null)>
[11059884.980405] l0: 0000000000001fff l1: 0000000000000000 l2:
0000000000baddf0 l3: 00000000000011e8
[11059884.980492] l4: 00000fff00000000 l5: 0006000000000000 l6:
0000000000d0afc0 l7: 0000000000dc13c8
[11059884.980573] i0: 00000001003a4000 i1: fff8000808c77870 i2:
0000000000000046 i3: 0000000000000000
[11059884.980651] i4: 0000000000000046 i5: 0000000000000046 i6:
fff8000808c76fc1 i7: 000000000074537c
[11059884.980734] I7: <scatterwalk_map_and_copy+0x3c/0xc0>
[11059884.980784] Call Trace:
[11059884.980814]  [000000000074537c] scatterwalk_map_and_copy+0x3c/0xc0
[11059884.980879]  [000000000074ba94] scomp_acomp_comp_decomp+0xb4/0x260
[11059884.980940]  [000000000074bc70] scomp_acomp_compress+0x10/0x20
[11059884.981000]  [0000000000751080] test_acomp+0x160/0x4c0
[11059884.981052]  [0000000000751474] alg_test_comp+0x94/0x100
[11059884.981104]  [000000000074f6fc] alg_test+0x15c/0x300
[11059884.981157]  [000000000074c748] cryptomgr_test+0x48/0x60
[11059884.982214]  [00000000004927cc] kthread+0xec/0x140
[11059884.982265]  [0000000000406084] ret_from_fork+0x1c/0x2c
[11059884.982317]  [0000000000000000]           (null)
[11059884.982364] Disabling lock debugging due to kernel taint
[11059884.982416] Caller[000000000074537c]: scatterwalk_map_and_copy+0x3c/0xc0
[11059884.982479] Caller[000000000074ba94]: scomp_acomp_comp_decomp+0xb4/0x260
[11059884.982544] Caller[000000000074bc70]: scomp_acomp_compress+0x10/0x20
[11059884.982606] Caller[0000000000751080]: test_acomp+0x160/0x4c0
[11059884.982661] Caller[0000000000751474]: alg_test_comp+0x94/0x100
[11059884.982717] Caller[000000000074f6fc]: alg_test+0x15c/0x300
[11059884.982773] Caller[000000000074c748]: cryptomgr_test+0x48/0x60
[11059884.982829] Caller[00000000004927cc]: kthread+0xec/0x140
[11059884.982881] Caller[0000000000406084]: ret_from_fork+0x1c/0x2c
[11059884.982937] Caller[0000000000000000]:           (null)
[11059884.982987] Instruction DUMP:
[11059884.982989]  c4066008
[11059884.983021]  92100018
[11059884.983050]  9410001c
[11059884.983079] <d0586040>
[11059884.983106]  82088010
[11059884.983135]  90020001
[11059884.983162]  937ec408
[11059884.983190]  40019e7a
[11059884.983219]  917ec418
[11059884.983247]
[11059884.983298] note: cryptomgr_test[229] exited with preempt_count 2

^ permalink raw reply

* [PATCH 2/2] crypto: skcipher AF_ALG - overhaul memory management
From: Stephan Müller @ 2016-12-25 17:15 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <1486189.x0AQ4O6r2j@positron.chronox.de>

The updated memory management is described in the top part of the code.
As one benefit of the changed memory management, the AIO and synchronous
operation is now implemented in one common function. The AF_ALG
operation uses the async kernel crypto API interface for each cipher
operation. Thus, the only difference between the AIO and sync operation
types visible from user space is:

1. the callback function to be invoked when the asynchronous operation
   is completed

2. whether to wait for the completion of the kernel crypto API operation
   or not

In addition, the code structure is adjusted to match the structure of
algif_aead for easier code assessment.

The user space interface changed slightly as follows: the old AIO
operation returned zero upon success and < 0 in case of an error to user
space. As all other AF_ALG interfaces (including the sync skcipher
interface) returned the number of processed bytes upon success and < 0
in case of an error, the new skcipher interface (regardless of AIO or
sync) returns the number of processed bytes in case of success.

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 crypto/algif_skcipher.c | 438 +++++++++++++++++++-----------------------------
 1 file changed, 174 insertions(+), 264 deletions(-)

diff --git a/crypto/algif_skcipher.c b/crypto/algif_skcipher.c
index a9e79d8..437e60a 100644
--- a/crypto/algif_skcipher.c
+++ b/crypto/algif_skcipher.c
@@ -10,6 +10,24 @@
  * Software Foundation; either version 2 of the License, or (at your option)
  * any later version.
  *
+ * The following concept of the memory management is used:
+ *
+ * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
+ * filled by user space with the data submitted via sendpage/sendmsg. Filling
+ * up the TX SGL does not cause a crypto operation -- the data will only be
+ * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
+ * provide a buffer which is tracked with the RX SGL.
+ *
+ * During the processing of the recvmsg operation, the cipher request is
+ * allocated and prepared. To support multiple recvmsg operations operating
+ * on one TX SGL, an offset pointer into the TX SGL is maintained. The TX SGL
+ * that is used for the crypto request is scatterwalk_ffwd by the offset
+ * pointer to obtain the start address the crypto operation shall use for
+ * the input data.
+ *
+ * After the completion of the crypto operation, the RX SGL and the cipher
+ * request is released. The TX SGL is released once all parts of it are
+ * processed.
  */
 
 #include <crypto/scatterwalk.h>
@@ -31,6 +49,22 @@ struct skcipher_sg_list {
 	struct scatterlist sg[0];
 };
 
+struct skcipher_async_rsgl {
+	struct af_alg_sgl sgl;
+	struct list_head list;
+};
+
+struct skcipher_async_req {
+	struct kiocb *iocb;
+	struct sock *sk;
+
+	struct skcipher_async_rsgl first_sgl;
+	struct list_head list;
+
+	unsigned int areqlen;
+	struct skcipher_request req;
+};
+
 struct skcipher_tfm {
 	struct crypto_skcipher *skcipher;
 	bool has_key;
@@ -44,65 +78,22 @@ struct skcipher_ctx {
 
 	struct af_alg_completion completion;
 
-	atomic_t inflight;
+	unsigned int inflight;
 	size_t used;
+	size_t processed;
 
-	unsigned int len;
 	bool more;
 	bool merge;
 	bool enc;
 
-	struct skcipher_request req;
-};
-
-struct skcipher_async_rsgl {
-	struct af_alg_sgl sgl;
-	struct list_head list;
+	unsigned int len;
 };
 
-struct skcipher_async_req {
-	struct kiocb *iocb;
-	struct skcipher_async_rsgl first_sgl;
-	struct list_head list;
-	struct scatterlist *tsg;
-	atomic_t *inflight;
-	struct skcipher_request req;
-};
+static DECLARE_WAIT_QUEUE_HEAD(skcipher_aio_finish_wait);
 
 #define MAX_SGL_ENTS ((4096 - sizeof(struct skcipher_sg_list)) / \
 		      sizeof(struct scatterlist) - 1)
 
-static void skcipher_free_async_sgls(struct skcipher_async_req *sreq)
-{
-	struct skcipher_async_rsgl *rsgl, *tmp;
-	struct scatterlist *sgl;
-	struct scatterlist *sg;
-	int i, n;
-
-	list_for_each_entry_safe(rsgl, tmp, &sreq->list, list) {
-		af_alg_free_sg(&rsgl->sgl);
-		if (rsgl != &sreq->first_sgl)
-			kfree(rsgl);
-	}
-	sgl = sreq->tsg;
-	n = sg_nents(sgl);
-	for_each_sg(sgl, sg, n, i)
-		put_page(sg_page(sg));
-
-	kfree(sreq->tsg);
-}
-
-static void skcipher_async_cb(struct crypto_async_request *req, int err)
-{
-	struct skcipher_async_req *sreq = req->data;
-	struct kiocb *iocb = sreq->iocb;
-
-	atomic_dec(sreq->inflight);
-	skcipher_free_async_sgls(sreq);
-	kzfree(sreq);
-	iocb->ki_complete(iocb, err, err);
-}
-
 static inline int skcipher_sndbuf(struct sock *sk)
 {
 	struct alg_sock *ask = alg_sk(sk);
@@ -117,7 +108,7 @@ static inline bool skcipher_writable(struct sock *sk)
 	return PAGE_SIZE <= skcipher_sndbuf(sk);
 }
 
-static int skcipher_alloc_sgl(struct sock *sk)
+static int skcipher_alloc_tsgl(struct sock *sk)
 {
 	struct alg_sock *ask = alg_sk(sk);
 	struct skcipher_ctx *ctx = ask->private;
@@ -147,7 +138,7 @@ static int skcipher_alloc_sgl(struct sock *sk)
 	return 0;
 }
 
-static void skcipher_pull_sgl(struct sock *sk, size_t used, int put)
+static void skcipher_pull_tsgl(struct sock *sk, size_t used)
 {
 	struct alg_sock *ask = alg_sk(sk);
 	struct skcipher_ctx *ctx = ask->private;
@@ -171,30 +162,35 @@ static void skcipher_pull_sgl(struct sock *sk, size_t used, int put)
 
 			used -= plen;
 			ctx->used -= plen;
+			ctx->processed -= plen;
 
 			if (sg[i].length)
 				return;
-			if (put)
-				put_page(sg_page(sg + i));
+
+			put_page(sg_page(sg + i));
 			sg_assign_page(sg + i, NULL);
 		}
 
 		list_del(&sgl->list);
-		sock_kfree_s(sk, sgl,
-			     sizeof(*sgl) + sizeof(sgl->sg[0]) *
-					    (MAX_SGL_ENTS + 1));
+		sock_kfree_s(sk, sgl, sizeof(*sgl) + sizeof(sgl->sg[0]) *
+						     (MAX_SGL_ENTS + 1));
 	}
 
 	if (!ctx->used)
 		ctx->merge = 0;
 }
 
-static void skcipher_free_sgl(struct sock *sk)
+static void skcipher_free_rsgl(struct skcipher_async_req *areq)
 {
-	struct alg_sock *ask = alg_sk(sk);
-	struct skcipher_ctx *ctx = ask->private;
+	struct sock *sk = areq->sk;
+	struct skcipher_async_rsgl *rsgl, *tmp;
 
-	skcipher_pull_sgl(sk, ctx->used, 1);
+	list_for_each_entry_safe(rsgl, tmp, &areq->list, list) {
+		af_alg_free_sg(&rsgl->sgl);
+		if (rsgl != &areq->first_sgl)
+			sock_kfree_s(sk, rsgl, sizeof(*rsgl));
+		list_del(&rsgl->list);
+	}
 }
 
 static int skcipher_wait_for_wmem(struct sock *sk, unsigned flags)
@@ -378,7 +374,7 @@ static int skcipher_sendmsg(struct socket *sock, struct msghdr *msg,
 
 		len = min_t(unsigned long, len, skcipher_sndbuf(sk));
 
-		err = skcipher_alloc_sgl(sk);
+		err = skcipher_alloc_tsgl(sk);
 		if (err)
 			goto unlock;
 
@@ -453,7 +449,7 @@ static ssize_t skcipher_sendpage(struct socket *sock, struct page *page,
 			goto unlock;
 	}
 
-	err = skcipher_alloc_sgl(sk);
+	err = skcipher_alloc_tsgl(sk);
 	if (err)
 		goto unlock;
 
@@ -479,25 +475,33 @@ static ssize_t skcipher_sendpage(struct socket *sock, struct page *page,
 	return err ?: size;
 }
 
-static int skcipher_all_sg_nents(struct skcipher_ctx *ctx)
+static void skcipher_async_cb(struct crypto_async_request *req, int err)
 {
-	struct skcipher_sg_list *sgl;
-	struct scatterlist *sg;
-	int nents = 0;
+	struct skcipher_async_req *areq = req->data;
+	struct sock *sk = areq->sk;
+	struct alg_sock *ask = alg_sk(sk);
+	struct skcipher_ctx *ctx = ask->private;
+	struct kiocb *iocb = areq->iocb;
 
-	list_for_each_entry(sgl, &ctx->tsgl, list) {
-		sg = sgl->sg;
+	lock_sock(sk);
 
-		while (!sg->length)
-			sg++;
+	BUG_ON(!ctx->inflight);
 
-		nents += sg_nents(sg);
-	}
-	return nents;
+	skcipher_free_rsgl(areq);
+	sock_kfree_s(sk, areq, areq->areqlen);
+	skcipher_pull_tsgl(sk, areq->req.cryptlen);
+	__sock_put(sk);
+	ctx->inflight--;
+
+	iocb->ki_complete(iocb, err, err);
+
+	release_sock(sk);
+
+	wake_up_interruptible(&skcipher_aio_finish_wait);
 }
 
-static int skcipher_recvmsg_async(struct socket *sock, struct msghdr *msg,
-				  int flags)
+static int skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
+			    size_t ignored, int flags)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
@@ -506,215 +510,131 @@ static int skcipher_recvmsg_async(struct socket *sock, struct msghdr *msg,
 	struct skcipher_ctx *ctx = ask->private;
 	struct skcipher_tfm *skc = pask->private;
 	struct crypto_skcipher *tfm = skc->skcipher;
-	struct skcipher_sg_list *sgl;
-	struct scatterlist *sg;
-	struct skcipher_async_req *sreq;
-	struct skcipher_request *req;
+	unsigned bs = crypto_skcipher_blocksize(tfm);
+	unsigned int areqlen = sizeof(struct skcipher_async_req) +
+			       crypto_skcipher_reqsize(tfm);
+	struct skcipher_sg_list *tsgl;
+	struct skcipher_async_req *areq;
 	struct skcipher_async_rsgl *last_rsgl = NULL;
-	unsigned int txbufs = 0, len = 0, tx_nents;
-	unsigned int reqsize = crypto_skcipher_reqsize(tfm);
-	unsigned int ivsize = crypto_skcipher_ivsize(tfm);
+	struct scatterlist tsgl_head[2], *tsgl_head_p;
 	int err = -ENOMEM;
-	bool mark = false;
-	char *iv;
-
-	sreq = kzalloc(sizeof(*sreq) + reqsize + ivsize, GFP_KERNEL);
-	if (unlikely(!sreq))
-		goto out;
-
-	req = &sreq->req;
-	iv = (char *)(req + 1) + reqsize;
-	sreq->iocb = msg->msg_iocb;
-	INIT_LIST_HEAD(&sreq->list);
-	sreq->inflight = &ctx->inflight;
+	size_t len = 0;
 
 	lock_sock(sk);
-	tx_nents = skcipher_all_sg_nents(ctx);
-	sreq->tsg = kcalloc(tx_nents, sizeof(*sg), GFP_KERNEL);
-	if (unlikely(!sreq->tsg))
+
+	/* Allocate cipher request for current operation. */
+	areq = sock_kmalloc(sk, areqlen, GFP_KERNEL);
+	if (unlikely(!areq))
 		goto unlock;
-	sg_init_table(sreq->tsg, tx_nents);
-	memcpy(iv, ctx->iv, ivsize);
-	skcipher_request_set_tfm(req, tfm);
-	skcipher_request_set_callback(req, CRYPTO_TFM_REQ_MAY_SLEEP,
-				      skcipher_async_cb, sreq);
+	areq->areqlen = areqlen;
+	areq->sk = sk;
+	INIT_LIST_HEAD(&areq->list);
 
-	while (iov_iter_count(&msg->msg_iter)) {
+ 	/* convert iovecs of output buffers into RX SGL */
+	while (len < ctx->used && iov_iter_count(&msg->msg_iter)) {
 		struct skcipher_async_rsgl *rsgl;
-		int used;
+		size_t seglen;
 
 		if (!ctx->used) {
 			err = skcipher_wait_for_data(sk, flags);
 			if (err)
 				goto free;
 		}
-		sgl = list_first_entry(&ctx->tsgl,
-				       struct skcipher_sg_list, list);
-		sg = sgl->sg;
 
-		while (!sg->length)
-			sg++;
-
-		used = min_t(unsigned long, ctx->used,
-			     iov_iter_count(&msg->msg_iter));
-		used = min_t(unsigned long, used, sg->length);
-
-		if (txbufs == tx_nents) {
-			struct scatterlist *tmp;
-			int x;
-			/* Ran out of tx slots in async request
-			 * need to expand */
-			tmp = kcalloc(tx_nents * 2, sizeof(*tmp),
-				      GFP_KERNEL);
-			if (!tmp) {
-				err = -ENOMEM;
-				goto free;
-			}
+		seglen = min_t(size_t, ctx->used,
+			       iov_iter_count(&msg->msg_iter));
 
-			sg_init_table(tmp, tx_nents * 2);
-			for (x = 0; x < tx_nents; x++)
-				sg_set_page(&tmp[x], sg_page(&sreq->tsg[x]),
-					    sreq->tsg[x].length,
-					    sreq->tsg[x].offset);
-			kfree(sreq->tsg);
-			sreq->tsg = tmp;
-			tx_nents *= 2;
-			mark = true;
-		}
-		/* Need to take over the tx sgl from ctx
-		 * to the asynch req - these sgls will be freed later */
-		sg_set_page(sreq->tsg + txbufs++, sg_page(sg), sg->length,
-			    sg->offset);
-
-		if (list_empty(&sreq->list)) {
-			rsgl = &sreq->first_sgl;
-			list_add_tail(&rsgl->list, &sreq->list);
+		if (list_empty(&areq->list)) {
+			rsgl = &areq->first_sgl;
 		} else {
-			rsgl = kmalloc(sizeof(*rsgl), GFP_KERNEL);
+			rsgl = sock_kmalloc(sk, sizeof(*rsgl), GFP_KERNEL);
 			if (!rsgl) {
 				err = -ENOMEM;
 				goto free;
 			}
-			list_add_tail(&rsgl->list, &sreq->list);
 		}
 
-		used = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, used);
-		err = used;
-		if (used < 0)
+		rsgl->sgl.npages = 0;
+		list_add_tail(&rsgl->list, &areq->list);
+
+		/* make one iovec available as scatterlist */
+		err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen);
+		if (err < 0)
 			goto free;
+
+		/* chain the new scatterlist with previous one */
 		if (last_rsgl)
 			af_alg_link_sg(&last_rsgl->sgl, &rsgl->sgl);
 
 		last_rsgl = rsgl;
-		len += used;
-		skcipher_pull_sgl(sk, used, 0);
-		iov_iter_advance(&msg->msg_iter, used);
+		len += err;
+		iov_iter_advance(&msg->msg_iter, err);
 	}
 
-	if (mark)
-		sg_mark_end(sreq->tsg + txbufs - 1);
+	/* Process only as much RX buffers for which we have TX data */
+	if (len > ctx->used)
+		len = ctx->used;
+
+	/*
+	 * If more buffers are to be expected to be processed, process only
+	 * full block size buffers.
+	 */
+	if (ctx->more || len < ctx->used)
+		len -= len % bs;
+
+	tsgl = list_first_entry(&ctx->tsgl, struct skcipher_sg_list, list);
+	/* Get the head of the SGL we want to process */
+	tsgl_head_p = scatterwalk_ffwd(tsgl_head, tsgl->sg, ctx->processed);
+	BUG_ON(!tsgl_head_p);
+
+	/* Initialize the crypto operation */
+	skcipher_request_set_tfm(&areq->req, tfm);
+	skcipher_request_set_crypt(&areq->req, tsgl_head_p,
+				   areq->first_sgl.sgl.sg, len, ctx->iv);
+
+	if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
+		/* AIO operation */
+		areq->iocb = msg->msg_iocb;
+		skcipher_request_set_callback(&areq->req,
+					      CRYPTO_TFM_REQ_MAY_SLEEP,
+					      skcipher_async_cb, areq);
+		err = ctx->enc ? crypto_skcipher_encrypt(&areq->req) :
+				 crypto_skcipher_decrypt(&areq->req);
+	} else {
+		/* Synchronous operation */
+		skcipher_request_set_callback(&areq->req,
+					      CRYPTO_TFM_REQ_MAY_SLEEP |
+					      CRYPTO_TFM_REQ_MAY_BACKLOG,
+					      af_alg_complete,
+					      &ctx->completion);
+		err = af_alg_wait_for_completion(ctx->enc ?
+					crypto_skcipher_encrypt(&areq->req) :
+					crypto_skcipher_decrypt(&areq->req),
+						 &ctx->completion);
+	}
 
-	skcipher_request_set_crypt(req, sreq->tsg, sreq->first_sgl.sgl.sg,
-				   len, iv);
-	err = ctx->enc ? crypto_skcipher_encrypt(req) :
-			 crypto_skcipher_decrypt(req);
+	/* AIO operation in progress */
 	if (err == -EINPROGRESS) {
-		atomic_inc(&ctx->inflight);
+		sock_hold(sk);
 		err = -EIOCBQUEUED;
-		sreq = NULL;
+		ctx->inflight++;
+		/* Remember the TX bytes that were processed. */
+		ctx->processed += len;
 		goto unlock;
-	}
-free:
-	skcipher_free_async_sgls(sreq);
-unlock:
-	skcipher_wmem_wakeup(sk);
-	release_sock(sk);
-	kzfree(sreq);
-out:
-	return err;
-}
-
-static int skcipher_recvmsg_sync(struct socket *sock, struct msghdr *msg,
-				 int flags)
-{
-	struct sock *sk = sock->sk;
-	struct alg_sock *ask = alg_sk(sk);
-	struct sock *psk = ask->parent;
-	struct alg_sock *pask = alg_sk(psk);
-	struct skcipher_ctx *ctx = ask->private;
-	struct skcipher_tfm *skc = pask->private;
-	struct crypto_skcipher *tfm = skc->skcipher;
-	unsigned bs = crypto_skcipher_blocksize(tfm);
-	struct skcipher_sg_list *sgl;
-	struct scatterlist *sg;
-	int err = -EAGAIN;
-	int used;
-	long copied = 0;
-
-	lock_sock(sk);
-	while (msg_data_left(msg)) {
-		if (!ctx->used) {
-			err = skcipher_wait_for_data(sk, flags);
-			if (err)
-				goto unlock;
-		}
-
-		used = min_t(unsigned long, ctx->used, msg_data_left(msg));
-
-		used = af_alg_make_sg(&ctx->rsgl, &msg->msg_iter, used);
-		err = used;
-		if (err < 0)
-			goto unlock;
-
-		if (ctx->more || used < ctx->used)
-			used -= used % bs;
-
-		err = -EINVAL;
-		if (!used)
-			goto free;
-
-		sgl = list_first_entry(&ctx->tsgl,
-				       struct skcipher_sg_list, list);
-		sg = sgl->sg;
-
-		while (!sg->length)
-			sg++;
-
-		skcipher_request_set_crypt(&ctx->req, sg, ctx->rsgl.sg, used,
-					   ctx->iv);
-
-		err = af_alg_wait_for_completion(
-				ctx->enc ?
-					crypto_skcipher_encrypt(&ctx->req) :
-					crypto_skcipher_decrypt(&ctx->req),
-				&ctx->completion);
+	} else if (!err)
+		/* Remember the TX bytes that were processed. */
+		ctx->processed += len;
 
 free:
-		af_alg_free_sg(&ctx->rsgl);
-
-		if (err)
-			goto unlock;
-
-		copied += used;
-		skcipher_pull_sgl(sk, used, 1);
-		iov_iter_advance(&msg->msg_iter, used);
-	}
-
-	err = 0;
+	skcipher_free_rsgl(areq);
+	if (areq)
+		sock_kfree_s(sk, areq, areqlen);
+	skcipher_pull_tsgl(sk, len);
 
 unlock:
 	skcipher_wmem_wakeup(sk);
 	release_sock(sk);
-
-	return copied ?: err;
-}
-
-static int skcipher_recvmsg(struct socket *sock, struct msghdr *msg,
-			    size_t ignored, int flags)
-{
-	return (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) ?
-		skcipher_recvmsg_async(sock, msg, flags) :
-		skcipher_recvmsg_sync(sock, msg, flags);
+	return err ? err : len;
 }
 
 static unsigned int skcipher_poll(struct file *file, struct socket *sock,
@@ -894,26 +814,20 @@ static int skcipher_setkey(void *private, const u8 *key, unsigned int keylen)
 	return err;
 }
 
-static void skcipher_wait(struct sock *sk)
-{
-	struct alg_sock *ask = alg_sk(sk);
-	struct skcipher_ctx *ctx = ask->private;
-	int ctr = 0;
-
-	while (atomic_read(&ctx->inflight) && ctr++ < 100)
-		msleep(100);
-}
-
 static void skcipher_sock_destruct(struct sock *sk)
 {
 	struct alg_sock *ask = alg_sk(sk);
 	struct skcipher_ctx *ctx = ask->private;
-	struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(&ctx->req);
+	struct sock *psk = ask->parent;
+	struct alg_sock *pask = alg_sk(psk);
+	struct skcipher_tfm *skc = pask->private;
+	struct crypto_skcipher *tfm = skc->skcipher;
 
-	if (atomic_read(&ctx->inflight))
-		skcipher_wait(sk);
+	/* Suspend caller if AIO operations are in flight. */
+	wait_event_interruptible(skcipher_aio_finish_wait,
+				 (ctx->inflight == 0));
 
-	skcipher_free_sgl(sk);
+	skcipher_pull_tsgl(sk, ctx->used);
 	sock_kzfree_s(sk, ctx->iv, crypto_skcipher_ivsize(tfm));
 	sock_kfree_s(sk, ctx, ctx->len);
 	af_alg_release_parent(sk);
@@ -925,7 +839,7 @@ static int skcipher_accept_parent_nokey(void *private, struct sock *sk)
 	struct alg_sock *ask = alg_sk(sk);
 	struct skcipher_tfm *tfm = private;
 	struct crypto_skcipher *skcipher = tfm->skcipher;
-	unsigned int len = sizeof(*ctx) + crypto_skcipher_reqsize(skcipher);
+	unsigned int len = sizeof(*ctx);
 
 	ctx = sock_kmalloc(sk, len, GFP_KERNEL);
 	if (!ctx)
@@ -943,19 +857,15 @@ static int skcipher_accept_parent_nokey(void *private, struct sock *sk)
 	INIT_LIST_HEAD(&ctx->tsgl);
 	ctx->len = len;
 	ctx->used = 0;
+	ctx->processed = 0;
+	ctx->inflight = 0;
 	ctx->more = 0;
 	ctx->merge = 0;
 	ctx->enc = 0;
-	atomic_set(&ctx->inflight, 0);
 	af_alg_init_completion(&ctx->completion);
 
 	ask->private = ctx;
 
-	skcipher_request_set_tfm(&ctx->req, skcipher);
-	skcipher_request_set_callback(&ctx->req, CRYPTO_TFM_REQ_MAY_SLEEP |
-						 CRYPTO_TFM_REQ_MAY_BACKLOG,
-				      af_alg_complete, &ctx->completion);
-
 	sk->sk_destruct = skcipher_sock_destruct;
 
 	return 0;
-- 
2.9.3

^ permalink raw reply related

* [PATCH 1/2] crypto: aead AF_ALG - overhaul memory management
From: Stephan Müller @ 2016-12-25 17:15 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto
In-Reply-To: <1486189.x0AQ4O6r2j@positron.chronox.de>

The updated memory management is described in the top part of the code.
As one benefit of the changed memory management, the AIO and synchronous
operation is now implemented in one common function. The AF_ALG
operation uses the async kernel crypto API interface for each cipher
operation. Thus, the only difference between the AIO and sync operation
types visible from user space is:

1. the callback function to be invoked when the asynchronous operation
   is completed

2. whether to wait for the completion of the kernel crypto API operation
   or not

Signed-off-by: Stephan Mueller <smueller@chronox.de>
---
 crypto/algif_aead.c | 439 +++++++++++++++++++++++-----------------------------
 1 file changed, 195 insertions(+), 244 deletions(-)

diff --git a/crypto/algif_aead.c b/crypto/algif_aead.c
index f849311..3ca68fb 100644
--- a/crypto/algif_aead.c
+++ b/crypto/algif_aead.c
@@ -11,6 +11,25 @@
  * under the terms of the GNU General Public License as published by the Free
  * Software Foundation; either version 2 of the License, or (at your option)
  * any later version.
+ *
+ * The following concept of the memory management is used:
+ *
+ * The kernel maintains two SGLs, the TX SGL and the RX SGL. The TX SGL is
+ * filled by user space with the data submitted via sendpage/sendmsg. Filling
+ * up the TX SGL does not cause a crypto operation -- the data will only be
+ * tracked by the kernel. Upon receipt of one recvmsg call, the caller must
+ * provide a buffer which is tracked with the RX SGL.
+ *
+ * During the processing of the recvmsg operation, the cipher request is
+ * allocated and prepared. To support multiple recvmsg operations operating
+ * on one TX SGL, an offset pointer into the TX SGL is maintained. The TX SGL
+ * that is used for the crypto request is scatterwalk_ffwd by the offset
+ * pointer to obtain the start address the crypto operation shall use for
+ * the input data.
+ *
+ * After the completion of the crypto operation, the RX SGL and the cipher
+ * request is released. The TX SGL is released once all parts of it are
+ * processed.
  */
 
 #include <crypto/internal/aead.h>
@@ -24,7 +43,7 @@
 #include <linux/net.h>
 #include <net/sock.h>
 
-struct aead_sg_list {
+struct aead_async_tsgl {
 	unsigned int cur;
 	struct scatterlist sg[ALG_MAX_PAGES];
 };
@@ -35,34 +54,38 @@ struct aead_async_rsgl {
 };
 
 struct aead_async_req {
-	struct scatterlist *tsgl;
-	struct aead_async_rsgl first_rsgl;
-	struct list_head list;
 	struct kiocb *iocb;
-	unsigned int tsgls;
-	char iv[];
+	struct sock *sk;
+
+	struct aead_async_rsgl first_rsgl;	/* First RX SG */
+	struct list_head list;			/* Track RX SGs */
+
+	unsigned int areqlen;			/* Length of this data struct */
+	struct aead_request aead_req;		/* req ctx trails this struct */
 };
 
 struct aead_ctx {
-	struct aead_sg_list tsgl;
-	struct aead_async_rsgl first_rsgl;
-	struct list_head list;
+	struct aead_async_tsgl tsgl;
 
 	void *iv;
+	size_t aead_assoclen;
 
-	struct af_alg_completion completion;
+	struct af_alg_completion completion;	/* sync work queue */
 
-	unsigned long used;
+	unsigned int inflight;			/* outstanding AIO ops */
+	size_t used;				/* TX bytes sent to kernel */
+	size_t processed;			/* number of bytes processed */
 
-	unsigned int len;
 	bool more;
 	bool merge;
 	bool enc;
 
-	size_t aead_assoclen;
-	struct aead_request aead_req;
+	unsigned int len;			/* length data structure */
+	struct crypto_aead *aead_tfm;
 };
 
+static DECLARE_WAIT_QUEUE_HEAD(aead_aio_finish_wait);
+
 static inline int aead_sndbuf(struct sock *sk)
 {
 	struct alg_sock *ask = alg_sk(sk);
@@ -79,34 +102,43 @@ static inline bool aead_writable(struct sock *sk)
 
 static inline bool aead_sufficient_data(struct aead_ctx *ctx)
 {
-	unsigned as = crypto_aead_authsize(crypto_aead_reqtfm(&ctx->aead_req));
+	unsigned as = crypto_aead_authsize(ctx->aead_tfm);
 
 	/*
 	 * The minimum amount of memory needed for an AEAD cipher is
 	 * the AAD and in case of decryption the tag.
+	 *
+	 * Also, sufficient data must be available after disregarding the
+	 * already processed data.
 	 */
-	return ctx->used >= ctx->aead_assoclen + (ctx->enc ? 0 : as);
+	return ctx->used >= ctx->aead_assoclen + (ctx->enc ? 0 : as) +
+			    ctx->processed;
 }
 
 static void aead_reset_ctx(struct aead_ctx *ctx)
 {
-	struct aead_sg_list *sgl = &ctx->tsgl;
+	struct aead_async_tsgl *sgl = &ctx->tsgl;
 
 	sg_init_table(sgl->sg, ALG_MAX_PAGES);
 	sgl->cur = 0;
 	ctx->used = 0;
+	ctx->processed = 0;
 	ctx->more = 0;
 	ctx->merge = 0;
 }
 
-static void aead_put_sgl(struct sock *sk)
+static void aead_pull_tsgl(struct sock *sk, bool force)
 {
 	struct alg_sock *ask = alg_sk(sk);
 	struct aead_ctx *ctx = ask->private;
-	struct aead_sg_list *sgl = &ctx->tsgl;
+	struct aead_async_tsgl *sgl = &ctx->tsgl;
 	struct scatterlist *sg = sgl->sg;
 	unsigned int i;
 
+	/* leave provided data in tact if we did not finish AIO work */
+	if (!force && ctx->used > ctx->processed)
+		return;
+
 	for (i = 0; i < sgl->cur; i++) {
 		if (!sg_page(sg + i))
 			continue;
@@ -117,6 +149,19 @@ static void aead_put_sgl(struct sock *sk)
 	aead_reset_ctx(ctx);
 }
 
+static void aead_free_rsgl(struct aead_async_req *areq)
+{
+	struct sock *sk = areq->sk;
+	struct aead_async_rsgl *rsgl, *tmp;
+
+	list_for_each_entry_safe(rsgl, tmp, &areq->list, list) {
+		af_alg_free_sg(&rsgl->sgl);
+		if (rsgl != &areq->first_rsgl)
+			sock_kfree_s(sk, rsgl, sizeof(*rsgl));
+		list_del(&rsgl->list);
+	}
+}
+
 static void aead_wmem_wakeup(struct sock *sk)
 {
 	struct socket_wq *wq;
@@ -189,9 +234,8 @@ static int aead_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
 	struct aead_ctx *ctx = ask->private;
-	unsigned ivsize =
-		crypto_aead_ivsize(crypto_aead_reqtfm(&ctx->aead_req));
-	struct aead_sg_list *sgl = &ctx->tsgl;
+	unsigned ivsize = crypto_aead_ivsize(ctx->aead_tfm);
+	struct aead_async_tsgl *sgl = &ctx->tsgl;
 	struct af_alg_control con = {};
 	long copied = 0;
 	bool enc = 0;
@@ -258,7 +302,7 @@ static int aead_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
 
 		if (!aead_writable(sk)) {
 			/* user space sent too much data */
-			aead_put_sgl(sk);
+			aead_pull_tsgl(sk, 1);
 			err = -EMSGSIZE;
 			goto unlock;
 		}
@@ -269,7 +313,7 @@ static int aead_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
 			size_t plen = 0;
 
 			if (sgl->cur >= ALG_MAX_PAGES) {
-				aead_put_sgl(sk);
+				aead_pull_tsgl(sk, 1);
 				err = -E2BIG;
 				goto unlock;
 			}
@@ -305,7 +349,7 @@ static int aead_sendmsg(struct socket *sock, struct msghdr *msg, size_t size)
 
 	ctx->more = msg->msg_flags & MSG_MORE;
 	if (!ctx->more && !aead_sufficient_data(ctx)) {
-		aead_put_sgl(sk);
+		aead_pull_tsgl(sk, 1);
 		err = -EMSGSIZE;
 	}
 
@@ -322,7 +366,7 @@ static ssize_t aead_sendpage(struct socket *sock, struct page *page,
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
 	struct aead_ctx *ctx = ask->private;
-	struct aead_sg_list *sgl = &ctx->tsgl;
+	struct aead_async_tsgl *sgl = &ctx->tsgl;
 	int err = -EINVAL;
 
 	if (flags & MSG_SENDPAGE_NOTLAST)
@@ -340,7 +384,7 @@ static ssize_t aead_sendpage(struct socket *sock, struct page *page,
 
 	if (!aead_writable(sk)) {
 		/* user space sent too much data */
-		aead_put_sgl(sk);
+		aead_pull_tsgl(sk, 1);
 		err = -EMSGSIZE;
 		goto unlock;
 	}
@@ -357,7 +401,7 @@ static ssize_t aead_sendpage(struct socket *sock, struct page *page,
 done:
 	ctx->more = flags & MSG_MORE;
 	if (!ctx->more && !aead_sufficient_data(ctx)) {
-		aead_put_sgl(sk);
+		aead_pull_tsgl(sk, 1);
 		err = -EMSGSIZE;
 	}
 
@@ -368,190 +412,48 @@ static ssize_t aead_sendpage(struct socket *sock, struct page *page,
 	return err ?: size;
 }
 
-#define GET_ASYM_REQ(req, tfm) (struct aead_async_req *) \
-		((char *)req + sizeof(struct aead_request) + \
-		 crypto_aead_reqsize(tfm))
-
- #define GET_REQ_SIZE(tfm) sizeof(struct aead_async_req) + \
-	crypto_aead_reqsize(tfm) + crypto_aead_ivsize(tfm) + \
-	sizeof(struct aead_request)
-
 static void aead_async_cb(struct crypto_async_request *_req, int err)
 {
-	struct sock *sk = _req->data;
+	struct aead_async_req *areq = _req->data;
+	struct sock *sk = areq->sk;
 	struct alg_sock *ask = alg_sk(sk);
 	struct aead_ctx *ctx = ask->private;
-	struct crypto_aead *tfm = crypto_aead_reqtfm(&ctx->aead_req);
-	struct aead_request *req = aead_request_cast(_req);
-	struct aead_async_req *areq = GET_ASYM_REQ(req, tfm);
-	struct scatterlist *sg = areq->tsgl;
-	struct aead_async_rsgl *rsgl;
 	struct kiocb *iocb = areq->iocb;
-	unsigned int i, reqlen = GET_REQ_SIZE(tfm);
 
-	list_for_each_entry(rsgl, &areq->list, list) {
-		af_alg_free_sg(&rsgl->sgl);
-		if (rsgl != &areq->first_rsgl)
-			sock_kfree_s(sk, rsgl, sizeof(*rsgl));
-	}
+	lock_sock(sk);
 
-	for (i = 0; i < areq->tsgls; i++)
-		put_page(sg_page(sg + i));
+	BUG_ON(!ctx->inflight);
 
-	sock_kfree_s(sk, areq->tsgl, sizeof(*areq->tsgl) * areq->tsgls);
-	sock_kfree_s(sk, req, reqlen);
+	aead_free_rsgl(areq);
+	sock_kfree_s(sk, areq, areq->areqlen);
+	aead_pull_tsgl(sk, 0);
 	__sock_put(sk);
+	ctx->inflight--;
 	iocb->ki_complete(iocb, err, err);
-}
-
-static int aead_recvmsg_async(struct socket *sock, struct msghdr *msg,
-			      int flags)
-{
-	struct sock *sk = sock->sk;
-	struct alg_sock *ask = alg_sk(sk);
-	struct aead_ctx *ctx = ask->private;
-	struct crypto_aead *tfm = crypto_aead_reqtfm(&ctx->aead_req);
-	struct aead_async_req *areq;
-	struct aead_request *req = NULL;
-	struct aead_sg_list *sgl = &ctx->tsgl;
-	struct aead_async_rsgl *last_rsgl = NULL, *rsgl;
-	unsigned int as = crypto_aead_authsize(tfm);
-	unsigned int i, reqlen = GET_REQ_SIZE(tfm);
-	int err = -ENOMEM;
-	unsigned long used;
-	size_t outlen = 0;
-	size_t usedpages = 0;
 
-	lock_sock(sk);
-	if (ctx->more) {
-		err = aead_wait_for_data(sk, flags);
-		if (err)
-			goto unlock;
-	}
-
-	if (!aead_sufficient_data(ctx))
-		goto unlock;
-
-	used = ctx->used;
-	if (ctx->enc)
-		outlen = used + as;
-	else
-		outlen = used - as;
-
-	req = sock_kmalloc(sk, reqlen, GFP_KERNEL);
-	if (unlikely(!req))
-		goto unlock;
-
-	areq = GET_ASYM_REQ(req, tfm);
-	memset(&areq->first_rsgl, '\0', sizeof(areq->first_rsgl));
-	INIT_LIST_HEAD(&areq->list);
-	areq->iocb = msg->msg_iocb;
-	memcpy(areq->iv, ctx->iv, crypto_aead_ivsize(tfm));
-	aead_request_set_tfm(req, tfm);
-	aead_request_set_ad(req, ctx->aead_assoclen);
-	aead_request_set_callback(req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-				  aead_async_cb, sk);
-	used -= ctx->aead_assoclen;
-
-	/* take over all tx sgls from ctx */
-	areq->tsgl = sock_kmalloc(sk,
-				  sizeof(*areq->tsgl) * max_t(u32, sgl->cur, 1),
-				  GFP_KERNEL);
-	if (unlikely(!areq->tsgl))
-		goto free;
-
-	sg_init_table(areq->tsgl, max_t(u32, sgl->cur, 1));
-	for (i = 0; i < sgl->cur; i++)
-		sg_set_page(&areq->tsgl[i], sg_page(&sgl->sg[i]),
-			    sgl->sg[i].length, sgl->sg[i].offset);
-
-	areq->tsgls = sgl->cur;
-
-	/* create rx sgls */
-	while (outlen > usedpages && iov_iter_count(&msg->msg_iter)) {
-		size_t seglen = min_t(size_t, iov_iter_count(&msg->msg_iter),
-				      (outlen - usedpages));
-
-		if (list_empty(&areq->list)) {
-			rsgl = &areq->first_rsgl;
-
-		} else {
-			rsgl = sock_kmalloc(sk, sizeof(*rsgl), GFP_KERNEL);
-			if (unlikely(!rsgl)) {
-				err = -ENOMEM;
-				goto free;
-			}
-		}
-		rsgl->sgl.npages = 0;
-		list_add_tail(&rsgl->list, &areq->list);
-
-		/* make one iovec available as scatterlist */
-		err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen);
-		if (err < 0)
-			goto free;
-
-		usedpages += err;
-
-		/* chain the new scatterlist with previous one */
-		if (last_rsgl)
-			af_alg_link_sg(&last_rsgl->sgl, &rsgl->sgl);
-
-		last_rsgl = rsgl;
-
-		iov_iter_advance(&msg->msg_iter, err);
-	}
-
-	/* ensure output buffer is sufficiently large */
-	if (usedpages < outlen) {
-		err = -EINVAL;
-		goto unlock;
-	}
-
-	aead_request_set_crypt(req, areq->tsgl, areq->first_rsgl.sgl.sg, used,
-			       areq->iv);
-	err = ctx->enc ? crypto_aead_encrypt(req) : crypto_aead_decrypt(req);
-	if (err) {
-		if (err == -EINPROGRESS) {
-			sock_hold(sk);
-			err = -EIOCBQUEUED;
-			aead_reset_ctx(ctx);
-			goto unlock;
-		} else if (err == -EBADMSG) {
-			aead_put_sgl(sk);
-		}
-		goto free;
-	}
-	aead_put_sgl(sk);
-
-free:
-	list_for_each_entry(rsgl, &areq->list, list) {
-		af_alg_free_sg(&rsgl->sgl);
-		if (rsgl != &areq->first_rsgl)
-			sock_kfree_s(sk, rsgl, sizeof(*rsgl));
-	}
-	if (areq->tsgl)
-		sock_kfree_s(sk, areq->tsgl, sizeof(*areq->tsgl) * areq->tsgls);
-	if (req)
-		sock_kfree_s(sk, req, reqlen);
-unlock:
-	aead_wmem_wakeup(sk);
 	release_sock(sk);
-	return err ? err : outlen;
+
+	wake_up_interruptible(&aead_aio_finish_wait);
 }
 
-static int aead_recvmsg_sync(struct socket *sock, struct msghdr *msg, int flags)
+static int aead_recvmsg(struct socket *sock, struct msghdr *msg, size_t ignored,
+			int flags)
 {
 	struct sock *sk = sock->sk;
 	struct alg_sock *ask = alg_sk(sk);
 	struct aead_ctx *ctx = ask->private;
-	unsigned as = crypto_aead_authsize(crypto_aead_reqtfm(&ctx->aead_req));
-	struct aead_sg_list *sgl = &ctx->tsgl;
+	struct crypto_aead *tfm = ctx->aead_tfm;
+	unsigned int as = crypto_aead_authsize(tfm);
+	unsigned int areqlen =
+		sizeof(struct aead_async_req) + crypto_aead_reqsize(tfm);
+	struct aead_async_tsgl *tsgl = &ctx->tsgl;
+	struct aead_async_req *areq;
 	struct aead_async_rsgl *last_rsgl = NULL;
-	struct aead_async_rsgl *rsgl, *tmp;
+	struct scatterlist tsgl_head[2], *tsgl_head_p;
 	int err = -EINVAL;
-	unsigned long used = 0;
-	size_t outlen = 0;
-	size_t usedpages = 0;
+	size_t used = 0;		/* [in]  TX bufs to be en/decrypted */
+	size_t outlen = 0;		/* [out] RX bufs produced by kernel */
+	size_t usedpages = 0;		/* [in]  RX bufs to be used from user */
 
 	lock_sock(sk);
 
@@ -566,8 +468,11 @@ static int aead_recvmsg_sync(struct socket *sock, struct msghdr *msg, int flags)
 			goto unlock;
 	}
 
-	/* data length provided by caller via sendmsg/sendpage */
-	used = ctx->used;
+	/*
+	 * Data length provided by caller via sendmsg/sendpage that has not
+	 * yet been processed.
+	 */
+	used = ctx->used - ctx->processed;
 
 	/*
 	 * Make sure sufficient data is present -- note, the same check is
@@ -600,86 +505,131 @@ static int aead_recvmsg_sync(struct socket *sock, struct msghdr *msg, int flags)
 	 */
 	used -= ctx->aead_assoclen;
 
-	/* convert iovecs of output buffers into scatterlists */
+	/* Allocate cipher request for current operation. */
+	areq = sock_kmalloc(sk, areqlen, GFP_KERNEL);
+	if (unlikely(!areq)) {
+		err = -ENOMEM;
+		goto unlock;
+	}
+	areq->areqlen = areqlen;
+	areq->sk = sk;
+	INIT_LIST_HEAD(&areq->list);
+
+	/* convert iovecs of output buffers into RX SGL */
 	while (outlen > usedpages && iov_iter_count(&msg->msg_iter)) {
+		struct aead_async_rsgl *rsgl;
 		size_t seglen = min_t(size_t, iov_iter_count(&msg->msg_iter),
 				      (outlen - usedpages));
 
-		if (list_empty(&ctx->list)) {
-			rsgl = &ctx->first_rsgl;
+		if (list_empty(&areq->list)) {
+			rsgl = &areq->first_rsgl;
 		} else {
 			rsgl = sock_kmalloc(sk, sizeof(*rsgl), GFP_KERNEL);
 			if (unlikely(!rsgl)) {
 				err = -ENOMEM;
-				goto unlock;
+				goto free;
 			}
 		}
+
 		rsgl->sgl.npages = 0;
-		list_add_tail(&rsgl->list, &ctx->list);
+		list_add_tail(&rsgl->list, &areq->list);
 
 		/* make one iovec available as scatterlist */
 		err = af_alg_make_sg(&rsgl->sgl, &msg->msg_iter, seglen);
 		if (err < 0)
-			goto unlock;
-		usedpages += err;
+			goto free;
+
 		/* chain the new scatterlist with previous one */
 		if (last_rsgl)
 			af_alg_link_sg(&last_rsgl->sgl, &rsgl->sgl);
 
 		last_rsgl = rsgl;
-
+		usedpages += err;
 		iov_iter_advance(&msg->msg_iter, err);
 	}
 
-	/* ensure output buffer is sufficiently large */
+	/*
+	 * Ensure output buffer is sufficiently large. If the caller provides
+	 * less buffer space, only use the relative required input size. This
+	 * allows AIO operation where the caller sent all data to be processed
+	 * and the AIO operation performs the operation on the different chunks
+	 * of the input data.
+	 */
 	if (usedpages < outlen) {
-		err = -EINVAL;
-		goto unlock;
-	}
+		size_t less = outlen - usedpages;
 
-	sg_mark_end(sgl->sg + sgl->cur - 1);
-	aead_request_set_crypt(&ctx->aead_req, sgl->sg, ctx->first_rsgl.sgl.sg,
-			       used, ctx->iv);
-	aead_request_set_ad(&ctx->aead_req, ctx->aead_assoclen);
+		if (used < less) {
+			err = -EINVAL;
+			goto free;
+		}
+		used -= less;
+		outlen -= less;
+	}
 
-	err = af_alg_wait_for_completion(ctx->enc ?
-					 crypto_aead_encrypt(&ctx->aead_req) :
-					 crypto_aead_decrypt(&ctx->aead_req),
+	sg_mark_end(tsgl->sg + tsgl->cur - 1);
+
+	/* Get the head of the SGL we want to process */
+	tsgl_head_p = scatterwalk_ffwd(tsgl_head, tsgl->sg, ctx->processed);
+	BUG_ON(!tsgl_head_p);
+
+	/* Initialize the crypto operation */
+	aead_request_set_crypt(&areq->aead_req, tsgl_head_p,
+			       areq->first_rsgl.sgl.sg, used, ctx->iv);
+	aead_request_set_ad(&areq->aead_req, ctx->aead_assoclen);
+	aead_request_set_tfm(&areq->aead_req, tfm);
+
+	if (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) {
+		/* AIO operation */
+		areq->iocb = msg->msg_iocb;
+		aead_request_set_callback(&areq->aead_req,
+					  CRYPTO_TFM_REQ_MAY_BACKLOG,
+					  aead_async_cb, areq);
+		err = ctx->enc ? crypto_aead_encrypt(&areq->aead_req) :
+				 crypto_aead_decrypt(&areq->aead_req);
+	} else {
+		/* Synchronous operation */
+		aead_request_set_callback(&areq->aead_req,
+					  CRYPTO_TFM_REQ_MAY_BACKLOG,
+					  af_alg_complete, &ctx->completion);
+		err = af_alg_wait_for_completion(ctx->enc ?
+					 crypto_aead_encrypt(&areq->aead_req) :
+					 crypto_aead_decrypt(&areq->aead_req),
 					 &ctx->completion);
+	}
 
 	if (err) {
-		/* EBADMSG implies a valid cipher operation took place */
-		if (err == -EBADMSG)
-			aead_put_sgl(sk);
+		/* AIO operation in progress */
+		if (err == -EINPROGRESS) {
+			sock_hold(sk);
+			err = -EIOCBQUEUED;
 
-		goto unlock;
+			/* Remember the TX bytes that were processed. */
+			ctx->processed += ctx->enc ? (outlen - as) :
+						     (outlen + as);
+			ctx->inflight++;
+
+			goto unlock;
+		}
+		/* EBADMSG implies a valid cipher operation took place */
+		else if (err != -EBADMSG)
+			goto free;
 	}
 
-	aead_put_sgl(sk);
-	err = 0;
+	/* Remember the TX bytes that were processed. */
+	ctx->processed += ctx->enc ? (outlen - as) : (outlen + as);
+
+free:
+	aead_free_rsgl(areq);
+	if (areq)
+		sock_kfree_s(sk, areq, areqlen);
+	aead_pull_tsgl(sk, 0);
 
 unlock:
-	list_for_each_entry_safe(rsgl, tmp, &ctx->list, list) {
-		af_alg_free_sg(&rsgl->sgl);
-		if (rsgl != &ctx->first_rsgl)
-			sock_kfree_s(sk, rsgl, sizeof(*rsgl));
-		list_del(&rsgl->list);
-	}
-	INIT_LIST_HEAD(&ctx->list);
 	aead_wmem_wakeup(sk);
 	release_sock(sk);
-
 	return err ? err : outlen;
 }
 
-static int aead_recvmsg(struct socket *sock, struct msghdr *msg, size_t ignored,
-			int flags)
-{
-	return (msg->msg_iocb && !is_sync_kiocb(msg->msg_iocb)) ?
-		aead_recvmsg_async(sock, msg, flags) :
-		aead_recvmsg_sync(sock, msg, flags);
-}
-
 static unsigned int aead_poll(struct file *file, struct socket *sock,
 			      poll_table *wait)
 {
@@ -746,11 +696,14 @@ static void aead_sock_destruct(struct sock *sk)
 {
 	struct alg_sock *ask = alg_sk(sk);
 	struct aead_ctx *ctx = ask->private;
-	unsigned int ivlen = crypto_aead_ivsize(
-				crypto_aead_reqtfm(&ctx->aead_req));
+	unsigned int ivlen = crypto_aead_ivsize(ctx->aead_tfm);
 
 	WARN_ON(atomic_read(&sk->sk_refcnt) != 0);
-	aead_put_sgl(sk);
+
+	/* Suspend caller if AIO operations are in flight. */
+	wait_event_interruptible(aead_aio_finish_wait, (ctx->inflight == 0));
+
+	aead_pull_tsgl(sk, 1);
 	sock_kzfree_s(sk, ctx->iv, ivlen);
 	sock_kfree_s(sk, ctx, ctx->len);
 	af_alg_release_parent(sk);
@@ -760,7 +713,7 @@ static int aead_accept_parent(void *private, struct sock *sk)
 {
 	struct aead_ctx *ctx;
 	struct alg_sock *ask = alg_sk(sk);
-	unsigned int len = sizeof(*ctx) + crypto_aead_reqsize(private);
+	unsigned int len = sizeof(*ctx);
 	unsigned int ivlen = crypto_aead_ivsize(private);
 
 	ctx = sock_kmalloc(sk, len, GFP_KERNEL);
@@ -777,20 +730,18 @@ static int aead_accept_parent(void *private, struct sock *sk)
 
 	ctx->len = len;
 	ctx->used = 0;
+	ctx->processed = 0;
 	ctx->more = 0;
 	ctx->merge = 0;
 	ctx->enc = 0;
+	ctx->inflight = 0;
 	ctx->tsgl.cur = 0;
 	ctx->aead_assoclen = 0;
 	af_alg_init_completion(&ctx->completion);
 	sg_init_table(ctx->tsgl.sg, ALG_MAX_PAGES);
-	INIT_LIST_HEAD(&ctx->list);
 
 	ask->private = ctx;
-
-	aead_request_set_tfm(&ctx->aead_req, private);
-	aead_request_set_callback(&ctx->aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
-				  af_alg_complete, &ctx->completion);
+	ctx->aead_tfm = private;
 
 	sk->sk_destruct = aead_sock_destruct;
 
-- 
2.9.3

^ permalink raw reply related

* [PATCH 0/2] crypto: AF_ALG memory management fix
From: Stephan Müller @ 2016-12-25 17:13 UTC (permalink / raw)
  To: herbert; +Cc: linux-crypto

Hi Herbert,

please find attached memory management updates to

- simplify the code: the old AIO memory management is very
  complex and seemingly very fragile -- the update now
  eliminates all reported bugs in the skcipher and AEAD
  interfaces which allowed the kernel to be crashed by
  an unprivileged user

- streamline the code: there is one code path for AIO and sync
  operation; the code between algif_skcipher and algif_aead
  is very similar (if that patch set is accepted, I volunteer
  to reduce code duplication by moving service operations
  into af_alg.c and to further unify the TX SGL handling)

- unify the AIO and sync operation which only differ in the
  kernel crypto API callback and whether to wait for the
  crypto operation or not

- fix all reported bugs regarding the handling of multiple
  IOCBs.

The following testing was performed:

- stress testing to verify that no memleaks exist

- testing using Tadeusz Struck AIO test tool (see
  https://github.com/tstruk/afalg_async_test) -- the AEAD test
  is not applicable any more due to the changed user space
  interface; the skcipher test works once the user space
  interface change is honored in the test code

- using the libkcapi test suite, all tests including the
  originally failing ones (AIO with multiple IOCBs) work now --
  the current libkcapi code artificially limits the AEAD
  operation to one IOCB. After altering the libkcapi code
  to allow multiple IOCBs, the testing works flawless.

Stephan Mueller (2):
  crypto: aead AF_ALG - overhaul memory management
  crypto: skcipher AF_ALG - overhaul memory management

 crypto/algif_aead.c     | 439 ++++++++++++++++++++
+---------------------------
 crypto/algif_skcipher.c | 438 +++++++++++++++++++----------------------------
 2 files changed, 369 insertions(+), 508 deletions(-)

-- 
2.9.3

^ permalink raw reply

* Re: [RFC PATCH 4.10 3/6] bpf: Use SHA256 instead of SHA1 for bpf digests
From: Daniel Borkmann @ 2016-12-24 19:59 UTC (permalink / raw)
  To: Andy Lutomirski, Netdev, LKML, Linux Crypto Mailing List
  Cc: Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Alexei Starovoitov
In-Reply-To: <bbfd69226fd74391045bafc695bba9a46cacca85.1482545792.git.luto@kernel.org>

On 12/24/2016 03:22 AM, Andy Lutomirski wrote:
> BPF digests are intended to be used to avoid reloading programs that
> are already loaded.  For use cases (CRIU?) where untrusted programs
> are involved, intentional hash collisions could cause the wrong BPF
> program to execute.  Additionally, if BPF digests are ever used
> in-kernel to skip verification, a hash collision could give privilege
> escalation directly.

Just for the record, digests will never ever be used to skip the
verification step, so I don't know why this idea even comes up
here (?) or is part of the changelog? As this will never be done
anyway, rather drop that part so we can avoid confusion on this?

Wrt untrusted programs, I don't see much of a use on this facility
in general for them. Something like a tail call map would quite
likely only be private to the application. And again, I really doubt
we'll have something like user namespace support in the foreseeable
future. Anyway, that said, I don't really have a big issue if you
want to switch to sha256, though.

> SHA1 is no longer considered adequately collision-resistant (see, for
> example, all the major browsers dropping support for SHA1
> certificates).  Use SHA256 instead.
>
> I moved the digest field to keep all of the bpf program metadata in
> the same cache line.
>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Alexei Starovoitov <ast@kernel.org>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>

^ permalink raw reply

* Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
From: Andy Lutomirski @ 2016-12-24 17:57 UTC (permalink / raw)
  To: Ard Biesheuvel
  Cc: Andy Lutomirski, Daniel Borkmann, Netdev, LKML,
	Linux Crypto Mailing List, Jason A. Donenfeld,
	Hannes Frederic Sowa, Alexei Starovoitov, Eric Dumazet,
	Eric Biggers, Tom Herbert, David S. Miller, Herbert Xu
In-Reply-To: <CAKv+Gu-UCqwchgbhg8VN4s=Yxk=PkjXG46NZo=8P4wAQtJ2TXw@mail.gmail.com>

On Sat, Dec 24, 2016 at 2:33 AM, Ard Biesheuvel
<ard.biesheuvel@linaro.org> wrote:
> Hi Andy,
>
> On 24 December 2016 at 02:22, Andy Lutomirski <luto@kernel.org> wrote:
>> There are some pieecs of kernel code that want to compute SHA256
>> directly without going through the crypto core.  Adjust the exported
>> API to decouple it from the crypto core.
>>
>
> There are a bunch of things happening at the same time in this patch,
> i.e., unnecessary renames of functions with static linkage, return
> type changes to the base prototypes (int (*)(...) to void (*)(...))
> and the change for the base functions to take a struct sha256_state
> ctx rather than a shash_desc. I suppose you are mainly after the
> latter, so could we please drop the other changes?
>
> For the name clashes, could we simply use the crypto_ prefix for the
> globally visible functions rather than using names that are already in
> use? (and having to go around clean up the conflicts)
> As for the return type changes, the base functions intentionally
> return int to allow tail calls from the functions exposed by the
> crypto API (whose prototypes cannot be changed). Unlikely to matter in
> the grand scheme of things (especially now that the base layer
> consists of static inline functions primarily), but it is equally
> pointless to go around and change them to return void IMO.
>
> So what remains is the struct shash_desc to struct sha256_state
> change, which makes sense given that you are after a sha256_digest()
> function that does not require the crypto API. But it seems your use
> case does not rely on incremental hashing, and so there is no reason
> for the state to be exposed outside of the implementation, and we
> could simply expose a crypto_sha256_digest() routine from the
> sha256_generic.c implementation instead.

I actually do use incremental hashing later on.   BPF currently
vmallocs() a big temporary buffer just so it can fill it and hash it.
I change it to hash as it goes.

I painted the bike shed the other way because I thought that crypto_
names should indicate that they're the versions compatible with the
crypto API, but I take your point about churn.  Part of the reason I
didn't want to use crypto_sha256_update is because that function is
currently like this:

int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
                          unsigned int len)

and I wanted to avoid churn.  The sha256_update() functions scattered
all over were static, so I didn't worry about them.

I'm going to give this another try as a split-up series that tries to
avoid making any changes beyond simple function renames to the
drivers.

>
> Also, I strongly feel that crypto and other security related patches
> should be tested before being posted, even if they are only RFC,
> especially when they are posted by high profile kernel devs like
> yourself. (Your code incorrectly calls crypto_sha2_final() in a couple
> of places, resulting in the finalization being performed twice, once
> with the accelerated block transform and again with the generic
> transform)
>

I tested it, albeit poorly.  I wanted feedback on the API (thanks!)
and I figured I could more carefully check the implementation once the
API survives a bit of review.  Since it looks like I have to rework
this, I'd need to re-test anyway.

>> I suspect this will very slightly speed up the SHA256 shash operations
>> as well by reducing the amount of indirection involved.
>>
>
> I think you have a valid point when it comes to the complexity of the
> crypto API in general. But the struct sha256_state is embedded in the
> shash_desc rather than referred to via a pointer, so the level of
> indirection does not appear to change. And given how 99.9% of the
> SHA256 execution time is spent in the block transform routine anyway,
> I expect the performance delta to be in the noise tbh.

s/very slightly/negligibly?  There's an extra speedup from avoiding a
variable-length stack allocation, but that probably doesn't matter
much either.

>
> Finally, another thing to keep in mind is that the base layers of
> SHA-1, SHA-256 and SHA-512 are intentionally structured in the same
> way. If there is a need for a digest() entry point, I'd prefer to add
> them for all flavours.

I want to get sha256 right first.  Once it's in good shape, making the
same changes to the other variants should be easy.

>
> Whether this still belongs under crypto or under lib/sha256.c as a
> library function (allowing archs to override it) is open for debate.
> If hashing BPF programs becomes a hot spot, we probably have bigger
> problems.
>
> Regards,
> Ard.
>
> P.S. I do take your point regarding the arch_sha256_block_transform()
> proposed in your follow up email, but there are some details (SIMD,
> availability of the instructions etc) that would make it only suitable
> for the generic implementation anyway, and the base layer was already
> a huge improvement compared to the open coded implementations of the
> SHA boilerplate.

Agreed, and my model may not be quite right.  It might have to be
something like:

if (arch_begin_sha256(len)) {
  ... do it with arch helpers...
  arch_end_sha256();
} else {
  ... do it generically ...
}

>> -       return sha256_base_finish(desc, out);
>> +       return crypto_sha2_final(desc, out);
>
> This is wrong: your crypto_sha2_final also calls sha256_base_do_finalize()

Ugh, right.  I clearly need to organize this change better to avoid
this kind of mistake.

^ permalink raw reply

* Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
From: Ard Biesheuvel @ 2016-12-24 10:33 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Borkmann, Netdev, LKML, Linux Crypto Mailing List,
	Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Herbert Xu
In-Reply-To: <942b91f25a63b22ec4946378a1fffe78d655cf18.1482545792.git.luto@kernel.org>

Hi Andy,

On 24 December 2016 at 02:22, Andy Lutomirski <luto@kernel.org> wrote:
> There are some pieecs of kernel code that want to compute SHA256
> directly without going through the crypto core.  Adjust the exported
> API to decouple it from the crypto core.
>

There are a bunch of things happening at the same time in this patch,
i.e., unnecessary renames of functions with static linkage, return
type changes to the base prototypes (int (*)(...) to void (*)(...))
and the change for the base functions to take a struct sha256_state
ctx rather than a shash_desc. I suppose you are mainly after the
latter, so could we please drop the other changes?

For the name clashes, could we simply use the crypto_ prefix for the
globally visible functions rather than using names that are already in
use? (and having to go around clean up the conflicts)
As for the return type changes, the base functions intentionally
return int to allow tail calls from the functions exposed by the
crypto API (whose prototypes cannot be changed). Unlikely to matter in
the grand scheme of things (especially now that the base layer
consists of static inline functions primarily), but it is equally
pointless to go around and change them to return void IMO.

So what remains is the struct shash_desc to struct sha256_state
change, which makes sense given that you are after a sha256_digest()
function that does not require the crypto API. But it seems your use
case does not rely on incremental hashing, and so there is no reason
for the state to be exposed outside of the implementation, and we
could simply expose a crypto_sha256_digest() routine from the
sha256_generic.c implementation instead.

Also, I strongly feel that crypto and other security related patches
should be tested before being posted, even if they are only RFC,
especially when they are posted by high profile kernel devs like
yourself. (Your code incorrectly calls crypto_sha2_final() in a couple
of places, resulting in the finalization being performed twice, once
with the accelerated block transform and again with the generic
transform)

> I suspect this will very slightly speed up the SHA256 shash operations
> as well by reducing the amount of indirection involved.
>

I think you have a valid point when it comes to the complexity of the
crypto API in general. But the struct sha256_state is embedded in the
shash_desc rather than referred to via a pointer, so the level of
indirection does not appear to change. And given how 99.9% of the
SHA256 execution time is spent in the block transform routine anyway,
I expect the performance delta to be in the noise tbh.

Finally, another thing to keep in mind is that the base layers of
SHA-1, SHA-256 and SHA-512 are intentionally structured in the same
way. If there is a need for a digest() entry point, I'd prefer to add
them for all flavours.

E.g, something like

"""
@@ -126,3 +128,23 @@ static inline int sha256_base_finish(struct
shash_desc *desc, u8 *out)
        *sctx = (struct sha256_state){};
        return 0;
 }
+
+static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
+{
+       unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
+       struct sha256_state *sctx = shash_desc_ctx(desc);
+
+       return __sha256_base_finish(sctx, out, digest_size);
+}
+
+static inline int sha256_base_digest(const u8 *data, unsigned int len, u8 *out,
+                                    sha256_block_fn *block_fn)
+{
+       struct sha256_state sctx;
+
+       __sha256_base_init(&sctx);
+       sha256_base_do_update(&sctx, data, len, block_fn);
+       sha256_base_do_finalize(&sctx, block_fn);
+
+       return __sha256_base_finish(&sctx, out, SHA256_DIGEST_SIZE);
+}
"""

(where __sha256_base_init() and __sha256_base_finish() are the
existing functions modified to take a struct sha256_state rather than
a shash_desc) should be sufficient to allow a generic
crypto_sha256_digest() to be composed that does not rely on the crypto
API.

Whether this still belongs under crypto or under lib/sha256.c as a
library function (allowing archs to override it) is open for debate.
If hashing BPF programs becomes a hot spot, we probably have bigger
problems.

Regards,
Ard.

P.S. I do take your point regarding the arch_sha256_block_transform()
proposed in your follow up email, but there are some details (SIMD,
availability of the instructions etc) that would make it only suitable
for the generic implementation anyway, and the base layer was already
a huge improvement compared to the open coded implementations of the
SHA boilerplate.


> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
> Cc: Herbert Xu <herbert@gondor.apana.org.au>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
>  arch/arm/crypto/sha2-ce-glue.c      | 10 ++++---
>  arch/arm/crypto/sha256_glue.c       | 23 ++++++++++-----
>  arch/arm/crypto/sha256_neon_glue.c  | 34 +++++++++++----------
>  arch/arm64/crypto/sha2-ce-glue.c    | 13 ++++----
>  arch/arm64/crypto/sha256-glue.c     | 59 +++++++++++++++++++++----------------
>  arch/x86/crypto/sha256_ssse3_glue.c | 46 +++++++++++++++++------------
>  arch/x86/purgatory/purgatory.c      |  2 +-
>  arch/x86/purgatory/sha256.c         | 25 ++--------------
>  arch/x86/purgatory/sha256.h         | 22 --------------
>  crypto/sha256_generic.c             | 50 +++++++++++++++++++++++--------
>  include/crypto/sha.h                | 29 ++++++++++++++----
>  include/crypto/sha256_base.h        | 40 ++++++++-----------------
>  12 files changed, 184 insertions(+), 169 deletions(-)
>  delete mode 100644 arch/x86/purgatory/sha256.h
>
> diff --git a/arch/arm/crypto/sha2-ce-glue.c b/arch/arm/crypto/sha2-ce-glue.c
> index 0755b2d657f3..8832c2f85591 100644
> --- a/arch/arm/crypto/sha2-ce-glue.c
> +++ b/arch/arm/crypto/sha2-ce-glue.c
> @@ -38,7 +38,7 @@ static int sha2_ce_update(struct shash_desc *desc, const u8 *data,
>                 return crypto_sha256_arm_update(desc, data, len);
>
>         kernel_neon_begin();
> -       sha256_base_do_update(desc, data, len,
> +       sha256_base_do_update(sctx, data, len,
>                               (sha256_block_fn *)sha2_ce_transform);
>         kernel_neon_end();
>
> @@ -48,17 +48,19 @@ static int sha2_ce_update(struct shash_desc *desc, const u8 *data,
>  static int sha2_ce_finup(struct shash_desc *desc, const u8 *data,
>                          unsigned int len, u8 *out)
>  {
> +       struct sha256_state *sctx = shash_desc_ctx(desc);
> +
>         if (!may_use_simd())
>                 return crypto_sha256_arm_finup(desc, data, len, out);
>
>         kernel_neon_begin();
>         if (len)
> -               sha256_base_do_update(desc, data, len,
> +               sha256_base_do_update(sctx, data, len,
>                                       (sha256_block_fn *)sha2_ce_transform);
> -       sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform);
> +       sha256_base_do_finalize(sctx, (sha256_block_fn *)sha2_ce_transform);
>         kernel_neon_end();
>
> -       return sha256_base_finish(desc, out);
> +       return crypto_sha2_final(desc, out);
>  }
>
>  static int sha2_ce_final(struct shash_desc *desc, u8 *out)
> diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
> index a84e869ef900..405a29a9a9d3 100644
> --- a/arch/arm/crypto/sha256_glue.c
> +++ b/arch/arm/crypto/sha256_glue.c
> @@ -36,27 +36,34 @@ asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
>  int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data,
>                              unsigned int len)
>  {
> +       struct sha256_state *sctx = shash_desc_ctx(desc);
> +
>         /* make sure casting to sha256_block_fn() is safe */
>         BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
>
> -       return sha256_base_do_update(desc, data, len,
> +       sha256_base_do_update(sctx, data, len,
>                                 (sha256_block_fn *)sha256_block_data_order);
> +       return 0;
>  }
>  EXPORT_SYMBOL(crypto_sha256_arm_update);
>
> -static int sha256_final(struct shash_desc *desc, u8 *out)
> +static int sha256_arm_final(struct shash_desc *desc, u8 *out)
>  {
> -       sha256_base_do_finalize(desc,
> +       struct sha256_state *sctx = shash_desc_ctx(desc);
> +
> +       sha256_base_do_finalize(sctx,
>                                 (sha256_block_fn *)sha256_block_data_order);
> -       return sha256_base_finish(desc, out);
> +       return crypto_sha2_final(desc, out);

This is wrong: your crypto_sha2_final also calls sha256_base_do_finalize()

>  }
>
>  int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data,
>                             unsigned int len, u8 *out)
>  {
> -       sha256_base_do_update(desc, data, len,
> +       struct sha256_state *sctx = shash_desc_ctx(desc);
> +
> +       sha256_base_do_update(sctx, data, len,
>                               (sha256_block_fn *)sha256_block_data_order);
> -       return sha256_final(desc, out);
> +       return crypto_sha2_final(desc, out);
>  }
>  EXPORT_SYMBOL(crypto_sha256_arm_finup);
>
> @@ -64,7 +71,7 @@ static struct shash_alg algs[] = { {
>         .digestsize     =       SHA256_DIGEST_SIZE,
>         .init           =       sha256_base_init,
>         .update         =       crypto_sha256_arm_update,
> -       .final          =       sha256_final,
> +       .final          =       sha256_arm_final,
>         .finup          =       crypto_sha256_arm_finup,
>         .descsize       =       sizeof(struct sha256_state),
>         .base           =       {
> @@ -79,7 +86,7 @@ static struct shash_alg algs[] = { {
>         .digestsize     =       SHA224_DIGEST_SIZE,
>         .init           =       sha224_base_init,
>         .update         =       crypto_sha256_arm_update,
> -       .final          =       sha256_final,
> +       .final          =       sha256_arm_final,
>         .finup          =       crypto_sha256_arm_finup,
>         .descsize       =       sizeof(struct sha256_state),
>         .base           =       {
> diff --git a/arch/arm/crypto/sha256_neon_glue.c b/arch/arm/crypto/sha256_neon_glue.c
> index 39ccd658817e..40c85d1d4c1e 100644
> --- a/arch/arm/crypto/sha256_neon_glue.c
> +++ b/arch/arm/crypto/sha256_neon_glue.c
> @@ -29,8 +29,8 @@
>  asmlinkage void sha256_block_data_order_neon(u32 *digest, const void *data,
>                                              unsigned int num_blks);
>
> -static int sha256_update(struct shash_desc *desc, const u8 *data,
> -                        unsigned int len)
> +static int sha256_neon_update(struct shash_desc *desc, const u8 *data,
> +                             unsigned int len)
>  {
>         struct sha256_state *sctx = shash_desc_ctx(desc);
>
> @@ -39,41 +39,43 @@ static int sha256_update(struct shash_desc *desc, const u8 *data,
>                 return crypto_sha256_arm_update(desc, data, len);
>
>         kernel_neon_begin();
> -       sha256_base_do_update(desc, data, len,
> +       sha256_base_do_update(sctx, data, len,
>                         (sha256_block_fn *)sha256_block_data_order_neon);
>         kernel_neon_end();
>
>         return 0;
>  }
>
> -static int sha256_finup(struct shash_desc *desc, const u8 *data,
> -                       unsigned int len, u8 *out)
> +static int sha256_neon_finup(struct shash_desc *desc, const u8 *data,
> +                            unsigned int len, u8 *out)
>  {
> +       struct sha256_state *sctx = shash_desc_ctx(desc);
> +
>         if (!may_use_simd())
>                 return crypto_sha256_arm_finup(desc, data, len, out);
>
>         kernel_neon_begin();
>         if (len)
> -               sha256_base_do_update(desc, data, len,
> +               sha256_base_do_update(sctx, data, len,
>                         (sha256_block_fn *)sha256_block_data_order_neon);
> -       sha256_base_do_finalize(desc,
> +       sha256_base_do_finalize(sctx,
>                         (sha256_block_fn *)sha256_block_data_order_neon);
>         kernel_neon_end();
>
> -       return sha256_base_finish(desc, out);
> +       return crypto_sha2_final(desc, out);
>  }
>
> -static int sha256_final(struct shash_desc *desc, u8 *out)
> +static int sha256_neon_final(struct shash_desc *desc, u8 *out)
>  {
> -       return sha256_finup(desc, NULL, 0, out);
> +       return sha256_neon_finup(desc, NULL, 0, out);
>  }
>
>  struct shash_alg sha256_neon_algs[] = { {
>         .digestsize     =       SHA256_DIGEST_SIZE,
>         .init           =       sha256_base_init,
> -       .update         =       sha256_update,
> -       .final          =       sha256_final,
> -       .finup          =       sha256_finup,
> +       .update         =       sha256_neon_update,
> +       .final          =       sha256_neon_final,
> +       .finup          =       sha256_neon_finup,
>         .descsize       =       sizeof(struct sha256_state),
>         .base           =       {
>                 .cra_name       =       "sha256",
> @@ -86,9 +88,9 @@ struct shash_alg sha256_neon_algs[] = { {
>  }, {
>         .digestsize     =       SHA224_DIGEST_SIZE,
>         .init           =       sha224_base_init,
> -       .update         =       sha256_update,
> -       .final          =       sha256_final,
> -       .finup          =       sha256_finup,
> +       .update         =       sha256_neon_update,
> +       .final          =       sha256_neon_final,
> +       .finup          =       sha256_neon_finup,
>         .descsize       =       sizeof(struct sha256_state),
>         .base           =       {
>                 .cra_name       =       "sha224",
> diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
> index 7cd587564a41..e38dd301abce 100644
> --- a/arch/arm64/crypto/sha2-ce-glue.c
> +++ b/arch/arm64/crypto/sha2-ce-glue.c
> @@ -39,7 +39,7 @@ static int sha256_ce_update(struct shash_desc *desc, const u8 *data,
>
>         sctx->finalize = 0;
>         kernel_neon_begin_partial(28);
> -       sha256_base_do_update(desc, data, len,
> +       sha256_base_do_update(&sctx->sst, data, len,
>                               (sha256_block_fn *)sha2_ce_transform);
>         kernel_neon_end();
>
> @@ -64,13 +64,13 @@ static int sha256_ce_finup(struct shash_desc *desc, const u8 *data,
>         sctx->finalize = finalize;
>
>         kernel_neon_begin_partial(28);
> -       sha256_base_do_update(desc, data, len,
> +       sha256_base_do_update(&sctx->sst, data, len,
>                               (sha256_block_fn *)sha2_ce_transform);
>         if (!finalize)
> -               sha256_base_do_finalize(desc,
> +               sha256_base_do_finalize(&sctx->sst,
>                                         (sha256_block_fn *)sha2_ce_transform);
>         kernel_neon_end();
> -       return sha256_base_finish(desc, out);
> +       return crypto_sha2_final(desc, out);
>  }
>
>  static int sha256_ce_final(struct shash_desc *desc, u8 *out)
> @@ -79,9 +79,10 @@ static int sha256_ce_final(struct shash_desc *desc, u8 *out)
>
>         sctx->finalize = 0;
>         kernel_neon_begin_partial(28);
> -       sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform);
> +       sha256_base_do_finalize(&sctx->sst,
> +                               (sha256_block_fn *)sha2_ce_transform);
>         kernel_neon_end();
> -       return sha256_base_finish(desc, out);
> +       return crypto_sha2_final(desc, out);
>  }
>
>  static struct shash_alg algs[] = { {
> diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c
> index a2226f841960..132a1ef89a71 100644
> --- a/arch/arm64/crypto/sha256-glue.c
> +++ b/arch/arm64/crypto/sha256-glue.c
> @@ -33,36 +33,39 @@ asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
>  asmlinkage void sha256_block_neon(u32 *digest, const void *data,
>                                   unsigned int num_blks);
>
> -static int sha256_update(struct shash_desc *desc, const u8 *data,
> -                        unsigned int len)
> +static int sha256_update_arm64(struct shash_desc *desc, const u8 *data,
> +                              unsigned int len)
>  {
> -       return sha256_base_do_update(desc, data, len,
> -                               (sha256_block_fn *)sha256_block_data_order);
> +       sha256_base_do_update(shash_desc_ctx(desc), data, len,
> +                             (sha256_block_fn *)sha256_block_data_order);
> +       return 0;
>  }
>
> -static int sha256_finup(struct shash_desc *desc, const u8 *data,
> -                       unsigned int len, u8 *out)
> +static int sha256_finup_arm64(struct shash_desc *desc, const u8 *data,
> +                             unsigned int len, u8 *out)
>  {
> +       struct sha256_state *sctx = shash_desc_ctx(desc);
> +
>         if (len)
> -               sha256_base_do_update(desc, data, len,
> +               sha256_base_do_update(sctx, data, len,
>                                 (sha256_block_fn *)sha256_block_data_order);
> -       sha256_base_do_finalize(desc,
> +       sha256_base_do_finalize(sctx,
>                                 (sha256_block_fn *)sha256_block_data_order);
>
> -       return sha256_base_finish(desc, out);
> +       return crypto_sha2_final(desc, out);
>  }
>
> -static int sha256_final(struct shash_desc *desc, u8 *out)
> +static int sha256_final_arm64(struct shash_desc *desc, u8 *out)
>  {
> -       return sha256_finup(desc, NULL, 0, out);
> +       return sha256_finup_arm64(desc, NULL, 0, out);
>  }
>
>  static struct shash_alg algs[] = { {
>         .digestsize             = SHA256_DIGEST_SIZE,
>         .init                   = sha256_base_init,
> -       .update                 = sha256_update,
> -       .final                  = sha256_final,
> -       .finup                  = sha256_finup,
> +       .update                 = sha256_update_arm64,
> +       .final                  = sha256_final_arm64,
> +       .finup                  = sha256_finup_arm64,
>         .descsize               = sizeof(struct sha256_state),
>         .base.cra_name          = "sha256",
>         .base.cra_driver_name   = "sha256-arm64",
> @@ -73,9 +76,9 @@ static struct shash_alg algs[] = { {
>  }, {
>         .digestsize             = SHA224_DIGEST_SIZE,
>         .init                   = sha224_base_init,
> -       .update                 = sha256_update,
> -       .final                  = sha256_final,
> -       .finup                  = sha256_finup,
> +       .update                 = sha256_update_arm64,
> +       .final                  = sha256_final_arm64,
> +       .finup                  = sha256_finup_arm64,
>         .descsize               = sizeof(struct sha256_state),
>         .base.cra_name          = "sha224",
>         .base.cra_driver_name   = "sha224-arm64",
> @@ -88,18 +91,22 @@ static struct shash_alg algs[] = { {
>  static int sha256_update_neon(struct shash_desc *desc, const u8 *data,
>                               unsigned int len)
>  {
> +       struct sha256_state *sctx = shash_desc_ctx(desc);
> +
>         /*
>          * Stacking and unstacking a substantial slice of the NEON register
>          * file may significantly affect performance for small updates when
>          * executing in interrupt context, so fall back to the scalar code
>          * in that case.
>          */
> -       if (!may_use_simd())
> -               return sha256_base_do_update(desc, data, len,
> +       if (!may_use_simd()) {
> +               sha256_base_do_update(sctx, data, len,
>                                 (sha256_block_fn *)sha256_block_data_order);
> +               return 0;
> +       }
>
>         kernel_neon_begin();
> -       sha256_base_do_update(desc, data, len,
> +       sha256_base_do_update(sctx, data, len,
>                                 (sha256_block_fn *)sha256_block_neon);
>         kernel_neon_end();
>
> @@ -109,22 +116,24 @@ static int sha256_update_neon(struct shash_desc *desc, const u8 *data,
>  static int sha256_finup_neon(struct shash_desc *desc, const u8 *data,
>                              unsigned int len, u8 *out)
>  {
> +       struct sha256_state *sctx = shash_desc_ctx(desc);
> +
>         if (!may_use_simd()) {
>                 if (len)
> -                       sha256_base_do_update(desc, data, len,
> +                       sha256_base_do_update(sctx, data, len,
>                                 (sha256_block_fn *)sha256_block_data_order);
> -               sha256_base_do_finalize(desc,
> +               sha256_base_do_finalize(sctx,
>                                 (sha256_block_fn *)sha256_block_data_order);
>         } else {
>                 kernel_neon_begin();
>                 if (len)
> -                       sha256_base_do_update(desc, data, len,
> +                       sha256_base_do_update(sctx, data, len,
>                                 (sha256_block_fn *)sha256_block_neon);
> -               sha256_base_do_finalize(desc,
> +               sha256_base_do_finalize(sctx,
>                                 (sha256_block_fn *)sha256_block_neon);
>                 kernel_neon_end();
>         }
> -       return sha256_base_finish(desc, out);
> +       return crypto_sha2_final(desc, out);
>  }
>
>  static int sha256_final_neon(struct shash_desc *desc, u8 *out)
> diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
> index 9e79baf03a4b..e722fbaf0558 100644
> --- a/arch/x86/crypto/sha256_ssse3_glue.c
> +++ b/arch/x86/crypto/sha256_ssse3_glue.c
> @@ -44,52 +44,60 @@ asmlinkage void sha256_transform_ssse3(u32 *digest, const char *data,
>                                        u64 rounds);
>  typedef void (sha256_transform_fn)(u32 *digest, const char *data, u64 rounds);
>
> -static int sha256_update(struct shash_desc *desc, const u8 *data,
> -                        unsigned int len, sha256_transform_fn *sha256_xform)
> +static int sha256_fpu_update(struct shash_desc *desc, const u8 *data,
> +                              unsigned int len,
> +                              sha256_transform_fn *sha256_xform)
>  {
>         struct sha256_state *sctx = shash_desc_ctx(desc);
>
>         if (!irq_fpu_usable() ||
> -           (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
> -               return crypto_sha256_update(desc, data, len);
> +           (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE) {
> +               sha256_update(sctx, data, len);
> +               return 0;
> +       }
>
>         /* make sure casting to sha256_block_fn() is safe */
>         BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
>
>         kernel_fpu_begin();
> -       sha256_base_do_update(desc, data, len,
> +       sha256_base_do_update(sctx, data, len,
>                               (sha256_block_fn *)sha256_xform);
>         kernel_fpu_end();
>
>         return 0;
>  }
>
> -static int sha256_finup(struct shash_desc *desc, const u8 *data,
> +static int sha256_fpu_finup(struct shash_desc *desc, const u8 *data,
>               unsigned int len, u8 *out, sha256_transform_fn *sha256_xform)
>  {
> -       if (!irq_fpu_usable())
> -               return crypto_sha256_finup(desc, data, len, out);
> +       struct sha256_state *sctx = shash_desc_ctx(desc);
> +
> +       if (!irq_fpu_usable()) {
> +               sha256_finup(sctx, data, len, out);
> +               return 0;
> +       }
>
>         kernel_fpu_begin();
>         if (len)
> -               sha256_base_do_update(desc, data, len,
> +               sha256_base_do_update(sctx, data, len,
>                                       (sha256_block_fn *)sha256_xform);
> -       sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_xform);
> +       sha256_base_do_finalize(sctx, (sha256_block_fn *)sha256_xform);
>         kernel_fpu_end();
>
> -       return sha256_base_finish(desc, out);
> +       crypto_sha2_final(desc, out);
> +       return 0;
>  }
>
>  static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
>                          unsigned int len)
>  {
> -       return sha256_update(desc, data, len, sha256_transform_ssse3);
> +       return sha256_fpu_update(desc, data, len, sha256_transform_ssse3);
>  }
>
>  static int sha256_ssse3_finup(struct shash_desc *desc, const u8 *data,
>               unsigned int len, u8 *out)
>  {
> -       return sha256_finup(desc, data, len, out, sha256_transform_ssse3);
> +       return sha256_fpu_finup(desc, data, len, out, sha256_transform_ssse3);
>  }
>
>  /* Add padding and return the message digest. */
> @@ -152,13 +160,13 @@ asmlinkage void sha256_transform_avx(u32 *digest, const char *data,
>  static int sha256_avx_update(struct shash_desc *desc, const u8 *data,
>                          unsigned int len)
>  {
> -       return sha256_update(desc, data, len, sha256_transform_avx);
> +       return sha256_fpu_update(desc, data, len, sha256_transform_avx);
>  }
>
>  static int sha256_avx_finup(struct shash_desc *desc, const u8 *data,
>                       unsigned int len, u8 *out)
>  {
> -       return sha256_finup(desc, data, len, out, sha256_transform_avx);
> +       return sha256_fpu_finup(desc, data, len, out, sha256_transform_avx);
>  }
>
>  static int sha256_avx_final(struct shash_desc *desc, u8 *out)
> @@ -236,13 +244,13 @@ asmlinkage void sha256_transform_rorx(u32 *digest, const char *data,
>  static int sha256_avx2_update(struct shash_desc *desc, const u8 *data,
>                          unsigned int len)
>  {
> -       return sha256_update(desc, data, len, sha256_transform_rorx);
> +       return sha256_fpu_update(desc, data, len, sha256_transform_rorx);
>  }
>
>  static int sha256_avx2_finup(struct shash_desc *desc, const u8 *data,
>                       unsigned int len, u8 *out)
>  {
> -       return sha256_finup(desc, data, len, out, sha256_transform_rorx);
> +       return sha256_fpu_finup(desc, data, len, out, sha256_transform_rorx);
>  }
>
>  static int sha256_avx2_final(struct shash_desc *desc, u8 *out)
> @@ -318,13 +326,13 @@ asmlinkage void sha256_ni_transform(u32 *digest, const char *data,
>  static int sha256_ni_update(struct shash_desc *desc, const u8 *data,
>                          unsigned int len)
>  {
> -       return sha256_update(desc, data, len, sha256_ni_transform);
> +       return sha256_fpu_update(desc, data, len, sha256_ni_transform);
>  }
>
>  static int sha256_ni_finup(struct shash_desc *desc, const u8 *data,
>                       unsigned int len, u8 *out)
>  {
> -       return sha256_finup(desc, data, len, out, sha256_ni_transform);
> +       return sha256_fpu_finup(desc, data, len, out, sha256_ni_transform);
>  }
>
>  static int sha256_ni_final(struct shash_desc *desc, u8 *out)
> diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
> index 25e068ba3382..ed6e80b844cf 100644
> --- a/arch/x86/purgatory/purgatory.c
> +++ b/arch/x86/purgatory/purgatory.c
> @@ -10,7 +10,7 @@
>   * Version 2.  See the file COPYING for more details.
>   */
>
> -#include "sha256.h"
> +#include <crypto/sha.h>
>  #include "../boot/string.h"
>
>  struct sha_region {
> diff --git a/arch/x86/purgatory/sha256.c b/arch/x86/purgatory/sha256.c
> index 548ca675a14a..724925d5da61 100644
> --- a/arch/x86/purgatory/sha256.c
> +++ b/arch/x86/purgatory/sha256.c
> @@ -17,7 +17,7 @@
>
>  #include <linux/bitops.h>
>  #include <asm/byteorder.h>
> -#include "sha256.h"
> +#include <crypto/sha.h>
>  #include "../boot/string.h"
>
>  static inline u32 Ch(u32 x, u32 y, u32 z)
> @@ -208,22 +208,7 @@ static void sha256_transform(u32 *state, const u8 *input)
>         memset(W, 0, 64 * sizeof(u32));
>  }
>
> -int sha256_init(struct sha256_state *sctx)
> -{
> -       sctx->state[0] = SHA256_H0;
> -       sctx->state[1] = SHA256_H1;
> -       sctx->state[2] = SHA256_H2;
> -       sctx->state[3] = SHA256_H3;
> -       sctx->state[4] = SHA256_H4;
> -       sctx->state[5] = SHA256_H5;
> -       sctx->state[6] = SHA256_H6;
> -       sctx->state[7] = SHA256_H7;
> -       sctx->count = 0;
> -
> -       return 0;
> -}
> -
> -int sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len)
> +void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len)
>  {
>         unsigned int partial, done;
>         const u8 *src;
> @@ -249,11 +234,9 @@ int sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len)
>                 partial = 0;
>         }
>         memcpy(sctx->buf + partial, src, len - done);
> -
> -       return 0;
>  }
>
> -int sha256_final(struct sha256_state *sctx, u8 *out)
> +void sha256_final(struct sha256_state *sctx, u8 *out)
>  {
>         __be32 *dst = (__be32 *)out;
>         __be64 bits;
> @@ -278,6 +261,4 @@ int sha256_final(struct sha256_state *sctx, u8 *out)
>
>         /* Zeroize sensitive information. */
>         memset(sctx, 0, sizeof(*sctx));
> -
> -       return 0;
>  }
> diff --git a/arch/x86/purgatory/sha256.h b/arch/x86/purgatory/sha256.h
> deleted file mode 100644
> index bd15a4127735..000000000000
> --- a/arch/x86/purgatory/sha256.h
> +++ /dev/null
> @@ -1,22 +0,0 @@
> -/*
> - *  Copyright (C) 2014 Red Hat Inc.
> - *
> - *  Author: Vivek Goyal <vgoyal@redhat.com>
> - *
> - * This source code is licensed under the GNU General Public License,
> - * Version 2.  See the file COPYING for more details.
> - */
> -
> -#ifndef SHA256_H
> -#define SHA256_H
> -
> -
> -#include <linux/types.h>
> -#include <crypto/sha.h>
> -
> -extern int sha256_init(struct sha256_state *sctx);
> -extern int sha256_update(struct sha256_state *sctx, const u8 *input,
> -                               unsigned int length);
> -extern int sha256_final(struct sha256_state *sctx, u8 *hash);
> -
> -#endif /* SHA256_H */
> diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
> index 8f9c47e1a96e..f2747893402c 100644
> --- a/crypto/sha256_generic.c
> +++ b/crypto/sha256_generic.c
> @@ -231,6 +231,13 @@ static void sha256_transform(u32 *state, const u8 *input)
>         memzero_explicit(W, 64 * sizeof(u32));
>  }
>
> +int sha256_base_init(struct shash_desc *desc)
> +{
> +       sha256_init(shash_desc_ctx(desc));
> +       return 0;
> +}
> +EXPORT_SYMBOL(sha256_base_init);
> +
>  static void sha256_generic_block_fn(struct sha256_state *sst, u8 const *src,
>                                     int blocks)
>  {
> @@ -240,32 +247,49 @@ static void sha256_generic_block_fn(struct sha256_state *sst, u8 const *src,
>         }
>  }
>
> -int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
> +void sha256_update(struct sha256_state *sctx, const u8 *data,
>                           unsigned int len)
>  {
> -       return sha256_base_do_update(desc, data, len, sha256_generic_block_fn);
> +       sha256_base_do_update(sctx, data, len, sha256_generic_block_fn);
> +}
> +EXPORT_SYMBOL(sha256_update);
> +
> +void sha256_final(struct sha256_state *sctx, u8 *out)
> +{
> +       sha256_base_do_finalize(sctx, sha256_generic_block_fn);
> +       sha256_base_finish(sctx, out);
>  }
> -EXPORT_SYMBOL(crypto_sha256_update);
> +EXPORT_SYMBOL(sha256_final);
>
> -static int sha256_final(struct shash_desc *desc, u8 *out)
> +static int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
> +                               unsigned int len)
>  {
> -       sha256_base_do_finalize(desc, sha256_generic_block_fn);
> -       return sha256_base_finish(desc, out);
> +       sha256_update(shash_desc_ctx(desc), data, len);
> +       return 0;
> +}
> +
> +int crypto_sha2_final(struct shash_desc *desc, u8 *out)
> +{
> +       struct sha256_state *sctx = shash_desc_ctx(desc);
> +
> +       sha256_base_do_finalize(sctx, sha256_generic_block_fn);
> +       sha2_base_finish(sctx, crypto_shash_digestsize(desc->tfm), out);
> +       return 0;
>  }
> +EXPORT_SYMBOL(crypto_sha2_final);
>
> -int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
> -                       unsigned int len, u8 *hash)
> +static int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
> +                              unsigned int len, u8 *hash)
>  {
> -       sha256_base_do_update(desc, data, len, sha256_generic_block_fn);
> -       return sha256_final(desc, hash);
> +       sha256_finup(shash_desc_ctx(desc), data, len, hash);
> +       return 0;
>  }
> -EXPORT_SYMBOL(crypto_sha256_finup);
>
>  static struct shash_alg sha256_algs[2] = { {
>         .digestsize     =       SHA256_DIGEST_SIZE,
>         .init           =       sha256_base_init,
>         .update         =       crypto_sha256_update,
> -       .final          =       sha256_final,
> +       .final          =       crypto_sha2_final,
>         .finup          =       crypto_sha256_finup,
>         .descsize       =       sizeof(struct sha256_state),
>         .base           =       {
> @@ -279,7 +303,7 @@ static struct shash_alg sha256_algs[2] = { {
>         .digestsize     =       SHA224_DIGEST_SIZE,
>         .init           =       sha224_base_init,
>         .update         =       crypto_sha256_update,
> -       .final          =       sha256_final,
> +       .final          =       crypto_sha2_final,
>         .finup          =       crypto_sha256_finup,
>         .descsize       =       sizeof(struct sha256_state),
>         .base           =       {
> diff --git a/include/crypto/sha.h b/include/crypto/sha.h
> index c94d3eb1cefd..2b6978471605 100644
> --- a/include/crypto/sha.h
> +++ b/include/crypto/sha.h
> @@ -96,11 +96,30 @@ extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
>  extern int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
>                              unsigned int len, u8 *hash);
>
> -extern int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
> -                             unsigned int len);
> -
> -extern int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
> -                              unsigned int len, u8 *hash);
> +static inline void sha256_init(struct sha256_state *sctx)
> +{
> +       sctx->state[0] = SHA256_H0;
> +       sctx->state[1] = SHA256_H1;
> +       sctx->state[2] = SHA256_H2;
> +       sctx->state[3] = SHA256_H3;
> +       sctx->state[4] = SHA256_H4;
> +       sctx->state[5] = SHA256_H5;
> +       sctx->state[6] = SHA256_H6;
> +       sctx->state[7] = SHA256_H7;
> +       sctx->count = 0;
> +}
> +
> +extern void sha256_update(struct sha256_state *sctx, const u8 *data,
> +                         unsigned int len);
> +
> +extern void sha256_final(struct sha256_state *sctx, u8 *out);
> +
> +static inline void sha256_finup(struct sha256_state *sctx, const u8 *data,
> +                               unsigned int len, u8 *hash)
> +{
> +       sha256_update(sctx, data, len);
> +       sha256_final(sctx, hash);
> +}
>
>  extern int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
>                               unsigned int len);
> diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h
> index d1f2195bb7de..f65d9a516b36 100644
> --- a/include/crypto/sha256_base.h
> +++ b/include/crypto/sha256_base.h
> @@ -35,29 +35,13 @@ static inline int sha224_base_init(struct shash_desc *desc)
>         return 0;
>  }
>
> -static inline int sha256_base_init(struct shash_desc *desc)
> -{
> -       struct sha256_state *sctx = shash_desc_ctx(desc);
> -
> -       sctx->state[0] = SHA256_H0;
> -       sctx->state[1] = SHA256_H1;
> -       sctx->state[2] = SHA256_H2;
> -       sctx->state[3] = SHA256_H3;
> -       sctx->state[4] = SHA256_H4;
> -       sctx->state[5] = SHA256_H5;
> -       sctx->state[6] = SHA256_H6;
> -       sctx->state[7] = SHA256_H7;
> -       sctx->count = 0;
> -
> -       return 0;
> -}
> +extern int sha256_base_init(struct shash_desc *desc);
>
> -static inline int sha256_base_do_update(struct shash_desc *desc,
> +static inline void sha256_base_do_update(struct sha256_state *sctx,
>                                         const u8 *data,
>                                         unsigned int len,
>                                         sha256_block_fn *block_fn)
>  {
> -       struct sha256_state *sctx = shash_desc_ctx(desc);
>         unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
>
>         sctx->count += len;
> @@ -86,15 +70,12 @@ static inline int sha256_base_do_update(struct shash_desc *desc,
>         }
>         if (len)
>                 memcpy(sctx->buf + partial, data, len);
> -
> -       return 0;
>  }
>
> -static inline int sha256_base_do_finalize(struct shash_desc *desc,
> +static inline void sha256_base_do_finalize(struct sha256_state *sctx,
>                                           sha256_block_fn *block_fn)
>  {
>         const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
> -       struct sha256_state *sctx = shash_desc_ctx(desc);
>         __be64 *bits = (__be64 *)(sctx->buf + bit_offset);
>         unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
>
> @@ -109,14 +90,11 @@ static inline int sha256_base_do_finalize(struct shash_desc *desc,
>         memset(sctx->buf + partial, 0x0, bit_offset - partial);
>         *bits = cpu_to_be64(sctx->count << 3);
>         block_fn(sctx, sctx->buf, 1);
> -
> -       return 0;
>  }
>
> -static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
> +static inline void sha2_base_finish(struct sha256_state *sctx,
> +                                   unsigned int digest_size, u8 *out)
>  {
> -       unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
> -       struct sha256_state *sctx = shash_desc_ctx(desc);
>         __be32 *digest = (__be32 *)out;
>         int i;
>
> @@ -124,5 +102,11 @@ static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
>                 put_unaligned_be32(sctx->state[i], digest++);
>
>         *sctx = (struct sha256_state){};
> -       return 0;
>  }
> +
> +static inline void sha256_base_finish(struct sha256_state *sctx, u8 *out)
> +{
> +       sha2_base_finish(sctx, SHA256_DIGEST_SIZE, out);
> +}
> +
> +extern int crypto_sha2_final(struct shash_desc *desc, u8 *out);
> --
> 2.9.3
>

^ permalink raw reply

* Re: [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
From: Andy Lutomirski @ 2016-12-24  2:26 UTC (permalink / raw)
  To: Andy Lutomirski
  Cc: Daniel Borkmann, Netdev, LKML, Linux Crypto Mailing List,
	Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Ard Biesheuvel, Herbert Xu
In-Reply-To: <942b91f25a63b22ec4946378a1fffe78d655cf18.1482545792.git.luto@kernel.org>

On Fri, Dec 23, 2016 at 6:22 PM, Andy Lutomirski <luto@kernel.org> wrote:
> There are some pieecs of kernel code that want to compute SHA256
> directly without going through the crypto core.  Adjust the exported
> API to decouple it from the crypto core.
>
> I suspect this will very slightly speed up the SHA256 shash operations
> as well by reducing the amount of indirection involved.
>

I should also mention: there's a nice potential cleanup that's
possible on top of this.  Currently, most of the accelerated SHA256
implementations just swap out the block function.  Another approach to
enabling this would be to restructure sha256_update along the lines
of:

sha256_block_fn_t fn = arch_sha256_block_fn(len);
sha256_base_do_update(sctx, data, len, arch_sha256_block_fn(len));

The idea being that arch code can decide whether to use an accelerated
block function based on context (x86, for example, can't always use
xmm regs) and length (on x86, using the accelerated versions for short
digests is very slow due to the state save/restore that happens) and
then the core code can just use it.

This would allow a lot of the boilerplate that this patch was forced
to modify to be deleted outright.

--Andy

^ permalink raw reply

* [RFC PATCH 4.10 3/6] bpf: Use SHA256 instead of SHA1 for bpf digests
From: Andy Lutomirski @ 2016-12-24  2:22 UTC (permalink / raw)
  To: Daniel Borkmann, Netdev, LKML, Linux Crypto Mailing List
  Cc: Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Andy Lutomirski, Alexei Starovoitov
In-Reply-To: <cover.1482545792.git.luto@kernel.org>

BPF digests are intended to be used to avoid reloading programs that
are already loaded.  For use cases (CRIU?) where untrusted programs
are involved, intentional hash collisions could cause the wrong BPF
program to execute.  Additionally, if BPF digests are ever used
in-kernel to skip verification, a hash collision could give privilege
escalation directly.

SHA1 is no longer considered adequately collision-resistant (see, for
example, all the major browsers dropping support for SHA1
certificates).  Use SHA256 instead.

I moved the digest field to keep all of the bpf program metadata in
the same cache line.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 include/linux/filter.h | 11 +++--------
 init/Kconfig           |  1 +
 kernel/bpf/core.c      | 41 +++++++----------------------------------
 3 files changed, 11 insertions(+), 42 deletions(-)

diff --git a/include/linux/filter.h b/include/linux/filter.h
index 702314253797..23df2574e30c 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -14,7 +14,8 @@
 #include <linux/workqueue.h>
 #include <linux/sched.h>
 #include <linux/capability.h>
-#include <linux/cryptohash.h>
+
+#include <crypto/sha.h>
 
 #include <net/sch_generic.h>
 
@@ -408,11 +409,11 @@ struct bpf_prog {
 	kmemcheck_bitfield_end(meta);
 	enum bpf_prog_type	type;		/* Type of BPF program */
 	u32			len;		/* Number of filter blocks */
-	u32			digest[SHA_DIGEST_WORDS]; /* Program digest */
 	struct bpf_prog_aux	*aux;		/* Auxiliary fields */
 	struct sock_fprog_kern	*orig_prog;	/* Original BPF program */
 	unsigned int		(*bpf_func)(const void *ctx,
 					    const struct bpf_insn *insn);
+	u8			digest[SHA256_DIGEST_SIZE]; /* Program digest */
 	/* Instructions for interpreter */
 	union {
 		struct sock_filter	insns[0];
@@ -519,12 +520,6 @@ static inline u32 bpf_prog_insn_size(const struct bpf_prog *prog)
 	return prog->len * sizeof(struct bpf_insn);
 }
 
-static inline u32 bpf_prog_digest_scratch_size(const struct bpf_prog *prog)
-{
-	return round_up(bpf_prog_insn_size(prog) +
-			sizeof(__be64) + 1, SHA_MESSAGE_BYTES);
-}
-
 static inline unsigned int bpf_prog_size(unsigned int proglen)
 {
 	return max(sizeof(struct bpf_prog),
diff --git a/init/Kconfig b/init/Kconfig
index 223b734abccd..5a4e2d99cc38 100644
--- a/init/Kconfig
+++ b/init/Kconfig
@@ -1634,6 +1634,7 @@ config BPF_SYSCALL
 	bool "Enable bpf() system call"
 	select ANON_INODES
 	select BPF
+	select CRYPTO_SHA256_LIB
 	default n
 	help
 	  Enable the bpf() system call that allows to manipulate eBPF
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 1eb4f1303756..911993863799 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -148,22 +148,18 @@ void __bpf_prog_free(struct bpf_prog *fp)
 
 int bpf_prog_calc_digest(struct bpf_prog *fp)
 {
-	const u32 bits_offset = SHA_MESSAGE_BYTES - sizeof(__be64);
-	u32 raw_size = bpf_prog_digest_scratch_size(fp);
-	u32 ws[SHA_WORKSPACE_WORDS];
-	u32 i, bsize, psize, blocks;
+	struct sha256_state sha;
+	u32 i, psize;
 	struct bpf_insn *dst;
 	bool was_ld_map;
-	u8 *raw, *todo;
-	__be32 *result;
-	__be64 *bits;
+	u8 *raw;
 
-	raw = vmalloc(raw_size);
+	psize = bpf_prog_insn_size(fp);
+	raw = vmalloc(psize);
 	if (!raw)
 		return -ENOMEM;
 
-	sha_init(fp->digest);
-	memset(ws, 0, sizeof(ws));
+	sha256_init(&sha);
 
 	/* We need to take out the map fd for the digest calculation
 	 * since they are unstable from user space side.
@@ -188,30 +184,7 @@ int bpf_prog_calc_digest(struct bpf_prog *fp)
 		}
 	}
 
-	psize = bpf_prog_insn_size(fp);
-	memset(&raw[psize], 0, raw_size - psize);
-	raw[psize++] = 0x80;
-
-	bsize  = round_up(psize, SHA_MESSAGE_BYTES);
-	blocks = bsize / SHA_MESSAGE_BYTES;
-	todo   = raw;
-	if (bsize - psize >= sizeof(__be64)) {
-		bits = (__be64 *)(todo + bsize - sizeof(__be64));
-	} else {
-		bits = (__be64 *)(todo + bsize + bits_offset);
-		blocks++;
-	}
-	*bits = cpu_to_be64((psize - 1) << 3);
-
-	while (blocks--) {
-		sha_transform(fp->digest, todo, ws);
-		todo += SHA_MESSAGE_BYTES;
-	}
-
-	result = (__force __be32 *)fp->digest;
-	for (i = 0; i < SHA_DIGEST_WORDS; i++)
-		result[i] = cpu_to_be32(fp->digest[i]);
-
+	sha256_finup(&sha, raw, psize, fp->digest);
 	vfree(raw);
 	return 0;
 }
-- 
2.9.3

^ permalink raw reply related

* [RFC PATCH 4.10 0/6] Switch BPF's digest to SHA256
From: Andy Lutomirski @ 2016-12-24  2:22 UTC (permalink / raw)
  To: Daniel Borkmann, Netdev, LKML, Linux Crypto Mailing List
  Cc: Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Andy Lutomirski

Since there are plenty of uses for the new-in-4.10 BPF digest feature
that would be problematic if malicious users could produce collisions,
the BPF digest should be collision-resistant.  SHA-1 is no longer
considered collision-resistant, so switch it to SHA-256.

The actual switchover is trivial.  Most of this series consists of
cleanups to the SHA256 code to make it usable as a standalone library
(since BPF should not depend on crypto).

The cleaned up library is much more user-friendly than the SHA-1 code,
so this also significantly tidies up the BPF digest code.

This is intended for 4.10.  If this series misses 4.10 and nothing
takes its place, then we'll have an unpleasant ABI stability
situation.

Andy Lutomirski (6):
  crypto/sha256: Refactor the API so it can be used without shash
  crypto/sha256: Make the sha256 library functions selectable
  bpf: Use SHA256 instead of SHA1 for bpf digests
  bpf: Avoid copying the entire BPF program when hashing it
  bpf: Rename fdinfo's prog_digest to prog_sha256
  net: Rename TCA*BPF_DIGEST to ..._SHA256

 arch/arm/crypto/sha2-ce-glue.c      | 10 +++---
 arch/arm/crypto/sha256_glue.c       | 23 +++++++++-----
 arch/arm/crypto/sha256_neon_glue.c  | 34 ++++++++++----------
 arch/arm64/crypto/sha2-ce-glue.c    | 13 ++++----
 arch/arm64/crypto/sha256-glue.c     | 59 +++++++++++++++++++---------------
 arch/x86/crypto/sha256_ssse3_glue.c | 46 ++++++++++++++++-----------
 arch/x86/purgatory/purgatory.c      |  2 +-
 arch/x86/purgatory/sha256.c         | 25 ++-------------
 arch/x86/purgatory/sha256.h         | 22 -------------
 crypto/Kconfig                      |  8 +++++
 crypto/Makefile                     |  2 +-
 crypto/sha256_generic.c             | 54 +++++++++++++++++++++++--------
 include/crypto/sha.h                | 33 ++++++++++++++++---
 include/crypto/sha256_base.h        | 40 +++++++----------------
 include/linux/filter.h              | 11 ++-----
 include/uapi/linux/pkt_cls.h        |  2 +-
 include/uapi/linux/tc_act/tc_bpf.h  |  2 +-
 init/Kconfig                        |  1 +
 kernel/bpf/core.c                   | 63 +++++++++----------------------------
 kernel/bpf/syscall.c                |  2 +-
 net/sched/act_bpf.c                 |  2 +-
 net/sched/cls_bpf.c                 |  2 +-
 22 files changed, 225 insertions(+), 231 deletions(-)
 delete mode 100644 arch/x86/purgatory/sha256.h

-- 
2.9.3

^ permalink raw reply

* [RFC PATCH 4.10 6/6] net: Rename TCA*BPF_DIGEST to ..._SHA256
From: Andy Lutomirski @ 2016-12-24  2:22 UTC (permalink / raw)
  To: Daniel Borkmann, Netdev, LKML, Linux Crypto Mailing List
  Cc: Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Andy Lutomirski, Alexei Starovoitov
In-Reply-To: <cover.1482545792.git.luto@kernel.org>

This makes it easier to add another digest algorithm down the road if
needed.  It also serves to force any programs that might have been
written against a kernel that had the old field name to notice the
change and make any necessary changes.

This shouldn't violate any stable API policies, as no released kernel
has ever had TCA*BPF_DIGEST.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 include/uapi/linux/pkt_cls.h       | 2 +-
 include/uapi/linux/tc_act/tc_bpf.h | 2 +-
 net/sched/act_bpf.c                | 2 +-
 net/sched/cls_bpf.c                | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/uapi/linux/pkt_cls.h b/include/uapi/linux/pkt_cls.h
index cb4bcdc58543..ac6b300c1550 100644
--- a/include/uapi/linux/pkt_cls.h
+++ b/include/uapi/linux/pkt_cls.h
@@ -397,7 +397,7 @@ enum {
 	TCA_BPF_NAME,
 	TCA_BPF_FLAGS,
 	TCA_BPF_FLAGS_GEN,
-	TCA_BPF_DIGEST,
+	TCA_BPF_SHA256,
 	__TCA_BPF_MAX,
 };
 
diff --git a/include/uapi/linux/tc_act/tc_bpf.h b/include/uapi/linux/tc_act/tc_bpf.h
index a6b88a6f7f71..eae18a7430eb 100644
--- a/include/uapi/linux/tc_act/tc_bpf.h
+++ b/include/uapi/linux/tc_act/tc_bpf.h
@@ -27,7 +27,7 @@ enum {
 	TCA_ACT_BPF_FD,
 	TCA_ACT_BPF_NAME,
 	TCA_ACT_BPF_PAD,
-	TCA_ACT_BPF_DIGEST,
+	TCA_ACT_BPF_SHA256,
 	__TCA_ACT_BPF_MAX,
 };
 #define TCA_ACT_BPF_MAX (__TCA_ACT_BPF_MAX - 1)
diff --git a/net/sched/act_bpf.c b/net/sched/act_bpf.c
index 1c60317f0121..3868a66d0b24 100644
--- a/net/sched/act_bpf.c
+++ b/net/sched/act_bpf.c
@@ -123,7 +123,7 @@ static int tcf_bpf_dump_ebpf_info(const struct tcf_bpf *prog,
 	    nla_put_string(skb, TCA_ACT_BPF_NAME, prog->bpf_name))
 		return -EMSGSIZE;
 
-	nla = nla_reserve(skb, TCA_ACT_BPF_DIGEST,
+	nla = nla_reserve(skb, TCA_ACT_BPF_SHA256,
 			  sizeof(prog->filter->digest));
 	if (nla == NULL)
 		return -EMSGSIZE;
diff --git a/net/sched/cls_bpf.c b/net/sched/cls_bpf.c
index adc776048d1a..6fc110321621 100644
--- a/net/sched/cls_bpf.c
+++ b/net/sched/cls_bpf.c
@@ -555,7 +555,7 @@ static int cls_bpf_dump_ebpf_info(const struct cls_bpf_prog *prog,
 	    nla_put_string(skb, TCA_BPF_NAME, prog->bpf_name))
 		return -EMSGSIZE;
 
-	nla = nla_reserve(skb, TCA_BPF_DIGEST, sizeof(prog->filter->digest));
+	nla = nla_reserve(skb, TCA_BPF_SHA256, sizeof(prog->filter->digest));
 	if (nla == NULL)
 		return -EMSGSIZE;
 
-- 
2.9.3

^ permalink raw reply related

* [RFC PATCH 4.10 5/6] bpf: Rename fdinfo's prog_digest to prog_sha256
From: Andy Lutomirski @ 2016-12-24  2:22 UTC (permalink / raw)
  To: Daniel Borkmann, Netdev, LKML, Linux Crypto Mailing List
  Cc: Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Andy Lutomirski, Alexei Starovoitov
In-Reply-To: <cover.1482545792.git.luto@kernel.org>

This makes it easier to add another digest algorithm down the road
if needed.  It also serves to force any programs that might have
been written against a kernel that had 'prog_digest' to be updated.

This shouldn't violate any stable API policies, as no released
kernel has ever had 'prog_digest'.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 kernel/bpf/syscall.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index e89acea22ecf..956370b80296 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -694,7 +694,7 @@ static void bpf_prog_show_fdinfo(struct seq_file *m, struct file *filp)
 	seq_printf(m,
 		   "prog_type:\t%u\n"
 		   "prog_jited:\t%u\n"
-		   "prog_digest:\t%s\n"
+		   "prog_sha256:\t%s\n"
 		   "memlock:\t%llu\n",
 		   prog->type,
 		   prog->jited,
-- 
2.9.3

^ permalink raw reply related

* [RFC PATCH 4.10 4/6] bpf: Avoid copying the entire BPF program when hashing it
From: Andy Lutomirski @ 2016-12-24  2:22 UTC (permalink / raw)
  To: Daniel Borkmann, Netdev, LKML, Linux Crypto Mailing List
  Cc: Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Andy Lutomirski, Alexei Starovoitov
In-Reply-To: <cover.1482545792.git.luto@kernel.org>

The sha256 helpers can consume a message incrementally, so there's no need
to allocate a buffer to store the whole blob to be hashed.

This may be a slight slowdown for very long messages because gcc can't
inline the sha256_update() calls.  For reasonably-sized programs,
however, this should be a considerable speedup as vmalloc() is quite
slow.

Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Alexei Starovoitov <ast@kernel.org>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 kernel/bpf/core.c | 34 ++++++++++++++--------------------
 1 file changed, 14 insertions(+), 20 deletions(-)

diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index 911993863799..1c2931f505af 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -149,43 +149,37 @@ void __bpf_prog_free(struct bpf_prog *fp)
 int bpf_prog_calc_digest(struct bpf_prog *fp)
 {
 	struct sha256_state sha;
-	u32 i, psize;
-	struct bpf_insn *dst;
+	u32 i;
 	bool was_ld_map;
-	u8 *raw;
-
-	psize = bpf_prog_insn_size(fp);
-	raw = vmalloc(psize);
-	if (!raw)
-		return -ENOMEM;
 
 	sha256_init(&sha);
 
 	/* We need to take out the map fd for the digest calculation
 	 * since they are unstable from user space side.
 	 */
-	dst = (void *)raw;
 	for (i = 0, was_ld_map = false; i < fp->len; i++) {
-		dst[i] = fp->insnsi[i];
+		struct bpf_insn insn = fp->insnsi[i];
+
 		if (!was_ld_map &&
-		    dst[i].code == (BPF_LD | BPF_IMM | BPF_DW) &&
-		    dst[i].src_reg == BPF_PSEUDO_MAP_FD) {
+		    insn.code == (BPF_LD | BPF_IMM | BPF_DW) &&
+		    insn.src_reg == BPF_PSEUDO_MAP_FD) {
 			was_ld_map = true;
-			dst[i].imm = 0;
+			insn.imm = 0;
 		} else if (was_ld_map &&
-			   dst[i].code == 0 &&
-			   dst[i].dst_reg == 0 &&
-			   dst[i].src_reg == 0 &&
-			   dst[i].off == 0) {
+			   insn.code == 0 &&
+			   insn.dst_reg == 0 &&
+			   insn.src_reg == 0 &&
+			   insn.off == 0) {
 			was_ld_map = false;
-			dst[i].imm = 0;
+			insn.imm = 0;
 		} else {
 			was_ld_map = false;
 		}
+
+		sha256_update(&sha, (const u8 *)&insn, sizeof(insn));
 	}
 
-	sha256_finup(&sha, raw, psize, fp->digest);
-	vfree(raw);
+	sha256_final(&sha, fp->digest);
 	return 0;
 }
 
-- 
2.9.3

^ permalink raw reply related

* [RFC PATCH 4.10 2/6] crypto/sha256: Make the sha256 library functions selectable
From: Andy Lutomirski @ 2016-12-24  2:22 UTC (permalink / raw)
  To: Daniel Borkmann, Netdev, LKML, Linux Crypto Mailing List
  Cc: Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Andy Lutomirski
In-Reply-To: <cover.1482545792.git.luto@kernel.org>

This will let other kernel code call into sha256_init(), etc. without
pulling in the core crypto code.

Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 crypto/Kconfig          | 8 ++++++++
 crypto/Makefile         | 2 +-
 crypto/sha256_generic.c | 4 ++++
 include/crypto/sha.h    | 4 ++++
 4 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/crypto/Kconfig b/crypto/Kconfig
index 160f08e721cc..85a2b3440c2b 100644
--- a/crypto/Kconfig
+++ b/crypto/Kconfig
@@ -10,6 +10,13 @@ config XOR_BLOCKS
 source "crypto/async_tx/Kconfig"
 
 #
+# Cryptographic algorithms that are usable without the Crypto API.
+# None of these should have visible config options.
+#
+config CRYPTO_SHA256_LIB
+	bool
+
+#
 # Cryptographic API Configuration
 #
 menuconfig CRYPTO
@@ -763,6 +770,7 @@ config CRYPTO_SHA512_MB
 
 config CRYPTO_SHA256
 	tristate "SHA224 and SHA256 digest algorithm"
+	select CRYPTO_SHA256_LIB
 	select CRYPTO_HASH
 	help
 	  SHA256 secure hash standard (DFIPS 180-2).
diff --git a/crypto/Makefile b/crypto/Makefile
index b8f0e3eb0791..d147d4c911f5 100644
--- a/crypto/Makefile
+++ b/crypto/Makefile
@@ -71,7 +71,7 @@ obj-$(CONFIG_CRYPTO_RMD160) += rmd160.o
 obj-$(CONFIG_CRYPTO_RMD256) += rmd256.o
 obj-$(CONFIG_CRYPTO_RMD320) += rmd320.o
 obj-$(CONFIG_CRYPTO_SHA1) += sha1_generic.o
-obj-$(CONFIG_CRYPTO_SHA256) += sha256_generic.o
+obj-$(CONFIG_CRYPTO_SHA256_LIB) += sha256_generic.o
 obj-$(CONFIG_CRYPTO_SHA512) += sha512_generic.o
 obj-$(CONFIG_CRYPTO_SHA3) += sha3_generic.o
 obj-$(CONFIG_CRYPTO_WP512) += wp512.o
diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index f2747893402c..9df71ac66dc4 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -261,6 +261,8 @@ void sha256_final(struct sha256_state *sctx, u8 *out)
 }
 EXPORT_SYMBOL(sha256_final);
 
+#ifdef CONFIG_CRYPTO_HASH
+
 static int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
 				unsigned int len)
 {
@@ -328,6 +330,8 @@ static void __exit sha256_generic_mod_fini(void)
 module_init(sha256_generic_mod_init);
 module_exit(sha256_generic_mod_fini);
 
+#endif /* CONFIG_CRYPTO_HASH */
+
 MODULE_LICENSE("GPL");
 MODULE_DESCRIPTION("SHA-224 and SHA-256 Secure Hash Algorithm");
 
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index 2b6978471605..381ba7fa5e3f 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -96,6 +96,8 @@ extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
 extern int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
 			     unsigned int len, u8 *hash);
 
+#ifdef CONFIG_CRYPTO_SHA256_LIB
+
 static inline void sha256_init(struct sha256_state *sctx)
 {
 	sctx->state[0] = SHA256_H0;
@@ -121,6 +123,8 @@ static inline void sha256_finup(struct sha256_state *sctx, const u8 *data,
 	sha256_final(sctx, hash);
 }
 
+#endif /* CONFIG_CRYPTO_SHA256_LIB */
+
 extern int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
 
-- 
2.9.3

^ permalink raw reply related

* [RFC PATCH 4.10 1/6] crypto/sha256: Refactor the API so it can be used without shash
From: Andy Lutomirski @ 2016-12-24  2:22 UTC (permalink / raw)
  To: Daniel Borkmann, Netdev, LKML, Linux Crypto Mailing List
  Cc: Jason A. Donenfeld, Hannes Frederic Sowa, Alexei Starovoitov,
	Eric Dumazet, Eric Biggers, Tom Herbert, David S. Miller,
	Andy Lutomirski, Ard Biesheuvel, Herbert Xu
In-Reply-To: <cover.1482545792.git.luto@kernel.org>

There are some pieecs of kernel code that want to compute SHA256
directly without going through the crypto core.  Adjust the exported
API to decouple it from the crypto core.

I suspect this will very slightly speed up the SHA256 shash operations
as well by reducing the amount of indirection involved.

Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org>
Cc: Herbert Xu <herbert@gondor.apana.org.au>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
 arch/arm/crypto/sha2-ce-glue.c      | 10 ++++---
 arch/arm/crypto/sha256_glue.c       | 23 ++++++++++-----
 arch/arm/crypto/sha256_neon_glue.c  | 34 +++++++++++----------
 arch/arm64/crypto/sha2-ce-glue.c    | 13 ++++----
 arch/arm64/crypto/sha256-glue.c     | 59 +++++++++++++++++++++----------------
 arch/x86/crypto/sha256_ssse3_glue.c | 46 +++++++++++++++++------------
 arch/x86/purgatory/purgatory.c      |  2 +-
 arch/x86/purgatory/sha256.c         | 25 ++--------------
 arch/x86/purgatory/sha256.h         | 22 --------------
 crypto/sha256_generic.c             | 50 +++++++++++++++++++++++--------
 include/crypto/sha.h                | 29 ++++++++++++++----
 include/crypto/sha256_base.h        | 40 ++++++++-----------------
 12 files changed, 184 insertions(+), 169 deletions(-)
 delete mode 100644 arch/x86/purgatory/sha256.h

diff --git a/arch/arm/crypto/sha2-ce-glue.c b/arch/arm/crypto/sha2-ce-glue.c
index 0755b2d657f3..8832c2f85591 100644
--- a/arch/arm/crypto/sha2-ce-glue.c
+++ b/arch/arm/crypto/sha2-ce-glue.c
@@ -38,7 +38,7 @@ static int sha2_ce_update(struct shash_desc *desc, const u8 *data,
 		return crypto_sha256_arm_update(desc, data, len);
 
 	kernel_neon_begin();
-	sha256_base_do_update(desc, data, len,
+	sha256_base_do_update(sctx, data, len,
 			      (sha256_block_fn *)sha2_ce_transform);
 	kernel_neon_end();
 
@@ -48,17 +48,19 @@ static int sha2_ce_update(struct shash_desc *desc, const u8 *data,
 static int sha2_ce_finup(struct shash_desc *desc, const u8 *data,
 			 unsigned int len, u8 *out)
 {
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
 	if (!may_use_simd())
 		return crypto_sha256_arm_finup(desc, data, len, out);
 
 	kernel_neon_begin();
 	if (len)
-		sha256_base_do_update(desc, data, len,
+		sha256_base_do_update(sctx, data, len,
 				      (sha256_block_fn *)sha2_ce_transform);
-	sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform);
+	sha256_base_do_finalize(sctx, (sha256_block_fn *)sha2_ce_transform);
 	kernel_neon_end();
 
-	return sha256_base_finish(desc, out);
+	return crypto_sha2_final(desc, out);
 }
 
 static int sha2_ce_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/arm/crypto/sha256_glue.c b/arch/arm/crypto/sha256_glue.c
index a84e869ef900..405a29a9a9d3 100644
--- a/arch/arm/crypto/sha256_glue.c
+++ b/arch/arm/crypto/sha256_glue.c
@@ -36,27 +36,34 @@ asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
 int crypto_sha256_arm_update(struct shash_desc *desc, const u8 *data,
 			     unsigned int len)
 {
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
 	/* make sure casting to sha256_block_fn() is safe */
 	BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
 
-	return sha256_base_do_update(desc, data, len,
+	sha256_base_do_update(sctx, data, len,
 				(sha256_block_fn *)sha256_block_data_order);
+	return 0;
 }
 EXPORT_SYMBOL(crypto_sha256_arm_update);
 
-static int sha256_final(struct shash_desc *desc, u8 *out)
+static int sha256_arm_final(struct shash_desc *desc, u8 *out)
 {
-	sha256_base_do_finalize(desc,
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	sha256_base_do_finalize(sctx,
 				(sha256_block_fn *)sha256_block_data_order);
-	return sha256_base_finish(desc, out);
+	return crypto_sha2_final(desc, out);
 }
 
 int crypto_sha256_arm_finup(struct shash_desc *desc, const u8 *data,
 			    unsigned int len, u8 *out)
 {
-	sha256_base_do_update(desc, data, len,
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	sha256_base_do_update(sctx, data, len,
 			      (sha256_block_fn *)sha256_block_data_order);
-	return sha256_final(desc, out);
+	return crypto_sha2_final(desc, out);
 }
 EXPORT_SYMBOL(crypto_sha256_arm_finup);
 
@@ -64,7 +71,7 @@ static struct shash_alg algs[] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
 	.init		=	sha256_base_init,
 	.update		=	crypto_sha256_arm_update,
-	.final		=	sha256_final,
+	.final		=	sha256_arm_final,
 	.finup		=	crypto_sha256_arm_finup,
 	.descsize	=	sizeof(struct sha256_state),
 	.base		=	{
@@ -79,7 +86,7 @@ static struct shash_alg algs[] = { {
 	.digestsize	=	SHA224_DIGEST_SIZE,
 	.init		=	sha224_base_init,
 	.update		=	crypto_sha256_arm_update,
-	.final		=	sha256_final,
+	.final		=	sha256_arm_final,
 	.finup		=	crypto_sha256_arm_finup,
 	.descsize	=	sizeof(struct sha256_state),
 	.base		=	{
diff --git a/arch/arm/crypto/sha256_neon_glue.c b/arch/arm/crypto/sha256_neon_glue.c
index 39ccd658817e..40c85d1d4c1e 100644
--- a/arch/arm/crypto/sha256_neon_glue.c
+++ b/arch/arm/crypto/sha256_neon_glue.c
@@ -29,8 +29,8 @@
 asmlinkage void sha256_block_data_order_neon(u32 *digest, const void *data,
 					     unsigned int num_blks);
 
-static int sha256_update(struct shash_desc *desc, const u8 *data,
-			 unsigned int len)
+static int sha256_neon_update(struct shash_desc *desc, const u8 *data,
+			      unsigned int len)
 {
 	struct sha256_state *sctx = shash_desc_ctx(desc);
 
@@ -39,41 +39,43 @@ static int sha256_update(struct shash_desc *desc, const u8 *data,
 		return crypto_sha256_arm_update(desc, data, len);
 
 	kernel_neon_begin();
-	sha256_base_do_update(desc, data, len,
+	sha256_base_do_update(sctx, data, len,
 			(sha256_block_fn *)sha256_block_data_order_neon);
 	kernel_neon_end();
 
 	return 0;
 }
 
-static int sha256_finup(struct shash_desc *desc, const u8 *data,
-			unsigned int len, u8 *out)
+static int sha256_neon_finup(struct shash_desc *desc, const u8 *data,
+			     unsigned int len, u8 *out)
 {
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
 	if (!may_use_simd())
 		return crypto_sha256_arm_finup(desc, data, len, out);
 
 	kernel_neon_begin();
 	if (len)
-		sha256_base_do_update(desc, data, len,
+		sha256_base_do_update(sctx, data, len,
 			(sha256_block_fn *)sha256_block_data_order_neon);
-	sha256_base_do_finalize(desc,
+	sha256_base_do_finalize(sctx,
 			(sha256_block_fn *)sha256_block_data_order_neon);
 	kernel_neon_end();
 
-	return sha256_base_finish(desc, out);
+	return crypto_sha2_final(desc, out);
 }
 
-static int sha256_final(struct shash_desc *desc, u8 *out)
+static int sha256_neon_final(struct shash_desc *desc, u8 *out)
 {
-	return sha256_finup(desc, NULL, 0, out);
+	return sha256_neon_finup(desc, NULL, 0, out);
 }
 
 struct shash_alg sha256_neon_algs[] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
 	.init		=	sha256_base_init,
-	.update		=	sha256_update,
-	.final		=	sha256_final,
-	.finup		=	sha256_finup,
+	.update		=	sha256_neon_update,
+	.final		=	sha256_neon_final,
+	.finup		=	sha256_neon_finup,
 	.descsize	=	sizeof(struct sha256_state),
 	.base		=	{
 		.cra_name	=	"sha256",
@@ -86,9 +88,9 @@ struct shash_alg sha256_neon_algs[] = { {
 }, {
 	.digestsize	=	SHA224_DIGEST_SIZE,
 	.init		=	sha224_base_init,
-	.update		=	sha256_update,
-	.final		=	sha256_final,
-	.finup		=	sha256_finup,
+	.update		=	sha256_neon_update,
+	.final		=	sha256_neon_final,
+	.finup		=	sha256_neon_finup,
 	.descsize	=	sizeof(struct sha256_state),
 	.base		=	{
 		.cra_name	=	"sha224",
diff --git a/arch/arm64/crypto/sha2-ce-glue.c b/arch/arm64/crypto/sha2-ce-glue.c
index 7cd587564a41..e38dd301abce 100644
--- a/arch/arm64/crypto/sha2-ce-glue.c
+++ b/arch/arm64/crypto/sha2-ce-glue.c
@@ -39,7 +39,7 @@ static int sha256_ce_update(struct shash_desc *desc, const u8 *data,
 
 	sctx->finalize = 0;
 	kernel_neon_begin_partial(28);
-	sha256_base_do_update(desc, data, len,
+	sha256_base_do_update(&sctx->sst, data, len,
 			      (sha256_block_fn *)sha2_ce_transform);
 	kernel_neon_end();
 
@@ -64,13 +64,13 @@ static int sha256_ce_finup(struct shash_desc *desc, const u8 *data,
 	sctx->finalize = finalize;
 
 	kernel_neon_begin_partial(28);
-	sha256_base_do_update(desc, data, len,
+	sha256_base_do_update(&sctx->sst, data, len,
 			      (sha256_block_fn *)sha2_ce_transform);
 	if (!finalize)
-		sha256_base_do_finalize(desc,
+		sha256_base_do_finalize(&sctx->sst,
 					(sha256_block_fn *)sha2_ce_transform);
 	kernel_neon_end();
-	return sha256_base_finish(desc, out);
+	return crypto_sha2_final(desc, out);
 }
 
 static int sha256_ce_final(struct shash_desc *desc, u8 *out)
@@ -79,9 +79,10 @@ static int sha256_ce_final(struct shash_desc *desc, u8 *out)
 
 	sctx->finalize = 0;
 	kernel_neon_begin_partial(28);
-	sha256_base_do_finalize(desc, (sha256_block_fn *)sha2_ce_transform);
+	sha256_base_do_finalize(&sctx->sst,
+				(sha256_block_fn *)sha2_ce_transform);
 	kernel_neon_end();
-	return sha256_base_finish(desc, out);
+	return crypto_sha2_final(desc, out);
 }
 
 static struct shash_alg algs[] = { {
diff --git a/arch/arm64/crypto/sha256-glue.c b/arch/arm64/crypto/sha256-glue.c
index a2226f841960..132a1ef89a71 100644
--- a/arch/arm64/crypto/sha256-glue.c
+++ b/arch/arm64/crypto/sha256-glue.c
@@ -33,36 +33,39 @@ asmlinkage void sha256_block_data_order(u32 *digest, const void *data,
 asmlinkage void sha256_block_neon(u32 *digest, const void *data,
 				  unsigned int num_blks);
 
-static int sha256_update(struct shash_desc *desc, const u8 *data,
-			 unsigned int len)
+static int sha256_update_arm64(struct shash_desc *desc, const u8 *data,
+			       unsigned int len)
 {
-	return sha256_base_do_update(desc, data, len,
-				(sha256_block_fn *)sha256_block_data_order);
+	sha256_base_do_update(shash_desc_ctx(desc), data, len,
+			      (sha256_block_fn *)sha256_block_data_order);
+	return 0;
 }
 
-static int sha256_finup(struct shash_desc *desc, const u8 *data,
-			unsigned int len, u8 *out)
+static int sha256_finup_arm64(struct shash_desc *desc, const u8 *data,
+			      unsigned int len, u8 *out)
 {
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
 	if (len)
-		sha256_base_do_update(desc, data, len,
+		sha256_base_do_update(sctx, data, len,
 				(sha256_block_fn *)sha256_block_data_order);
-	sha256_base_do_finalize(desc,
+	sha256_base_do_finalize(sctx,
 				(sha256_block_fn *)sha256_block_data_order);
 
-	return sha256_base_finish(desc, out);
+	return crypto_sha2_final(desc, out);
 }
 
-static int sha256_final(struct shash_desc *desc, u8 *out)
+static int sha256_final_arm64(struct shash_desc *desc, u8 *out)
 {
-	return sha256_finup(desc, NULL, 0, out);
+	return sha256_finup_arm64(desc, NULL, 0, out);
 }
 
 static struct shash_alg algs[] = { {
 	.digestsize		= SHA256_DIGEST_SIZE,
 	.init			= sha256_base_init,
-	.update			= sha256_update,
-	.final			= sha256_final,
-	.finup			= sha256_finup,
+	.update			= sha256_update_arm64,
+	.final			= sha256_final_arm64,
+	.finup			= sha256_finup_arm64,
 	.descsize		= sizeof(struct sha256_state),
 	.base.cra_name		= "sha256",
 	.base.cra_driver_name	= "sha256-arm64",
@@ -73,9 +76,9 @@ static struct shash_alg algs[] = { {
 }, {
 	.digestsize		= SHA224_DIGEST_SIZE,
 	.init			= sha224_base_init,
-	.update			= sha256_update,
-	.final			= sha256_final,
-	.finup			= sha256_finup,
+	.update			= sha256_update_arm64,
+	.final			= sha256_final_arm64,
+	.finup			= sha256_finup_arm64,
 	.descsize		= sizeof(struct sha256_state),
 	.base.cra_name		= "sha224",
 	.base.cra_driver_name	= "sha224-arm64",
@@ -88,18 +91,22 @@ static struct shash_alg algs[] = { {
 static int sha256_update_neon(struct shash_desc *desc, const u8 *data,
 			      unsigned int len)
 {
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
 	/*
 	 * Stacking and unstacking a substantial slice of the NEON register
 	 * file may significantly affect performance for small updates when
 	 * executing in interrupt context, so fall back to the scalar code
 	 * in that case.
 	 */
-	if (!may_use_simd())
-		return sha256_base_do_update(desc, data, len,
+	if (!may_use_simd()) {
+		sha256_base_do_update(sctx, data, len,
 				(sha256_block_fn *)sha256_block_data_order);
+		return 0;
+	}
 
 	kernel_neon_begin();
-	sha256_base_do_update(desc, data, len,
+	sha256_base_do_update(sctx, data, len,
 				(sha256_block_fn *)sha256_block_neon);
 	kernel_neon_end();
 
@@ -109,22 +116,24 @@ static int sha256_update_neon(struct shash_desc *desc, const u8 *data,
 static int sha256_finup_neon(struct shash_desc *desc, const u8 *data,
 			     unsigned int len, u8 *out)
 {
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
 	if (!may_use_simd()) {
 		if (len)
-			sha256_base_do_update(desc, data, len,
+			sha256_base_do_update(sctx, data, len,
 				(sha256_block_fn *)sha256_block_data_order);
-		sha256_base_do_finalize(desc,
+		sha256_base_do_finalize(sctx,
 				(sha256_block_fn *)sha256_block_data_order);
 	} else {
 		kernel_neon_begin();
 		if (len)
-			sha256_base_do_update(desc, data, len,
+			sha256_base_do_update(sctx, data, len,
 				(sha256_block_fn *)sha256_block_neon);
-		sha256_base_do_finalize(desc,
+		sha256_base_do_finalize(sctx,
 				(sha256_block_fn *)sha256_block_neon);
 		kernel_neon_end();
 	}
-	return sha256_base_finish(desc, out);
+	return crypto_sha2_final(desc, out);
 }
 
 static int sha256_final_neon(struct shash_desc *desc, u8 *out)
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index 9e79baf03a4b..e722fbaf0558 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -44,52 +44,60 @@ asmlinkage void sha256_transform_ssse3(u32 *digest, const char *data,
 				       u64 rounds);
 typedef void (sha256_transform_fn)(u32 *digest, const char *data, u64 rounds);
 
-static int sha256_update(struct shash_desc *desc, const u8 *data,
-			 unsigned int len, sha256_transform_fn *sha256_xform)
+static int sha256_fpu_update(struct shash_desc *desc, const u8 *data,
+			       unsigned int len,
+			       sha256_transform_fn *sha256_xform)
 {
 	struct sha256_state *sctx = shash_desc_ctx(desc);
 
 	if (!irq_fpu_usable() ||
-	    (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
-		return crypto_sha256_update(desc, data, len);
+	    (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE) {
+		sha256_update(sctx, data, len);
+		return 0;
+	}
 
 	/* make sure casting to sha256_block_fn() is safe */
 	BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
 
 	kernel_fpu_begin();
-	sha256_base_do_update(desc, data, len,
+	sha256_base_do_update(sctx, data, len,
 			      (sha256_block_fn *)sha256_xform);
 	kernel_fpu_end();
 
 	return 0;
 }
 
-static int sha256_finup(struct shash_desc *desc, const u8 *data,
+static int sha256_fpu_finup(struct shash_desc *desc, const u8 *data,
 	      unsigned int len, u8 *out, sha256_transform_fn *sha256_xform)
 {
-	if (!irq_fpu_usable())
-		return crypto_sha256_finup(desc, data, len, out);
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	if (!irq_fpu_usable()) {
+		sha256_finup(sctx, data, len, out);
+		return 0;
+	}
 
 	kernel_fpu_begin();
 	if (len)
-		sha256_base_do_update(desc, data, len,
+		sha256_base_do_update(sctx, data, len,
 				      (sha256_block_fn *)sha256_xform);
-	sha256_base_do_finalize(desc, (sha256_block_fn *)sha256_xform);
+	sha256_base_do_finalize(sctx, (sha256_block_fn *)sha256_xform);
 	kernel_fpu_end();
 
-	return sha256_base_finish(desc, out);
+	crypto_sha2_final(desc, out);
+	return 0;
 }
 
 static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
 			 unsigned int len)
 {
-	return sha256_update(desc, data, len, sha256_transform_ssse3);
+	return sha256_fpu_update(desc, data, len, sha256_transform_ssse3);
 }
 
 static int sha256_ssse3_finup(struct shash_desc *desc, const u8 *data,
 	      unsigned int len, u8 *out)
 {
-	return sha256_finup(desc, data, len, out, sha256_transform_ssse3);
+	return sha256_fpu_finup(desc, data, len, out, sha256_transform_ssse3);
 }
 
 /* Add padding and return the message digest. */
@@ -152,13 +160,13 @@ asmlinkage void sha256_transform_avx(u32 *digest, const char *data,
 static int sha256_avx_update(struct shash_desc *desc, const u8 *data,
 			 unsigned int len)
 {
-	return sha256_update(desc, data, len, sha256_transform_avx);
+	return sha256_fpu_update(desc, data, len, sha256_transform_avx);
 }
 
 static int sha256_avx_finup(struct shash_desc *desc, const u8 *data,
 		      unsigned int len, u8 *out)
 {
-	return sha256_finup(desc, data, len, out, sha256_transform_avx);
+	return sha256_fpu_finup(desc, data, len, out, sha256_transform_avx);
 }
 
 static int sha256_avx_final(struct shash_desc *desc, u8 *out)
@@ -236,13 +244,13 @@ asmlinkage void sha256_transform_rorx(u32 *digest, const char *data,
 static int sha256_avx2_update(struct shash_desc *desc, const u8 *data,
 			 unsigned int len)
 {
-	return sha256_update(desc, data, len, sha256_transform_rorx);
+	return sha256_fpu_update(desc, data, len, sha256_transform_rorx);
 }
 
 static int sha256_avx2_finup(struct shash_desc *desc, const u8 *data,
 		      unsigned int len, u8 *out)
 {
-	return sha256_finup(desc, data, len, out, sha256_transform_rorx);
+	return sha256_fpu_finup(desc, data, len, out, sha256_transform_rorx);
 }
 
 static int sha256_avx2_final(struct shash_desc *desc, u8 *out)
@@ -318,13 +326,13 @@ asmlinkage void sha256_ni_transform(u32 *digest, const char *data,
 static int sha256_ni_update(struct shash_desc *desc, const u8 *data,
 			 unsigned int len)
 {
-	return sha256_update(desc, data, len, sha256_ni_transform);
+	return sha256_fpu_update(desc, data, len, sha256_ni_transform);
 }
 
 static int sha256_ni_finup(struct shash_desc *desc, const u8 *data,
 		      unsigned int len, u8 *out)
 {
-	return sha256_finup(desc, data, len, out, sha256_ni_transform);
+	return sha256_fpu_finup(desc, data, len, out, sha256_ni_transform);
 }
 
 static int sha256_ni_final(struct shash_desc *desc, u8 *out)
diff --git a/arch/x86/purgatory/purgatory.c b/arch/x86/purgatory/purgatory.c
index 25e068ba3382..ed6e80b844cf 100644
--- a/arch/x86/purgatory/purgatory.c
+++ b/arch/x86/purgatory/purgatory.c
@@ -10,7 +10,7 @@
  * Version 2.  See the file COPYING for more details.
  */
 
-#include "sha256.h"
+#include <crypto/sha.h>
 #include "../boot/string.h"
 
 struct sha_region {
diff --git a/arch/x86/purgatory/sha256.c b/arch/x86/purgatory/sha256.c
index 548ca675a14a..724925d5da61 100644
--- a/arch/x86/purgatory/sha256.c
+++ b/arch/x86/purgatory/sha256.c
@@ -17,7 +17,7 @@
 
 #include <linux/bitops.h>
 #include <asm/byteorder.h>
-#include "sha256.h"
+#include <crypto/sha.h>
 #include "../boot/string.h"
 
 static inline u32 Ch(u32 x, u32 y, u32 z)
@@ -208,22 +208,7 @@ static void sha256_transform(u32 *state, const u8 *input)
 	memset(W, 0, 64 * sizeof(u32));
 }
 
-int sha256_init(struct sha256_state *sctx)
-{
-	sctx->state[0] = SHA256_H0;
-	sctx->state[1] = SHA256_H1;
-	sctx->state[2] = SHA256_H2;
-	sctx->state[3] = SHA256_H3;
-	sctx->state[4] = SHA256_H4;
-	sctx->state[5] = SHA256_H5;
-	sctx->state[6] = SHA256_H6;
-	sctx->state[7] = SHA256_H7;
-	sctx->count = 0;
-
-	return 0;
-}
-
-int sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len)
+void sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len)
 {
 	unsigned int partial, done;
 	const u8 *src;
@@ -249,11 +234,9 @@ int sha256_update(struct sha256_state *sctx, const u8 *data, unsigned int len)
 		partial = 0;
 	}
 	memcpy(sctx->buf + partial, src, len - done);
-
-	return 0;
 }
 
-int sha256_final(struct sha256_state *sctx, u8 *out)
+void sha256_final(struct sha256_state *sctx, u8 *out)
 {
 	__be32 *dst = (__be32 *)out;
 	__be64 bits;
@@ -278,6 +261,4 @@ int sha256_final(struct sha256_state *sctx, u8 *out)
 
 	/* Zeroize sensitive information. */
 	memset(sctx, 0, sizeof(*sctx));
-
-	return 0;
 }
diff --git a/arch/x86/purgatory/sha256.h b/arch/x86/purgatory/sha256.h
deleted file mode 100644
index bd15a4127735..000000000000
--- a/arch/x86/purgatory/sha256.h
+++ /dev/null
@@ -1,22 +0,0 @@
-/*
- *  Copyright (C) 2014 Red Hat Inc.
- *
- *  Author: Vivek Goyal <vgoyal@redhat.com>
- *
- * This source code is licensed under the GNU General Public License,
- * Version 2.  See the file COPYING for more details.
- */
-
-#ifndef SHA256_H
-#define SHA256_H
-
-
-#include <linux/types.h>
-#include <crypto/sha.h>
-
-extern int sha256_init(struct sha256_state *sctx);
-extern int sha256_update(struct sha256_state *sctx, const u8 *input,
-				unsigned int length);
-extern int sha256_final(struct sha256_state *sctx, u8 *hash);
-
-#endif /* SHA256_H */
diff --git a/crypto/sha256_generic.c b/crypto/sha256_generic.c
index 8f9c47e1a96e..f2747893402c 100644
--- a/crypto/sha256_generic.c
+++ b/crypto/sha256_generic.c
@@ -231,6 +231,13 @@ static void sha256_transform(u32 *state, const u8 *input)
 	memzero_explicit(W, 64 * sizeof(u32));
 }
 
+int sha256_base_init(struct shash_desc *desc)
+{
+	sha256_init(shash_desc_ctx(desc));
+	return 0;
+}
+EXPORT_SYMBOL(sha256_base_init);
+
 static void sha256_generic_block_fn(struct sha256_state *sst, u8 const *src,
 				    int blocks)
 {
@@ -240,32 +247,49 @@ static void sha256_generic_block_fn(struct sha256_state *sst, u8 const *src,
 	}
 }
 
-int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
+void sha256_update(struct sha256_state *sctx, const u8 *data,
 			  unsigned int len)
 {
-	return sha256_base_do_update(desc, data, len, sha256_generic_block_fn);
+	sha256_base_do_update(sctx, data, len, sha256_generic_block_fn);
+}
+EXPORT_SYMBOL(sha256_update);
+
+void sha256_final(struct sha256_state *sctx, u8 *out)
+{
+	sha256_base_do_finalize(sctx, sha256_generic_block_fn);
+	sha256_base_finish(sctx, out);
 }
-EXPORT_SYMBOL(crypto_sha256_update);
+EXPORT_SYMBOL(sha256_final);
 
-static int sha256_final(struct shash_desc *desc, u8 *out)
+static int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
+				unsigned int len)
 {
-	sha256_base_do_finalize(desc, sha256_generic_block_fn);
-	return sha256_base_finish(desc, out);
+	sha256_update(shash_desc_ctx(desc), data, len);
+	return 0;
+}
+
+int crypto_sha2_final(struct shash_desc *desc, u8 *out)
+{
+	struct sha256_state *sctx = shash_desc_ctx(desc);
+
+	sha256_base_do_finalize(sctx, sha256_generic_block_fn);
+	sha2_base_finish(sctx, crypto_shash_digestsize(desc->tfm), out);
+	return 0;
 }
+EXPORT_SYMBOL(crypto_sha2_final);
 
-int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
-			unsigned int len, u8 *hash)
+static int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
+			       unsigned int len, u8 *hash)
 {
-	sha256_base_do_update(desc, data, len, sha256_generic_block_fn);
-	return sha256_final(desc, hash);
+	sha256_finup(shash_desc_ctx(desc), data, len, hash);
+	return 0;
 }
-EXPORT_SYMBOL(crypto_sha256_finup);
 
 static struct shash_alg sha256_algs[2] = { {
 	.digestsize	=	SHA256_DIGEST_SIZE,
 	.init		=	sha256_base_init,
 	.update		=	crypto_sha256_update,
-	.final		=	sha256_final,
+	.final		=	crypto_sha2_final,
 	.finup		=	crypto_sha256_finup,
 	.descsize	=	sizeof(struct sha256_state),
 	.base		=	{
@@ -279,7 +303,7 @@ static struct shash_alg sha256_algs[2] = { {
 	.digestsize	=	SHA224_DIGEST_SIZE,
 	.init		=	sha224_base_init,
 	.update		=	crypto_sha256_update,
-	.final		=	sha256_final,
+	.final		=	crypto_sha2_final,
 	.finup		=	crypto_sha256_finup,
 	.descsize	=	sizeof(struct sha256_state),
 	.base		=	{
diff --git a/include/crypto/sha.h b/include/crypto/sha.h
index c94d3eb1cefd..2b6978471605 100644
--- a/include/crypto/sha.h
+++ b/include/crypto/sha.h
@@ -96,11 +96,30 @@ extern int crypto_sha1_update(struct shash_desc *desc, const u8 *data,
 extern int crypto_sha1_finup(struct shash_desc *desc, const u8 *data,
 			     unsigned int len, u8 *hash);
 
-extern int crypto_sha256_update(struct shash_desc *desc, const u8 *data,
-			      unsigned int len);
-
-extern int crypto_sha256_finup(struct shash_desc *desc, const u8 *data,
-			       unsigned int len, u8 *hash);
+static inline void sha256_init(struct sha256_state *sctx)
+{
+	sctx->state[0] = SHA256_H0;
+	sctx->state[1] = SHA256_H1;
+	sctx->state[2] = SHA256_H2;
+	sctx->state[3] = SHA256_H3;
+	sctx->state[4] = SHA256_H4;
+	sctx->state[5] = SHA256_H5;
+	sctx->state[6] = SHA256_H6;
+	sctx->state[7] = SHA256_H7;
+	sctx->count = 0;
+}
+
+extern void sha256_update(struct sha256_state *sctx, const u8 *data,
+			  unsigned int len);
+
+extern void sha256_final(struct sha256_state *sctx, u8 *out);
+
+static inline void sha256_finup(struct sha256_state *sctx, const u8 *data,
+				unsigned int len, u8 *hash)
+{
+	sha256_update(sctx, data, len);
+	sha256_final(sctx, hash);
+}
 
 extern int crypto_sha512_update(struct shash_desc *desc, const u8 *data,
 			      unsigned int len);
diff --git a/include/crypto/sha256_base.h b/include/crypto/sha256_base.h
index d1f2195bb7de..f65d9a516b36 100644
--- a/include/crypto/sha256_base.h
+++ b/include/crypto/sha256_base.h
@@ -35,29 +35,13 @@ static inline int sha224_base_init(struct shash_desc *desc)
 	return 0;
 }
 
-static inline int sha256_base_init(struct shash_desc *desc)
-{
-	struct sha256_state *sctx = shash_desc_ctx(desc);
-
-	sctx->state[0] = SHA256_H0;
-	sctx->state[1] = SHA256_H1;
-	sctx->state[2] = SHA256_H2;
-	sctx->state[3] = SHA256_H3;
-	sctx->state[4] = SHA256_H4;
-	sctx->state[5] = SHA256_H5;
-	sctx->state[6] = SHA256_H6;
-	sctx->state[7] = SHA256_H7;
-	sctx->count = 0;
-
-	return 0;
-}
+extern int sha256_base_init(struct shash_desc *desc);
 
-static inline int sha256_base_do_update(struct shash_desc *desc,
+static inline void sha256_base_do_update(struct sha256_state *sctx,
 					const u8 *data,
 					unsigned int len,
 					sha256_block_fn *block_fn)
 {
-	struct sha256_state *sctx = shash_desc_ctx(desc);
 	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
 
 	sctx->count += len;
@@ -86,15 +70,12 @@ static inline int sha256_base_do_update(struct shash_desc *desc,
 	}
 	if (len)
 		memcpy(sctx->buf + partial, data, len);
-
-	return 0;
 }
 
-static inline int sha256_base_do_finalize(struct shash_desc *desc,
+static inline void sha256_base_do_finalize(struct sha256_state *sctx,
 					  sha256_block_fn *block_fn)
 {
 	const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
-	struct sha256_state *sctx = shash_desc_ctx(desc);
 	__be64 *bits = (__be64 *)(sctx->buf + bit_offset);
 	unsigned int partial = sctx->count % SHA256_BLOCK_SIZE;
 
@@ -109,14 +90,11 @@ static inline int sha256_base_do_finalize(struct shash_desc *desc,
 	memset(sctx->buf + partial, 0x0, bit_offset - partial);
 	*bits = cpu_to_be64(sctx->count << 3);
 	block_fn(sctx, sctx->buf, 1);
-
-	return 0;
 }
 
-static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
+static inline void sha2_base_finish(struct sha256_state *sctx,
+				    unsigned int digest_size, u8 *out)
 {
-	unsigned int digest_size = crypto_shash_digestsize(desc->tfm);
-	struct sha256_state *sctx = shash_desc_ctx(desc);
 	__be32 *digest = (__be32 *)out;
 	int i;
 
@@ -124,5 +102,11 @@ static inline int sha256_base_finish(struct shash_desc *desc, u8 *out)
 		put_unaligned_be32(sctx->state[i], digest++);
 
 	*sctx = (struct sha256_state){};
-	return 0;
 }
+
+static inline void sha256_base_finish(struct sha256_state *sctx, u8 *out)
+{
+	sha2_base_finish(sctx, SHA256_DIGEST_SIZE, out);
+}
+
+extern int crypto_sha2_final(struct shash_desc *desc, u8 *out);
-- 
2.9.3

^ permalink raw reply related

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-24  1:17 UTC (permalink / raw)
  To: daniel, hannes, linux
  Cc: ak, davem, David.Laight, ebiggers3, eric.dumazet, Jason,
	kernel-hardening, linux-crypto, linux-kernel, luto, netdev, tom,
	tytso, vegard.nossum
In-Reply-To: <254f8cd7-d045-172d-8692-6052d9da287e@stressinduktion.org>

Hannes Frederic Sowa wrote:
> On 24.12.2016 00:39, George Spelvin wrote:
>> We just finished discussing why 8 bytes isn't enough.  If you only
>> feed back 8 bytes, an attacker who can do 2^64 computation can find it
>> (by guessing and computing forward to verify the guess) and recover the
>> previous state.  You need to feed back at least as much output as your
>> security targete.  For /dev/urandom's ChaCha20, that's 256 bits.

> I followed the discussion but it appeared to me that this has the
> additional constraint of time until the next reseeding event happenes,
> which is 300s (under the assumption that calls to get_random_int happen
> regularly, which I expect right now). After that the existing reseeding
> mechansim will ensure enough backtracking protection. The number of
> bytes can easily be increased here, given that reseeding was shown to be
> quite fast already and we produce enough output. But I am not sure if
> this is a bit overengineered in the end?

I'm not following your description of how the time-based and call-based
mechanisms interact, but for any mix-back, you should either do enough
or none at all.  (Also called "catastrophic reseeding".)

For example, two mix-backs of 64 bits gives you 65 bit security, not 128.
(Because each mixback can be guessed and verified separately.)

> Also agreed. Given your solution below to prandom_u32, I do think it
> might also work without the seqlock now.

It's not technically a seqlock; in particular the reader doesn't
spin.  But the write side, and general logic is so similar it's
a good mental model.

Basically, assume a 64-byte buffer.  The reader has gone through
32 bytes of it, and has 32 left, and when he reads another 8 bytes,
has to distinguish three cases:

1) No update; we read the old bytes and there are now 32 - 24 bytes left.
2) Update completed while we weren't looking.  There are now new
   bytes in the buffer, and we took 8 leaving 64 - 8 = 56.
3) Update in progress at the time of the read.  We don't know if we
   are seeing old bytes or new bytes, so we have to assume the worst
   and not proceeed unless 32 >= 8, but assume at the end there are
   64 - 8 = 56 new bytes left.

> I wouldn't have added a disable irqs, but given that I really like your
> proposal, I would take it in my todo branch and submit it when net-next
> window opens.

If you want that, I have a pile of patches to prandom I really
should push upstream.  Shall I refresh them and send them to you?


commit 4cf1b3d9f4fbccc29ffc2fe4ca4ff52ea77253f1
Author: George Spelvin <linux@horizon.com>
Date:   Mon Aug 31 00:05:00 2015 -0400

    net: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    The net/802 code was already efficient, but prandom_u32_max() is simpler.
    
    In net/batman-adv/bat_iv_ogm.c, batadv_iv_ogm_fwd_send_time() got changed
    from picking a random number of milliseconds and converting to jiffies to
    picking a random number of jiffies, since the number of milliseconds (and
    thus the conversion to jiffies) is a compile-time constant.  The equivalent
    code in batadv_iv_ogm_emit_send_time was not changed, because the number
    of milliseconds is variable.
    
    In net/ipv6/ip6_flowlabel.c, ip6_flowlabel had htonl(prandom_u32()),
    which is silly.  Just cast to __be32 without permuting the bits.
    
    net/sched/sch_netem.c got adjusted to only need one call to prandom_u32
    instead of 2.  (Assuming skb_headlen can't exceed 512 MiB, which is
    hopefully safe for some time yet.)
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 9c8fb80e1fd2be42c35cab1af27187d600fd85e3
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 15:20:47 2014 -0400

    mm/swapfile.c: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 2743eb01e5c5958fd88ae78d19c5fea772d4b117
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 15:19:53 2014 -0400

    lib: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 6a5e91bf395060a3351bfe5efc40ac20ffba2c1b
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 15:18:50 2014 -0400

    fs/xfs: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range".
    
    Also changed the last argument of xfs_error_test() from "unsigned long"
    to "unsigned", since the code never did support values > 2^32, and
    the largest value ever passed is 100.
    
    The code could be improved even further by passing in 2^32/rf rather
    than rf, but I'll leave that to some XFS developers.
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 6f6d485d9179ca6ec4e30caa06ade0e0c6931810
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 15:00:17 2014 -0400

    fs/ubifs: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range".
    
    In fs/ubifs/debug.c, the chance() function got rewritten entirely
    to take advantage of the fact its arguments are always constant.
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 0b6bf2c874bbbcfa74f6be0413c772b3ac134d12
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 14:52:17 2014 -0400

    fs/extX: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range".
    
    In ext4/ialloc.c:436, there's one more chance to do that, but
    the modulo is required to keep the deterministic option consistent,
    so I left it alone.
    
    Switching to "parent_group = ((u64)grp * ngroups) >> 32;" would be more
    efficient, but would change user-visible behavior.
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit e79e0e8b491bc976c0b4e1b2070eccdf55b34f43
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 14:47:15 2014 -0400

    fs/ceph: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit fc628326d8cf4abe364ea01259bc392c0bbaf5a0
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 14:46:29 2014 -0400

    drivers/scsi/fcoe/fcoe_ctlr.c: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 4810d67dd2edf08e7801ef47550d46b7397fe2dc
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 14:45:55 2014 -0400

    drivers/ntb/ntb_hw.c: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit f4a806abbc0785e8f0363e02fac613246eed9e34
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 14:45:27 2014 -0400

    drivers/net: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    In some cases (like ethernet/broadcom/cnic.c and drivers/net/hamradio),
    the range is a compile-time constant power of 2, so the code doesn't
    get any better, but I'm trying to do a full sweep.
    
    In drivers/net/wireless/brcm80211/brcmfmac/p2p.c, a timeout is selected from
    100, 200 or 300 ms.  It would be easy enough to make the granularity finer,
    but I assume the existing code is that way for a reason.
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 603e0c311fac09d633ff6c0fd9242108f1c52ead
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 13:55:09 2014 -0400

    drivers/mtd: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range".
    
    And rewrote the 1-in-N code in drivers/mtd/ubi/debug.h to avoid
    even more arithmetic.
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit e0657cc865e8e02768906935b8e8bf63af58aa46
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 13:52:13 2014 -0400

    drivers/mmc/core: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 017ee6841ec8d416093fc1f18bdd3df0dfc6aacc
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 13:51:33 2014 -0400

    drivers/lguest/page_tables.c: Use prandom_u32_max()
    
    This doesn't actually change the code because the array size is
    a power of 2 (it's a 4-element array).  But I'm on a roll.
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 87556165f46eb42c756bcb94e062c3fbd272a4bf
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 13:49:41 2014 -0400

    drivers/infiniband: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 1eafe1d429f442218810e8c604d4e7c466414cf3
Author: George Spelvin <linux@horizon.com>
Date:   Sun Aug 30 23:42:41 2015 -0400

    block/blk-mq-tag.c: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit f03ee59a63098d244d5b8932fc68c9fc3e2bb222
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 13:46:52 2014 -0400

    arch/x86: Use prandom_u32_max()
    
    It's slightly more efficient than "prandom_u32() % range"
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit feafd3a3fb09924ea633d677a7ab8a25a817f39d
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 13:44:49 2014 -0400

    lib/random32.c: Remove redundant U suffixes on integers
    
    Get rid of a few of the extraneous U suffixes on ordinary integers.
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit f14328d248e59c862478633479528c9cb8554d7a
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 12:40:19 2014 -0400

    lib/random32.c: Randomize timeout to the millisecond, not the second.
    
    If you're going to bother randomizing it, do it right.
    And use prandom_u32_max(), which is designed for the job, rather
    than "% 40".
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 143342006adfff718afedf58f639b72337d7c816
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 12:51:26 2014 -0400

    lib/random32.c: Make prandom_u32_max efficient for powers of 2
    
    The multiply-and-shift is efficient in the general case, but slower
    than a simple bitwise AND if the range is a power of 2.  Make the function
    handle this case so callers don't have to worry about micro-optimizing it.
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 99864db5cb023e6b09d71cde4997a5f6cafb5cca
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 12:02:17 2014 -0400

    lib/random32.c: Use <asm/unaligned.h> instead of hand-rolling it
    
    The functions exist for a reason; the manual byte-at-a-time code
    is unnecessarily slow (and bloated).
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 4ecd45f6eadd4d171782dc6b451ed1040e47d419
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 11:55:59 2014 -0400

    lib/random32.c: Debloat non-speed-critical code
    
    Unrolling code in rarely-used code paths is just silly.  There are
    two places that static calls to prandom_u32_state() can be removed:
    
    1) prandom_warmup() calls prandom_u32_state 10 times.
    2) prandom_state_selftest() can avoid one call and simplify the
       loop logic by repeating an assignment to a local variable
       (which probably adds zero code anyway).
    
    Signed-off-by: George Spelvin <linux@horizon.com>

commit 8765cff45da1d96e4310d50dd49231790c49b612
Author: George Spelvin <linux@horizon.com>
Date:   Sat May 24 11:52:34 2014 -0400

    lib/random32.c: Mark self-test data as __initconst
    
    So it can be thrown away along with the code that uses it.
    
    Signed-off-by: George Spelvin <linux@horizon.com>

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Hannes Frederic Sowa @ 2016-12-24  0:12 UTC (permalink / raw)
  To: George Spelvin, daniel
  Cc: ak, davem, David.Laight, ebiggers3, eric.dumazet, Jason,
	kernel-hardening, linux-crypto, linux-kernel, luto, netdev, tom,
	tytso, vegard.nossum
In-Reply-To: <20161223233904.11739.qmail@ns.sciencehorizons.net>

Hi,

On 24.12.2016 00:39, George Spelvin wrote:
> Hannes Frederic Sowa wrote:
>> In general this looks good, but bitbuffer needs to be protected from
>> concurrent access, thus needing at least some atomic instruction and
>> disabling of interrupts for the locking if done outside of
>> get_random_long. Thus I liked your previous approach more where you
>> simply embed this in the already necessary get_random_long and aliased
>> get_random_long as get_random_bits(BITS_PER_LONG) accordingly, IMHO.
> 
> It's meant to be part of the same approach, and I didn't include locking
> because that's a requirement for *any* solution, and isn't specific
> to the part I was trying to illustrate.
> 
> (As for re-using the name "get_random_long", that was just so
> I didn't have to explain it.  Call it "get_random_long_internal"
> if you like.)
> 
> Possible locking implementations include:
> 1) Use the same locking as applies to get_random_long_internal(), or
> 2) Make bitbuffer a per-CPU variable (note that we currently have 128
>    bits of per-CPU state in get_random_int_hash[]), and this is all a
>    fast-path to bypass heavier locking in get_random_long_internal().

I understood that this is definitely a solvable problem.

>>> But, I just realized I've been overlooking something glaringly obvious...
>>> there's no reason you can't compute multple blocks in advance.
>>
>> In the current code on the cost of per cpu allocations thus memory.
> 
> Yes, but on 4-core machines it's still not much, and 4096-core
> behemoths have RAM to burn.
> 
>> In the extract_crng case, couldn't we simply use 8 bytes of the 64 byte
>> return block to feed it directly back into the state chacha? So we pass
>> on 56 bytes into the pcpu buffer, and consume 8 bytes for the next
>> state. This would make the window max shorter than the anti
>> backtracking protection right now from 300s to 14 get_random_int calls.
>> Not sure if this is worth it.
> 
> We just finished discussing why 8 bytes isn't enough.  If you only
> feed back 8 bytes, an attacker who can do 2^64 computation can find it
> (by guessing and computing forward to verify the guess) and recover the
> previous state.  You need to feed back at least as much output as your
> security targete.  For /dev/urandom's ChaCha20, that's 256 bits.

I followed the discussion but it appeared to me that this has the
additional constraint of time until the next reseeding event happenes,
which is 300s (under the assumption that calls to get_random_int happen
regularly, which I expect right now). After that the existing reseeding
mechansim will ensure enough backtracking protection. The number of
bytes can easily be increased here, given that reseeding was shown to be
quite fast already and we produce enough output. But I am not sure if
this is a bit overengineered in the end?

>>> For example, suppose we gave each CPU a small pool to minimize locking.
>>> When one runs out and calls the global refill, it could refill *all*
>>> of the CPU pools.  (You don't even need locking; there may be a race to
>>> determine *which* random numbers the reader sees, but they're equally
>>> good either way.)
> 
>> Yes, but still disabled interrupts, otherwise the same state could be
>> used twice on the same CPU. Uff, I think we have this problem in
>> prandom_u32.
> 
> There are some locking gotchas, but it is doable lock-free.
> 
> Basically, it's a seqlock.  The writer increments it once (to an odd
> number) before starting to overwrite the buffer, and a second time (to
> an even number) after.  "Before" and "after" mean smp_wmb().
> 
> The reader can use this to figure out how much of the data in the buffer
> is safely fresh.  The full sequence of checks is a bit intricate,
> but straightforward.
> 
> I didn't discuss the locking because I'm confident it's solvable,
> not because I wasn't aware it has to be solved.

Also agreed. Given your solution below to prandom_u32, I do think it
might also work without the seqlock now.

> As for prandom_u32(), what's the problem?  Are you worried that
> get_cpu_var disables preemption but not interrupts, and so an
> ISR might return the same value as process-level code?

Yes, exactly those were my thoughts.

> First of all, that's not a problem because prandom_u32() doesn't
> have security guarantees.  Occasionally returning a dupicate number
> is okay.
>
> Second, if you do care, that could be trivially fixed by throwing
> a barrier() in the middle of the code.  (Patch appended; S-o-B
> if anyone wants it.)

I wouldn't have added a disable irqs, but given that I really like your
proposal, I would take it in my todo branch and submit it when net-next
window opens.

> diff --git a/lib/random32.c b/lib/random32.c
> index c750216d..6bee4a36 100644
> --- a/lib/random32.c
> +++ b/lib/random32.c
> [...]

Thanks,
Hannes

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-23 23:39 UTC (permalink / raw)
  To: daniel, hannes, linux
  Cc: ak, davem, David.Laight, ebiggers3, eric.dumazet, Jason,
	kernel-hardening, linux-crypto, linux-kernel, luto, netdev, tom,
	tytso, vegard.nossum
In-Reply-To: <1482526135.2554.5.camel@stressinduktion.org>

Hannes Frederic Sowa wrote:
> In general this looks good, but bitbuffer needs to be protected from
> concurrent access, thus needing at least some atomic instruction and
> disabling of interrupts for the locking if done outside of
> get_random_long. Thus I liked your previous approach more where you
> simply embed this in the already necessary get_random_long and aliased
> get_random_long as get_random_bits(BITS_PER_LONG) accordingly, IMHO.

It's meant to be part of the same approach, and I didn't include locking
because that's a requirement for *any* solution, and isn't specific
to the part I was trying to illustrate.

(As for re-using the name "get_random_long", that was just so
I didn't have to explain it.  Call it "get_random_long_internal"
if you like.)

Possible locking implementations include:
1) Use the same locking as applies to get_random_long_internal(), or
2) Make bitbuffer a per-CPU variable (note that we currently have 128
   bits of per-CPU state in get_random_int_hash[]), and this is all a
   fast-path to bypass heavier locking in get_random_long_internal().

>> But, I just realized I've been overlooking something glaringly obvious...
>> there's no reason you can't compute multple blocks in advance.
>
> In the current code on the cost of per cpu allocations thus memory.

Yes, but on 4-core machines it's still not much, and 4096-core
behemoths have RAM to burn.

> In the extract_crng case, couldn't we simply use 8 bytes of the 64 byte
> return block to feed it directly back into the state chacha? So we pass
> on 56 bytes into the pcpu buffer, and consume 8 bytes for the next
> state. This would make the window max shorter than the anti
> backtracking protection right now from 300s to 14 get_random_int calls.
> Not sure if this is worth it.

We just finished discussing why 8 bytes isn't enough.  If you only
feed back 8 bytes, an attacker who can do 2^64 computation can find it
(by guessing and computing forward to verify the guess) and recover the
previous state.  You need to feed back at least as much output as your
security targete.  For /dev/urandom's ChaCha20, that's 256 bits.

>> For example, suppose we gave each CPU a small pool to minimize locking.
>> When one runs out and calls the global refill, it could refill *all*
>> of the CPU pools.  (You don't even need locking; there may be a race to
>> determine *which* random numbers the reader sees, but they're equally
>> good either way.)

> Yes, but still disabled interrupts, otherwise the same state could be
> used twice on the same CPU. Uff, I think we have this problem in
> prandom_u32.

There are some locking gotchas, but it is doable lock-free.

Basically, it's a seqlock.  The writer increments it once (to an odd
number) before starting to overwrite the buffer, and a second time (to
an even number) after.  "Before" and "after" mean smp_wmb().

The reader can use this to figure out how much of the data in the buffer
is safely fresh.  The full sequence of checks is a bit intricate,
but straightforward.

I didn't discuss the locking because I'm confident it's solvable,
not because I wasn't aware it has to be solved.

As for prandom_u32(), what's the problem?  Are you worried that
get_cpu_var disables preemption but not interrupts, and so an
ISR might return the same value as process-level code?

First of all, that's not a problem because prandom_u32() doesn't
have security guarantees.  Occasionally returning a dupicate number
is okay.

Second, if you do care, that could be trivially fixed by throwing
a barrier() in the middle of the code.  (Patch appended; S-o-B
if anyone wants it.)


diff --git a/lib/random32.c b/lib/random32.c
index c750216d..6bee4a36 100644
--- a/lib/random32.c
+++ b/lib/random32.c
@@ -55,16 +55,29 @@ static DEFINE_PER_CPU(struct rnd_state, net_rand_state);
  *
  *	This is used for pseudo-randomness with no outside seeding.
  *	For more random results, use prandom_u32().
+ *
+ *	The barrier() is to allow prandom_u32() to be called from interupt
+ *	context without locking.  An interrupt will run to completion,
+ *	updating all four state variables.  The barrier() ensures that
+ *	the interrupted code will compute a different result.  Either it
+ *	will have written s1 and s2 (so the interrupt will start with
+ *	the updated values), or it will use the values of s3 and s4
+ *	updated by the interrupt.
+ *
+ *	(The same logic applies recursively to nested interrupts, trap
+ *	handlers, and NMIs.)
  */
 u32 prandom_u32_state(struct rnd_state *state)
 {
+	register u32 x;
 #define TAUSWORTHE(s, a, b, c, d) ((s & c) << d) ^ (((s << a) ^ s) >> b)
-	state->s1 = TAUSWORTHE(state->s1,  6, 13, 4294967294U, 18U);
-	state->s2 = TAUSWORTHE(state->s2,  2, 27, 4294967288U,  2U);
-	state->s3 = TAUSWORTHE(state->s3, 13, 21, 4294967280U,  7U);
-	state->s4 = TAUSWORTHE(state->s4,  3, 12, 4294967168U, 13U);
+	x  = state->s1 = TAUSWORTHE(state->s1,  6, 13,   ~1u, 18);
+	x += state->s2 = TAUSWORTHE(state->s2,  2, 27,   ~7u,  2);
+	barrier();
+	x ^= state->s3 = TAUSWORTHE(state->s3, 13, 21,  ~15u,  7);
+	x += state->s4 = TAUSWORTHE(state->s4,  3, 12, ~127u, 13);
 
-	return (state->s1 ^ state->s2 ^ state->s3 ^ state->s4);
+	return x;
 }
 EXPORT_SYMBOL(prandom_u32_state);
 

^ permalink raw reply related

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Jason A. Donenfeld @ 2016-12-23 21:18 UTC (permalink / raw)
  To: Hannes Frederic Sowa
  Cc: Andy Lutomirski, Daniel Borkmann, Alexei Starovoitov,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <ec43894a-7b54-ff30-9874-bb4a25314e09@stressinduktion.org>

On Fri, Dec 23, 2016 at 7:19 PM, Hannes Frederic Sowa
<hannes@stressinduktion.org> wrote:
> Factoring out sha3

Per the other thread, you probably don't actually want SHA3, because
it's slow in software. You want SHA2. If you want something faster and
better, then Blake2 is most certainly the way to go. I'll be
submitting some patches in a month or so adding Blake2 for future use
in the tree.

Jason

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Hannes Frederic Sowa @ 2016-12-23 20:48 UTC (permalink / raw)
  To: George Spelvin
  Cc: ak, davem, David.Laight, ebiggers3, eric.dumazet, Jason,
	kernel-hardening, linux-crypto, linux-kernel, luto, netdev, tom,
	tytso, vegard.nossum
In-Reply-To: <20161223182626.6290.qmail@ns.sciencehorizons.net>

Hi,

On Fri, 2016-12-23 at 13:26 -0500, George Spelvin wrote:
> (Cc: list trimmed slightly as the topic is wandering a bit.)
> 
> Hannes Frederic Sowa wrote:
> > On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote:
> > > Adding might_lock() annotations will improve coverage a lot.
> > 
> > Might be hard to find the correct lock we take later down the code
> > path, but if that is possible, certainly.
> 
> The point of might_lock() is that you don't have to.  You find the
> worst case (most global) lock that the code *might* take if all the
> buffer-empty conditions are true, and tell lockdep "assume this lock is
> taken all the time".

Yes, of course. Probably indicating input_pool's and primary_crng's
locks are good to indicate here, but on NUMA other locks might be
taken.

> > > Hannes Frederic Sowa wrote:
> > > > Yes, that does look nice indeed. Accounting for bits instead of bytes
> > > > shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> > > > case you can't satisfy a request with one batched entropy block and have
> > > > to consume randomness from two.
> 
> For example, here's a simple bit-buffer implementation that wraps around
> a get_random_long.  The bitbuffer is of the form "00001xxxx", where the
> x bits are valid, and the position of the msbit indicates how many bits
> are valid.
> 
> extern unsigned long get_random_long();
> static unsigned long bitbuffer = 1;	/* Holds 0..BITS_PER_LONG-1 bits */
> unsigned long get_random_bits(unsigned char bits)
> {
> 	/* We handle bits == BITS_PER_LONG,and not bits == 0 */
> 	unsigned long mask = -1ul >> (BITS_PER_LONG - bits);
> 	unsigned long val;
> 
> 	if (bitbuffer > mask) {
> 		/* Request can be satisfied out of the bit buffer */
> 		val = bitbuffer;
> 		bitbuffer >>= bits;
> 	} else {
> 		/*
> 		 * Not enough bits, but enough room in bitbuffer for the
> 		 * leftovers.  avail < bits, so avail + 64 <= bits + 63.
> 		 */
> 		val = get_random_long();
> 		bitbuffer = bitbuffer << (BITS_PER_LONG - bits)
> 			| val >> 1 >> (bits-1);
> 	}
> 	return val & mask;
> }

In general this looks good, but bitbuffer needs to be protected from
concurrent access, thus needing at least some atomic instruction and
disabling of interrupts for the locking if done outside of
get_random_long. Thus I liked your previous approach more where you
simply embed this in the already necessary get_random_long and aliased
get_random_long as get_random_bits(BITS_PER_LONG) accordingly, IMHO.

> > When we hit the chacha20 without doing a reseed we only mutate the
> > state of chacha, but being an invertible function in its own, a
> > proposal would be to mix parts of the chacha20 output back into the
> > state, which, as a result, would cause slowdown because we couldn't
> > propagate the complete output of the cipher back to the caller (looking
> > at the function _extract_crng).
> 
> Basically, yes.  Half of the output goes to rekeying itself.

Okay, thanks and understood. :)

> But, I just realized I've been overlooking something glaringly obvious...
> there's no reason you can't compute multple blocks in advance.

In the current code on the cost of per cpu allocations thus memory.

> The standard assumption in antibacktracking is that you'll *notice* the
> state capture and stop trusting the random numbers afterward; you just
> want the output *before* to be secure.  In other words, cops busting
> down the door can't find the session key used in the message you just sent.

Yep, analogous to forward secrecy and future secrecy (self healing) is
provided by catastrophic reseeding from /dev/random.

In the extract_crng case, couldn't we simply use 8 bytes of the 64 byte
return block to feed it directly back into the state chacha? So we pass
on 56 bytes into the pcpu buffer, and consume 8 bytes for the next
state. This would make the window max shorter than the anti
backtracking protection right now from 300s to 14 get_random_int calls.
Not sure if this is worth it.

> So you can compute and store random numbers ahead of need.
> 
> This can amortize the antibacktracking as much as you'd like.
> 
> For example, suppose we gave each CPU a small pool to minimize locking.
> When one runs out and calls the global refill, it could refill *all*
> of the CPU pools.  (You don't even need locking; there may be a race to
> determine *which* random numbers the reader sees, but they're equally
> good either way.)

Yes, but still disabled interrupts, otherwise the same state could be
used twice on the same CPU. Uff, I think we have this problem in
prandom_u32.

> > Or are you referring that the anti-backtrack protection should happen
> > in every call from get_random_int?
> 
> If you ask for anti-backtracking without qualification, that's the
> goal, since you don't know how long will elapse until the next call.  
> 
> It's like fsync().  There are lots of more limited forms of "keep my
> data safe in case of a crash", but the most basic one is "if we lost
> power the very instant the call returned, the data would be safe."

Yes, it absolutely makes sense.

Thanks a lot,
Hannes

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: George Spelvin @ 2016-12-23 18:26 UTC (permalink / raw)
  To: hannes, linux
  Cc: ak, davem, David.Laight, ebiggers3, eric.dumazet, Jason,
	kernel-hardening, linux-crypto, linux-kernel, luto, netdev, tom,
	tytso, vegard.nossum
In-Reply-To: <1482494724.3353.7.camel@stressinduktion.org>

(Cc: list trimmed slightly as the topic is wandering a bit.)

Hannes Frederic Sowa wrote:
> On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote:
>> Adding might_lock() annotations will improve coverage a lot.
>
> Might be hard to find the correct lock we take later down the code
> path, but if that is possible, certainly.

The point of might_lock() is that you don't have to.  You find the
worst case (most global) lock that the code *might* take if all the
buffer-empty conditions are true, and tell lockdep "assume this lock is
taken all the time".

>> Hannes Frederic Sowa wrote:
>>> Yes, that does look nice indeed. Accounting for bits instead of bytes
>>> shouldn't be a huge problem either. Maybe it gets a bit more verbose in
>>> case you can't satisfy a request with one batched entropy block and have
>>> to consume randomness from two.

For example, here's a simple bit-buffer implementation that wraps around
a get_random_long.  The bitbuffer is of the form "00001xxxx", where the
x bits are valid, and the position of the msbit indicates how many bits
are valid.

extern unsigned long get_random_long();
static unsigned long bitbuffer = 1;	/* Holds 0..BITS_PER_LONG-1 bits */
unsigned long get_random_bits(unsigned char bits)
{
	/* We handle bits == BITS_PER_LONG,and not bits == 0 */
	unsigned long mask = -1ul >> (BITS_PER_LONG - bits);
	unsigned long val;

	if (bitbuffer > mask) {
		/* Request can be satisfied out of the bit buffer */
		val = bitbuffer;
		bitbuffer >>= bits;
	} else {
		/*
		 * Not enough bits, but enough room in bitbuffer for the
		 * leftovers.  avail < bits, so avail + 64 <= bits + 63.
		 */
		val = get_random_long();
		bitbuffer = bitbuffer << (BITS_PER_LONG - bits)
			| val >> 1 >> (bits-1);
	}
	return val & mask;
}

> When we hit the chacha20 without doing a reseed we only mutate the
> state of chacha, but being an invertible function in its own, a
> proposal would be to mix parts of the chacha20 output back into the
> state, which, as a result, would cause slowdown because we couldn't
> propagate the complete output of the cipher back to the caller (looking
> at the function _extract_crng).

Basically, yes.  Half of the output goes to rekeying itself.

But, I just realized I've been overlooking something glaringly obvious...
there's no reason you can't compute multple blocks in advance.

The standard assumption in antibacktracking is that you'll *notice* the
state capture and stop trusting the random numbers afterward; you just
want the output *before* to be secure.  In other words, cops busting
down the door can't find the session key used in the message you just sent.

So you can compute and store random numbers ahead of need.

This can amortize the antibacktracking as much as you'd like.

For example, suppose we gave each CPU a small pool to minimize locking.
When one runs out and calls the global refill, it could refill *all*
of the CPU pools.  (You don't even need locking; there may be a race to
determine *which* random numbers the reader sees, but they're equally
good either way.)

> Or are you referring that the anti-backtrack protection should happen
> in every call from get_random_int?

If you ask for anti-backtracking without qualification, that's the
goal, since you don't know how long will elapse until the next call.  

It's like fsync().  There are lots of more limited forms of "keep my
data safe in case of a crash", but the most basic one is "if we lost
power the very instant the call returned, the data would be safe."

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Hannes Frederic Sowa @ 2016-12-23 18:19 UTC (permalink / raw)
  To: Andy Lutomirski, Daniel Borkmann
  Cc: Alexei Starovoitov, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrXHoPBV1UpLwtLjjmqJXVpgEUTCnwePDLx4g6a1SFMZaw@mail.gmail.com>

On 23.12.2016 17:42, Andy Lutomirski wrote:
> On Fri, Dec 23, 2016 at 8:23 AM, Andy Lutomirski <luto@amacapital.net> wrote:
>> On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote:
>>>>
>>>> On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:
>>>>>
>>>>> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
>>>>>>
>>>>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>>
>>> [...]
>>>
>>>>>> The hashing is not a proper sha1 neither, unfortunately. I think that
>>>>>> is why it will have a custom implementation in iproute2?
>>>>>
>>>>>
>>>>> Still trying to catch up on this admittedly bit confusing thread. I
>>>>> did run automated tests over couple of days comparing the data I got
>>>>> from fdinfo with the one from af_alg and found no mismatch on the test
>>>>> cases varying from min to max possible program sizes. In the process
>>>>> of testing, as you might have seen on netdev, I found couple of other
>>>>> bugs in bpf code along the way and fixed them up as well. So my question,
>>>>> do you or Andy or anyone participating in claiming this have any
>>>>> concrete data or test cases that suggests something different? If yes,
>>>>> I'm very curious to hear about it and willing fix it up, of course.
>>>>> When I'm back from pto I'll prep and cook up my test suite to be
>>>>> included into the selftests/bpf/, should have done this initially,
>>>>> sorry about that. I'll also post something to expose the alg, that
>>>>> sounds fine to me.
>>>>
>>>>
>>>> Looking into your code closer, I noticed that you indeed seem to do the
>>>> finalization of sha-1 by hand by aligning and padding the buffer
>>>> accordingly and also patching in the necessary payload length.
>>>>
>>>> Apologies for my side for claiming that this is not correct sha1
>>>> output, I was only looking at sha_transform and its implementation and
>>>> couldn't see the padding and finalization round with embedding the data
>>>> length in there and hadn't thought of it being done manually.
>>>>
>>>> Anyway, is it difficult to get the sha finalization into some common
>>>> code library? It is not very bpf specific and crypto code reviewers
>>>> won't find it there at all.
>>>
>>>
>>> Yes, sure, I'll rework it that way (early next year when I'm back if
>>> that's fine with you).
>>
>> Can we make it SHA-256 before 4.10 comes out, though?  This really
>> looks like it will be used in situations where collisions matter and
>> it will be exposed to malicious programs, and SHA-1 should not be used
>> for new designs for this purpose because it simply isn't long enough.
>>
>> Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very
>> misleading.  That should be u8 or, at the very least, __be32.
>>
>> I realize that there isn't a sha-256 implementation in lib, but would
>> it really be so bad to make the bpf digest only work (for now) when
>> crypto is enabled?  I would *love* to see the crypto core learn how to
>> export simple primitives for direct use without needing the whole
>> crypto core, and this doesn't seem particularly hard to do, but I
>> don't think that's 4.10 material.
> 
> I'm going to try to send out RFC patches for all of this today or
> tomorrow.  It doesn't look bad at all.

Factoring out sha3 to lib/ and use it as standalone and in crypto api
doesn't seem hard, yep. I also proposed this to Daniel offlist.

Bye,
Hannes

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Andy Lutomirski @ 2016-12-23 16:42 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Hannes Frederic Sowa, Alexei Starovoitov, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <CALCETrVgnnbstPoPf-0bZ_ySDydXNCp5dLAEvhAD9cApozFsng@mail.gmail.com>

On Fri, Dec 23, 2016 at 8:23 AM, Andy Lutomirski <luto@amacapital.net> wrote:
> On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote:
>>>
>>> On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:
>>>>
>>>> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
>>>>>
>>>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>>
>> [...]
>>
>>>>> The hashing is not a proper sha1 neither, unfortunately. I think that
>>>>> is why it will have a custom implementation in iproute2?
>>>>
>>>>
>>>> Still trying to catch up on this admittedly bit confusing thread. I
>>>> did run automated tests over couple of days comparing the data I got
>>>> from fdinfo with the one from af_alg and found no mismatch on the test
>>>> cases varying from min to max possible program sizes. In the process
>>>> of testing, as you might have seen on netdev, I found couple of other
>>>> bugs in bpf code along the way and fixed them up as well. So my question,
>>>> do you or Andy or anyone participating in claiming this have any
>>>> concrete data or test cases that suggests something different? If yes,
>>>> I'm very curious to hear about it and willing fix it up, of course.
>>>> When I'm back from pto I'll prep and cook up my test suite to be
>>>> included into the selftests/bpf/, should have done this initially,
>>>> sorry about that. I'll also post something to expose the alg, that
>>>> sounds fine to me.
>>>
>>>
>>> Looking into your code closer, I noticed that you indeed seem to do the
>>> finalization of sha-1 by hand by aligning and padding the buffer
>>> accordingly and also patching in the necessary payload length.
>>>
>>> Apologies for my side for claiming that this is not correct sha1
>>> output, I was only looking at sha_transform and its implementation and
>>> couldn't see the padding and finalization round with embedding the data
>>> length in there and hadn't thought of it being done manually.
>>>
>>> Anyway, is it difficult to get the sha finalization into some common
>>> code library? It is not very bpf specific and crypto code reviewers
>>> won't find it there at all.
>>
>>
>> Yes, sure, I'll rework it that way (early next year when I'm back if
>> that's fine with you).
>
> Can we make it SHA-256 before 4.10 comes out, though?  This really
> looks like it will be used in situations where collisions matter and
> it will be exposed to malicious programs, and SHA-1 should not be used
> for new designs for this purpose because it simply isn't long enough.
>
> Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very
> misleading.  That should be u8 or, at the very least, __be32.
>
> I realize that there isn't a sha-256 implementation in lib, but would
> it really be so bad to make the bpf digest only work (for now) when
> crypto is enabled?  I would *love* to see the crypto core learn how to
> export simple primitives for direct use without needing the whole
> crypto core, and this doesn't seem particularly hard to do, but I
> don't think that's 4.10 material.

I'm going to try to send out RFC patches for all of this today or
tomorrow.  It doesn't look bad at all.

--Andy

^ permalink raw reply

* Re: BPF hash algo (Re: [kernel-hardening] Re: [PATCH v7 3/6] random: use SipHash in place of MD5)
From: Andy Lutomirski @ 2016-12-23 16:23 UTC (permalink / raw)
  To: Daniel Borkmann
  Cc: Hannes Frederic Sowa, Alexei Starovoitov, Jason A. Donenfeld,
	kernel-hardening@lists.openwall.com, Theodore Ts'o, Netdev,
	LKML, Linux Crypto Mailing List, David Laight, Eric Dumazet,
	Linus Torvalds, Eric Biggers, Tom Herbert, Andi Kleen,
	David S. Miller, Jean-Philippe Aumasson
In-Reply-To: <585D11BF.60903@iogearbox.net>

On Fri, Dec 23, 2016 at 3:59 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
> On 12/23/2016 11:59 AM, Hannes Frederic Sowa wrote:
>>
>> On Fri, 2016-12-23 at 11:04 +0100, Daniel Borkmann wrote:
>>>
>>> On 12/22/2016 05:59 PM, Hannes Frederic Sowa wrote:
>>>>
>>>> On Thu, 2016-12-22 at 08:07 -0800, Andy Lutomirski wrote:
>
> [...]
>
>>>> The hashing is not a proper sha1 neither, unfortunately. I think that
>>>> is why it will have a custom implementation in iproute2?
>>>
>>>
>>> Still trying to catch up on this admittedly bit confusing thread. I
>>> did run automated tests over couple of days comparing the data I got
>>> from fdinfo with the one from af_alg and found no mismatch on the test
>>> cases varying from min to max possible program sizes. In the process
>>> of testing, as you might have seen on netdev, I found couple of other
>>> bugs in bpf code along the way and fixed them up as well. So my question,
>>> do you or Andy or anyone participating in claiming this have any
>>> concrete data or test cases that suggests something different? If yes,
>>> I'm very curious to hear about it and willing fix it up, of course.
>>> When I'm back from pto I'll prep and cook up my test suite to be
>>> included into the selftests/bpf/, should have done this initially,
>>> sorry about that. I'll also post something to expose the alg, that
>>> sounds fine to me.
>>
>>
>> Looking into your code closer, I noticed that you indeed seem to do the
>> finalization of sha-1 by hand by aligning and padding the buffer
>> accordingly and also patching in the necessary payload length.
>>
>> Apologies for my side for claiming that this is not correct sha1
>> output, I was only looking at sha_transform and its implementation and
>> couldn't see the padding and finalization round with embedding the data
>> length in there and hadn't thought of it being done manually.
>>
>> Anyway, is it difficult to get the sha finalization into some common
>> code library? It is not very bpf specific and crypto code reviewers
>> won't find it there at all.
>
>
> Yes, sure, I'll rework it that way (early next year when I'm back if
> that's fine with you).

Can we make it SHA-256 before 4.10 comes out, though?  This really
looks like it will be used in situations where collisions matter and
it will be exposed to malicious programs, and SHA-1 should not be used
for new designs for this purpose because it simply isn't long enough.

Also, a SHA-1 digest isn't a pile of u32s, so u32 digest[...] is very
misleading.  That should be u8 or, at the very least, __be32.

I realize that there isn't a sha-256 implementation in lib, but would
it really be so bad to make the bpf digest only work (for now) when
crypto is enabled?  I would *love* to see the crypto core learn how to
export simple primitives for direct use without needing the whole
crypto core, and this doesn't seem particularly hard to do, but I
don't think that's 4.10 material.

--Andy

^ permalink raw reply

* Re: George's crazy full state idea (Re: HalfSipHash Acceptable Usage)
From: Hannes Frederic Sowa @ 2016-12-23 12:05 UTC (permalink / raw)
  To: George Spelvin, luto
  Cc: ak, davem, David.Laight, djb, ebiggers3, eric.dumazet, Jason,
	jeanphilippe.aumasson, kernel-hardening, linux-crypto,
	linux-kernel, netdev, tom, torvalds, tytso, vegard.nossum
In-Reply-To: <20161223000716.5768.qmail@ns.sciencehorizons.net>

On Thu, 2016-12-22 at 19:07 -0500, George Spelvin wrote:
> Hannes Frederic Sowa wrote:
> > A lockdep test should still be done. ;)
> 
> Adding might_lock() annotations will improve coverage a lot.

Might be hard to find the correct lock we take later down the code
path, but if that is possible, certainly.

> > Yes, that does look nice indeed. Accounting for bits instead of bytes
> > shouldn't be a huge problem either. Maybe it gets a bit more verbose in
> > case you can't satisfy a request with one batched entropy block and have
> > to consume randomness from two.
> 
> The bit granularity is also for the callers' convenience, so they don't
> have to mask again.  Whether get_random_bits rounds up to byte boundaries
> internally or not is something else.
> 
> When the current batch runs low, I was actually thinking of throwing
> away the remaining bits and computing a new batch of 512.  But it's
> whatever works best at implementation time.
> 
> > > > It could only mix the output back in every two calls, in which case
> > > > you can backtrack up to one call but you need to do 2^128 work to
> > > > backtrack farther.  But yes, this is getting excessively complicated.
> > > No, if you're willing to accept limited backtrack, this is a perfectly
> > > acceptable solution, and not too complicated.  You could do it phase-less
> > > if you like; store the previous output, then after generating the new
> > > one, mix in both.  Then overwrite the previous output.  (But doing two
> > > rounds of a crypto primtive to avoid one conditional jump is stupid,
> > > so forget that.)
> > Can you quickly explain why we lose the backtracking capability?
> 
> Sure.  An RNG is (state[i], output[i]) = f(state[i-1]).  The goal of
> backtracking is to compute output[i], or better yet state[i-1], given
> state[i].
> 
> For example, consider an OFB or CTR mode generator.  The state is a key
> and and IV, and you encrypt the IV with the key to produce output, then
> either replace the IV with the output, or increment it.  Either way,
> since you still have the key, you can invert the transformation and
> recover the previous IV.
> 
> The standard way around this is to use the Davies-Meyer construction:
> 
> IV[i] = IV[i-1] + E(IV[i-1], key)
> 
> This is the standard way to make a non-invertible random function
> out of an invertible random permutation.
> 
> From the sum, there's no easy way to find the ciphertext *or* the
> plaintext that was encrypted.  Assuming the encryption is secure,
> the only way to reverse it is brute force: guess IV[i-1] and run the
> operation forward to see if the resultant IV[i] matches.
> 
> There are a variety of ways to organize this computation, since the
> guess gives toy both IV[i-1] and E(IV[i-1], key) = IV[i] - IV[i-1], including
> running E forward, backward, or starting from both ends to see if you
> meet in the middle.
> 
> The way you add the encryption output to the IV is not very important.
> It can be addition, xor, or some more complex invertible transformation.
> In the case of SipHash, the "encryption" output is smaller than the
> input, so we have to get a bit more creative, but it's still basically
> the same thing.
> 
> The problem is that the output which is combined with the IV is too small.
> With only 64 bits, trying all possible values is practical.  (The world's
> Bitcoin miners are collectively computing SHA-256(SHA-256(input)) 1.7 * 2^64
> times per second.)
> 
> By basically doing two iterations at once and mixing in 128 bits of
> output, the guessing attack is rendered impractical.  The only downside
> is that you need to remember and store one result between when it's
> computed and last used.  This is part of the state, so an attack can
> find output[i-1], but not anything farther back.

Thanks a lot for the explanation!

> > ChaCha as a block cipher gives a "perfect" permutation from the output
> > of either the CRNG or the CPRNG, which actually itself has backtracking
> > protection.
> 
> I'm not quite understanding.  The /dev/random implementation uses some
> of the ChaCha output as a new ChaCha key (that's another way to mix output
> back into the state) to prevent backtracking.  But this slows it down, and
> again if you want to be efficient, you're generating and storing large batches
> of entropy and storing it in the RNG state.

I was actually referring to the anti-backtrack protection in
/dev/random and also /dev/urandom, from where we reseed every 300
seconds and if our batched entropy runs low with Ted's/Jason's current
patch for get_random_int.

As far as I can understand it, backtracking is not a problem in case of
a reseed event inside extract_crng.

When we hit the chacha20 without doing a reseed we only mutate the
state of chacha, but being an invertible function in its own, a
proposal would be to mix parts of the chacha20 output back into the
state, which, as a result, would cause slowdown because we couldn't
propagate the complete output of the cipher back to the caller (looking
at the function _extract_crng).

Or are you referring that the anti-backtrack protection should happen
in every call from get_random_int?

Thanks,
Hannes

^ permalink raw reply


This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox