* [RFC PATCH 2/3] crypto: zip - Wire-up Compression / decompression HW offload
From: Jan Glauber @ 2016-12-12 15:04 UTC (permalink / raw)
To: Herbert Xu
Cc: linux-crypto, linux-kernel, David S . Miller, Mahipal Challa,
Vishnu Nair, Jan Glauber
In-Reply-To: <20161212150439.18627-1-jglauber@cavium.com>
From: Mahipal Challa <Mahipal.Challa@cavium.com>
This contains changes for adding compression/decompression h/w offload
functionality for both DEFLATE and LZS.
Signed-off-by: Mahipal Challa <Mahipal.Challa@cavium.com>
Signed-off-by: Vishnu Nair <Vishnu.Nair@cavium.com>
Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
drivers/crypto/cavium/zip/Makefile | 5 +-
drivers/crypto/cavium/zip/zip_crypto.c | 243 ++++++++++++++++++++++++++++++++
drivers/crypto/cavium/zip/zip_crypto.h | 6 +
drivers/crypto/cavium/zip/zip_deflate.c | 190 +++++++++++++++++++++++++
drivers/crypto/cavium/zip/zip_deflate.h | 62 ++++++++
drivers/crypto/cavium/zip/zip_device.c | 1 +
drivers/crypto/cavium/zip/zip_inflate.c | 211 +++++++++++++++++++++++++++
drivers/crypto/cavium/zip/zip_inflate.h | 62 ++++++++
drivers/crypto/cavium/zip/zip_main.c | 29 ----
9 files changed, 779 insertions(+), 30 deletions(-)
create mode 100644 drivers/crypto/cavium/zip/zip_crypto.c
create mode 100644 drivers/crypto/cavium/zip/zip_deflate.c
create mode 100644 drivers/crypto/cavium/zip/zip_deflate.h
create mode 100644 drivers/crypto/cavium/zip/zip_inflate.c
create mode 100644 drivers/crypto/cavium/zip/zip_inflate.h
diff --git a/drivers/crypto/cavium/zip/Makefile b/drivers/crypto/cavium/zip/Makefile
index 2c07508..b2f3baaf 100644
--- a/drivers/crypto/cavium/zip/Makefile
+++ b/drivers/crypto/cavium/zip/Makefile
@@ -5,4 +5,7 @@
obj-$(CONFIG_CRYPTO_DEV_CAVIUM_ZIP) += thunderx_zip.o
thunderx_zip-y := zip_main.o \
zip_device.o \
- zip_mem.o
+ zip_crypto.o \
+ zip_mem.o \
+ zip_deflate.o \
+ zip_inflate.o
diff --git a/drivers/crypto/cavium/zip/zip_crypto.c b/drivers/crypto/cavium/zip/zip_crypto.c
new file mode 100644
index 0000000..888e18b
--- /dev/null
+++ b/drivers/crypto/cavium/zip/zip_crypto.c
@@ -0,0 +1,243 @@
+/***********************license start************************************
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ * All rights reserved.
+ *
+ * License: one of 'Cavium License' or 'GNU General Public License Version 2'
+ *
+ * This file is provided under the terms of the Cavium License (see below)
+ * or under the terms of GNU General Public License, Version 2, as
+ * published by the Free Software Foundation. When using or redistributing
+ * this file, you may do so under either license.
+ *
+ * Cavium License: Redistribution and use in source and binary forms, with
+ * or without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials provided
+ * with the distribution.
+ *
+ * * Neither the name of Cavium Inc. nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * This Software, including technical data, may be subject to U.S. export
+ * control laws, including the U.S. Export Administration Act and its
+ * associated regulations, and may be subject to export or import
+ * regulations in other countries.
+ *
+ * TO THE MAXIMUM EXTENT PERMITTED BY LAW, THE SOFTWARE IS PROVIDED "AS IS"
+ * AND WITH ALL FAULTS AND CAVIUM INC. MAKES NO PROMISES, REPRESENTATIONS
+ * OR WARRANTIES, EITHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, WITH
+ * RESPECT TO THE SOFTWARE, INCLUDING ITS CONDITION, ITS CONFORMITY TO ANY
+ * REPRESENTATION OR DESCRIPTION, OR THE EXISTENCE OF ANY LATENT OR PATENT
+ * DEFECTS, AND CAVIUM SPECIFICALLY DISCLAIMS ALL IMPLIED (IF ANY)
+ * WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR A
+ * PARTICULAR PURPOSE, LACK OF VIRUSES, ACCURACY OR COMPLETENESS, QUIET
+ * ENJOYMENT, QUIET POSSESSION OR CORRESPONDENCE TO DESCRIPTION. THE
+ * ENTIRE RISK ARISING OUT OF USE OR PERFORMANCE OF THE SOFTWARE LIES
+ * WITH YOU.
+ ***********************license end**************************************/
+
+#include "zip_crypto.h"
+
+static void zip_static_init_zip_ops(struct zip_operation *zip_ops,
+ int lzs_flag)
+{
+ zip_ops->flush = ZIP_FLUSH_FINISH;
+
+ /* equivalent to level 6 of opensource zlib */
+ zip_ops->speed = 1;
+
+ if (!lzs_flag) {
+ zip_ops->ccode = 0; /* Auto Huffman */
+ zip_ops->lzs_flag = 0;
+ zip_ops->format = ZLIB_FORMAT;
+ } else {
+ zip_ops->ccode = 3; /* LZS Encoding */
+ zip_ops->lzs_flag = 1;
+ zip_ops->format = LZS_FORMAT;
+ }
+ zip_ops->begin_file = 1;
+ zip_ops->history_len = 0;
+ zip_ops->end_file = 1;
+ zip_ops->compcode = 0;
+ zip_ops->csum = 1; /* Adler checksum desired */
+}
+
+/* Legacy Compress framework start */
+
+int zip_alloc_zip_ctx(struct crypto_tfm *tfm)
+{
+ struct zip_kernel_ctx *zip_ctx = crypto_tfm_ctx(tfm);
+ struct zip_operation *comp_ctx = &zip_ctx->zip_comp;
+ struct zip_operation *decomp_ctx = &zip_ctx->zip_decomp;
+
+ zip_static_init_zip_ops(comp_ctx, 0);
+ zip_static_init_zip_ops(decomp_ctx, 0);
+
+ comp_ctx->input = zip_data_buf_alloc(MAX_INPUT_BUFFER_SIZE);
+ if (!comp_ctx->input)
+ return -ENOMEM;
+
+ comp_ctx->output = zip_data_buf_alloc(MAX_OUTPUT_BUFFER_SIZE);
+ if (!comp_ctx->output)
+ goto err_comp_input;
+
+ decomp_ctx->input = zip_data_buf_alloc(MAX_INPUT_BUFFER_SIZE);
+ if (!decomp_ctx->input)
+ goto err_comp_output;
+
+ decomp_ctx->output = zip_data_buf_alloc(MAX_OUTPUT_BUFFER_SIZE);
+ if (!decomp_ctx->output)
+ goto err_decomp_input;
+
+ return 0;
+
+err_decomp_input:
+ zip_data_buf_free(decomp_ctx->input, MAX_INPUT_BUFFER_SIZE);
+
+err_comp_output:
+ zip_data_buf_free(comp_ctx->output, MAX_OUTPUT_BUFFER_SIZE);
+
+err_comp_input:
+ zip_data_buf_free(comp_ctx->input, MAX_INPUT_BUFFER_SIZE);
+
+ return -ENOMEM;
+}
+
+int zip_alloc_lzs_ctx(struct crypto_tfm *tfm)
+{
+ struct zip_kernel_ctx *zip_ctx = crypto_tfm_ctx(tfm);
+ struct zip_operation *comp_ctx = &zip_ctx->zip_comp;
+ struct zip_operation *decomp_ctx = &zip_ctx->zip_decomp;
+
+ zip_static_init_zip_ops(comp_ctx, 1);
+ zip_static_init_zip_ops(decomp_ctx, 1);
+
+ comp_ctx->input = zip_data_buf_alloc(MAX_INPUT_BUFFER_SIZE);
+ if (!comp_ctx->input)
+ return -ENOMEM;
+
+ comp_ctx->output = zip_data_buf_alloc(MAX_OUTPUT_BUFFER_SIZE);
+ if (!comp_ctx->output)
+ goto err_comp_input;
+
+ decomp_ctx->input = zip_data_buf_alloc(MAX_INPUT_BUFFER_SIZE);
+ if (!decomp_ctx->input)
+ goto err_comp_output;
+
+ decomp_ctx->output = zip_data_buf_alloc(MAX_OUTPUT_BUFFER_SIZE);
+ if (!decomp_ctx->output)
+ goto err_decomp_input;
+
+ return 0;
+
+err_decomp_input:
+ zip_data_buf_free(decomp_ctx->input, MAX_INPUT_BUFFER_SIZE);
+
+err_comp_output:
+ zip_data_buf_free(comp_ctx->output, MAX_OUTPUT_BUFFER_SIZE);
+
+err_comp_input:
+ zip_data_buf_free(comp_ctx->input, MAX_INPUT_BUFFER_SIZE);
+
+ return -ENOMEM;
+}
+
+void zip_free_zip_ctx(struct crypto_tfm *tfm)
+{
+ struct zip_kernel_ctx *zip_ctx = crypto_tfm_ctx(tfm);
+ struct zip_operation *comp_ctx = &zip_ctx->zip_comp;
+ struct zip_operation *dec_ctx = &zip_ctx->zip_decomp;
+
+ zip_data_buf_free(comp_ctx->input, MAX_INPUT_BUFFER_SIZE);
+ zip_data_buf_free(comp_ctx->output, MAX_OUTPUT_BUFFER_SIZE);
+
+ zip_data_buf_free(dec_ctx->input, MAX_INPUT_BUFFER_SIZE);
+ zip_data_buf_free(dec_ctx->output, MAX_OUTPUT_BUFFER_SIZE);
+}
+
+int zip_deflate_comp(struct crypto_tfm *tfm,
+ const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen)
+{
+ struct zip_kernel_ctx *zip_ctx = NULL;
+ struct zip_operation *zip_ops = NULL;
+ struct zip_state zip_state;
+ struct zip_device *zip = NULL;
+ int ret;
+
+ if (!tfm || !src || !dst || !dlen)
+ return -ENOMEM;
+
+ zip = zip_get_device(zip_get_node_id());
+ if (!zip)
+ return -ENODEV;
+
+ memset(&zip_state, 0, sizeof(struct zip_state));
+
+ zip_ctx = crypto_tfm_ctx(tfm);
+ zip_ops = &zip_ctx->zip_comp;
+
+ zip_ops->input_len = slen;
+ zip_ops->output_len = *dlen;
+
+ memcpy(zip_ops->input, src, slen);
+
+ ret = zip_deflate(zip_ops, &zip_state, zip);
+
+ if (!ret) {
+ *dlen = zip_ops->output_len;
+ memcpy(dst, zip_ops->output, *dlen);
+ }
+
+ return ret;
+}
+
+int zip_inflate_comp(struct crypto_tfm *tfm,
+ const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen)
+{
+ struct zip_kernel_ctx *zip_ctx = NULL;
+ struct zip_operation *zip_ops = NULL;
+ struct zip_state zip_state;
+ struct zip_device *zip = NULL;
+ int ret;
+
+ if (!tfm || !src || !dst || !dlen)
+ return -ENOMEM;
+
+ zip = zip_get_device(zip_get_node_id());
+ if (!zip)
+ return -ENODEV;
+
+ memset(&zip_state, 0, sizeof(struct zip_state));
+
+ zip_ctx = crypto_tfm_ctx(tfm);
+ zip_ops = &zip_ctx->zip_decomp;
+
+ memcpy(zip_ops->input, src, slen);
+
+ /* Work around for a bug in zlib which needs an extra bytes sometimes */
+ if (zip_ops->ccode != 3) /* Not LZS Encoding */
+ zip_ops->input[slen++] = 0;
+
+ zip_ops->input_len = slen;
+ zip_ops->output_len = *dlen;
+
+ ret = zip_inflate(zip_ops, &zip_state, zip);
+
+ if (!ret) {
+ *dlen = zip_ops->output_len;
+ memcpy(dst, zip_ops->output, *dlen);
+ }
+
+ return ret;
+}
+
+/* Legacy compress framework end */
diff --git a/drivers/crypto/cavium/zip/zip_crypto.h b/drivers/crypto/cavium/zip/zip_crypto.h
index 1215049..26792e9 100644
--- a/drivers/crypto/cavium/zip/zip_crypto.h
+++ b/drivers/crypto/cavium/zip/zip_crypto.h
@@ -48,6 +48,8 @@
#include <linux/crypto.h>
#include "common.h"
+#include "zip_deflate.h"
+#include "zip_inflate.h"
struct zip_kernel_ctx {
struct zip_operation zip_comp;
@@ -57,5 +59,9 @@ struct zip_kernel_ctx {
int zip_alloc_zip_ctx(struct crypto_tfm *tfm);
int zip_alloc_lzs_ctx(struct crypto_tfm *tfm);
void zip_free_zip_ctx(struct crypto_tfm *tfm);
+int zip_deflate_comp(struct crypto_tfm *tfm, const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen);
+int zip_inflate_comp(struct crypto_tfm *tfm, const u8 *src, unsigned int slen,
+ u8 *dst, unsigned int *dlen);
#endif
diff --git a/drivers/crypto/cavium/zip/zip_deflate.c b/drivers/crypto/cavium/zip/zip_deflate.c
new file mode 100644
index 0000000..913cc25
--- /dev/null
+++ b/drivers/crypto/cavium/zip/zip_deflate.c
@@ -0,0 +1,190 @@
+/***********************license start************************************
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ * All rights reserved.
+ *
+ * License: one of 'Cavium License' or 'GNU General Public License Version 2'
+ *
+ * This file is provided under the terms of the Cavium License (see below)
+ * or under the terms of GNU General Public License, Version 2, as
+ * published by the Free Software Foundation. When using or redistributing
+ * this file, you may do so under either license.
+ *
+ * Cavium License: Redistribution and use in source and binary forms, with
+ * or without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials provided
+ * with the distribution.
+ *
+ * * Neither the name of Cavium Inc. nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * This Software, including technical data, may be subject to U.S. export
+ * control laws, including the U.S. Export Administration Act and its
+ * associated regulations, and may be subject to export or import
+ * regulations in other countries.
+ *
+ * TO THE MAXIMUM EXTENT PERMITTED BY LAW, THE SOFTWARE IS PROVIDED "AS IS"
+ * AND WITH ALL FAULTS AND CAVIUM INC. MAKES NO PROMISES, REPRESENTATIONS
+ * OR WARRANTIES, EITHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, WITH
+ * RESPECT TO THE SOFTWARE, INCLUDING ITS CONDITION, ITS CONFORMITY TO ANY
+ * REPRESENTATION OR DESCRIPTION, OR THE EXISTENCE OF ANY LATENT OR PATENT
+ * DEFECTS, AND CAVIUM SPECIFICALLY DISCLAIMS ALL IMPLIED (IF ANY)
+ * WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR A
+ * PARTICULAR PURPOSE, LACK OF VIRUSES, ACCURACY OR COMPLETENESS, QUIET
+ * ENJOYMENT, QUIET POSSESSION OR CORRESPONDENCE TO DESCRIPTION. THE
+ * ENTIRE RISK ARISING OUT OF USE OR PERFORMANCE OF THE SOFTWARE LIES
+ * WITH YOU.
+ ***********************license end**************************************/
+
+#include <linux/delay.h>
+#include <linux/sched.h>
+
+#include "common.h"
+#include "zip_deflate.h"
+
+/* Prepares the deflate zip command */
+static int prepare_zip_command(struct zip_operation *zip_ops,
+ struct zip_state *s, union zip_inst_s *zip_cmd)
+{
+ union zip_zres_s *result_ptr = &s->result;
+
+ memset(zip_cmd, 0, sizeof(s->zip_cmd));
+ memset(result_ptr, 0, sizeof(s->result));
+
+ /* IWORD #0 */
+ /* History gather */
+ zip_cmd->s.hg = 0;
+ /* compression enable = 1 for deflate */
+ zip_cmd->s.ce = 1;
+ /* sf (sync flush) */
+ zip_cmd->s.sf = 1;
+ /* ef (end of file) */
+ if (zip_ops->flush == ZIP_FLUSH_FINISH) {
+ zip_cmd->s.ef = 1;
+ zip_cmd->s.sf = 0;
+ }
+
+ zip_cmd->s.cc = zip_ops->ccode;
+ /* ss (compression speed/storage) */
+ zip_cmd->s.ss = zip_ops->speed;
+
+ /* IWORD #1 */
+ /* adler checksum */
+ zip_cmd->s.adlercrc32 = zip_ops->csum;
+ zip_cmd->s.historylength = zip_ops->history_len;
+ zip_cmd->s.dg = 0;
+
+ /* IWORD # 6 and 7 - compression input/history pointer */
+ zip_cmd->s.inp_ptr_addr.s.addr = __pa(zip_ops->input);
+ zip_cmd->s.inp_ptr_ctl.s.length = (zip_ops->input_len +
+ zip_ops->history_len);
+ zip_cmd->s.ds = 0;
+
+ /* IWORD # 8 and 9 - Output pointer */
+ zip_cmd->s.out_ptr_addr.s.addr = __pa(zip_ops->output);
+ zip_cmd->s.out_ptr_ctl.s.length = zip_ops->output_len;
+ /* maximum number of output-stream bytes that can be written */
+ zip_cmd->s.totaloutputlength = zip_ops->output_len;
+
+ /* IWORD # 10 and 11 - Result pointer */
+ zip_cmd->s.res_ptr_addr.s.addr = __pa(result_ptr);
+ /* Clearing completion code */
+ result_ptr->s.compcode = 0;
+
+ return 0;
+}
+
+/**
+ * zip_deflate - API to offload deflate operation to hardware
+ * @zip_ops: Pointer to zip operation structure
+ * @s: Pointer to the structure representing zip state
+ * @zip_dev: Pointer to zip device structure
+ *
+ * This function prepares the zip deflate command and submits it to the zip
+ * engine for processing.
+ *
+ * Return: 0 if successful or error code
+ */
+int zip_deflate(struct zip_operation *zip_ops, struct zip_state *s,
+ struct zip_device *zip_dev)
+{
+ union zip_inst_s *zip_cmd = &s->zip_cmd;
+ union zip_zres_s *result_ptr = &s->result;
+ u32 queue;
+
+ /* Prepares zip command based on the input parameters */
+ prepare_zip_command(zip_ops, s, zip_cmd);
+
+ /* Loads zip command into command queues and rings door bell */
+ queue = zip_load_instr(zip_cmd, zip_dev);
+
+ while (!result_ptr->s.compcode)
+ continue;
+
+ zip_ops->compcode = result_ptr->s.compcode;
+ switch (zip_ops->compcode) {
+ case ZIP_NOTDONE:
+ zip_dbg("Zip instruction not yet completed");
+ return ZIP_ERROR;
+
+ case ZIP_SUCCESS:
+ zip_dbg("Zip instruction completed successfully");
+ zip_update_cmd_bufs(zip_dev, queue);
+ break;
+
+ case ZIP_DTRUNC:
+ zip_dbg("Output Truncate error");
+ /* Returning ZIP_ERROR to avoid copy to user */
+ return ZIP_ERROR;
+
+ default:
+ zip_err("Zip instruction failed. Code:%d", zip_ops->compcode);
+ return ZIP_ERROR;
+ }
+
+ /* Update the CRC depending on the format */
+ switch (zip_ops->format) {
+ case RAW_FORMAT:
+ zip_dbg("RAW Format: %d ", zip_ops->format);
+ /* Get checksum from engine, need to feed it again */
+ zip_ops->csum = result_ptr->s.adler32;
+ break;
+
+ case ZLIB_FORMAT:
+ zip_dbg("ZLIB Format: %d ", zip_ops->format);
+ zip_ops->csum = result_ptr->s.adler32;
+ break;
+
+ case GZIP_FORMAT:
+ zip_dbg("GZIP Format: %d ", zip_ops->format);
+ zip_ops->csum = result_ptr->s.crc32;
+ break;
+
+ case LZS_FORMAT:
+ zip_dbg("LZS Format: %d ", zip_ops->format);
+ break;
+
+ default:
+ zip_err("Unknown Format:%d\n", zip_ops->format);
+ }
+
+ /* Update output_len */
+ if (zip_ops->output_len < result_ptr->s.totalbyteswritten) {
+ /* Dynamic stop && strm->output_len < zipconstants[onfsize] */
+ zip_err("output_len (%d) < total bytes written(%d)\n",
+ zip_ops->output_len, result_ptr->s.totalbyteswritten);
+ zip_ops->output_len = 0;
+
+ } else {
+ zip_ops->output_len = result_ptr->s.totalbyteswritten;
+ }
+
+ return 0;
+}
diff --git a/drivers/crypto/cavium/zip/zip_deflate.h b/drivers/crypto/cavium/zip/zip_deflate.h
new file mode 100644
index 0000000..bdb5207
--- /dev/null
+++ b/drivers/crypto/cavium/zip/zip_deflate.h
@@ -0,0 +1,62 @@
+/***********************license start************************************
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ * All rights reserved.
+ *
+ * License: one of 'Cavium License' or 'GNU General Public License Version 2'
+ *
+ * This file is provided under the terms of the Cavium License (see below)
+ * or under the terms of GNU General Public License, Version 2, as
+ * published by the Free Software Foundation. When using or redistributing
+ * this file, you may do so under either license.
+ *
+ * Cavium License: Redistribution and use in source and binary forms, with
+ * or without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials provided
+ * with the distribution.
+ *
+ * * Neither the name of Cavium Inc. nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * This Software, including technical data, may be subject to U.S. export
+ * control laws, including the U.S. Export Administration Act and its
+ * associated regulations, and may be subject to export or import
+ * regulations in other countries.
+ *
+ * TO THE MAXIMUM EXTENT PERMITTED BY LAW, THE SOFTWARE IS PROVIDED "AS IS"
+ * AND WITH ALL FAULTS AND CAVIUM INC. MAKES NO PROMISES, REPRESENTATIONS
+ * OR WARRANTIES, EITHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, WITH
+ * RESPECT TO THE SOFTWARE, INCLUDING ITS CONDITION, ITS CONFORMITY TO ANY
+ * REPRESENTATION OR DESCRIPTION, OR THE EXISTENCE OF ANY LATENT OR PATENT
+ * DEFECTS, AND CAVIUM SPECIFICALLY DISCLAIMS ALL IMPLIED (IF ANY)
+ * WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR A
+ * PARTICULAR PURPOSE, LACK OF VIRUSES, ACCURACY OR COMPLETENESS, QUIET
+ * ENJOYMENT, QUIET POSSESSION OR CORRESPONDENCE TO DESCRIPTION. THE
+ * ENTIRE RISK ARISING OUT OF USE OR PERFORMANCE OF THE SOFTWARE LIES
+ * WITH YOU.
+ ***********************license end**************************************/
+
+#ifndef __ZIP_DEFLATE_H__
+#define __ZIP_DEFLATE_H__
+
+/**
+ * zip_deflate - API to offload deflate operation to hardware
+ * @zip_ops: Pointer to zip operation structure
+ * @s: Pointer to the structure representing zip state
+ * @zip_dev: Pointer to the structure representing zip device
+ *
+ * This function prepares the zip deflate command and submits it to the zip
+ * engine by ringing the doorbell.
+ *
+ * Return: 0 if successful or error code
+ */
+int zip_deflate(struct zip_operation *zip_ops, struct zip_state *s,
+ struct zip_device *zip_dev);
+#endif
diff --git a/drivers/crypto/cavium/zip/zip_device.c b/drivers/crypto/cavium/zip/zip_device.c
index ed21c5a..a72cdcf0 100644
--- a/drivers/crypto/cavium/zip/zip_device.c
+++ b/drivers/crypto/cavium/zip/zip_device.c
@@ -44,6 +44,7 @@
***********************license end**************************************/
#include "common.h"
+#include "zip_deflate.h"
/**
* zip_cmd_queue_consumed - Calculates the space consumed in the command queue.
diff --git a/drivers/crypto/cavium/zip/zip_inflate.c b/drivers/crypto/cavium/zip/zip_inflate.c
new file mode 100644
index 0000000..849c4c85
--- /dev/null
+++ b/drivers/crypto/cavium/zip/zip_inflate.c
@@ -0,0 +1,211 @@
+/***********************license start************************************
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ * All rights reserved.
+ *
+ * License: one of 'Cavium License' or 'GNU General Public License Version 2'
+ *
+ * This file is provided under the terms of the Cavium License (see below)
+ * or under the terms of GNU General Public License, Version 2, as
+ * published by the Free Software Foundation. When using or redistributing
+ * this file, you may do so under either license.
+ *
+ * Cavium License: Redistribution and use in source and binary forms, with
+ * or without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials provided
+ * with the distribution.
+ *
+ * * Neither the name of Cavium Inc. nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * This Software, including technical data, may be subject to U.S. export
+ * control laws, including the U.S. Export Administration Act and its
+ * associated regulations, and may be subject to export or import
+ * regulations in other countries.
+ *
+ * TO THE MAXIMUM EXTENT PERMITTED BY LAW, THE SOFTWARE IS PROVIDED "AS IS"
+ * AND WITH ALL FAULTS AND CAVIUM INC. MAKES NO PROMISES, REPRESENTATIONS
+ * OR WARRANTIES, EITHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, WITH
+ * RESPECT TO THE SOFTWARE, INCLUDING ITS CONDITION, ITS CONFORMITY TO ANY
+ * REPRESENTATION OR DESCRIPTION, OR THE EXISTENCE OF ANY LATENT OR PATENT
+ * DEFECTS, AND CAVIUM SPECIFICALLY DISCLAIMS ALL IMPLIED (IF ANY)
+ * WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR A
+ * PARTICULAR PURPOSE, LACK OF VIRUSES, ACCURACY OR COMPLETENESS, QUIET
+ * ENJOYMENT, QUIET POSSESSION OR CORRESPONDENCE TO DESCRIPTION. THE
+ * ENTIRE RISK ARISING OUT OF USE OR PERFORMANCE OF THE SOFTWARE LIES
+ * WITH YOU.
+ ***********************license end**************************************/
+
+#include <linux/delay.h>
+#include <linux/sched.h>
+
+#include "common.h"
+#include "zip_inflate.h"
+
+static int prepare_inflate_zcmd(struct zip_operation *zip_ops,
+ struct zip_state *s, union zip_inst_s *zip_cmd)
+{
+ union zip_zres_s *result_ptr = &s->result;
+
+ memset(zip_cmd, 0, sizeof(s->zip_cmd));
+ memset(result_ptr, 0, sizeof(s->result));
+
+ /* IWORD#0 */
+
+ /* Decompression History Gather list - no gather list */
+ zip_cmd->s.hg = 0;
+ /* For decompression, CE must be 0x0. */
+ zip_cmd->s.ce = 0;
+ /* For decompression, SS must be 0x0. */
+ zip_cmd->s.ss = 0;
+ /* For decompression, SF should always be set. */
+ zip_cmd->s.sf = 1;
+
+ /* Begin File */
+ if (zip_ops->begin_file == 0)
+ zip_cmd->s.bf = 0;
+ else
+ zip_cmd->s.bf = 1;
+
+ zip_cmd->s.ef = 1;
+ /* 0: for Deflate decompression, 3: for LZS decompression */
+ zip_cmd->s.cc = zip_ops->ccode;
+
+ /* IWORD #1*/
+
+ /* adler checksum */
+ zip_cmd->s.adlercrc32 = zip_ops->csum;
+
+ /*
+ * HISTORYLENGTH must be 0x0 for any ZIP decompress operation.
+ * History data is added to a decompression operation via IWORD3.
+ */
+ zip_cmd->s.historylength = 0;
+ zip_cmd->s.ds = 0;
+
+ /* IWORD # 8 and 9 - Output pointer */
+ zip_cmd->s.out_ptr_addr.s.addr = __pa(zip_ops->output);
+ zip_cmd->s.out_ptr_ctl.s.length = zip_ops->output_len;
+
+ /* Maximum number of output-stream bytes that can be written */
+ zip_cmd->s.totaloutputlength = zip_ops->output_len;
+
+ zip_dbg("Data Direct Input case ");
+
+ /* IWORD # 6 and 7 - input pointer */
+ zip_cmd->s.dg = 0;
+ zip_cmd->s.inp_ptr_addr.s.addr = __pa((u8 *)zip_ops->input);
+ zip_cmd->s.inp_ptr_ctl.s.length = zip_ops->input_len;
+
+ /* IWORD # 10 and 11 - Result pointer */
+ zip_cmd->s.res_ptr_addr.s.addr = __pa(result_ptr);
+
+ /* Clearing completion code */
+ result_ptr->s.compcode = 0;
+
+ /* Returning 0 for time being.*/
+ return 0;
+}
+
+/**
+ * zip_inflate - API to offload inflate operation to hardware
+ * @zip_ops: Pointer to zip operation structure
+ * @s: Pointer to the structure representing zip state
+ * @zip_dev: Pointer to zip device structure
+ *
+ * This function prepares the zip inflate command and submits it to the zip
+ * engine for processing.
+ *
+ * Return: 0 if successful or error code
+ */
+int zip_inflate(struct zip_operation *zip_ops, struct zip_state *s,
+ struct zip_device *zip_dev)
+{
+ union zip_inst_s *zip_cmd = &s->zip_cmd;
+ union zip_zres_s *result_ptr = &s->result;
+ u32 queue;
+
+ /* Prepare inflate zip command */
+ prepare_inflate_zcmd(zip_ops, s, zip_cmd);
+
+ /* Load inflate command to zip queue and ring the doorbell */
+ queue = zip_load_instr(zip_cmd, zip_dev);
+
+ while (!result_ptr->s.compcode)
+ continue;
+
+ zip_ops->compcode = result_ptr->s.compcode;
+ switch (zip_ops->compcode) {
+ case ZIP_NOTDONE:
+ zip_dbg("Zip Instruction not yet completed\n");
+ return ZIP_ERROR;
+
+ case ZIP_SUCCESS:
+ zip_dbg("Zip Instruction completed successfully\n");
+ break;
+
+ case ZIP_DYNAMIC_STOP:
+ zip_dbg(" Dynamic stop Initiated\n");
+ break;
+
+ default:
+ zip_dbg("Instruction failed. Code = %d\n", zip_ops->compcode);
+ zip_update_cmd_bufs(zip_dev, queue);
+ return ZIP_ERROR;
+ }
+
+ zip_update_cmd_bufs(zip_dev, queue);
+
+ if ((zip_ops->ccode == 3) && (zip_ops->flush == 4) &&
+ (zip_ops->compcode != ZIP_DYNAMIC_STOP))
+ result_ptr->s.ef = 1;
+
+ zip_ops->csum = result_ptr->s.adler32;
+
+ if (zip_ops->output_len < result_ptr->s.totalbyteswritten) {
+ zip_err("output_len (%d) < total bytes written (%d)\n",
+ zip_ops->output_len, result_ptr->s.totalbyteswritten);
+ zip_ops->output_len = 0;
+ } else {
+ zip_ops->output_len = result_ptr->s.totalbyteswritten;
+ }
+
+ zip_ops->bytes_read = result_ptr->s.totalbytesread;
+ zip_ops->bits_processed = result_ptr->s.totalbitsprocessed;
+ zip_ops->end_file = result_ptr->s.ef;
+ if (zip_ops->end_file) {
+ switch (zip_ops->format) {
+ case RAW_FORMAT:
+ zip_dbg("RAW Format: %d ", zip_ops->format);
+ /* Get checksum from engine */
+ zip_ops->csum = result_ptr->s.adler32;
+ break;
+
+ case ZLIB_FORMAT:
+ zip_dbg("ZLIB Format: %d ", zip_ops->format);
+ zip_ops->csum = result_ptr->s.adler32;
+ break;
+
+ case GZIP_FORMAT:
+ zip_dbg("GZIP Format: %d ", zip_ops->format);
+ zip_ops->csum = result_ptr->s.crc32;
+ break;
+
+ case LZS_FORMAT:
+ zip_dbg("LZS Format: %d ", zip_ops->format);
+ break;
+
+ default:
+ zip_err("Format error:%d\n", zip_ops->format);
+ }
+ }
+
+ return 0;
+}
diff --git a/drivers/crypto/cavium/zip/zip_inflate.h b/drivers/crypto/cavium/zip/zip_inflate.h
new file mode 100644
index 0000000..4cee4c9
--- /dev/null
+++ b/drivers/crypto/cavium/zip/zip_inflate.h
@@ -0,0 +1,62 @@
+/***********************license start************************************
+ * Copyright (c) 2003-2016 Cavium, Inc.
+ * All rights reserved.
+ *
+ * License: one of 'Cavium License' or 'GNU General Public License Version 2'
+ *
+ * This file is provided under the terms of the Cavium License (see below)
+ * or under the terms of GNU General Public License, Version 2, as
+ * published by the Free Software Foundation. When using or redistributing
+ * this file, you may do so under either license.
+ *
+ * Cavium License: Redistribution and use in source and binary forms, with
+ * or without modification, are permitted provided that the following
+ * conditions are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ *
+ * * Redistributions in binary form must reproduce the above
+ * copyright notice, this list of conditions and the following
+ * disclaimer in the documentation and/or other materials provided
+ * with the distribution.
+ *
+ * * Neither the name of Cavium Inc. nor the names of its contributors may be
+ * used to endorse or promote products derived from this software without
+ * specific prior written permission.
+ *
+ * This Software, including technical data, may be subject to U.S. export
+ * control laws, including the U.S. Export Administration Act and its
+ * associated regulations, and may be subject to export or import
+ * regulations in other countries.
+ *
+ * TO THE MAXIMUM EXTENT PERMITTED BY LAW, THE SOFTWARE IS PROVIDED "AS IS"
+ * AND WITH ALL FAULTS AND CAVIUM INC. MAKES NO PROMISES, REPRESENTATIONS
+ * OR WARRANTIES, EITHER EXPRESS, IMPLIED, STATUTORY, OR OTHERWISE, WITH
+ * RESPECT TO THE SOFTWARE, INCLUDING ITS CONDITION, ITS CONFORMITY TO ANY
+ * REPRESENTATION OR DESCRIPTION, OR THE EXISTENCE OF ANY LATENT OR PATENT
+ * DEFECTS, AND CAVIUM SPECIFICALLY DISCLAIMS ALL IMPLIED (IF ANY)
+ * WARRANTIES OF TITLE, MERCHANTABILITY, NONINFRINGEMENT, FITNESS FOR A
+ * PARTICULAR PURPOSE, LACK OF VIRUSES, ACCURACY OR COMPLETENESS, QUIET
+ * ENJOYMENT, QUIET POSSESSION OR CORRESPONDENCE TO DESCRIPTION. THE
+ * ENTIRE RISK ARISING OUT OF USE OR PERFORMANCE OF THE SOFTWARE LIES
+ * WITH YOU.
+ ***********************license end**************************************/
+
+#ifndef __ZIP_INFLATE_H__
+#define __ZIP_INFLATE_H__
+
+/**
+ * zip_inflate - API to offload inflate operation to hardware
+ * @zip_ops: Pointer to zip operation structure
+ * @s: Pointer to the structure representing zip state
+ * @zip_dev: Pointer to the structure representing zip device
+ *
+ * This function prepares the zip inflate command and submits it to the zip
+ * engine for processing.
+ *
+ * Return: 0 if successful or error code
+ */
+int zip_inflate(struct zip_operation *zip_ops, struct zip_state *s,
+ struct zip_device *zip_dev);
+#endif
diff --git a/drivers/crypto/cavium/zip/zip_main.c b/drivers/crypto/cavium/zip/zip_main.c
index 052c42d..ae3395f 100644
--- a/drivers/crypto/cavium/zip/zip_main.c
+++ b/drivers/crypto/cavium/zip/zip_main.c
@@ -364,35 +364,6 @@ static void zip_remove(struct pci_dev *pdev)
zip_dbg_exit();
}
-/* Dummy Functions */
-int zip_alloc_lzs_ctx(struct crypto_tfm *tfm)
-{
- return 0;
-}
-
-int zip_alloc_zip_ctx(struct crypto_tfm *tfm)
-{
- return 0;
-}
-
-void zip_free_zip_ctx(struct crypto_tfm *tfm)
-{
-}
-
-int zip_deflate_comp(struct crypto_tfm *tfm,
- const u8 *src, unsigned int slen,
- u8 *dst, unsigned int *dlen)
-{
- return 0;
-}
-
-int zip_inflate_comp(struct crypto_tfm *tfm,
- const u8 *src, unsigned int slen,
- u8 *dst, unsigned int *dlen)
-{
- return 0;
-}
-
/* PCI Sub-System Interface */
static struct pci_driver zip_driver = {
.name = DRV_NAME,
--
2.9.0.rc0.21.g7777322
^ permalink raw reply related
* [RFC PATCH 3/3] crypto: zip - Add Compression/decompression statistics
From: Jan Glauber @ 2016-12-12 15:04 UTC (permalink / raw)
To: Herbert Xu
Cc: linux-crypto, linux-kernel, David S . Miller, Mahipal Challa,
Vishnu Nair, Jan Glauber
In-Reply-To: <20161212150439.18627-1-jglauber@cavium.com>
From: Mahipal Challa <Mahipal.Challa@cavium.com>
Add statistics for compression/decompression hardware offload
under debugfs.
Signed-off-by: Mahipal Challa <Mahipal.Challa@cavium.com>
Signed-off-by: Vishnu Nair <Vishnu.Nair@cavium.com>
Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
drivers/crypto/cavium/zip/zip_deflate.c | 10 ++
drivers/crypto/cavium/zip/zip_inflate.c | 12 ++
drivers/crypto/cavium/zip/zip_main.c | 227 ++++++++++++++++++++++++++++++++
drivers/crypto/cavium/zip/zip_main.h | 15 +++
4 files changed, 264 insertions(+)
diff --git a/drivers/crypto/cavium/zip/zip_deflate.c b/drivers/crypto/cavium/zip/zip_deflate.c
index 913cc25..11052d8 100644
--- a/drivers/crypto/cavium/zip/zip_deflate.c
+++ b/drivers/crypto/cavium/zip/zip_deflate.c
@@ -122,12 +122,19 @@ int zip_deflate(struct zip_operation *zip_ops, struct zip_state *s,
/* Prepares zip command based on the input parameters */
prepare_zip_command(zip_ops, s, zip_cmd);
+ atomic64_add(zip_ops->input_len, &zip_dev->stats.comp_in_bytes);
/* Loads zip command into command queues and rings door bell */
queue = zip_load_instr(zip_cmd, zip_dev);
+ /* Stats update for compression requests submitted */
+ atomic64_inc(&zip_dev->stats.comp_req_submit);
+
while (!result_ptr->s.compcode)
continue;
+ /* Stats update for compression requests completed */
+ atomic64_inc(&zip_dev->stats.comp_req_complete);
+
zip_ops->compcode = result_ptr->s.compcode;
switch (zip_ops->compcode) {
case ZIP_NOTDONE:
@@ -175,6 +182,9 @@ int zip_deflate(struct zip_operation *zip_ops, struct zip_state *s,
zip_err("Unknown Format:%d\n", zip_ops->format);
}
+ atomic64_add(result_ptr->s.totalbyteswritten,
+ &zip_dev->stats.comp_out_bytes);
+
/* Update output_len */
if (zip_ops->output_len < result_ptr->s.totalbyteswritten) {
/* Dynamic stop && strm->output_len < zipconstants[onfsize] */
diff --git a/drivers/crypto/cavium/zip/zip_inflate.c b/drivers/crypto/cavium/zip/zip_inflate.c
index 849c4c85..44503d8 100644
--- a/drivers/crypto/cavium/zip/zip_inflate.c
+++ b/drivers/crypto/cavium/zip/zip_inflate.c
@@ -135,12 +135,20 @@ int zip_inflate(struct zip_operation *zip_ops, struct zip_state *s,
/* Prepare inflate zip command */
prepare_inflate_zcmd(zip_ops, s, zip_cmd);
+ atomic64_add(zip_ops->input_len, &zip_dev->stats.decomp_in_bytes);
+
/* Load inflate command to zip queue and ring the doorbell */
queue = zip_load_instr(zip_cmd, zip_dev);
+ /* Decompression requests submitted stats update */
+ atomic64_inc(&zip_dev->stats.decomp_req_submit);
+
while (!result_ptr->s.compcode)
continue;
+ /* Decompression requests completed stats update */
+ atomic64_inc(&zip_dev->stats.decomp_req_complete);
+
zip_ops->compcode = result_ptr->s.compcode;
switch (zip_ops->compcode) {
case ZIP_NOTDONE:
@@ -157,6 +165,7 @@ int zip_inflate(struct zip_operation *zip_ops, struct zip_state *s,
default:
zip_dbg("Instruction failed. Code = %d\n", zip_ops->compcode);
+ atomic64_inc(&zip_dev->stats.decomp_bad_reqs);
zip_update_cmd_bufs(zip_dev, queue);
return ZIP_ERROR;
}
@@ -169,6 +178,9 @@ int zip_inflate(struct zip_operation *zip_ops, struct zip_state *s,
zip_ops->csum = result_ptr->s.adler32;
+ atomic64_add(result_ptr->s.totalbyteswritten,
+ &zip_dev->stats.decomp_out_bytes);
+
if (zip_ops->output_len < result_ptr->s.totalbyteswritten) {
zip_err("output_len (%d) < total bytes written (%d)\n",
zip_ops->output_len, result_ptr->s.totalbyteswritten);
diff --git a/drivers/crypto/cavium/zip/zip_main.c b/drivers/crypto/cavium/zip/zip_main.c
index ae3395f..56631bf 100644
--- a/drivers/crypto/cavium/zip/zip_main.c
+++ b/drivers/crypto/cavium/zip/zip_main.c
@@ -427,6 +427,228 @@ static void zip_unregister_compression_device(void)
crypto_unregister_alg(&zip_comp_lzs);
}
+/*
+ * debugfs functions
+ */
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+/* Displays ZIP device statistics */
+static int zip_show_stats(struct seq_file *s, void *unused)
+{
+ u64 val = 0ull;
+ u64 avg_chunk = 0ull, avg_cr = 0ull;
+ u32 q = 0;
+
+ int index = 0;
+ struct zip_device *zip;
+ struct zip_stats *st;
+
+ for (index = 0; index < MAX_ZIP_DEVICES; index++) {
+ if (zip_dev[index]) {
+ zip = zip_dev[index];
+ st = &zip->stats;
+
+ /* Get all the pending requests */
+ for (q = 0; q < ZIP_NUM_QUEUES; q++) {
+ val = zip_reg_read((zip->reg_base +
+ ZIP_DBG_COREX_STA(q)));
+ val = (val >> 32);
+ val = val & 0xffffff;
+ atomic64_add(val, &st->pending_req);
+ }
+
+ avg_chunk = (atomic64_read(&st->comp_in_bytes) /
+ atomic64_read(&st->comp_req_complete));
+ avg_cr = (atomic64_read(&st->comp_in_bytes) /
+ atomic64_read(&st->comp_out_bytes));
+ seq_printf(s, " ZIP Device %d Stats\n"
+ "-----------------------------------\n"
+ "Comp Req Submitted : \t%ld\n"
+ "Comp Req Completed : \t%ld\n"
+ "Compress In Bytes : \t%ld\n"
+ "Compressed Out Bytes : \t%ld\n"
+ "Average Chunk size : \t%llu\n"
+ "Average Compression ratio : \t%llu\n"
+ "Decomp Req Submitted : \t%ld\n"
+ "Decomp Req Completed : \t%ld\n"
+ "Decompress In Bytes : \t%ld\n"
+ "Decompressed Out Bytes : \t%ld\n"
+ "Decompress Bad requests : \t%ld\n"
+ "Pending Req : \t%ld\n"
+ "---------------------------------\n",
+ index,
+ atomic64_read(&st->comp_req_submit),
+ atomic64_read(&st->comp_req_complete),
+ atomic64_read(&st->comp_in_bytes),
+ atomic64_read(&st->comp_out_bytes),
+ avg_chunk,
+ avg_cr,
+ atomic64_read(&st->decomp_req_submit),
+ atomic64_read(&st->decomp_req_complete),
+ atomic64_read(&st->decomp_in_bytes),
+ atomic64_read(&st->decomp_out_bytes),
+ atomic64_read(&st->decomp_bad_reqs),
+ atomic64_read(&st->pending_req));
+
+ /* Reset pending requests count */
+ atomic64_set(&st->pending_req, 0);
+ }
+ }
+ return 0;
+}
+
+/* Clears stats data */
+static int zip_clear_stats(struct seq_file *s, void *unused)
+{
+ int index = 0;
+
+ for (index = 0; index < MAX_ZIP_DEVICES; index++) {
+ if (zip_dev[index]) {
+ memset(&zip_dev[index]->stats, 0,
+ sizeof(struct zip_state));
+ seq_printf(s, "Cleared stats for zip %d\n", index);
+ }
+ }
+
+ return 0;
+}
+
+static struct zip_registers zipregs[64] = {
+ {"ZIP_CMD_CTL ", 0x0000ull},
+ {"ZIP_THROTTLE ", 0x0010ull},
+ {"ZIP_CONSTANTS ", 0x00A0ull},
+ {"ZIP_QUE0_MAP ", 0x1400ull},
+ {"ZIP_QUE1_MAP ", 0x1408ull},
+ {"ZIP_QUE_ENA ", 0x0500ull},
+ {"ZIP_QUE_PRI ", 0x0508ull},
+ {"ZIP_QUE0_DONE ", 0x2000ull},
+ {"ZIP_QUE1_DONE ", 0x2008ull},
+ {"ZIP_QUE0_DOORBELL ", 0x4000ull},
+ {"ZIP_QUE1_DOORBELL ", 0x4008ull},
+ {"ZIP_QUE0_SBUF_ADDR ", 0x1000ull},
+ {"ZIP_QUE1_SBUF_ADDR ", 0x1008ull},
+ {"ZIP_QUE0_SBUF_CTL ", 0x1200ull},
+ {"ZIP_QUE1_SBUF_CTL ", 0x1208ull},
+ { NULL, 0}
+};
+
+/* Prints registers' contents */
+static int zip_print_regs(struct seq_file *s, void *unused)
+{
+ u64 val = 0;
+ int i = 0, index = 0;
+
+ for (index = 0; index < MAX_ZIP_DEVICES; index++) {
+ if (zip_dev[index]) {
+ seq_printf(s, "--------------------------------\n"
+ " ZIP Device %d Registers\n"
+ "--------------------------------\n",
+ index);
+
+ i = 0;
+
+ while (zipregs[i].reg_name) {
+ val = zip_reg_read((zip_dev[index]->reg_base +
+ zipregs[i].reg_offset));
+ seq_printf(s, "%s: 0x%016llx\n",
+ zipregs[i].reg_name, val);
+ i++;
+ }
+ }
+ }
+ return 0;
+}
+
+static int zip_stats_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, zip_show_stats, NULL);
+}
+
+static const struct file_operations zip_stats_fops = {
+ .owner = THIS_MODULE,
+ .open = zip_stats_open,
+ .read = seq_read,
+};
+
+static int zip_clear_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, zip_clear_stats, NULL);
+}
+
+static const struct file_operations zip_clear_fops = {
+ .owner = THIS_MODULE,
+ .open = zip_clear_open,
+ .read = seq_read,
+};
+
+static int zip_regs_open(struct inode *inode, struct file *file)
+{
+ return single_open(file, zip_print_regs, NULL);
+}
+
+static const struct file_operations zip_regs_fops = {
+ .owner = THIS_MODULE,
+ .open = zip_regs_open,
+ .read = seq_read,
+};
+
+/* Root directory for thunderx_zip debugfs entry */
+static struct dentry *zip_debugfs_root;
+
+static int __init zip_debugfs_init(void)
+{
+ struct dentry *zip_stats, *zip_clear, *zip_regs;
+
+ if (!debugfs_initialized())
+ return -ENODEV;
+
+ zip_debugfs_root = debugfs_create_dir("thunderx_zip", NULL);
+ if (!zip_debugfs_root)
+ return -ENOMEM;
+
+ /* Creating files for entries inside thunderx_zip directory */
+ zip_stats = debugfs_create_file("zip_stats", S_IRUGO,
+ zip_debugfs_root,
+ NULL, &zip_stats_fops);
+ if (!zip_stats)
+ goto failed_to_create;
+
+ zip_clear = debugfs_create_file("zip_clear", S_IRUGO,
+ zip_debugfs_root,
+ NULL, &zip_clear_fops);
+ if (!zip_clear)
+ goto failed_to_create;
+
+ zip_regs = debugfs_create_file("zip_regs", S_IRUGO,
+ zip_debugfs_root,
+ NULL, &zip_regs_fops);
+ if (!zip_regs)
+ goto failed_to_create;
+
+ return 0;
+
+failed_to_create:
+ debugfs_remove_recursive(zip_debugfs_root);
+ return -ENOENT;
+}
+
+static void __exit zip_debugfs_exit(void)
+{
+ debugfs_remove_recursive(zip_debugfs_root);
+}
+
+#else
+static int __init zip_debugfs_init(void)
+{
+ return 0;
+}
+
+static void __exit zip_debugfs_exit(void) { }
+
+#endif
+/* debugfs - end */
+
static int __init zip_init_module(void)
{
int ret;
@@ -448,11 +670,16 @@ static int __init zip_init_module(void)
return 1;
}
+ if (zip_debugfs_init())
+ zip_msg("debugfs initialization failed\n");
+
return ret;
}
static void __exit zip_cleanup_module(void)
{
+ zip_debugfs_exit();
+
/* Unregister this driver for pci zip devices */
pci_unregister_driver(&zip_driver);
diff --git a/drivers/crypto/cavium/zip/zip_main.h b/drivers/crypto/cavium/zip/zip_main.h
index 73b9e6d..cd7963e 100644
--- a/drivers/crypto/cavium/zip/zip_main.h
+++ b/drivers/crypto/cavium/zip/zip_main.h
@@ -87,6 +87,20 @@ struct zip_registers {
u64 reg_offset;
};
+/* ZIP Compression - Decompression stats */
+struct zip_stats {
+ atomic64_t comp_req_submit;
+ atomic64_t comp_req_complete;
+ atomic64_t decomp_req_submit;
+ atomic64_t decomp_req_complete;
+ atomic64_t pending_req;
+ atomic64_t comp_in_bytes;
+ atomic64_t comp_out_bytes;
+ atomic64_t decomp_in_bytes;
+ atomic64_t decomp_out_bytes;
+ atomic64_t decomp_bad_reqs;
+};
+
/* ZIP Instruction Queue */
struct zip_iq {
u64 *sw_head;
@@ -112,6 +126,7 @@ struct zip_device {
u64 ctxsize;
struct zip_iq iq[ZIP_MAX_NUM_QUEUES];
+ struct zip_stats stats;
};
/* Prototypes */
--
2.9.0.rc0.21.g7777322
^ permalink raw reply related
* [RFC PATCH 0/3] Cavium ThunderX ZIP driver
From: Jan Glauber @ 2016-12-12 15:04 UTC (permalink / raw)
To: Herbert Xu
Cc: linux-crypto, linux-kernel, David S . Miller, Mahipal Challa,
Vishnu Nair, Jan Glauber
Hi Herbert,
this series adds support for hardware accelerated compression & decompression
as found on ThunderX (arm64) SOCs. I've been reviewing this driver internally
for some time and would like to get feedback on the RFC to see if this goes
into the right direction and to see if there are any concerns.
We've discussed switching to the new acomp algorithm but for the time being
decided against acomp because our test cases are not yet supported with it.
To test the ZIP driver we've used ZSWAP and IPComp.
Performance numbers from ZSWAP look promising.
The "average time" for compressing a 4KB page:
Compression Software : 128 usec
Compression HW deflate : 16 usec
Compression HW LZS : 10 usec
Decompression Software : 20 usec
Decompression HW deflate: 7 usec
Decompression HW LZS : 5 usec
Patches are on top of 4.9.
Feedback welcome!
Jan
---------------------
Mahipal Challa (3):
crypto: zip - Add ThunderX ZIP driver core
crypto: zip - Wire-up Compression / decompression HW offload
crypto: zip - Add Compression/decompression statistics
drivers/crypto/Kconfig | 7 +
drivers/crypto/Makefile | 1 +
drivers/crypto/cavium/Makefile | 4 +
drivers/crypto/cavium/zip/Makefile | 11 +
drivers/crypto/cavium/zip/common.h | 258 ++++++
drivers/crypto/cavium/zip/zip_crypto.c | 243 ++++++
drivers/crypto/cavium/zip/zip_crypto.h | 67 ++
drivers/crypto/cavium/zip/zip_deflate.c | 200 +++++
drivers/crypto/cavium/zip/zip_deflate.h | 62 ++
drivers/crypto/cavium/zip/zip_device.c | 209 +++++
drivers/crypto/cavium/zip/zip_device.h | 138 ++++
drivers/crypto/cavium/zip/zip_inflate.c | 223 ++++++
drivers/crypto/cavium/zip/zip_inflate.h | 62 ++
drivers/crypto/cavium/zip/zip_main.c | 698 ++++++++++++++++
drivers/crypto/cavium/zip/zip_main.h | 141 ++++
drivers/crypto/cavium/zip/zip_mem.c | 120 +++
drivers/crypto/cavium/zip/zip_mem.h | 78 ++
drivers/crypto/cavium/zip/zip_regs.h | 1326 +++++++++++++++++++++++++++++++
18 files changed, 3848 insertions(+)
create mode 100644 drivers/crypto/cavium/Makefile
create mode 100644 drivers/crypto/cavium/zip/Makefile
create mode 100644 drivers/crypto/cavium/zip/common.h
create mode 100644 drivers/crypto/cavium/zip/zip_crypto.c
create mode 100644 drivers/crypto/cavium/zip/zip_crypto.h
create mode 100644 drivers/crypto/cavium/zip/zip_deflate.c
create mode 100644 drivers/crypto/cavium/zip/zip_deflate.h
create mode 100644 drivers/crypto/cavium/zip/zip_device.c
create mode 100644 drivers/crypto/cavium/zip/zip_device.h
create mode 100644 drivers/crypto/cavium/zip/zip_inflate.c
create mode 100644 drivers/crypto/cavium/zip/zip_inflate.h
create mode 100644 drivers/crypto/cavium/zip/zip_main.c
create mode 100644 drivers/crypto/cavium/zip/zip_main.h
create mode 100644 drivers/crypto/cavium/zip/zip_mem.c
create mode 100644 drivers/crypto/cavium/zip/zip_mem.h
create mode 100644 drivers/crypto/cavium/zip/zip_regs.h
--
2.9.0.rc0.21.g7777322
^ permalink raw reply
* Re: [PATCH 1/1] crypto: asymmetric_keys: set error code on failure
From: David Howells @ 2016-12-12 16:10 UTC (permalink / raw)
To: Pan Bian
Cc: dhowells, Herbert Xu, David S. Miller, keyrings, linux-crypto,
linux-kernel, Pan Bian
In-Reply-To: <1480777024-7410-1-git-send-email-bianpan201602@163.com>
Pan Bian <bianpan201602@163.com> wrote:
> outlen = crypto_akcipher_maxsize(tfm);
> output = kmalloc(outlen, GFP_KERNEL);
> - if (!output)
> + if (!output) {
> + ret = -ENOMEM;
> goto error_free_req;
> + }
This is preferred:
+ ret = -ENOMEM;
outlen = crypto_akcipher_maxsize(tfm);
output = kmalloc(outlen, GFP_KERNEL);
if (!output)
goto error_free_req;
I'll alter your patch.
David
^ permalink raw reply
* [PATCH] crypto: arm64/aes: reimplement bit-sliced ARM/NEON implementation for arm64
From: Ard Biesheuvel @ 2016-12-12 17:45 UTC (permalink / raw)
To: linux-crypto, herbert; +Cc: linux-arm-kernel, nico, will.deacon, Ard Biesheuvel
This is a reimplementation of the NEON version of the bit-sliced AES
algorithm. This code is heavily based on Andy Polyakov's OpenSSL version
for ARM, which is also available in the kernel. This is an alternative for
the existing NEON implementation for arm64 authored by me, which suffers
from poor performance due to its reliance on the pathologically slow four
register variant of the tbl/tbx NEON instruction.
This version is about ~30% (*) faster than the generic C code, but only in
cases where the input can be 8x interleaved (this is a fundamental property
of bit slicing). For this reason, only the chaining modes ECB, XTS and CTR
are implemented. (The significance of ECB is that it could potentially be
used by other chaining modes)
* Measured on Cortex-A57. Note that this is still an order of magnitude
slower than the implementations that use the dedicated AES instructions
introduced in ARMv8, but those are part of an optional extension, and so
it is good to have a fallback.
Signed-off-by: Ard Biesheuvel <ard.biesheuvel@linaro.org>
---
arch/arm64/crypto/Kconfig | 6 +
arch/arm64/crypto/Makefile | 3 +
arch/arm64/crypto/aes-neonbs-core.S | 905 ++++++++++++++++++++++++++++++++++++
arch/arm64/crypto/aes-neonbs-glue.c | 300 ++++++++++++
4 files changed, 1214 insertions(+)
create mode 100644 arch/arm64/crypto/aes-neonbs-core.S
create mode 100644 arch/arm64/crypto/aes-neonbs-glue.c
diff --git a/arch/arm64/crypto/Kconfig b/arch/arm64/crypto/Kconfig
index 450a85df041a..cd0e7a6146b7 100644
--- a/arch/arm64/crypto/Kconfig
+++ b/arch/arm64/crypto/Kconfig
@@ -72,4 +72,10 @@ config CRYPTO_CRC32_ARM64
depends on ARM64
select CRYPTO_HASH
+config CRYPTO_AES_NEON_BS
+ tristate "AES in ECB/CBC/CTR/XTS modes using bit-sliced NEON algorithm"
+ depends on KERNEL_MODE_NEON
+ select CRYPTO_BLKCIPHER
+ select CRYPTO_AES
+
endif
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index aa8888d7b744..11d20714ec48 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -41,6 +41,9 @@ sha256-arm64-y := sha256-glue.o sha256-core.o
obj-$(CONFIG_CRYPTO_SHA512_ARM64) += sha512-arm64.o
sha512-arm64-y := sha512-glue.o sha512-core.o
+obj-$(CONFIG_CRYPTO_AES_NEON_BS) += aes-neon-bs.o
+aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o
+
AFLAGS_aes-ce.o := -DINTERLEAVE=4
AFLAGS_aes-neon.o := -DINTERLEAVE=4
diff --git a/arch/arm64/crypto/aes-neonbs-core.S b/arch/arm64/crypto/aes-neonbs-core.S
new file mode 100644
index 000000000000..d027c276cc75
--- /dev/null
+++ b/arch/arm64/crypto/aes-neonbs-core.S
@@ -0,0 +1,905 @@
+/*
+ * Bit sliced AES using NEON instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+/*
+ * The algorithm implemented here is described in detail by the paper
+ * 'Faster and Timing-Attack Resistant AES-GCM' by Emilia Kaesper and
+ * Peter Schwabe (https://eprint.iacr.org/2009/129.pdf)
+ *
+ * This implementation is based primarily on the OpenSSL implementation
+ * for 32-bit ARM written by Andy Polyakov <appro@openssl.org>
+ */
+
+#include <linux/linkage.h>
+#include <asm/assembler.h>
+
+ .text
+
+ rounds .req x11
+ bskey .req x12
+
+ .macro in_bs_ch, b0, b1, b2, b3, b4, b5, b6, b7
+ eor \b2, \b2, \b1
+ eor \b5, \b5, \b6
+ eor \b3, \b3, \b0
+ eor \b6, \b6, \b2
+ eor \b5, \b5, \b0
+ eor \b6, \b6, \b3
+ eor \b3, \b3, \b7
+ eor \b7, \b7, \b5
+ eor \b3, \b3, \b4
+ eor \b4, \b4, \b5
+ eor \b2, \b2, \b7
+ eor \b3, \b3, \b1
+ eor \b1, \b1, \b5
+ .endm
+
+ .macro out_bs_ch, b0, b1, b2, b3, b4, b5, b6, b7
+ eor \b0, \b0, \b6
+ eor \b1, \b1, \b4
+ eor \b4, \b4, \b6
+ eor \b2, \b2, \b0
+ eor \b6, \b6, \b1
+ eor \b1, \b1, \b5
+ eor \b5, \b5, \b3
+ eor \b3, \b3, \b7
+ eor \b7, \b7, \b5
+ eor \b2, \b2, \b5
+ eor \b4, \b4, \b7
+ .endm
+
+ .macro inv_in_bs_ch, b6, b1, b2, b4, b7, b0, b3, b5
+ eor \b1, \b1, \b7
+ eor \b4, \b4, \b7
+ eor \b7, \b7, \b5
+ eor \b1, \b1, \b3
+ eor \b2, \b2, \b5
+ eor \b3, \b3, \b7
+ eor \b6, \b6, \b1
+ eor \b2, \b2, \b0
+ eor \b5, \b5, \b3
+ eor \b4, \b4, \b6
+ eor \b0, \b0, \b6
+ eor \b1, \b1, \b4
+ .endm
+
+ .macro inv_out_bs_ch, b6, b5, b0, b3, b7, b1, b4, b2
+ eor \b1, \b1, \b5
+ eor \b2, \b2, \b7
+ eor \b3, \b3, \b1
+ eor \b4, \b4, \b5
+ eor \b7, \b7, \b5
+ eor \b3, \b3, \b4
+ eor \b5, \b5, \b0
+ eor \b3, \b3, \b7
+ eor \b6, \b6, \b2
+ eor \b2, \b2, \b1
+ eor \b6, \b6, \b3
+ eor \b3, \b3, \b0
+ eor \b5, \b5, \b6
+ .endm
+
+ .macro mul_gf4, x0, x1, y0, y1, t0, t1
+ eor \t0, \y0, \y1
+ and \t0, \t0, \x0
+ eor \x0, \x0, \x1
+ and \t1, \x1, \y0
+ and \x0, \x0, \y1
+ eor \x1, \t1, \t0
+ eor \x0, \x0, \t1
+ .endm
+
+ .macro mul_gf4_n, x0, x1, y0, y1, t0
+ eor \t0, \y0, \y1
+ and \t0, \t0, \x0
+ eor \x0, \x0, \x1
+ and \x1, \x1, \y0
+ and \x0, \x0, \y1
+ eor \x1, \x1, \x0
+ eor \x0, \x0, \t0
+ .endm
+
+ .macro mul_gf4_n_gf4, x0, x1, y0, y1, t0, x2, x3, y2, y3, t1
+ eor \t0, \y0, \y1
+ eor \t1, \y2, \y3
+ and \t0, \t0, \x0
+ and \t1, \t1, \x2
+ eor \x0, \x0, \x1
+ eor \x2, \x2, \x3
+ and \x1, \x1, \y0
+ and \x3, \x3, \y2
+ and \x0, \x0, \y1
+ and \x2, \x2, \y3
+ eor \x1, \x1, \x0
+ eor \x2, \x2, \x3
+ eor \x0, \x0, \t0
+ eor \x3, \x3, \t1
+ .endm
+
+ .macro mul_gf16_2, x0, x1, x2, x3, x4, x5, x6, x7, \
+ y0, y1, y2, y3, t0, t1, t2, t3
+ eor \t0, \x0, \x2
+ eor \t1, \x1, \x3
+ mul_gf4 \x0, \x1, \y0, \y1, \t2, \t3
+ eor \y0, \y0, \y2
+ eor \y1, \y1, \y3
+ mul_gf4_n_gf4 \t0, \t1, \y0, \y1, \t3, \x2, \x3, \y2, \y3, \t2
+ eor \x0, \x0, \t0
+ eor \x2, \x2, \t0
+ eor \x1, \x1, \t1
+ eor \x3, \x3, \t1
+ eor \t0, \x4, \x6
+ eor \t1, \x5, \x7
+ mul_gf4_n_gf4 \t0, \t1, \y0, \y1, \t3, \x6, \x7, \y2, \y3, \t2
+ eor \y0, \y0, \y2
+ eor \y1, \y1, \y3
+ mul_gf4 \x4, \x5, \y0, \y1, \t2, \t3
+ eor \x4, \x4, \t0
+ eor \x6, \x6, \t0
+ eor \x5, \x5, \t1
+ eor \x7, \x7, \t1
+ .endm
+
+ .macro inv_gf256, x0, x1, x2, x3, x4, x5, x6, x7, \
+ t0, t1, t2, t3, s0, s1, s2, s3
+ eor \t3, \x4, \x6
+ eor \t2, \x5, \x7
+ eor \t1, \x1, \x3
+ eor \s1, \x7, \x6
+ mov \t0, \t2
+ eor \s0, \x0, \x2
+ orr \t2, \t2, \t1
+ eor \s3, \t3, \t0
+ and \s2, \t3, \s0
+ orr \t3, \t3, \s0
+ eor \s0, \s0, \t1
+ and \t0, \t0, \t1
+ eor \t1, \x3, \x2
+ and \s3, \s3, \s0
+ and \s1, \s1, \t1
+ eor \t1, \x4, \x5
+ eor \s0, \x1, \x0
+ eor \t3, \t3, \s1
+ eor \t2, \t2, \s1
+ and \s1, \t1, \s0
+ orr \t1, \t1, \s0
+ eor \t3, \t3, \s3
+ eor \t0, \t0, \s1
+ eor \t2, \t2, \s2
+ eor \t1, \t1, \s3
+ eor \t0, \t0, \s2
+ and \s0, \x7, \x3
+ eor \t1, \t1, \s2
+ and \s1, \x6, \x2
+ and \s2, \x5, \x1
+ orr \s3, \x4, \x0
+ eor \t3, \t3, \s0
+ eor \t1, \t1, \s2
+ eor \t0, \t0, \s3
+ eor \t2, \t2, \s1
+ and \s2, \t3, \t1
+ mov \s0, \t0
+ eor \s1, \t2, \s2
+ eor \s3, \t0, \s2
+ eor \s2, \t0, \s2
+ bsl \s1, \t1, \t0
+ bsl \s3, \t3, \t2
+ eor \t3, \t3, \t2
+ bsl \s0, \s1, \s2
+ bsl \t0, \s2, \s1
+ and \s2, \s0, \s3
+ eor \t1, \t1, \t0
+ eor \s2, \s2, \t3
+ mul_gf16_2 \x0, \x1, \x2, \x3, \x4, \x5, \x6, \x7, \
+ \s3, \s2, \s1, \t1, \s0, \t0, \t2, \t3
+ .endm
+
+ .macro sbox, b0, b1, b2, b3, b4, b5, b6, b7, \
+ t0, t1, t2, t3, s0, s1, s2, s3
+ in_bs_ch \b0\().16b, \b1\().16b, \b2\().16b, \b3\().16b, \
+ \b4\().16b, \b5\().16b, \b6\().16b, \b7\().16b
+ inv_gf256 \b6\().16b, \b5\().16b, \b0\().16b, \b3\().16b, \
+ \b7\().16b, \b1\().16b, \b4\().16b, \b2\().16b, \
+ \t0\().16b, \t1\().16b, \t2\().16b, \t3\().16b, \
+ \s0\().16b, \s1\().16b, \s2\().16b, \s3\().16b
+ out_bs_ch \b7\().16b, \b1\().16b, \b4\().16b, \b2\().16b, \
+ \b6\().16b, \b5\().16b, \b0\().16b, \b3\().16b
+ .endm
+
+ .macro inv_sbox, b0, b1, b2, b3, b4, b5, b6, b7, \
+ t0, t1, t2, t3, s0, s1, s2, s3
+ inv_in_bs_ch \b0\().16b, \b1\().16b, \b2\().16b, \b3\().16b, \
+ \b4\().16b, \b5\().16b, \b6\().16b, \b7\().16b
+ inv_gf256 \b5\().16b, \b1\().16b, \b2\().16b, \b6\().16b, \
+ \b3\().16b, \b7\().16b, \b0\().16b, \b4\().16b, \
+ \t0\().16b, \t1\().16b, \t2\().16b, \t3\().16b, \
+ \s0\().16b, \s1\().16b, \s2\().16b, \s3\().16b
+ inv_out_bs_ch \b3\().16b, \b7\().16b, \b0\().16b, \b4\().16b, \
+ \b5\().16b, \b1\().16b, \b2\().16b, \b6\().16b
+ .endm
+
+ .macro enc_next_rk
+ ldp q16, q17, [bskey], #32
+ ldp q18, q19, [bskey], #32
+ ldp q20, q21, [bskey], #32
+ ldp q22, q23, [bskey], #32
+ .endm
+
+ .macro dec_next_rk
+ ldp q16, q17, [bskey, #-128]!
+ ldp q18, q19, [bskey, #32]
+ ldp q20, q21, [bskey, #64]
+ ldp q22, q23, [bskey, #96]
+ .endm
+
+ .macro add_round_key, x0, x1, x2, x3, x4, x5, x6, x7
+ eor \x0\().16b, \x0\().16b, v16.16b
+ eor \x1\().16b, \x1\().16b, v17.16b
+ eor \x2\().16b, \x2\().16b, v18.16b
+ eor \x3\().16b, \x3\().16b, v19.16b
+ eor \x4\().16b, \x4\().16b, v20.16b
+ eor \x5\().16b, \x5\().16b, v21.16b
+ eor \x6\().16b, \x6\().16b, v22.16b
+ eor \x7\().16b, \x7\().16b, v23.16b
+ .endm
+
+ .macro shift_rows, x0, x1, x2, x3, x4, x5, x6, x7, mask
+ tbl \x0\().16b, {\x0\().16b}, \mask\().16b
+ tbl \x1\().16b, {\x1\().16b}, \mask\().16b
+ tbl \x2\().16b, {\x2\().16b}, \mask\().16b
+ tbl \x3\().16b, {\x3\().16b}, \mask\().16b
+ tbl \x4\().16b, {\x4\().16b}, \mask\().16b
+ tbl \x5\().16b, {\x5\().16b}, \mask\().16b
+ tbl \x6\().16b, {\x6\().16b}, \mask\().16b
+ tbl \x7\().16b, {\x7\().16b}, \mask\().16b
+ .endm
+
+ .macro mix_cols, x0, x1, x2, x3, x4, x5, x6, x7, \
+ t0, t1, t2, t3, t4, t5, t6, t7, inv
+ ext \t0\().16b, \x0\().16b, \x0\().16b, #12
+ ext \t1\().16b, \x1\().16b, \x1\().16b, #12
+ eor \x0\().16b, \x0\().16b, \t0\().16b
+ ext \t2\().16b, \x2\().16b, \x2\().16b, #12
+ eor \x1\().16b, \x1\().16b, \t1\().16b
+ ext \t3\().16b, \x3\().16b, \x3\().16b, #12
+ eor \x2\().16b, \x2\().16b, \t2\().16b
+ ext \t4\().16b, \x4\().16b, \x4\().16b, #12
+ eor \x3\().16b, \x3\().16b, \t3\().16b
+ ext \t5\().16b, \x5\().16b, \x5\().16b, #12
+ eor \x4\().16b, \x4\().16b, \t4\().16b
+ ext \t6\().16b, \x6\().16b, \x6\().16b, #12
+ eor \x5\().16b, \x5\().16b, \t5\().16b
+ ext \t7\().16b, \x7\().16b, \x7\().16b, #12
+ eor \x6\().16b, \x6\().16b, \t6\().16b
+ eor \t1\().16b, \t1\().16b, \x0\().16b
+ eor \x7\().16b, \x7\().16b, \t7\().16b
+ ext \x0\().16b, \x0\().16b, \x0\().16b, #8
+ eor \t2\().16b, \t2\().16b, \x1\().16b
+ eor \t0\().16b, \t0\().16b, \x7\().16b
+ eor \t1\().16b, \t1\().16b, \x7\().16b
+ ext \x1\().16b, \x1\().16b, \x1\().16b, #8
+ eor \t5\().16b, \t5\().16b, \x4\().16b
+ eor \x0\().16b, \x0\().16b, \t0\().16b
+ eor \t6\().16b, \t6\().16b, \x5\().16b
+ eor \x1\().16b, \x1\().16b, \t1\().16b
+ ext \t0\().16b, \x4\().16b, \x4\().16b, #8
+ eor \t4\().16b, \t4\().16b, \x3\().16b
+ ext \t1\().16b, \x5\().16b, \x5\().16b, #8
+ eor \t7\().16b, \t7\().16b, \x6\().16b
+ ext \x4\().16b, \x3\().16b, \x3\().16b, #8
+ eor \t3\().16b, \t3\().16b, \x2\().16b
+ ext \x5\().16b, \x7\().16b, \x7\().16b, #8
+ eor \t4\().16b, \t4\().16b, \x7\().16b
+ ext \x3\().16b, \x6\().16b, \x6\().16b, #8
+ eor \t3\().16b, \t3\().16b, \x7\().16b
+ ext \x6\().16b, \x2\().16b, \x2\().16b, #8
+ eor \x7\().16b, \t1\().16b, \t5\().16b
+ .ifb \inv
+ eor \x2\().16b, \t0\().16b, \t4\().16b
+ eor \x4\().16b, \x4\().16b, \t3\().16b
+ eor \x5\().16b, \x5\().16b, \t7\().16b
+ eor \x3\().16b, \x3\().16b, \t6\().16b
+ eor \x6\().16b, \x6\().16b, \t2\().16b
+ .else
+ eor \t3\().16b, \t3\().16b, \x4\().16b
+ eor \x5\().16b, \x5\().16b, \t7\().16b
+ eor \x2\().16b, \x3\().16b, \t6\().16b
+ eor \x3\().16b, \t0\().16b, \t4\().16b
+ eor \x4\().16b, \x6\().16b, \t2\().16b
+ mov \x6\().16b, \t3\().16b
+ .endif
+ .endm
+
+ .macro inv_mix_cols, x0, x1, x2, x3, x4, x5, x6, x7, \
+ t0, t1, t2, t3, t4, t5, t6, t7
+ ext \t0\().16b, \x0\().16b, \x0\().16b, #8
+ ext \t6\().16b, \x6\().16b, \x6\().16b, #8
+ ext \t7\().16b, \x7\().16b, \x7\().16b, #8
+ eor \t0\().16b, \t0\().16b, \x0\().16b
+ ext \t1\().16b, \x1\().16b, \x1\().16b, #8
+ eor \t6\().16b, \t6\().16b, \x6\().16b
+ ext \t2\().16b, \x2\().16b, \x2\().16b, #8
+ eor \t7\().16b, \t7\().16b, \x7\().16b
+ ext \t3\().16b, \x3\().16b, \x3\().16b, #8
+ eor \t1\().16b, \t1\().16b, \x1\().16b
+ ext \t4\().16b, \x4\().16b, \x4\().16b, #8
+ eor \t2\().16b, \t2\().16b, \x2\().16b
+ ext \t5\().16b, \x5\().16b, \x5\().16b, #8
+ eor \t3\().16b, \t3\().16b, \x3\().16b
+ eor \t4\().16b, \t4\().16b, \x4\().16b
+ eor \t5\().16b, \t5\().16b, \x5\().16b
+ eor \x0\().16b, \x0\().16b, \t6\().16b
+ eor \x1\().16b, \x1\().16b, \t6\().16b
+ eor \x2\().16b, \x2\().16b, \t0\().16b
+ eor \x4\().16b, \x4\().16b, \t2\().16b
+ eor \x3\().16b, \x3\().16b, \t1\().16b
+ eor \x1\().16b, \x1\().16b, \t7\().16b
+ eor \x2\().16b, \x2\().16b, \t7\().16b
+ eor \x4\().16b, \x4\().16b, \t6\().16b
+ eor \x5\().16b, \x5\().16b, \t3\().16b
+ eor \x3\().16b, \x3\().16b, \t6\().16b
+ eor \x6\().16b, \x6\().16b, \t4\().16b
+ eor \x4\().16b, \x4\().16b, \t7\().16b
+ eor \x5\().16b, \x5\().16b, \t7\().16b
+ eor \x7\().16b, \x7\().16b, \t5\().16b
+ mix_cols \x0, \x1, \x2, \x3, \x4, \x5, \x6, \x7, \
+ \t0, \t1, \t2, \t3, \t4, \t5, \t6, \t7, 1
+ .endm
+
+ .macro swapmove_2x, a0, b0, a1, b1, n, mask, t0, t1
+ ushr \t0\().2d, \b0\().2d, #\n
+ ushr \t1\().2d, \b1\().2d, #\n
+ eor \t0\().16b, \t0\().16b, \a0\().16b
+ eor \t1\().16b, \t1\().16b, \a1\().16b
+ and \t0\().16b, \t0\().16b, \mask\().16b
+ and \t1\().16b, \t1\().16b, \mask\().16b
+ eor \a0\().16b, \a0\().16b, \t0\().16b
+ shl \t0\().2d, \t0\().2d, #\n
+ eor \a1\().16b, \a1\().16b, \t1\().16b
+ shl \t1\().2d, \t1\().2d, #\n
+ eor \b0\().16b, \b0\().16b, \t0\().16b
+ eor \b1\().16b, \b1\().16b, \t1\().16b
+ .endm
+
+ .macro bitslice, x7, x6, x5, x4, x3, x2, x1, x0, t0, t1, t2, t3
+ movi \t0\().16b, #0x55
+ movi \t1\().16b, #0x33
+ swapmove_2x \x0, \x1, \x2, \x3, 1, \t0, \t2, \t3
+ swapmove_2x \x4, \x5, \x6, \x7, 1, \t0, \t2, \t3
+ movi \t0\().16b, #0x0f
+ swapmove_2x \x0, \x2, \x1, \x3, 2, \t1, \t2, \t3
+ swapmove_2x \x4, \x6, \x5, \x7, 2, \t1, \t2, \t3
+ swapmove_2x \x0, \x4, \x1, \x5, 4, \t0, \t2, \t3
+ swapmove_2x \x2, \x6, \x3, \x7, 4, \t0, \t2, \t3
+ .endm
+
+
+ .align 6
+M0: .octa 0x0004080c0105090d02060a0e03070b0f
+
+M0SR: .octa 0x0004080c05090d010a0e02060f03070b
+SR: .octa 0x0f0e0d0c0a09080b0504070600030201
+SRM0: .octa 0x01060b0c0207080d0304090e00050a0f
+
+M0ISR: .octa 0x0004080c0d0105090a0e0206070b0f03
+ISR: .octa 0x0f0e0d0c080b0a090504070602010003
+ISRM0: .octa 0x0306090c00070a0d01040b0e0205080f
+
+ /*
+ * void aesbs_convert_key(u8 out[], u32 const rk[], int rounds)
+ */
+ENTRY(aesbs_convert_key)
+ ld1 {v7.4s}, [x1], #16 // load round 0 key
+ ld1 {v17.4s}, [x1], #16 // load round 1 key
+
+ movi v8.16b, #0x01 // bit masks
+ movi v9.16b, #0x02
+ movi v10.16b, #0x04
+ movi v11.16b, #0x08
+ movi v12.16b, #0x10
+ movi v13.16b, #0x20
+ movi v14.16b, #0x40
+ movi v15.16b, #0x80
+ ldr q16, M0
+
+ sub x2, x2, #1
+ str q7, [x0], #16 // save round 0 key
+
+.Lkey_loop:
+ tbl v7.16b ,{v17.16b}, v16.16b
+ ld1 {v17.4s}, [x1], #16 // load next round key
+
+ cmtst v0.16b, v7.16b, v8.16b
+ cmtst v1.16b, v7.16b, v9.16b
+ cmtst v2.16b, v7.16b, v10.16b
+ cmtst v3.16b, v7.16b, v11.16b
+ cmtst v4.16b, v7.16b, v12.16b
+ cmtst v5.16b, v7.16b, v13.16b
+ cmtst v6.16b, v7.16b, v14.16b
+ cmtst v7.16b, v7.16b, v15.16b
+ not v0.16b, v0.16b
+ not v1.16b, v1.16b
+ not v5.16b, v5.16b
+ not v6.16b, v6.16b
+
+ subs x2, x2, #1
+ stp q2, q3, [x0, #32]
+ stp q4, q5, [x0, #64]
+ stp q6, q7, [x0, #96]
+ stp q0, q1, [x0], #128
+ b.ne .Lkey_loop
+
+ movi v7.16b, #0x63 // compose .L63
+ eor v17.16b, v17.16b, v7.16b
+ str q17, [x0]
+ ret
+ENDPROC(aesbs_convert_key)
+
+ .align 4
+aesbs_encrypt8:
+ ldr q9, [bskey], #16 // round 0 key
+ ldr q8, M0SR
+ ldr q24, SR
+
+ eor v10.16b, v0.16b, v9.16b // xor with round0 key
+ eor v11.16b, v1.16b, v9.16b
+ tbl v0.16b, {v10.16b}, v8.16b
+ eor v12.16b, v2.16b, v9.16b
+ tbl v1.16b, {v11.16b}, v8.16b
+ eor v13.16b, v3.16b, v9.16b
+ tbl v2.16b, {v12.16b}, v8.16b
+ eor v14.16b, v4.16b, v9.16b
+ tbl v3.16b, {v13.16b}, v8.16b
+ eor v15.16b, v5.16b, v9.16b
+ tbl v4.16b, {v14.16b}, v8.16b
+ eor v10.16b, v6.16b, v9.16b
+ tbl v5.16b, {v15.16b}, v8.16b
+ eor v11.16b, v7.16b, v9.16b
+ tbl v6.16b, {v10.16b}, v8.16b
+ tbl v7.16b, {v11.16b}, v8.16b
+
+ bitslice v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11
+
+ sub rounds, rounds, #1
+ b .Lenc_sbox
+
+.Lenc_loop:
+ shift_rows v0, v1, v2, v3, v4, v5, v6, v7, v24
+.Lenc_sbox:
+ sbox v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, \
+ v13, v14, v15
+ subs rounds, rounds, #1
+ b.cc .Lenc_done
+
+ enc_next_rk
+
+ mix_cols v0, v1, v4, v6, v3, v7, v2, v5, v8, v9, v10, v11, v12, \
+ v13, v14, v15
+
+ add_round_key v0, v1, v2, v3, v4, v5, v6, v7
+
+ b.ne .Lenc_loop
+ ldr q24, SRM0
+ b .Lenc_loop
+
+.Lenc_done:
+ ldr q12, [bskey] // last round key
+
+ bitslice v0, v1, v4, v6, v3, v7, v2, v5, v8, v9, v10, v11
+
+ eor v0.16b, v0.16b, v12.16b
+ eor v1.16b, v1.16b, v12.16b
+ eor v4.16b, v4.16b, v12.16b
+ eor v6.16b, v6.16b, v12.16b
+ eor v3.16b, v3.16b, v12.16b
+ eor v7.16b, v7.16b, v12.16b
+ eor v2.16b, v2.16b, v12.16b
+ eor v5.16b, v5.16b, v12.16b
+ ret
+ENDPROC(aesbs_encrypt8)
+
+ .align 4
+aesbs_decrypt8:
+ lsl x9, rounds, #7
+ add bskey, bskey, x9
+
+ ldr q9, [bskey, #-112]! // round 0 key
+ ldr q8, M0ISR
+ ldr q24, ISR
+
+ eor v10.16b, v0.16b, v9.16b // xor with round0 key
+ eor v11.16b, v1.16b, v9.16b
+ tbl v0.16b, {v10.16b}, v8.16b
+ eor v12.16b, v2.16b, v9.16b
+ tbl v1.16b, {v11.16b}, v8.16b
+ eor v13.16b, v3.16b, v9.16b
+ tbl v2.16b, {v12.16b}, v8.16b
+ eor v14.16b, v4.16b, v9.16b
+ tbl v3.16b, {v13.16b}, v8.16b
+ eor v15.16b, v5.16b, v9.16b
+ tbl v4.16b, {v14.16b}, v8.16b
+ eor v10.16b, v6.16b, v9.16b
+ tbl v5.16b, {v15.16b}, v8.16b
+ eor v11.16b, v7.16b, v9.16b
+ tbl v6.16b, {v10.16b}, v8.16b
+ tbl v7.16b, {v11.16b}, v8.16b
+
+ bitslice v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11
+
+ sub rounds, rounds, #1
+ b .Ldec_sbox
+
+.Ldec_loop:
+ shift_rows v0, v1, v2, v3, v4, v5, v6, v7, v24
+.Ldec_sbox:
+ inv_sbox v0, v1, v2, v3, v4, v5, v6, v7, v8, v9, v10, v11, v12, \
+ v13, v14, v15
+ subs rounds, rounds, #1
+ b.cc .Ldec_done
+
+ dec_next_rk
+
+ add_round_key v0, v1, v6, v4, v2, v7, v3, v5
+
+ inv_mix_cols v0, v1, v6, v4, v2, v7, v3, v5, v8, v9, v10, v11, v12, \
+ v13, v14, v15
+
+ b.ne .Ldec_loop
+ ldr q24, ISRM0
+ b .Ldec_loop
+.Ldec_done:
+ ldr q12, [bskey, #-16] // last round key
+
+ bitslice v0, v1, v6, v4, v2, v7, v3, v5, v8, v9, v10, v11
+
+ eor v0.16b, v0.16b, v12.16b
+ eor v1.16b, v1.16b, v12.16b
+ eor v6.16b, v6.16b, v12.16b
+ eor v4.16b, v4.16b, v12.16b
+ eor v2.16b, v2.16b, v12.16b
+ eor v7.16b, v7.16b, v12.16b
+ eor v3.16b, v3.16b, v12.16b
+ eor v5.16b, v5.16b, v12.16b
+ ret
+ENDPROC(aesbs_decrypt8)
+
+ /*
+ * aesbs_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
+ * int blocks)
+ * aesbs_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
+ * int blocks)
+ */
+ .macro __ecb_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
+ stp x29, x30, [sp, #-16]!
+ mov x29, sp
+
+99: mov x5, #1
+ lsl x5, x5, x4
+ subs w4, w4, #8
+ csel x4, x4, xzr, pl
+ csel x5, x5, xzr, mi
+
+ ld1 {v0.16b}, [x1], #16
+ tbnz x5, #1, 0f
+ ld1 {v1.16b}, [x1], #16
+ tbnz x5, #2, 0f
+ ld1 {v2.16b}, [x1], #16
+ tbnz x5, #3, 0f
+ ld1 {v3.16b}, [x1], #16
+ tbnz x5, #4, 0f
+ ld1 {v4.16b}, [x1], #16
+ tbnz x5, #5, 0f
+ ld1 {v5.16b}, [x1], #16
+ tbnz x5, #6, 0f
+ ld1 {v6.16b}, [x1], #16
+ tbnz x5, #7, 0f
+ ld1 {v7.16b}, [x1], #16
+
+0: mov bskey, x2
+ mov rounds, x3
+ bl \do8
+
+ st1 {\o0\().16b}, [x0], #16
+ tbnz x5, #1, 1f
+ st1 {\o1\().16b}, [x0], #16
+ tbnz x5, #2, 1f
+ st1 {\o2\().16b}, [x0], #16
+ tbnz x5, #3, 1f
+ st1 {\o3\().16b}, [x0], #16
+ tbnz x5, #4, 1f
+ st1 {\o4\().16b}, [x0], #16
+ tbnz x5, #5, 1f
+ st1 {\o5\().16b}, [x0], #16
+ tbnz x5, #6, 1f
+ st1 {\o6\().16b}, [x0], #16
+ tbnz x5, #7, 1f
+ st1 {\o7\().16b}, [x0], #16
+
+ cbnz x4, 99b
+
+1: ldp x29, x30, [sp], #16
+ ret
+ .endm
+
+ .align 4
+ENTRY(aesbs_ecb_encrypt)
+ __ecb_crypt aesbs_encrypt8, v0, v1, v4, v6, v3, v7, v2, v5
+ENDPROC(aesbs_ecb_encrypt)
+
+ .align 4
+ENTRY(aesbs_ecb_decrypt)
+ __ecb_crypt aesbs_decrypt8, v0, v1, v6, v4, v2, v7, v3, v5
+ENDPROC(aesbs_ecb_decrypt)
+
+ .macro next_tweak, out, in, const, tmp
+ sshr \tmp\().2d, \in\().2d, #63
+ and \tmp\().16b, \tmp\().16b, \const\().16b
+ add \out\().2d, \in\().2d, \in\().2d
+ ext \tmp\().16b, \tmp\().16b, \tmp\().16b, #8
+ eor \out\().16b, \out\().16b, \tmp\().16b
+ .endm
+
+ .align 4
+.Lxts_mul_x:
+CPU_LE( .quad 1, 0x87 )
+CPU_BE( .quad 0x87, 1 )
+
+ /*
+ * aesbs_xts_encrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
+ * int blocks, u8 iv[])
+ * aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[], int rounds,
+ * int blocks, u8 iv[])
+ */
+__xts_crypt8:
+ mov x6, #1
+ lsl x6, x6, x4
+ subs w4, w4, #8
+ csel x4, x4, xzr, pl
+ csel x6, x6, xzr, mi
+
+ ld1 {v0.16b}, [x1], #16
+ next_tweak v26, v25, v30, v31
+ eor v0.16b, v0.16b, v25.16b
+ tbnz x6, #1, 0f
+
+ ld1 {v1.16b}, [x1], #16
+ next_tweak v27, v26, v30, v31
+ eor v1.16b, v1.16b, v26.16b
+ tbnz x6, #2, 0f
+
+ ld1 {v2.16b}, [x1], #16
+ next_tweak v28, v27, v30, v31
+ eor v2.16b, v2.16b, v27.16b
+ tbnz x6, #3, 0f
+
+ ld1 {v3.16b}, [x1], #16
+ next_tweak v29, v28, v30, v31
+ eor v3.16b, v3.16b, v28.16b
+ tbnz x6, #4, 0f
+
+ ld1 {v4.16b}, [x1], #16
+ str q29, [sp, #16]
+ eor v4.16b, v4.16b, v29.16b
+ next_tweak v29, v29, v30, v31
+ tbnz x6, #5, 0f
+
+ ld1 {v5.16b}, [x1], #16
+ str q29, [sp, #32]
+ eor v5.16b, v5.16b, v29.16b
+ next_tweak v29, v29, v30, v31
+ tbnz x6, #6, 0f
+
+ ld1 {v6.16b}, [x1], #16
+ str q29, [sp, #48]
+ eor v6.16b, v6.16b, v29.16b
+ next_tweak v29, v29, v30, v31
+ tbnz x6, #7, 0f
+
+ ld1 {v7.16b}, [x1], #16
+ str q29, [sp, #64]
+ eor v7.16b, v7.16b, v29.16b
+ next_tweak v29, v29, v30, v31
+
+0: mov bskey, x2
+ mov rounds, x3
+ br x7
+ENDPROC(__xts_crypt8)
+
+ .macro __xts_crypt, do8, o0, o1, o2, o3, o4, o5, o6, o7
+ stp x29, x30, [sp, #-80]!
+ mov x29, sp
+
+ ldr q30, .Lxts_mul_x
+ ld1 {v25.16b}, [x5]
+
+99: adr x7, \do8
+ bl __xts_crypt8
+
+ ldp q16, q17, [sp, #16]
+ ldp q18, q19, [sp, #48]
+
+ eor \o0\().16b, \o0\().16b, v25.16b
+ eor \o1\().16b, \o1\().16b, v26.16b
+ eor \o2\().16b, \o2\().16b, v27.16b
+ eor \o3\().16b, \o3\().16b, v28.16b
+
+ st1 {\o0\().16b}, [x0], #16
+ mov v25.16b, v26.16b
+ tbnz x6, #1, 1f
+ st1 {\o1\().16b}, [x0], #16
+ mov v25.16b, v27.16b
+ tbnz x6, #2, 1f
+ st1 {\o2\().16b}, [x0], #16
+ mov v25.16b, v28.16b
+ tbnz x6, #3, 1f
+ st1 {\o3\().16b}, [x0], #16
+ mov v25.16b, v29.16b
+ tbnz x6, #4, 1f
+
+ eor \o4\().16b, \o4\().16b, v16.16b
+ eor \o5\().16b, \o5\().16b, v17.16b
+ eor \o6\().16b, \o6\().16b, v18.16b
+ eor \o7\().16b, \o7\().16b, v19.16b
+
+ st1 {\o4\().16b}, [x0], #16
+ tbnz x6, #5, 1f
+ st1 {\o5\().16b}, [x0], #16
+ tbnz x6, #6, 1f
+ st1 {\o6\().16b}, [x0], #16
+ tbnz x6, #7, 1f
+ st1 {\o7\().16b}, [x0], #16
+
+ cbnz x4, 99b
+
+1: st1 {v25.16b}, [x5]
+ ldp x29, x30, [sp], #80
+ ret
+ .endm
+
+ENTRY(aesbs_xts_encrypt)
+ __xts_crypt aesbs_encrypt8, v0, v1, v4, v6, v3, v7, v2, v5
+ENDPROC(aesbs_xts_encrypt)
+
+ENTRY(aesbs_xts_decrypt)
+ __xts_crypt aesbs_decrypt8, v0, v1, v6, v4, v2, v7, v3, v5
+ENDPROC(aesbs_xts_decrypt)
+
+ .macro next_ctr, v
+ mov \v\().d[1], x8
+ mov \v\().d[0], x7
+ adds x8, x8, #1
+ adc x7, x7, xzr
+ rev64 \v\().16b, \v\().16b
+ .endm
+
+ /*
+ * aesbs_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[],
+ * int rounds, int blocks, u8 iv[], bool final)
+ */
+ENTRY(aesbs_ctr_encrypt)
+ stp x29, x30, [sp, #-16]!
+ mov x29, sp
+
+ add x4, x4, x6 // do one extra block if final
+
+ ldp x7, x8, [x5]
+ ld1 {v0.16b}, [x5]
+CPU_LE( rev x7, x7 )
+CPU_LE( rev x8, x8 )
+ adds x8, x8, #1
+ adc x7, x7, xzr
+
+99: mov x9, #1
+ lsl x9, x9, x4
+ subs w4, w4, #8
+ csel x4, x4, xzr, pl
+ csel x9, x9, xzr, le
+
+ tbnz x9, #1, 0f
+
+ next_ctr v1
+ tbnz x9, #2, 0f
+
+ next_ctr v2
+ tbnz x9, #3, 0f
+
+ next_ctr v3
+ tbnz x9, #4, 0f
+
+ next_ctr v4
+ tbnz x9, #5, 0f
+
+ next_ctr v5
+ tbnz x9, #6, 0f
+
+ next_ctr v6
+ tbnz x9, #7, 0f
+
+ next_ctr v7
+
+0: mov bskey, x2
+ mov rounds, x3
+ bl aesbs_encrypt8
+
+ lsr x9, x9, x6 // disregard the final block
+ tbnz x9, #0, 0f
+
+ ld1 {v8.16b}, [x1], #16
+ eor v0.16b, v0.16b, v8.16b
+ st1 {v0.16b}, [x0], #16
+ tbnz x9, #1, 1f
+
+ ld1 {v9.16b}, [x1], #16
+ eor v1.16b, v1.16b, v9.16b
+ st1 {v1.16b}, [x0], #16
+ tbnz x9, #2, 2f
+
+ ld1 {v10.16b}, [x1], #16
+ eor v4.16b, v4.16b, v10.16b
+ st1 {v4.16b}, [x0], #16
+ tbnz x9, #3, 3f
+
+ ld1 {v11.16b}, [x1], #16
+ eor v6.16b, v6.16b, v11.16b
+ st1 {v6.16b}, [x0], #16
+ tbnz x9, #4, 4f
+
+ ld1 {v12.16b}, [x1], #16
+ eor v3.16b, v3.16b, v12.16b
+ st1 {v3.16b}, [x0], #16
+ tbnz x9, #5, 5f
+
+ ld1 {v13.16b}, [x1], #16
+ eor v7.16b, v7.16b, v13.16b
+ st1 {v7.16b}, [x0], #16
+ tbnz x9, #6, 6f
+
+ ld1 {v14.16b}, [x1], #16
+ eor v2.16b, v2.16b, v14.16b
+ st1 {v2.16b}, [x0], #16
+ tbnz x9, #7, 7f
+
+ ld1 {v15.16b}, [x1], #16
+ eor v5.16b, v5.16b, v15.16b
+ st1 {v5.16b}, [x0], #16
+
+ next_ctr v0
+ cbnz x4, 99b
+
+0: st1 {v0.16b}, [x5]
+8: ldp x29, x30, [sp], #16
+ ret
+
+ /*
+ * If we are handling the tail of the input (x6 == 1), return the
+ * final keystream block back to the caller via the IV buffer.
+ */
+1: cbz x6, 8b
+ st1 {v1.16b}, [x5]
+ b 8b
+2: cbz x6, 8b
+ st1 {v4.16b}, [x5]
+ b 8b
+3: cbz x6, 8b
+ st1 {v6.16b}, [x5]
+ b 8b
+4: cbz x6, 8b
+ st1 {v3.16b}, [x5]
+ b 8b
+5: cbz x6, 8b
+ st1 {v7.16b}, [x5]
+ b 8b
+6: cbz x6, 8b
+ st1 {v2.16b}, [x5]
+ b 8b
+7: cbz x6, 8b
+ st1 {v5.16b}, [x5]
+ b 8b
+ENDPROC(aesbs_ctr_encrypt)
diff --git a/arch/arm64/crypto/aes-neonbs-glue.c b/arch/arm64/crypto/aes-neonbs-glue.c
new file mode 100644
index 000000000000..57982172563c
--- /dev/null
+++ b/arch/arm64/crypto/aes-neonbs-glue.c
@@ -0,0 +1,300 @@
+/*
+ * Bit sliced AES using NEON instructions
+ *
+ * Copyright (C) 2016 Linaro Ltd <ard.biesheuvel@linaro.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <asm/neon.h>
+#include <crypto/aes.h>
+#include <crypto/internal/simd.h>
+#include <crypto/internal/skcipher.h>
+#include <crypto/xts.h>
+#include <linux/module.h>
+
+MODULE_AUTHOR("Ard Biesheuvel <ard.biesheuvel@linaro.org>");
+MODULE_LICENSE("GPL v2");
+
+asmlinkage void aesbs_ecb_encrypt(u8 out[], u8 const in[], u8 const rk[],
+ int rounds, int blocks);
+asmlinkage void aesbs_ecb_decrypt(u8 out[], u8 const in[], u8 const rk[],
+ int rounds, int blocks);
+
+asmlinkage void aesbs_xts_encrypt(u8 out[], u8 const in[], u8 const rk[],
+ int rounds, int blocks, u8 iv[]);
+asmlinkage void aesbs_xts_decrypt(u8 out[], u8 const in[], u8 const rk[],
+ int rounds, int blocks, u8 iv[]);
+
+asmlinkage void aesbs_ctr_encrypt(u8 out[], u8 const in[], u8 const rk[],
+ int rounds, int blocks, u8 iv[], bool final);
+
+asmlinkage void aesbs_convert_key(u8 out[], u32 const rk[], int rounds);
+
+struct aesbs_key {
+ u8 key[13 * (8 * AES_BLOCK_SIZE) + 32];
+};
+
+struct aesbs_ctx {
+ struct aesbs_key bskey;
+ int rounds;
+};
+
+struct aesbs_xts_ctx {
+ struct aesbs_key bskey;
+ struct crypto_cipher *tweak_tfm;
+ int rounds;
+};
+
+static int aesbs_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct aesbs_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct crypto_aes_ctx rk;
+ int err;
+
+ err = crypto_aes_expand_key(&rk, in_key, key_len);
+ if (err)
+ return err;
+
+ ctx->rounds = 6 + key_len / 4;
+
+ kernel_neon_begin();
+ aesbs_convert_key(ctx->bskey.key, rk.key_enc, ctx->rounds);
+ kernel_neon_end();
+
+ return 0;
+}
+
+static int xts_init(struct crypto_skcipher *tfm)
+{
+ struct aesbs_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+ ctx->tweak_tfm = crypto_alloc_cipher("aes", 0, CRYPTO_ALG_ASYNC);
+ if (IS_ERR(ctx->tweak_tfm))
+ return PTR_ERR(ctx->tweak_tfm);
+
+ return 0;
+}
+
+static void xts_exit(struct crypto_skcipher *tfm)
+{
+ struct aesbs_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+
+ crypto_free_cipher(ctx->tweak_tfm);
+}
+
+static int aesbs_xts_setkey(struct crypto_skcipher *tfm, const u8 *in_key,
+ unsigned int key_len)
+{
+ struct aesbs_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct crypto_aes_ctx rk;
+ int err;
+
+ err = xts_verify_key(tfm, in_key, key_len);
+ if (err)
+ return err;
+
+ err = crypto_cipher_setkey(ctx->tweak_tfm, in_key + key_len / 2,
+ key_len / 2);
+ if (err)
+ return err;
+
+ err = crypto_aes_expand_key(&rk, in_key, key_len / 2);
+ if (err)
+ return err;
+
+ ctx->rounds = 6 + key_len / 8;
+
+ kernel_neon_begin();
+ aesbs_convert_key(ctx->bskey.key, rk.key_enc, ctx->rounds);
+ kernel_neon_end();
+
+ return 0;
+}
+
+static int __ecb_crypt(struct skcipher_request *req,
+ void (*fn)(u8 out[], u8 const in[], u8 const rk[],
+ int rounds, int blocks))
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ struct aesbs_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, true);
+
+ kernel_neon_begin();
+ while (walk.nbytes >= AES_BLOCK_SIZE) {
+ unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
+
+ if (walk.nbytes < walk.total)
+ blocks = round_down(blocks,
+ walk.chunksize / AES_BLOCK_SIZE);
+
+ fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->bskey.key,
+ ctx->rounds, blocks);
+ err = skcipher_walk_done(&walk,
+ walk.nbytes - blocks * AES_BLOCK_SIZE);
+ }
+ kernel_neon_end();
+
+ return err;
+}
+
+static int ecb_encrypt(struct skcipher_request *req)
+{
+ return __ecb_crypt(req, aesbs_ecb_encrypt);
+}
+
+static int ecb_decrypt(struct skcipher_request *req)
+{
+ return __ecb_crypt(req, aesbs_ecb_decrypt);
+}
+
+static int __xts_crypt(struct skcipher_request *req,
+ void (*fn)(u8 out[], u8 const in[], u8 const rk[],
+ int rounds, int blocks, u8 iv[]))
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ struct aesbs_xts_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, true);
+
+ crypto_cipher_encrypt_one(ctx->tweak_tfm, walk.iv, walk.iv);
+
+ kernel_neon_begin();
+ while (walk.nbytes >= AES_BLOCK_SIZE) {
+ unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
+
+ if (walk.nbytes < walk.total)
+ blocks = round_down(blocks,
+ walk.chunksize / AES_BLOCK_SIZE);
+
+ fn(walk.dst.virt.addr, walk.src.virt.addr, ctx->bskey.key,
+ ctx->rounds, blocks, walk.iv);
+ err = skcipher_walk_done(&walk,
+ walk.nbytes - blocks * AES_BLOCK_SIZE);
+ }
+ kernel_neon_end();
+
+ return err;
+}
+
+static int xts_encrypt(struct skcipher_request *req)
+{
+ return __xts_crypt(req, aesbs_xts_encrypt);
+}
+
+static int xts_decrypt(struct skcipher_request *req)
+{
+ return __xts_crypt(req, aesbs_xts_decrypt);
+}
+
+static int ctr_encrypt(struct skcipher_request *req)
+{
+ struct crypto_skcipher *tfm = crypto_skcipher_reqtfm(req);
+ struct aesbs_ctx *ctx = crypto_skcipher_ctx(tfm);
+ struct skcipher_walk walk;
+ int err;
+
+ err = skcipher_walk_virt(&walk, req, true);
+
+ kernel_neon_begin();
+ while (walk.nbytes > 0) {
+ unsigned int blocks = walk.nbytes / AES_BLOCK_SIZE;
+ bool final = (walk.total % AES_BLOCK_SIZE) != 0;
+
+ if (walk.nbytes < walk.total) {
+ blocks = round_down(blocks,
+ walk.chunksize / AES_BLOCK_SIZE);
+ final = false;
+ }
+
+ aesbs_ctr_encrypt(walk.dst.virt.addr, walk.src.virt.addr,
+ ctx->bskey.key, ctx->rounds, blocks, walk.iv,
+ final);
+
+ if (final) {
+ u8 *dst = walk.dst.virt.addr + blocks * AES_BLOCK_SIZE;
+ u8 *src = walk.src.virt.addr + blocks * AES_BLOCK_SIZE;
+
+ if (dst != src)
+ memcpy(dst, src, walk.total % AES_BLOCK_SIZE);
+ crypto_xor(dst, walk.iv, walk.total % AES_BLOCK_SIZE);
+
+ err = skcipher_walk_done(&walk, 0);
+ break;
+ }
+ err = skcipher_walk_done(&walk,
+ walk.nbytes - blocks * AES_BLOCK_SIZE);
+ }
+ kernel_neon_end();
+
+ return err;
+}
+
+static struct skcipher_alg aes_algs[] = { {
+ .base.cra_name = "ecb(aes)",
+ .base.cra_driver_name = "ecb-aes-neonbs",
+ .base.cra_priority = 200,
+ .base.cra_blocksize = AES_BLOCK_SIZE,
+ .base.cra_ctxsize = sizeof(struct aesbs_ctx),
+ .base.cra_module = THIS_MODULE,
+
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .chunksize = 8 * AES_BLOCK_SIZE,
+ .setkey = aesbs_setkey,
+ .encrypt = ecb_encrypt,
+ .decrypt = ecb_decrypt,
+}, {
+ .base.cra_name = "xts(aes)",
+ .base.cra_driver_name = "xts-aes-neonbs",
+ .base.cra_priority = 200,
+ .base.cra_blocksize = AES_BLOCK_SIZE,
+ .base.cra_ctxsize = sizeof(struct aesbs_xts_ctx),
+ .base.cra_module = THIS_MODULE,
+
+ .min_keysize = 2 * AES_MIN_KEY_SIZE,
+ .max_keysize = 2 * AES_MAX_KEY_SIZE,
+ .chunksize = 8 * AES_BLOCK_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .setkey = aesbs_xts_setkey,
+ .encrypt = xts_encrypt,
+ .decrypt = xts_decrypt,
+ .init = xts_init,
+ .exit = xts_exit,
+}, {
+ .base.cra_name = "ctr(aes)",
+ .base.cra_driver_name = "ctr-aes-neonbs",
+ .base.cra_priority = 200,
+ .base.cra_blocksize = 1,
+ .base.cra_ctxsize = sizeof(struct aesbs_ctx),
+ .base.cra_module = THIS_MODULE,
+
+ .min_keysize = AES_MIN_KEY_SIZE,
+ .max_keysize = AES_MAX_KEY_SIZE,
+ .chunksize = 8 * AES_BLOCK_SIZE,
+ .ivsize = AES_BLOCK_SIZE,
+ .setkey = aesbs_setkey,
+ .encrypt = ctr_encrypt,
+ .decrypt = ctr_encrypt,
+} };
+
+static int __init aes_init(void)
+{
+ return crypto_register_skciphers(aes_algs, ARRAY_SIZE(aes_algs));
+}
+
+static void aes_exit(void)
+{
+ crypto_unregister_skciphers(aes_algs, ARRAY_SIZE(aes_algs));
+}
+
+module_init(aes_init);
+module_exit(aes_exit);
--
2.7.4
^ permalink raw reply related
* Re: Remaining crypto API regressions with CONFIG_VMAP_STACK
From: Andy Lutomirski @ 2016-12-12 18:34 UTC (permalink / raw)
To: Eric Biggers
Cc: linux-crypto, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kernel-hardening@lists.openwall.com, Herbert Xu,
Andrew Lutomirski, Stephan Mueller
In-Reply-To: <20161209230851.GB64048@google.com>
On Fri, Dec 9, 2016 at 3:08 PM, Eric Biggers <ebiggers3@gmail.com> wrote:
> In the 4.9 kernel, virtually-mapped stacks will be supported and enabled by
> default on x86_64. This has been exposing a number of problems in which
> on-stack buffers are being passed into the crypto API, which to support crypto
> accelerators operates on 'struct page' rather than on virtual memory.
Here's my status.
> drivers/crypto/bfin_crc.c:351
> drivers/crypto/qce/sha.c:299
> drivers/crypto/sahara.c:973,988
> drivers/crypto/talitos.c:1910
> drivers/crypto/qce/sha.c:325
I have a patch to make these depend on !VMAP_STACK.
> drivers/crypto/ccp/ccp-crypto-aes-cmac.c:105,119,142
> drivers/crypto/ccp/ccp-crypto-sha.c:95,109,124
> drivers/crypto/ccp/ccp-crypto-aes-xts.c:162
> drivers/crypto/ccp/ccp-crypto-aes.c:94
According to Herbert, these are fine. I'm personally less convinced
since I'm very confused as to what "async" means in the crypto code,
but I'm going to leave these alone.
>
> And these other places do crypto operations on buffers clearly on the stack:
>
> drivers/usb/wusbcore/crypto.c:264
> security/keys/encrypted-keys/encrypted.c:500
I have a patch.
> drivers/net/wireless/intersil/orinoco/mic.c:72
I have a patch to convert this to, drumroll please:
priv->tx_tfm_mic = crypto_alloc_shash("michael_mic", 0,
CRYPTO_ALG_ASYNC);
Herbert, I'm at a loss as what a "shash" that's "ASYNC" even means.
> net/ceph/crypto.c:182
This:
size_t zero_padding = (0x10 - (src_len & 0x0f));
is an amazing line of code...
But this driver uses cbc and wants to do synchronous crypto, and I
don't think that the crypto API supports real synchronous crypto using
CBC, so I'm going to let someone else fix this.
> net/rxrpc/rxkad.c:737,1000
Herbert, can you fix this?
> fs/cifs/smbencrypt.c:96
I have a patch.
My pile is here:
https://git.kernel.org/cgit/linux/kernel/git/luto/linux.git/log/?h=crypto
I'll send out the patches soon.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* Re: Remaining crypto API regressions with CONFIG_VMAP_STACK
From: Gary R Hook @ 2016-12-12 18:45 UTC (permalink / raw)
To: Andy Lutomirski, Eric Biggers
Cc: linux-crypto, linux-kernel@vger.kernel.org, linux-mm@kvack.org,
kernel-hardening@lists.openwall.com, Herbert Xu,
Andrew Lutomirski, Stephan Mueller
In-Reply-To: <CALCETrWfa5VJQNu3XjeFhF0cDFWF+M-dPwsT_7dzO5YSxsneGg@mail.gmail.com>
On 12/12/2016 12:34 PM, Andy Lutomirski wrote:
<...snip...>
>
> I have a patch to make these depend on !VMAP_STACK.
>
>> drivers/crypto/ccp/ccp-crypto-aes-cmac.c:105,119,142
>> drivers/crypto/ccp/ccp-crypto-sha.c:95,109,124
>> drivers/crypto/ccp/ccp-crypto-aes-xts.c:162
>> drivers/crypto/ccp/ccp-crypto-aes.c:94
>
> According to Herbert, these are fine. I'm personally less convinced
> since I'm very confused as to what "async" means in the crypto code,
> but I'm going to leave these alone.
I went back through the code, and AFAICT every argument to sg_init_one() in
the above-cited files is a buffer that is part of the request context. Which
is allocated by the crypto framework, and therefore will never be on the
stack.
Right?
I don't (as yet) see a need for any patch to these. Someone correct me
if I'm
missing something.
<...snip...>
--
This is my day job. Follow me at:
IG/Twitter/Facebook: @grhookphoto
IG/Twitter/Facebook: @grhphotographer
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply
* [PATCH] wusbcore: Fix one more crypto-on-the-stack bug
From: Andy Lutomirski @ 2016-12-12 20:52 UTC (permalink / raw)
To: linux-kernel, linux-usb, gregkh
Cc: Eric Biggers, linux-crypto, Herbert Xu, Stephan Mueller,
Andy Lutomirski
The driver put a constant buffer of all zeros on the stack and
pointed a scatterlist entry at it. This doesn't work with virtual
stacks. Make the buffer static to fix it.
Cc: stable@vger.kernel.org # 4.9 only
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
drivers/usb/wusbcore/crypto.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/drivers/usb/wusbcore/crypto.c b/drivers/usb/wusbcore/crypto.c
index 79451f7ef1b7..a7e007a0cd49 100644
--- a/drivers/usb/wusbcore/crypto.c
+++ b/drivers/usb/wusbcore/crypto.c
@@ -216,7 +216,7 @@ static int wusb_ccm_mac(struct crypto_skcipher *tfm_cbc,
struct scatterlist sg[4], sg_dst;
void *dst_buf;
size_t dst_size;
- const u8 bzero[16] = { 0 };
+ static const u8 bzero[16] = { 0 };
u8 iv[crypto_skcipher_ivsize(tfm_cbc)];
size_t zero_padding;
--
2.9.3
^ permalink raw reply related
* [PATCH] keys/encrypted: Fix two crypto-on-the-stack bugs
From: Andy Lutomirski @ 2016-12-12 20:53 UTC (permalink / raw)
To: linux-kernel, linux-usb, dhowells, keyrings
Cc: Eric Biggers, linux-crypto, Herbert Xu, Stephan Mueller,
Andy Lutomirski
In-Reply-To: <8c273c9c41f51b34bb3115086f1d776895580637.1481575835.git.luto@kernel.org>
The driver put a constant buffer of all zeros on the stack and
pointed a scatterlist entry at it in two places. This doesn't work
with virtual stacks. Use a static 16-byte buffer of zeros instead.
Cc: stable@vger.kernel.org # 4.9 only
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
security/keys/encrypted-keys/encrypted.c | 9 +++------
1 file changed, 3 insertions(+), 6 deletions(-)
diff --git a/security/keys/encrypted-keys/encrypted.c b/security/keys/encrypted-keys/encrypted.c
index 17a06105ccb6..fab2fb864002 100644
--- a/security/keys/encrypted-keys/encrypted.c
+++ b/security/keys/encrypted-keys/encrypted.c
@@ -46,6 +46,7 @@ static const char key_format_default[] = "default";
static const char key_format_ecryptfs[] = "ecryptfs";
static unsigned int ivsize;
static int blksize;
+static const char zero_pad[16] = {0};
#define KEY_TRUSTED_PREFIX_LEN (sizeof (KEY_TRUSTED_PREFIX) - 1)
#define KEY_USER_PREFIX_LEN (sizeof (KEY_USER_PREFIX) - 1)
@@ -481,7 +482,6 @@ static int derived_key_encrypt(struct encrypted_key_payload *epayload,
unsigned int encrypted_datalen;
u8 iv[AES_BLOCK_SIZE];
unsigned int padlen;
- char pad[16];
int ret;
encrypted_datalen = roundup(epayload->decrypted_datalen, blksize);
@@ -493,11 +493,10 @@ static int derived_key_encrypt(struct encrypted_key_payload *epayload,
goto out;
dump_decrypted_data(epayload);
- memset(pad, 0, sizeof pad);
sg_init_table(sg_in, 2);
sg_set_buf(&sg_in[0], epayload->decrypted_data,
epayload->decrypted_datalen);
- sg_set_buf(&sg_in[1], pad, padlen);
+ sg_set_buf(&sg_in[1], zero_pad, padlen);
sg_init_table(sg_out, 1);
sg_set_buf(sg_out, epayload->encrypted_data, encrypted_datalen);
@@ -584,7 +583,6 @@ static int derived_key_decrypt(struct encrypted_key_payload *epayload,
struct skcipher_request *req;
unsigned int encrypted_datalen;
u8 iv[AES_BLOCK_SIZE];
- char pad[16];
int ret;
encrypted_datalen = roundup(epayload->decrypted_datalen, blksize);
@@ -594,13 +592,12 @@ static int derived_key_decrypt(struct encrypted_key_payload *epayload,
goto out;
dump_encrypted_data(epayload, encrypted_datalen);
- memset(pad, 0, sizeof pad);
sg_init_table(sg_in, 1);
sg_init_table(sg_out, 2);
sg_set_buf(sg_in, epayload->encrypted_data, encrypted_datalen);
sg_set_buf(&sg_out[0], epayload->decrypted_data,
epayload->decrypted_datalen);
- sg_set_buf(&sg_out[1], pad, sizeof pad);
+ sg_set_buf(&sg_out[1], zero_pad, sizeof zero_pad);
memcpy(iv, epayload->iv, sizeof(iv));
skcipher_request_set_crypt(req, sg_in, sg_out, encrypted_datalen, iv);
--
2.9.3
^ permalink raw reply related
* [PATCH] cifs: Fix smbencrypt() to stop pointing a scatterlist at the stack
From: Andy Lutomirski @ 2016-12-12 20:54 UTC (permalink / raw)
To: linux-kernel, linux-usb, sfrench
Cc: Eric Biggers, linux-crypto, Herbert Xu, Stephan Mueller,
linux-cifs, Andy Lutomirski
In-Reply-To: <8c273c9c41f51b34bb3115086f1d776895580637.1481575835.git.luto@kernel.org>
smbencrypt() points a scatterlist to the stack, which is breaks if
CONFIG_VMAP_STACK=y.
Fix it by switching to crypto_cipher_encrypt_one(). The new code
should be considerably faster as an added benefit.
This code is nearly identical to some code that Eric Biggers
suggested.
Cc: stable@vger.kernel.org # 4.9 only
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
Compile-tested only.
fs/cifs/smbencrypt.c | 40 ++++++++--------------------------------
1 file changed, 8 insertions(+), 32 deletions(-)
diff --git a/fs/cifs/smbencrypt.c b/fs/cifs/smbencrypt.c
index 699b7868108f..c12bffefa3c9 100644
--- a/fs/cifs/smbencrypt.c
+++ b/fs/cifs/smbencrypt.c
@@ -23,7 +23,7 @@
Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
*/
-#include <crypto/skcipher.h>
+#include <linux/crypto.h>
#include <linux/module.h>
#include <linux/slab.h>
#include <linux/fs.h>
@@ -69,46 +69,22 @@ str_to_key(unsigned char *str, unsigned char *key)
static int
smbhash(unsigned char *out, const unsigned char *in, unsigned char *key)
{
- int rc;
unsigned char key2[8];
- struct crypto_skcipher *tfm_des;
- struct scatterlist sgin, sgout;
- struct skcipher_request *req;
+ struct crypto_cipher *tfm_des;
str_to_key(key, key2);
- tfm_des = crypto_alloc_skcipher("ecb(des)", 0, CRYPTO_ALG_ASYNC);
+ tfm_des = crypto_alloc_cipher("des", 0, 0);
if (IS_ERR(tfm_des)) {
- rc = PTR_ERR(tfm_des);
- cifs_dbg(VFS, "could not allocate des crypto API\n");
- goto smbhash_err;
- }
-
- req = skcipher_request_alloc(tfm_des, GFP_KERNEL);
- if (!req) {
- rc = -ENOMEM;
cifs_dbg(VFS, "could not allocate des crypto API\n");
- goto smbhash_free_skcipher;
+ return PTR_ERR(tfm_des);
}
- crypto_skcipher_setkey(tfm_des, key2, 8);
-
- sg_init_one(&sgin, in, 8);
- sg_init_one(&sgout, out, 8);
+ crypto_cipher_setkey(tfm_des, key2, 8);
+ crypto_cipher_encrypt_one(tfm_des, out, in);
+ crypto_free_cipher(tfm_des);
- skcipher_request_set_callback(req, 0, NULL, NULL);
- skcipher_request_set_crypt(req, &sgin, &sgout, 8, NULL);
-
- rc = crypto_skcipher_encrypt(req);
- if (rc)
- cifs_dbg(VFS, "could not encrypt crypt key rc: %d\n", rc);
-
- skcipher_request_free(req);
-
-smbhash_free_skcipher:
- crypto_free_skcipher(tfm_des);
-smbhash_err:
- return rc;
+ return 0;
}
static int
--
2.9.3
^ permalink raw reply related
* [PATCH] crypto: Make a few drivers depend on !VMAP_STACK
From: Andy Lutomirski @ 2016-12-12 20:55 UTC (permalink / raw)
To: linux-kernel, linux-usb
Cc: Eric Biggers, linux-crypto, Herbert Xu, Stephan Mueller,
Andy Lutomirski
In-Reply-To: <8c273c9c41f51b34bb3115086f1d776895580637.1481575835.git.luto@kernel.org>
Eric Biggers found several crypto drivers that point scatterlists at
the stack. These drivers should never load on x86, but, for future
safety, make them depend on !VMAP_STACK.
No -stable backport should be needed as no released kernel
configuration should be affected.
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
drivers/crypto/Kconfig | 8 ++++----
1 file changed, 4 insertions(+), 4 deletions(-)
diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 4d2b81f2b223..481e67e54ffd 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -245,7 +245,7 @@ config CRYPTO_DEV_TALITOS
select CRYPTO_BLKCIPHER
select CRYPTO_HASH
select HW_RANDOM
- depends on FSL_SOC
+ depends on FSL_SOC && !VMAP_STACK
help
Say 'Y' here to use the Freescale Security Engine (SEC)
to offload cryptographic algorithm computation.
@@ -357,7 +357,7 @@ config CRYPTO_DEV_PICOXCELL
config CRYPTO_DEV_SAHARA
tristate "Support for SAHARA crypto accelerator"
- depends on ARCH_MXC && OF
+ depends on ARCH_MXC && OF && !VMAP_STACK
select CRYPTO_BLKCIPHER
select CRYPTO_AES
select CRYPTO_ECB
@@ -410,7 +410,7 @@ endif # if CRYPTO_DEV_UX500
config CRYPTO_DEV_BFIN_CRC
tristate "Support for Blackfin CRC hardware"
- depends on BF60x
+ depends on BF60x && !VMAP_STACK
help
Newer Blackfin processors have CRC hardware. Select this if you
want to use the Blackfin CRC module.
@@ -487,7 +487,7 @@ source "drivers/crypto/qat/Kconfig"
config CRYPTO_DEV_QCE
tristate "Qualcomm crypto engine accelerator"
- depends on (ARCH_QCOM || COMPILE_TEST) && HAS_DMA && HAS_IOMEM
+ depends on (ARCH_QCOM || COMPILE_TEST) && HAS_DMA && HAS_IOMEM && !VMAP_STACK
select CRYPTO_AES
select CRYPTO_DES
select CRYPTO_ECB
--
2.9.3
^ permalink raw reply related
* [PATCH] orinoco: Use shash instead of ahash for MIC calculations
From: Andy Lutomirski @ 2016-12-12 20:55 UTC (permalink / raw)
To: linux-kernel, linux-usb, linux-wireless
Cc: Eric Biggers, linux-crypto, Herbert Xu, Stephan Mueller,
Andy Lutomirski
In-Reply-To: <8c273c9c41f51b34bb3115086f1d776895580637.1481575835.git.luto@kernel.org>
Eric Biggers pointed out that the orinoco driver pointed scatterlists
at the stack.
Fix it by switching from ahash to shash. The result should be
simpler, faster, and more correct.
Cc: stable@vger.kernel.org # 4.9 only
Reported-by: Eric Biggers <ebiggers3@gmail.com>
Signed-off-by: Andy Lutomirski <luto@kernel.org>
---
Compile-tested only.
drivers/net/wireless/intersil/orinoco/mic.c | 44 +++++++++++++++----------
drivers/net/wireless/intersil/orinoco/mic.h | 3 +-
drivers/net/wireless/intersil/orinoco/orinoco.h | 4 +--
3 files changed, 30 insertions(+), 21 deletions(-)
diff --git a/drivers/net/wireless/intersil/orinoco/mic.c b/drivers/net/wireless/intersil/orinoco/mic.c
index bc7397d709d3..08bc7822f820 100644
--- a/drivers/net/wireless/intersil/orinoco/mic.c
+++ b/drivers/net/wireless/intersil/orinoco/mic.c
@@ -16,7 +16,7 @@
/********************************************************************/
int orinoco_mic_init(struct orinoco_private *priv)
{
- priv->tx_tfm_mic = crypto_alloc_ahash("michael_mic", 0,
+ priv->tx_tfm_mic = crypto_alloc_shash("michael_mic", 0,
CRYPTO_ALG_ASYNC);
if (IS_ERR(priv->tx_tfm_mic)) {
printk(KERN_DEBUG "orinoco_mic_init: could not allocate "
@@ -25,7 +25,7 @@ int orinoco_mic_init(struct orinoco_private *priv)
return -ENOMEM;
}
- priv->rx_tfm_mic = crypto_alloc_ahash("michael_mic", 0,
+ priv->rx_tfm_mic = crypto_alloc_shash("michael_mic", 0,
CRYPTO_ALG_ASYNC);
if (IS_ERR(priv->rx_tfm_mic)) {
printk(KERN_DEBUG "orinoco_mic_init: could not allocate "
@@ -40,17 +40,16 @@ int orinoco_mic_init(struct orinoco_private *priv)
void orinoco_mic_free(struct orinoco_private *priv)
{
if (priv->tx_tfm_mic)
- crypto_free_ahash(priv->tx_tfm_mic);
+ crypto_free_shash(priv->tx_tfm_mic);
if (priv->rx_tfm_mic)
- crypto_free_ahash(priv->rx_tfm_mic);
+ crypto_free_shash(priv->rx_tfm_mic);
}
-int orinoco_mic(struct crypto_ahash *tfm_michael, u8 *key,
+int orinoco_mic(struct crypto_shash *tfm_michael, u8 *key,
u8 *da, u8 *sa, u8 priority,
u8 *data, size_t data_len, u8 *mic)
{
- AHASH_REQUEST_ON_STACK(req, tfm_michael);
- struct scatterlist sg[2];
+ SHASH_DESC_ON_STACK(desc, tfm_michael);
u8 hdr[ETH_HLEN + 2]; /* size of header + padding */
int err;
@@ -67,18 +66,27 @@ int orinoco_mic(struct crypto_ahash *tfm_michael, u8 *key,
hdr[ETH_ALEN * 2 + 2] = 0;
hdr[ETH_ALEN * 2 + 3] = 0;
- /* Use scatter gather to MIC header and data in one go */
- sg_init_table(sg, 2);
- sg_set_buf(&sg[0], hdr, sizeof(hdr));
- sg_set_buf(&sg[1], data, data_len);
+ desc->tfm = tfm_michael;
+ desc->flags = 0;
- if (crypto_ahash_setkey(tfm_michael, key, MIC_KEYLEN))
- return -1;
+ err = crypto_shash_setkey(tfm_michael, key, MIC_KEYLEN);
+ if (err)
+ return err;
+
+ err = crypto_shash_init(desc);
+ if (err)
+ return err;
+
+ err = crypto_shash_update(desc, hdr, sizeof(hdr));
+ if (err)
+ return err;
+
+ err = crypto_shash_update(desc, data, data_len);
+ if (err)
+ return err;
+
+ err = crypto_shash_final(desc, mic);
+ shash_desc_zero(desc);
- ahash_request_set_tfm(req, tfm_michael);
- ahash_request_set_callback(req, 0, NULL, NULL);
- ahash_request_set_crypt(req, sg, mic, data_len + sizeof(hdr));
- err = crypto_ahash_digest(req);
- ahash_request_zero(req);
return err;
}
diff --git a/drivers/net/wireless/intersil/orinoco/mic.h b/drivers/net/wireless/intersil/orinoco/mic.h
index ce731d05cc98..e8724e889219 100644
--- a/drivers/net/wireless/intersil/orinoco/mic.h
+++ b/drivers/net/wireless/intersil/orinoco/mic.h
@@ -6,6 +6,7 @@
#define _ORINOCO_MIC_H_
#include <linux/types.h>
+#include <crypto/hash.h>
#define MICHAEL_MIC_LEN 8
@@ -15,7 +16,7 @@ struct crypto_ahash;
int orinoco_mic_init(struct orinoco_private *priv);
void orinoco_mic_free(struct orinoco_private *priv);
-int orinoco_mic(struct crypto_ahash *tfm_michael, u8 *key,
+int orinoco_mic(struct crypto_shash *tfm_michael, u8 *key,
u8 *da, u8 *sa, u8 priority,
u8 *data, size_t data_len, u8 *mic);
diff --git a/drivers/net/wireless/intersil/orinoco/orinoco.h b/drivers/net/wireless/intersil/orinoco/orinoco.h
index 2f0c84b1c440..5fa1c3e3713f 100644
--- a/drivers/net/wireless/intersil/orinoco/orinoco.h
+++ b/drivers/net/wireless/intersil/orinoco/orinoco.h
@@ -152,8 +152,8 @@ struct orinoco_private {
u8 *wpa_ie;
int wpa_ie_len;
- struct crypto_ahash *rx_tfm_mic;
- struct crypto_ahash *tx_tfm_mic;
+ struct crypto_shash *rx_tfm_mic;
+ struct crypto_shash *tx_tfm_mic;
unsigned int wpa_enabled:1;
unsigned int tkip_cm_active:1;
--
2.9.3
^ permalink raw reply related
* Re: [PATCH v2] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-12 21:17 UTC (permalink / raw)
To: Eric Biggers
Cc: kernel-hardening, LKML, Linux Crypto Mailing List, Linus Torvalds,
George Spelvin, Scott Bauer, Andi Kleen, Andy Lutomirski, Greg KH,
Jean-Philippe Aumasson, Daniel J . Bernstein
In-Reply-To: <20161212054229.GA31382@zzz>
Hey Eric,
Lots of good points; thanks for the review. Responses are inline below.
On Mon, Dec 12, 2016 at 6:42 AM, Eric Biggers <ebiggers3@gmail.com> wrote:
> Maybe add to the help text for CONFIG_TEST_HASH that it now tests siphash too?
Good call. Will do.
> This assumes the key and message buffers are aligned to __alignof__(u64).
> Unless that's going to be a clearly documented requirement for callers, you
> should use get_unaligned_le64() instead. And you can pass a 'u8 *' directly to
> get_unaligned_le64(), no need for a helper function.
I had thought about that briefly, but just sort of figured most people
were passing in aligned variables... but that's a pretty bad
assumption to make especially for 64-bit alignment. I'll switch to
using the get_unaligned functions.
[As a side note, I wonder if crypto/chacha20_generic.c should be using
the unaligned functions instead too, at least for the iv reading...]
> It makes sense for this to return a u64, but that means the cpu_to_le64() is
> wrong, since u64 indicates CPU endianness. It should just return 'b'.
At first I was very opposed to making this change, since by returning
a value with an explicit byte order, you can cast to u8 and have
uniform indexed byte access across platforms. But of course this
doesn't make any sense, since it's returning a u64, and it makes all
other bitwise operations non-uniform anyway. I checked with JP
(co-creator of siphash, CC'd) and he confirmed your suspicion that it
was just to make the test vector comparison easier and for some
byte-wise uniformity, but that it's not strictly necessary. So, I've
removed that last cpu_to_le64, and I've also refactored those test
vectors to be written as ULL literals, so that a simple == integer
comparison will work across platforms.
> Can you mention in a comment where the test vectors came from?
Sure, will do.
> If you make the output really be CPU-endian like I'm suggesting then this will
> need to be something like:
>
> if (out != get_unaligned_le64(test_vectors[i])) {
>
> Or else make the test vectors be an array of u64.
Yep, I wound up doing that.
Thanks Eric! Will submit a v3 soon if nobody else has comments.
Jason
^ permalink raw reply
* Re: [PATCH v2] siphash: add cryptographically secure hashtable function
From: Linus Torvalds @ 2016-12-12 21:37 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: kernel-hardening@lists.openwall.com, LKML,
Linux Crypto Mailing List, George Spelvin, Scott Bauer,
Andi Kleen, Andy Lutomirski, Greg KH, Jean-Philippe Aumasson,
Daniel J . Bernstein
In-Reply-To: <CAHmME9qSW1U3dU+VjV8UBz=XOMfpbTkOCyrz74VnQTNcJW_FUw@mail.gmail.com>
On Sun, Dec 11, 2016 at 9:48 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> I modified the test to hash data of size 0 through 7 repeatedly
> 100000000 times, and benchmarked that a few times on a Skylake laptop.
> The `load_unaligned_zeropad & bytemask_from_count` version was
> consistently 7% slower.
>
> I then modified it again to simply hash a 4 byte constant repeatedly
> 1000000000 times. The `load_unaligned_zeropad & bytemask_from_count`
> version was around 6% faster. I tried again with a 7 byte constant and
> got more or less a similar result.
>
> Then I tried with a 1 byte constant, and found that the
> `load_unaligned_zeropad & bytemask_from_count` version was slower.
>
> So, it would seem that between the `if (left)` and the `switch
> (left)`, there's the same number of branches.
Interesting.
For the dcache code (which is where that trick comes from), we used to
have a loop (rather than the duff's device thing), and it performed
badly due to the consistently badly predicted branch of the loop. But
I never compared it against the duff's device version.
I guess you could try to just remove the "if (left)" test entirely, if
it is at least partly the mispredict. It should do the right thing
even with a zero count, and it might schedule the code better. Code
size _should_ be better with the byte mask model (which won't matter
in the hot loop example, since it will all be cached, possibly even in
the uop cache for really tight benchmark loops).
Linus
^ permalink raw reply
* Re: [PATCH v2] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-12 21:44 UTC (permalink / raw)
To: Linus Torvalds
Cc: kernel-hardening@lists.openwall.com, LKML,
Linux Crypto Mailing List, George Spelvin, Scott Bauer,
Andi Kleen, Andy Lutomirski, Greg KH, Jean-Philippe Aumasson,
Daniel J . Bernstein
Hi Linus,
> I guess you could try to just remove the "if (left)" test entirely, if
> it is at least partly the mispredict. It should do the right thing
> even with a zero count, and it might schedule the code better. Code
> size _should_ be better with the byte mask model (which won't matter
> in the hot loop example, since it will all be cached, possibly even in
> the uop cache for really tight benchmark loops).
Originally I had just forgotten the `if (left)`, and had the same
sub-par benchmarks. In the v3 revision that I'm working on at the
moment, I'm using your dcache trick for cases 3,5,6,7 and
short-circuiting cases 1,2,4 to just directly access those bytes as
integers. For the 32-bit case, I do something similar, but built
inside of the duff's device. This should give optimal performance for
the most popular use cases, which involve hashing "some stuff" plus a
leftover u16 (port number?) or u32 (ipv4 addr?).
#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
switch (left) {
case 0: break;
case 1: b |= data[0]; break;
case 2: b |= get_unaligned_le16(data); break;
case 4: b |= get_unaligned_le32(data); break;
default:
b |= le64_to_cpu(load_unaligned_zeropad(data) &
bytemask_from_count(left));
break;
}
#else
switch (left) {
case 7: b |= ((u64)data[6]) << 48;
case 6: b |= ((u64)data[5]) << 40;
case 5: b |= ((u64)data[4]) << 32;
case 4: b |= get_unaligned_le32(data); break;
case 3: b |= ((u64)data[2]) << 16;
case 2: b |= get_unaligned_le16(data); break;
case 1: b |= data[0];
}
#endif
It seems like this might be best of all worlds?
Jason
^ permalink raw reply
* Re: [PATCH] wusbcore: Fix one more crypto-on-the-stack bug
From: Greg KH @ 2016-12-12 21:44 UTC (permalink / raw)
To: Andy Lutomirski
Cc: linux-kernel, linux-usb, Eric Biggers, linux-crypto, Herbert Xu,
Stephan Mueller
In-Reply-To: <8c273c9c41f51b34bb3115086f1d776895580637.1481575835.git.luto@kernel.org>
On Mon, Dec 12, 2016 at 12:52:45PM -0800, Andy Lutomirski wrote:
> The driver put a constant buffer of all zeros on the stack and
> pointed a scatterlist entry at it. This doesn't work with virtual
> stacks. Make the buffer static to fix it.
>
> Cc: stable@vger.kernel.org # 4.9 only
> Reported-by: Eric Biggers <ebiggers3@gmail.com>
> Signed-off-by: Andy Lutomirski <luto@kernel.org>
> ---
> drivers/usb/wusbcore/crypto.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/drivers/usb/wusbcore/crypto.c b/drivers/usb/wusbcore/crypto.c
> index 79451f7ef1b7..a7e007a0cd49 100644
> --- a/drivers/usb/wusbcore/crypto.c
> +++ b/drivers/usb/wusbcore/crypto.c
> @@ -216,7 +216,7 @@ static int wusb_ccm_mac(struct crypto_skcipher *tfm_cbc,
> struct scatterlist sg[4], sg_dst;
> void *dst_buf;
> size_t dst_size;
> - const u8 bzero[16] = { 0 };
> + static const u8 bzero[16] = { 0 };
Hm, can static memory handle DMA? That's a requirement of the USB
stack, does this data later end up being sent down to a USB host
controller?
thanks,
greg k-h
^ permalink raw reply
* Re: [PATCH v2] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-12 21:57 UTC (permalink / raw)
To: Linus Torvalds
Cc: kernel-hardening@lists.openwall.com, LKML,
Linux Crypto Mailing List, George Spelvin, Scott Bauer,
Andi Kleen, Andy Lutomirski, Greg KH, Jean-Philippe Aumasson,
Daniel J . Bernstein
In-Reply-To: <CAHmME9o3otY8oKW1TGDWM23j4yz3PVvZViuwmfJ+szpWbm2BfA@mail.gmail.com>
On Mon, Dec 12, 2016 at 10:44 PM, Jason A. Donenfeld <Jason@zx2c4.com> wrote:
> #if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
> switch (left) {
> case 0: break;
> case 1: b |= data[0]; break;
> case 2: b |= get_unaligned_le16(data); break;
> case 4: b |= get_unaligned_le32(data); break;
> default:
> b |= le64_to_cpu(load_unaligned_zeropad(data) &
> bytemask_from_count(left));
> break;
> }
> #else
> switch (left) {
> case 7: b |= ((u64)data[6]) << 48;
> case 6: b |= ((u64)data[5]) << 40;
> case 5: b |= ((u64)data[4]) << 32;
> case 4: b |= get_unaligned_le32(data); break;
> case 3: b |= ((u64)data[2]) << 16;
> case 2: b |= get_unaligned_le16(data); break;
> case 1: b |= data[0];
> }
> #endif
As it turns out, perhaps unsurprisingly, the code generation here is
really not nice, resulting in many branches instead of a computed
jump. I'll submit v3 with just a branch-less load_unaligned_zeropad
for the 64-bit/dcache case and the duff's device for the other case.
^ permalink raw reply
* Re: [PATCH v6 2/2] crypto: add virtio-crypto driver
From: Michael S. Tsirkin @ 2016-12-12 22:05 UTC (permalink / raw)
To: Herbert Xu
Cc: Gonglei (Arei), linux-kernel@vger.kernel.org,
qemu-devel@nongnu.org, virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org,
linux-crypto@vger.kernel.org, Luonengjun, stefanha@redhat.com,
Huangweidong (C), Wubin (H), xin.zeng@intel.com, Claudio Fontana,
pasic@linux.vnet.ibm.com, davem@davemloft.net,
Zhoujian (jay, Euler)
In-Reply-To: <20161212105407.GA3033@gondor.apana.org.au>
On Mon, Dec 12, 2016 at 06:54:07PM +0800, Herbert Xu wrote:
> On Mon, Dec 12, 2016 at 06:25:12AM +0000, Gonglei (Arei) wrote:
> > Hi, Michael & Herbert
> >
> > Because the virtio-crypto device emulation had been in QEMU 2.8,
> > would you please merge the virtio-crypto driver for 4.10 if no other
> > comments? If so, Miachel pls ack and/or review the patch, then
> > Herbert will take it (I asked him last week). Thank you!
> >
> > Ps: Note on 4.10 merge window timing from Linus
> > https://lkml.org/lkml/2016/12/7/506
> >
> > Dec 23rd is the deadline for 4.10 merge window.
>
> Sorry but it's too late for 4.10. It needed to have been in my
> tree before the merge window opened to make it for this cycle.
>
> Cheers,
Objections to me merging this? I'm preparing my tree right now.
> --
> Email: Herbert Xu <herbert@gondor.apana.org.au>
> Home Page: http://gondor.apana.org.au/~herbert/
> PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply
* [PATCH v3] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-12 22:18 UTC (permalink / raw)
To: Linus Torvalds, kernel-hardening@lists.openwall.com, LKML,
Linux Crypto Mailing List, George Spelvin, Scott Bauer,
Andi Kleen, Andy Lutomirski, Greg KH, Eric Biggers
Cc: Jason A. Donenfeld, Jean-Philippe Aumasson, Daniel J . Bernstein
In-Reply-To: <CA+55aFymjmEPNx8ZwhxtiE=nPG_5gbkzUQhdRAwTareuNcV=tA@mail.gmail.com>
SipHash is a 64-bit keyed hash function that is actually a
cryptographically secure PRF, like HMAC. Except SipHash is super fast,
and is meant to be used as a hashtable keyed lookup function.
SipHash isn't just some new trendy hash function. It's been around for a
while, and there really isn't anything that comes remotely close to
being useful in the way SipHash is. With that said, why do we need this?
There are a variety of attacks known as "hashtable poisoning" in which an
attacker forms some data such that the hash of that data will be the
same, and then preceeds to fill up all entries of a hashbucket. This is
a realistic and well-known denial-of-service vector.
Linux developers already seem to be aware that this is an issue, and
various places that use hash tables in, say, a network context, use a
non-cryptographically secure function (usually jhash) and then try to
twiddle with the key on a time basis (or in many cases just do nothing
and hope that nobody notices). While this is an admirable attempt at
solving the problem, it doesn't actually fix it. SipHash fixes it.
(It fixes it in such a sound way that you could even build a stream
cipher out of SipHash that would resist the modern cryptanalysis.)
There are a modicum of places in the kernel that are vulnerable to
hashtable poisoning attacks, either via userspace vectors or network
vectors, and there's not a reliable mechanism inside the kernel at the
moment to fix it. The first step toward fixing these issues is actually
getting a secure primitive into the kernel for developers to use. Then
we can, bit by bit, port things over to it as deemed appropriate.
Dozens of languages are already using this internally for their hash
tables. Some of the BSDs already use this in their kernels. SipHash is
a widely known high-speed solution to a widely known problem, and it's
time we catch-up.
Signed-off-by: Jason A. Donenfeld <Jason@zx2c4.com>
Cc: Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
Cc: Daniel J. Bernstein <djb@cr.yp.to>
---
Changes from v2->v3:
- The unaligned helpers are now used for reading from u8* arrays.
- Linus' trick with load_unaligned_zeropad has been implemented for
64-bit/dcache platforms.
- Non 64-bit/dcache platforms now use a more optimized duff's device
for shortcutting certain sized left-overs.
- The Kconfig help text for the test now mentions siphash.
- The function now returns a native-endian byte sequence inside a
u64, which is more correct. As well, the tests vectors are now
represented as u64 literals, rather than byte sequences.
- The origin of the test vectors is now inside a comment.
include/linux/siphash.h | 20 +++++++++++++
lib/Kconfig.debug | 6 ++--
lib/Makefile | 5 ++--
lib/siphash.c | 75 +++++++++++++++++++++++++++++++++++++++++++++++++
lib/test_siphash.c | 74 ++++++++++++++++++++++++++++++++++++++++++++++++
5 files changed, 175 insertions(+), 5 deletions(-)
create mode 100644 include/linux/siphash.h
create mode 100644 lib/siphash.c
create mode 100644 lib/test_siphash.c
diff --git a/include/linux/siphash.h b/include/linux/siphash.h
new file mode 100644
index 000000000000..6623b3090645
--- /dev/null
+++ b/include/linux/siphash.h
@@ -0,0 +1,20 @@
+/* Copyright (C) 2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#ifndef _LINUX_SIPHASH_H
+#define _LINUX_SIPHASH_H
+
+#include <linux/types.h>
+
+enum siphash_lengths {
+ SIPHASH24_KEY_LEN = 16
+};
+
+u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN]);
+
+#endif /* _LINUX_SIPHASH_H */
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index a6c8db1d62f6..2a1797704b41 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1823,9 +1823,9 @@ config TEST_HASH
tristate "Perform selftest on hash functions"
default n
help
- Enable this option to test the kernel's integer (<linux/hash,h>)
- and string (<linux/stringhash.h>) hash functions on boot
- (or module load).
+ Enable this option to test the kernel's integer (<linux/hash.h>),
+ string (<linux/stringhash.h>), and siphash (<linux/siphash.h>)
+ hash functions on boot (or module load).
This is intended to help people writing architecture-specific
optimized versions. If unsure, say N.
diff --git a/lib/Makefile b/lib/Makefile
index 50144a3aeebd..71d398b04a74 100644
--- a/lib/Makefile
+++ b/lib/Makefile
@@ -22,7 +22,8 @@ lib-y := ctype.o string.o vsprintf.o cmdline.o \
sha1.o chacha20.o md5.o irq_regs.o argv_split.o \
flex_proportions.o ratelimit.o show_mem.o \
is_single_threaded.o plist.o decompress.o kobject_uevent.o \
- earlycpio.o seq_buf.o nmi_backtrace.o nodemask.o win_minmax.o
+ earlycpio.o seq_buf.o siphash.o \
+ nmi_backtrace.o nodemask.o win_minmax.o
lib-$(CONFIG_MMU) += ioremap.o
lib-$(CONFIG_SMP) += cpumask.o
@@ -44,7 +45,7 @@ obj-$(CONFIG_TEST_HEXDUMP) += test_hexdump.o
obj-y += kstrtox.o
obj-$(CONFIG_TEST_BPF) += test_bpf.o
obj-$(CONFIG_TEST_FIRMWARE) += test_firmware.o
-obj-$(CONFIG_TEST_HASH) += test_hash.o
+obj-$(CONFIG_TEST_HASH) += test_hash.o test_siphash.o
obj-$(CONFIG_TEST_KASAN) += test_kasan.o
obj-$(CONFIG_TEST_KSTRTOX) += test-kstrtox.o
obj-$(CONFIG_TEST_LKM) += test_module.o
diff --git a/lib/siphash.c b/lib/siphash.c
new file mode 100644
index 000000000000..b259a3295c50
--- /dev/null
+++ b/lib/siphash.c
@@ -0,0 +1,75 @@
+/* Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ * Copyright (C) 2012-2014 Jean-Philippe Aumasson <jeanphilippe.aumasson@gmail.com>
+ * Copyright (C) 2012-2014 Daniel J. Bernstein <djb@cr.yp.to>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <asm/unaligned.h>
+
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+#include <linux/dcache.h>
+#include <asm/word-at-a-time.h>
+#endif
+
+#define SIPROUND \
+ do { \
+ v0 += v1; v1 = rol64(v1, 13); v1 ^= v0; v0 = rol64(v0, 32); \
+ v2 += v3; v3 = rol64(v3, 16); v3 ^= v2; \
+ v0 += v3; v3 = rol64(v3, 21); v3 ^= v0; \
+ v2 += v1; v1 = rol64(v1, 17); v1 ^= v2; v2 = rol64(v2, 32); \
+ } while(0)
+
+u64 siphash24(const u8 *data, size_t len, const u8 key[SIPHASH24_KEY_LEN])
+{
+ u64 v0 = 0x736f6d6570736575ULL;
+ u64 v1 = 0x646f72616e646f6dULL;
+ u64 v2 = 0x6c7967656e657261ULL;
+ u64 v3 = 0x7465646279746573ULL;
+ u64 b = ((u64)len) << 56;
+ u64 k0 = get_unaligned_le64(key);
+ u64 k1 = get_unaligned_le64(key + sizeof(u64));
+ u64 m;
+ const u8 *end = data + len - (len % sizeof(u64));
+ const u8 left = len & (sizeof(u64) - 1);
+ v3 ^= k1;
+ v2 ^= k0;
+ v1 ^= k1;
+ v0 ^= k0;
+ for (; data != end; data += sizeof(u64)) {
+ m = get_unaligned_le64(data);
+ v3 ^= m;
+ SIPROUND;
+ SIPROUND;
+ v0 ^= m;
+ }
+#if defined(CONFIG_DCACHE_WORD_ACCESS) && BITS_PER_LONG == 64
+ b |= le64_to_cpu(load_unaligned_zeropad(data) & bytemask_from_count(left));
+#else
+ switch (left) {
+ case 7: b |= ((u64)data[6]) << 48;
+ case 6: b |= ((u64)data[5]) << 40;
+ case 5: b |= ((u64)data[4]) << 32;
+ case 4: b |= get_unaligned_le32(data); break;
+ case 3: b |= ((u64)data[2]) << 16;
+ case 2: b |= get_unaligned_le16(data); break;
+ case 1: b |= data[0];
+ }
+#endif
+ v3 ^= b;
+ SIPROUND;
+ SIPROUND;
+ v0 ^= b;
+ v2 ^= 0xff;
+ SIPROUND;
+ SIPROUND;
+ SIPROUND;
+ SIPROUND;
+ return (v0 ^ v1) ^ (v2 ^ v3);
+}
+EXPORT_SYMBOL(siphash24);
diff --git a/lib/test_siphash.c b/lib/test_siphash.c
new file mode 100644
index 000000000000..336298aaa33b
--- /dev/null
+++ b/lib/test_siphash.c
@@ -0,0 +1,74 @@
+/* Test cases for siphash.c
+ *
+ * Copyright (C) 2015-2016 Jason A. Donenfeld <Jason@zx2c4.com>
+ *
+ * This file is provided under a dual BSD/GPLv2 license.
+ *
+ * SipHash: a fast short-input PRF
+ * https://131002.net/siphash/
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/siphash.h>
+#include <linux/kernel.h>
+#include <linux/string.h>
+#include <linux/errno.h>
+#include <linux/module.h>
+
+/* Test vectors taken from official reference source available at:
+ * https://131002.net/siphash/siphash24.c
+ */
+static const u64 test_vectors[64] = {
+ 0x726fdb47dd0e0e31ULL, 0x74f839c593dc67fdULL, 0x0d6c8009d9a94f5aULL,
+ 0x85676696d7fb7e2dULL, 0xcf2794e0277187b7ULL, 0x18765564cd99a68dULL,
+ 0xcbc9466e58fee3ceULL, 0xab0200f58b01d137ULL, 0x93f5f5799a932462ULL,
+ 0x9e0082df0ba9e4b0ULL, 0x7a5dbbc594ddb9f3ULL, 0xf4b32f46226bada7ULL,
+ 0x751e8fbc860ee5fbULL, 0x14ea5627c0843d90ULL, 0xf723ca908e7af2eeULL,
+ 0xa129ca6149be45e5ULL, 0x3f2acc7f57c29bdbULL, 0x699ae9f52cbe4794ULL,
+ 0x4bc1b3f0968dd39cULL, 0xbb6dc91da77961bdULL, 0xbed65cf21aa2ee98ULL,
+ 0xd0f2cbb02e3b67c7ULL, 0x93536795e3a33e88ULL, 0xa80c038ccd5ccec8ULL,
+ 0xb8ad50c6f649af94ULL, 0xbce192de8a85b8eaULL, 0x17d835b85bbb15f3ULL,
+ 0x2f2e6163076bcfadULL, 0xde4daaaca71dc9a5ULL, 0xa6a2506687956571ULL,
+ 0xad87a3535c49ef28ULL, 0x32d892fad841c342ULL, 0x7127512f72f27cceULL,
+ 0xa7f32346f95978e3ULL, 0x12e0b01abb051238ULL, 0x15e034d40fa197aeULL,
+ 0x314dffbe0815a3b4ULL, 0x027990f029623981ULL, 0xcadcd4e59ef40c4dULL,
+ 0x9abfd8766a33735cULL, 0x0e3ea96b5304a7d0ULL, 0xad0c42d6fc585992ULL,
+ 0x187306c89bc215a9ULL, 0xd4a60abcf3792b95ULL, 0xf935451de4f21df2ULL,
+ 0xa9538f0419755787ULL, 0xdb9acddff56ca510ULL, 0xd06c98cd5c0975ebULL,
+ 0xe612a3cb9ecba951ULL, 0xc766e62cfcadaf96ULL, 0xee64435a9752fe72ULL,
+ 0xa192d576b245165aULL, 0x0a8787bf8ecb74b2ULL, 0x81b3e73d20b49b6fULL,
+ 0x7fa8220ba3b2eceaULL, 0x245731c13ca42499ULL, 0xb78dbfaf3a8d83bdULL,
+ 0xea1ad565322a1a0bULL, 0x60e61c23a3795013ULL, 0x6606d7e446282b93ULL,
+ 0x6ca4ecb15c5f91e1ULL, 0x9f626da15c9625f3ULL, 0xe51b38608ef25f57ULL,
+ 0x958a324ceb064572ULL
+};
+
+static int __init siphash_test_init(void)
+{
+ u8 in[64], k[16], i;
+ int ret = 0;
+
+ for (i = 0; i < 16; ++i)
+ k[i] = i;
+ for (i = 0; i < 64; ++i) {
+ in[i] = i;
+ if (siphash24(in, i, k) != test_vectors[i]) {
+ pr_info("self-test %u: FAIL\n", i + 1);
+ ret = -EINVAL;
+ }
+ }
+ if (!ret)
+ pr_info("self-tests: pass\n");
+ return ret;
+}
+
+static void __exit siphash_test_exit(void)
+{
+}
+
+module_init(siphash_test_init);
+module_exit(siphash_test_exit);
+
+MODULE_AUTHOR("Jason A. Donenfeld <Jason@zx2c4.com>");
+MODULE_LICENSE("Dual BSD/GPL");
--
2.11.0
^ permalink raw reply related
* Re: [PATCH] keys/encrypted: Fix two crypto-on-the-stack bugs
From: David Howells @ 2016-12-12 22:28 UTC (permalink / raw)
To: Andy Lutomirski
Cc: dhowells, linux-kernel, linux-usb, keyrings, Eric Biggers,
linux-crypto, Herbert Xu, Stephan Mueller
In-Reply-To: <e958f214e8885968be8045ffde813ac339b81178.1481575835.git.luto@kernel.org>
Andy Lutomirski <luto@kernel.org> wrote:
> +static const char zero_pad[16] = {0};
Isn't there a global page of zeros or something that we can share? Also, you
shouldn't explicitly initialise it so that it stays in .bss.
> - sg_set_buf(&sg_out[1], pad, sizeof pad);
> + sg_set_buf(&sg_out[1], zero_pad, sizeof zero_pad);
Can you put brackets on the sizeof?
Thanks,
David
^ permalink raw reply
* Re: [PATCH v3] siphash: add cryptographically secure hashtable function
From: Andi Kleen @ 2016-12-12 23:01 UTC (permalink / raw)
To: Jason A. Donenfeld
Cc: Linus Torvalds, kernel-hardening@lists.openwall.com, LKML,
Linux Crypto Mailing List, George Spelvin, Scott Bauer,
Andy Lutomirski, Greg KH, Eric Biggers, Jean-Philippe Aumasson,
Daniel J . Bernstein
In-Reply-To: <20161212221832.10653-1-Jason@zx2c4.com>
> Dozens of languages are already using this internally for their hash
> tables. Some of the BSDs already use this in their kernels. SipHash is
> a widely known high-speed solution to a widely known problem, and it's
> time we catch-up.
It would be nice if the network code could be converted to use siphash
for the secure sequence numbers. Right now it pulls in a lot of code
for bigger secure hashes just for that, which is a problem for tiny
kernels.
-Andi
^ permalink raw reply
* Re: [PATCH v3] siphash: add cryptographically secure hashtable function
From: Jason A. Donenfeld @ 2016-12-12 23:04 UTC (permalink / raw)
To: Andi Kleen
Cc: Linus Torvalds, kernel-hardening@lists.openwall.com, LKML,
Linux Crypto Mailing List, George Spelvin, Scott Bauer,
Andy Lutomirski, Greg KH, Eric Biggers, Jean-Philippe Aumasson,
Daniel J . Bernstein
On Tue, Dec 13, 2016 at 12:01 AM, Andi Kleen <ak@linux.intel.com> wrote:
> It would be nice if the network code could be converted to use siphash
> for the secure sequence numbers. Right now it pulls in a lot of code
> for bigger secure hashes just for that, which is a problem for tiny
> kernels.
Indeed this would be a great first candidate. There are lots of places
where MD5 (!!) is pulled in for this sort of thing, when SipHash could
be a faster and leaner replacement (and arguably more secure than
rusty MD5).
^ permalink raw reply
* Re: [PATCH] wusbcore: Fix one more crypto-on-the-stack bug
From: Andy Lutomirski @ 2016-12-12 23:57 UTC (permalink / raw)
To: Greg KH
Cc: Andy Lutomirski, linux-kernel@vger.kernel.org, USB list,
Eric Biggers, linux-crypto, Herbert Xu, Stephan Mueller
In-Reply-To: <20161212214447.GA12142@kroah.com>
On Mon, Dec 12, 2016 at 1:44 PM, Greg KH <gregkh@linuxfoundation.org> wrote:
> On Mon, Dec 12, 2016 at 12:52:45PM -0800, Andy Lutomirski wrote:
>> The driver put a constant buffer of all zeros on the stack and
>> pointed a scatterlist entry at it. This doesn't work with virtual
>> stacks. Make the buffer static to fix it.
>>
>> Cc: stable@vger.kernel.org # 4.9 only
>> Reported-by: Eric Biggers <ebiggers3@gmail.com>
>> Signed-off-by: Andy Lutomirski <luto@kernel.org>
>> ---
>> drivers/usb/wusbcore/crypto.c | 2 +-
>> 1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/drivers/usb/wusbcore/crypto.c b/drivers/usb/wusbcore/crypto.c
>> index 79451f7ef1b7..a7e007a0cd49 100644
>> --- a/drivers/usb/wusbcore/crypto.c
>> +++ b/drivers/usb/wusbcore/crypto.c
>> @@ -216,7 +216,7 @@ static int wusb_ccm_mac(struct crypto_skcipher *tfm_cbc,
>> struct scatterlist sg[4], sg_dst;
>> void *dst_buf;
>> size_t dst_size;
>> - const u8 bzero[16] = { 0 };
>> + static const u8 bzero[16] = { 0 };
>
> Hm, can static memory handle DMA? That's a requirement of the USB
> stack, does this data later end up being sent down to a USB host
> controller?
I think it doesn't, but I'll switch it to use empty_zero_page instead.
--Andy
^ permalink raw reply
* Re: [PATCH] keys/encrypted: Fix two crypto-on-the-stack bugs
From: Andy Lutomirski @ 2016-12-13 0:32 UTC (permalink / raw)
To: David Howells
Cc: Andy Lutomirski, linux-kernel@vger.kernel.org, USB list, keyrings,
Eric Biggers, linux-crypto, Herbert Xu, Stephan Mueller
In-Reply-To: <5944.1481581706@warthog.procyon.org.uk>
On Mon, Dec 12, 2016 at 2:28 PM, David Howells <dhowells@redhat.com> wrote:
> Andy Lutomirski <luto@kernel.org> wrote:
>
>> +static const char zero_pad[16] = {0};
>
> Isn't there a global page of zeros or something that we can share? Also, you
> shouldn't explicitly initialise it so that it stays in .bss.
This is a double-edged sword. It seems that omitting the
initialization causes it to go in .bss, which isn't read only. I have
no idea why initializing make a difference at all -- the IMO sensible
behavior would be to put it in .rodata as NOBITS either way.
But I'll use empty_zero_page.
>
>> - sg_set_buf(&sg_out[1], pad, sizeof pad);
>> + sg_set_buf(&sg_out[1], zero_pad, sizeof zero_pad);
>
> Can you put brackets on the sizeof?
Will do for v2.
^ permalink raw reply
* RE: [PATCH v6 2/2] crypto: add virtio-crypto driver
From: Gonglei (Arei) @ 2016-12-13 1:13 UTC (permalink / raw)
To: Michael S. Tsirkin, Herbert Xu
Cc: linux-kernel@vger.kernel.org, qemu-devel@nongnu.org,
virtio-dev@lists.oasis-open.org,
virtualization@lists.linux-foundation.org,
linux-crypto@vger.kernel.org, Luonengjun, stefanha@redhat.com,
Huangweidong (C), Wubin (H), xin.zeng@intel.com, Claudio Fontana,
pasic@linux.vnet.ibm.com, davem@davemloft.net,
Zhoujian (jay, Euler), Hanweidong (Randy)
In-Reply-To: <20161212234941-mutt-send-email-mst@kernel.org>
>
> Subject: Re: [PATCH v6 2/2] crypto: add virtio-crypto driver
>
> On Mon, Dec 12, 2016 at 06:54:07PM +0800, Herbert Xu wrote:
> > On Mon, Dec 12, 2016 at 06:25:12AM +0000, Gonglei (Arei) wrote:
> > > Hi, Michael & Herbert
> > >
> > > Because the virtio-crypto device emulation had been in QEMU 2.8,
> > > would you please merge the virtio-crypto driver for 4.10 if no other
> > > comments? If so, Miachel pls ack and/or review the patch, then
> > > Herbert will take it (I asked him last week). Thank you!
> > >
> > > Ps: Note on 4.10 merge window timing from Linus
> > > https://lkml.org/lkml/2016/12/7/506
> > >
> > > Dec 23rd is the deadline for 4.10 merge window.
> >
> > Sorry but it's too late for 4.10. It needed to have been in my
> > tree before the merge window opened to make it for this cycle.
> >
> > Cheers,
>
>
> Objections to me merging this? I'm preparing my tree right now.
>
That's great if so since 4.11 merge window opens
at least three months later.
Do you agree with it Herbert? Thanks.
Regards,
-Gonglei
^ permalink raw reply
page: next (older) | prev (newer) | latest
- recent:[subjects (threaded)|topics (new)|topics (active)]
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox