* [PATCH 0/6] Multibuffer hashing take two
@ 2024-10-27 9:45 Herbert Xu
2024-10-27 9:45 ` [PATCH 1/6] crypto: ahash - Only save callback and data in ahash_save_req Herbert Xu
` (6 more replies)
0 siblings, 7 replies; 10+ messages in thread
From: Herbert Xu @ 2024-10-27 9:45 UTC (permalink / raw)
To: Linux Crypto Mailing List
Cc: Eric Biggers, Ard Biesheuvel, Megha Dey, Tim Chen
Multibuffer hashing was a constant sore while it was part of the
kernel. It was very buggy and unnecessarily complex. Finally
it was removed when it had been broken for a while without anyone
noticing.
Peace reigned in its absence, until Eric Biggers made a proposal
for its comeback :)
Link: https://lore.kernel.org/all/20240415213719.120673-1-ebiggers@kernel.org/
The issue is that the SHA algorithm (and possibly others) is
inherently not parallelisable. Therefore the only way to exploit
parallelism on modern CPUs is to hash multiple indendent streams
of data.
Eric's proposal is a simple interface bolted onto shash that takes
two streams of data of identical length. I thought the limitation
of two was too small, and Eric addressed that in his latest version:
Link: https://lore.kernel.org/all/20241001153718.111665-2-ebiggers@kernel.org/
However, I still disliked the addition of this to shash as it meant
that users would have to spend extra effort in order to accumulate
and maintain multiple streams of data.
My preference is to use ahash as the basis of multibuffer, because
its request object interface is perfectly suited to chaining.
The ahash interface is almost universally hated because of its
use of the SG list. So to sweeten the deal I have added virtual
address support to ahash, thus rendering the shash interface
redundant.
Note that ahash can already be used synchronously by asking for
sync-only algorithms. Thus there is no need to handle callbacks
and such *if* you don't want to.
This patch-set introduces two additions to the ahash interface.
First of all request chaining is added so that an arbitrary number
of requests can be submitted in one go. Incidentally this also
reduces the cost of indirect calls by amortisation.
It then adds virtual address support to ahash. This allows the
user to supply a virtual address as the input instead of an SG
list.
This is assumed to be not DMA-capable so it is always copied
before it's passed to an existing ahash driver. New drivers can
elect to take virtual addresses directly. Of course existing shash
algorithms are able to take virtual addresses without any copying.
The final patch resurrects the old SHA2 AVX2 muiltibuffer code as
a proof of concept that this API works. The result shows that with
a full complement of 8 requests, this API is able to achieve parity
with the more modern but single-threaded SHA-NI code.
Herbert Xu (6):
crypto: ahash - Only save callback and data in ahash_save_req
crypto: hash - Add request chaining API
crypto: tcrypt - Restore multibuffer ahash tests
crypto: ahash - Add virtual address support
crypto: ahash - Set default reqsize from ahash_alg
crypto: x86/sha2 - Restore multibuffer AVX2 support
arch/x86/crypto/Makefile | 2 +-
arch/x86/crypto/sha256_mb_mgr_datastruct.S | 304 +++++++++++
arch/x86/crypto/sha256_ssse3_glue.c | 523 ++++++++++++++++--
arch/x86/crypto/sha256_x8_avx2.S | 596 +++++++++++++++++++++
crypto/ahash.c | 566 ++++++++++++++++---
crypto/tcrypt.c | 227 ++++++++
include/crypto/algapi.h | 10 +
include/crypto/hash.h | 68 ++-
include/crypto/internal/hash.h | 17 +-
include/linux/crypto.h | 26 +
10 files changed, 2209 insertions(+), 130 deletions(-)
create mode 100644 arch/x86/crypto/sha256_mb_mgr_datastruct.S
create mode 100644 arch/x86/crypto/sha256_x8_avx2.S
--
2.39.5
^ permalink raw reply [flat|nested] 10+ messages in thread
* [PATCH 1/6] crypto: ahash - Only save callback and data in ahash_save_req
2024-10-27 9:45 [PATCH 0/6] Multibuffer hashing take two Herbert Xu
@ 2024-10-27 9:45 ` Herbert Xu
2024-10-27 9:45 ` [PATCH 2/6] crypto: hash - Add request chaining API Herbert Xu
` (5 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Herbert Xu @ 2024-10-27 9:45 UTC (permalink / raw)
To: Linux Crypto Mailing List
Cc: Eric Biggers, Ard Biesheuvel, Megha Dey, Tim Chen
As unaligned operations are supported by the underlying algorithm,
ahash_save_req and ahash_restore_req can be greatly simplified to
only preserve the callback and data.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
crypto/ahash.c | 97 ++++++++++++++++---------------------------
include/crypto/hash.h | 3 --
2 files changed, 35 insertions(+), 65 deletions(-)
diff --git a/crypto/ahash.c b/crypto/ahash.c
index bcd9de009a91..c8e7327c6949 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -27,6 +27,12 @@
#define CRYPTO_ALG_TYPE_AHASH_MASK 0x0000000e
+struct ahash_save_req_state {
+ struct ahash_request *req;
+ crypto_completion_t compl;
+ void *data;
+};
+
/*
* For an ahash tfm that is using an shash algorithm (instead of an ahash
* algorithm), this returns the underlying shash tfm.
@@ -262,67 +268,34 @@ int crypto_ahash_init(struct ahash_request *req)
}
EXPORT_SYMBOL_GPL(crypto_ahash_init);
-static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt,
- bool has_state)
+static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt)
{
- struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
- unsigned int ds = crypto_ahash_digestsize(tfm);
- struct ahash_request *subreq;
- unsigned int subreq_size;
- unsigned int reqsize;
- u8 *result;
+ struct ahash_save_req_state *state;
gfp_t gfp;
u32 flags;
- subreq_size = sizeof(*subreq);
- reqsize = crypto_ahash_reqsize(tfm);
- reqsize = ALIGN(reqsize, crypto_tfm_ctx_alignment());
- subreq_size += reqsize;
- subreq_size += ds;
-
flags = ahash_request_flags(req);
gfp = (flags & CRYPTO_TFM_REQ_MAY_SLEEP) ? GFP_KERNEL : GFP_ATOMIC;
- subreq = kmalloc(subreq_size, gfp);
- if (!subreq)
+ state = kmalloc(sizeof(*state), gfp);
+ if (!state)
return -ENOMEM;
- ahash_request_set_tfm(subreq, tfm);
- ahash_request_set_callback(subreq, flags, cplt, req);
-
- result = (u8 *)(subreq + 1) + reqsize;
-
- ahash_request_set_crypt(subreq, req->src, result, req->nbytes);
-
- if (has_state) {
- void *state;
-
- state = kmalloc(crypto_ahash_statesize(tfm), gfp);
- if (!state) {
- kfree(subreq);
- return -ENOMEM;
- }
-
- crypto_ahash_export(req, state);
- crypto_ahash_import(subreq, state);
- kfree_sensitive(state);
- }
-
- req->priv = subreq;
+ state->compl = req->base.complete;
+ state->data = req->base.data;
+ req->base.complete = cplt;
+ req->base.data = state;
+ state->req = req;
return 0;
}
-static void ahash_restore_req(struct ahash_request *req, int err)
+static void ahash_restore_req(struct ahash_request *req)
{
- struct ahash_request *subreq = req->priv;
+ struct ahash_save_req_state *state = req->base.data;
- if (!err)
- memcpy(req->result, subreq->result,
- crypto_ahash_digestsize(crypto_ahash_reqtfm(req)));
-
- req->priv = NULL;
-
- kfree_sensitive(subreq);
+ req->base.complete = state->compl;
+ req->base.data = state->data;
+ kfree(state);
}
int crypto_ahash_update(struct ahash_request *req)
@@ -374,51 +347,51 @@ EXPORT_SYMBOL_GPL(crypto_ahash_digest);
static void ahash_def_finup_done2(void *data, int err)
{
- struct ahash_request *areq = data;
+ struct ahash_save_req_state *state = data;
+ struct ahash_request *areq = state->req;
if (err == -EINPROGRESS)
return;
- ahash_restore_req(areq, err);
-
+ ahash_restore_req(areq);
ahash_request_complete(areq, err);
}
static int ahash_def_finup_finish1(struct ahash_request *req, int err)
{
- struct ahash_request *subreq = req->priv;
-
if (err)
goto out;
- subreq->base.complete = ahash_def_finup_done2;
+ req->base.complete = ahash_def_finup_done2;
- err = crypto_ahash_alg(crypto_ahash_reqtfm(req))->final(subreq);
+ err = crypto_ahash_alg(crypto_ahash_reqtfm(req))->final(req);
if (err == -EINPROGRESS || err == -EBUSY)
return err;
out:
- ahash_restore_req(req, err);
+ ahash_restore_req(req);
return err;
}
static void ahash_def_finup_done1(void *data, int err)
{
- struct ahash_request *areq = data;
- struct ahash_request *subreq;
+ struct ahash_save_req_state *state0 = data;
+ struct ahash_save_req_state state;
+ struct ahash_request *areq;
+ state = *state0;
+ areq = state.req;
if (err == -EINPROGRESS)
goto out;
- subreq = areq->priv;
- subreq->base.flags &= CRYPTO_TFM_REQ_MAY_BACKLOG;
+ areq->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
err = ahash_def_finup_finish1(areq, err);
if (err == -EINPROGRESS || err == -EBUSY)
return;
out:
- ahash_request_complete(areq, err);
+ state.compl(state.data, err);
}
static int ahash_def_finup(struct ahash_request *req)
@@ -426,11 +399,11 @@ static int ahash_def_finup(struct ahash_request *req)
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
int err;
- err = ahash_save_req(req, ahash_def_finup_done1, true);
+ err = ahash_save_req(req, ahash_def_finup_done1);
if (err)
return err;
- err = crypto_ahash_alg(tfm)->update(req->priv);
+ err = crypto_ahash_alg(tfm)->update(req);
if (err == -EINPROGRESS || err == -EBUSY)
return err;
diff --git a/include/crypto/hash.h b/include/crypto/hash.h
index 2d5ea9f9ff43..9c1f8ca59a77 100644
--- a/include/crypto/hash.h
+++ b/include/crypto/hash.h
@@ -55,9 +55,6 @@ struct ahash_request {
struct scatterlist *src;
u8 *result;
- /* This field may only be used by the ahash API code. */
- void *priv;
-
void *__ctx[] CRYPTO_MINALIGN_ATTR;
};
--
2.39.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 2/6] crypto: hash - Add request chaining API
2024-10-27 9:45 [PATCH 0/6] Multibuffer hashing take two Herbert Xu
2024-10-27 9:45 ` [PATCH 1/6] crypto: ahash - Only save callback and data in ahash_save_req Herbert Xu
@ 2024-10-27 9:45 ` Herbert Xu
2024-10-27 9:45 ` [PATCH 3/6] crypto: tcrypt - Restore multibuffer ahash tests Herbert Xu
` (4 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Herbert Xu @ 2024-10-27 9:45 UTC (permalink / raw)
To: Linux Crypto Mailing List
Cc: Eric Biggers, Ard Biesheuvel, Megha Dey, Tim Chen
This adds request chaining to the ahash interface. Request chaining
allows multiple requests to be submitted in one shot. An algorithm
can elect to receive chained requests by setting the flag
CRYPTO_ALG_REQ_CHAIN. If this bit is not set, the API will break
up chained requests and submit them one-by-one.
A new err field is added to struct crypto_async_request to record
the return value for each individual request.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
crypto/ahash.c | 259 +++++++++++++++++++++++++++++----
include/crypto/algapi.h | 10 ++
include/crypto/hash.h | 25 ++++
include/crypto/internal/hash.h | 10 ++
include/linux/crypto.h | 26 ++++
5 files changed, 304 insertions(+), 26 deletions(-)
diff --git a/crypto/ahash.c b/crypto/ahash.c
index c8e7327c6949..e1ee18deca67 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -28,11 +28,19 @@
#define CRYPTO_ALG_TYPE_AHASH_MASK 0x0000000e
struct ahash_save_req_state {
- struct ahash_request *req;
+ struct list_head head;
+ struct ahash_request *req0;
+ struct ahash_request *cur;
+ int (*op)(struct ahash_request *req);
crypto_completion_t compl;
void *data;
};
+static void ahash_reqchain_done(void *data, int err);
+static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt);
+static void ahash_restore_req(struct ahash_request *req);
+static int ahash_def_finup(struct ahash_request *req);
+
/*
* For an ahash tfm that is using an shash algorithm (instead of an ahash
* algorithm), this returns the underlying shash tfm.
@@ -256,24 +264,150 @@ int crypto_ahash_setkey(struct crypto_ahash *tfm, const u8 *key,
}
EXPORT_SYMBOL_GPL(crypto_ahash_setkey);
+static int ahash_reqchain_finish(struct ahash_save_req_state *state,
+ int err, u32 mask)
+{
+ struct ahash_request *req0 = state->req0;
+ struct ahash_request *req = state->cur;
+ struct ahash_request *n;
+
+ req->base.err = err;
+
+ if (req == req0)
+ INIT_LIST_HEAD(&req->base.list);
+ else
+ list_add_tail(&req->base.list, &req0->base.list);
+
+ list_for_each_entry_safe(req, n, &state->head, base.list) {
+ list_del_init(&req->base.list);
+
+ req->base.flags &= mask;
+ req->base.complete = ahash_reqchain_done;
+ req->base.data = state;
+ state->cur = req;
+ err = state->op(req);
+
+ if (err == -EINPROGRESS) {
+ if (!list_empty(&state->head))
+ err = -EBUSY;
+ goto out;
+ }
+
+ if (err == -EBUSY)
+ goto out;
+
+ req->base.err = err;
+ list_add_tail(&req->base.list, &req0->base.list);
+ }
+
+ ahash_restore_req(req0);
+
+out:
+ return err;
+}
+
+static void ahash_reqchain_done(void *data, int err)
+{
+ struct ahash_save_req_state *state = data;
+ crypto_completion_t compl = state->compl;
+
+ data = state->data;
+
+ if (err == -EINPROGRESS) {
+ if (!list_empty(&state->head))
+ return;
+ goto notify;
+ }
+
+ err = ahash_reqchain_finish(state, err, CRYPTO_TFM_REQ_MAY_BACKLOG);
+ if (err == -EBUSY)
+ return;
+
+notify:
+ compl(data, err);
+}
+
+static int ahash_do_req_chain(struct ahash_request *req,
+ int (*op)(struct ahash_request *req))
+{
+ struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+ struct ahash_save_req_state *state;
+ struct ahash_save_req_state state0;
+ int err;
+
+ if (!ahash_request_chained(req) || list_empty(&req->base.list) ||
+ crypto_ahash_req_chain(tfm))
+ return op(req);
+
+ state = &state0;
+
+ if (ahash_is_async(tfm)) {
+ err = ahash_save_req(req, ahash_reqchain_done);
+ if (err) {
+ struct ahash_request *r2;
+
+ req->base.err = err;
+ list_for_each_entry(r2, &req->base.list, base.list)
+ r2->base.err = err;
+
+ return err;
+ }
+
+ state = req->base.data;
+ }
+
+ state->op = op;
+ state->cur = req;
+ INIT_LIST_HEAD(&state->head);
+ list_splice(&req->base.list, &state->head);
+
+ err = op(req);
+ if (err == -EBUSY || err == -EINPROGRESS)
+ return -EBUSY;
+
+ return ahash_reqchain_finish(state, err, ~0);
+}
+
int crypto_ahash_init(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
- if (likely(tfm->using_shash))
- return crypto_shash_init(prepare_shash_desc(req, tfm));
+ if (likely(tfm->using_shash)) {
+ struct ahash_request *r2;
+ int err;
+
+ err = crypto_shash_init(prepare_shash_desc(req, tfm));
+ req->base.err = err;
+ if (!ahash_request_chained(req))
+ return err;
+
+ list_for_each_entry(r2, &req->base.list, base.list) {
+ struct shash_desc *desc;
+
+ desc = prepare_shash_desc(r2, tfm);
+ r2->base.err = crypto_shash_init(desc);
+ }
+
+ return 0;
+ }
+
if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
return -ENOKEY;
- return crypto_ahash_alg(tfm)->init(req);
+
+ return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->init);
}
EXPORT_SYMBOL_GPL(crypto_ahash_init);
static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt)
{
+ struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
struct ahash_save_req_state *state;
gfp_t gfp;
u32 flags;
+ if (!ahash_is_async(tfm))
+ return 0;
+
flags = ahash_request_flags(req);
gfp = (flags & CRYPTO_TFM_REQ_MAY_SLEEP) ? GFP_KERNEL : GFP_ATOMIC;
state = kmalloc(sizeof(*state), gfp);
@@ -284,14 +418,20 @@ static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt)
state->data = req->base.data;
req->base.complete = cplt;
req->base.data = state;
- state->req = req;
+ state->req0 = req;
return 0;
}
static void ahash_restore_req(struct ahash_request *req)
{
- struct ahash_save_req_state *state = req->base.data;
+ struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+ struct ahash_save_req_state *state;
+
+ if (!ahash_is_async(tfm))
+ return;
+
+ state = req->base.data;
req->base.complete = state->compl;
req->base.data = state->data;
@@ -302,10 +442,26 @@ int crypto_ahash_update(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
- if (likely(tfm->using_shash))
- return shash_ahash_update(req, ahash_request_ctx(req));
+ if (likely(tfm->using_shash)) {
+ struct ahash_request *r2;
+ int err;
- return crypto_ahash_alg(tfm)->update(req);
+ err = shash_ahash_update(req, ahash_request_ctx(req));
+ req->base.err = err;
+ if (!ahash_request_chained(req))
+ return err;
+
+ list_for_each_entry(r2, &req->base.list, base.list) {
+ struct shash_desc *desc;
+
+ desc = ahash_request_ctx(r2);
+ r2->base.err = shash_ahash_update(r2, desc);
+ }
+
+ return 0;
+ }
+
+ return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->update);
}
EXPORT_SYMBOL_GPL(crypto_ahash_update);
@@ -313,10 +469,26 @@ int crypto_ahash_final(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
- if (likely(tfm->using_shash))
- return crypto_shash_final(ahash_request_ctx(req), req->result);
+ if (likely(tfm->using_shash)) {
+ struct ahash_request *r2;
+ int err;
- return crypto_ahash_alg(tfm)->final(req);
+ err = crypto_shash_final(ahash_request_ctx(req), req->result);
+ req->base.err = err;
+ if (!ahash_request_chained(req))
+ return err;
+
+ list_for_each_entry(r2, &req->base.list, base.list) {
+ struct shash_desc *desc;
+
+ desc = ahash_request_ctx(r2);
+ r2->base.err = crypto_shash_final(desc, r2->result);
+ }
+
+ return 0;
+ }
+
+ return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->final);
}
EXPORT_SYMBOL_GPL(crypto_ahash_final);
@@ -324,10 +496,29 @@ int crypto_ahash_finup(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
- if (likely(tfm->using_shash))
- return shash_ahash_finup(req, ahash_request_ctx(req));
+ if (likely(tfm->using_shash)) {
+ struct ahash_request *r2;
+ int err;
- return crypto_ahash_alg(tfm)->finup(req);
+ err = shash_ahash_finup(req, ahash_request_ctx(req));
+ req->base.err = err;
+ if (!ahash_request_chained(req))
+ return err;
+
+ list_for_each_entry(r2, &req->base.list, base.list) {
+ struct shash_desc *desc;
+
+ desc = ahash_request_ctx(r2);
+ r2->base.err = shash_ahash_finup(r2, desc);
+ }
+
+ return 0;
+ }
+
+ if (!crypto_ahash_alg(tfm)->finup)
+ return ahash_def_finup(req);
+
+ return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->finup);
}
EXPORT_SYMBOL_GPL(crypto_ahash_finup);
@@ -335,20 +526,36 @@ int crypto_ahash_digest(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
- if (likely(tfm->using_shash))
- return shash_ahash_digest(req, prepare_shash_desc(req, tfm));
+ if (likely(tfm->using_shash)) {
+ struct ahash_request *r2;
+ int err;
+
+ err = shash_ahash_digest(req, prepare_shash_desc(req, tfm));
+ req->base.err = err;
+ if (!ahash_request_chained(req))
+ return err;
+
+ list_for_each_entry(r2, &req->base.list, base.list) {
+ struct shash_desc *desc;
+
+ desc = prepare_shash_desc(r2, tfm);
+ r2->base.err = shash_ahash_digest(r2, desc);
+ }
+
+ return 0;
+ }
if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
return -ENOKEY;
- return crypto_ahash_alg(tfm)->digest(req);
+ return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->digest);
}
EXPORT_SYMBOL_GPL(crypto_ahash_digest);
static void ahash_def_finup_done2(void *data, int err)
{
struct ahash_save_req_state *state = data;
- struct ahash_request *areq = state->req;
+ struct ahash_request *areq = state->req0;
if (err == -EINPROGRESS)
return;
@@ -359,12 +566,15 @@ static void ahash_def_finup_done2(void *data, int err)
static int ahash_def_finup_finish1(struct ahash_request *req, int err)
{
+ struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+
if (err)
goto out;
- req->base.complete = ahash_def_finup_done2;
+ if (ahash_is_async(tfm))
+ req->base.complete = ahash_def_finup_done2;
- err = crypto_ahash_alg(crypto_ahash_reqtfm(req))->final(req);
+ err = crypto_ahash_final(req);
if (err == -EINPROGRESS || err == -EBUSY)
return err;
@@ -380,7 +590,7 @@ static void ahash_def_finup_done1(void *data, int err)
struct ahash_request *areq;
state = *state0;
- areq = state.req;
+ areq = state.req0;
if (err == -EINPROGRESS)
goto out;
@@ -396,14 +606,13 @@ static void ahash_def_finup_done1(void *data, int err)
static int ahash_def_finup(struct ahash_request *req)
{
- struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
int err;
err = ahash_save_req(req, ahash_def_finup_done1);
if (err)
return err;
- err = crypto_ahash_alg(tfm)->update(req);
+ err = crypto_ahash_update(req);
if (err == -EINPROGRESS || err == -EBUSY)
return err;
@@ -618,8 +827,6 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
base->cra_type = &crypto_ahash_type;
base->cra_flags |= CRYPTO_ALG_TYPE_AHASH;
- if (!alg->finup)
- alg->finup = ahash_def_finup;
if (!alg->setkey)
alg->setkey = ahash_nosetkey;
diff --git a/include/crypto/algapi.h b/include/crypto/algapi.h
index 156de41ca760..c5df380c7d08 100644
--- a/include/crypto/algapi.h
+++ b/include/crypto/algapi.h
@@ -271,4 +271,14 @@ static inline u32 crypto_tfm_alg_type(struct crypto_tfm *tfm)
return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK;
}
+static inline bool crypto_request_chained(struct crypto_async_request *req)
+{
+ return req->flags & CRYPTO_TFM_REQ_CHAIN;
+}
+
+static inline bool crypto_tfm_req_chain(struct crypto_tfm *tfm)
+{
+ return tfm->__crt_alg->cra_flags & CRYPTO_ALG_REQ_CHAIN;
+}
+
#endif /* _CRYPTO_ALGAPI_H */
diff --git a/include/crypto/hash.h b/include/crypto/hash.h
index 9c1f8ca59a77..de5e5dcd0c95 100644
--- a/include/crypto/hash.h
+++ b/include/crypto/hash.h
@@ -621,6 +621,7 @@ static inline void ahash_request_set_callback(struct ahash_request *req,
{
req->base.complete = compl;
req->base.data = data;
+ flags &= ~CRYPTO_TFM_REQ_CHAIN;
req->base.flags = flags;
}
@@ -646,6 +647,20 @@ static inline void ahash_request_set_crypt(struct ahash_request *req,
req->result = result;
}
+static inline void ahash_reqchain_init(struct ahash_request *req,
+ u32 flags, crypto_completion_t compl,
+ void *data)
+{
+ ahash_request_set_callback(req, flags, compl, data);
+ crypto_reqchain_init(&req->base);
+}
+
+static inline void ahash_request_chain(struct ahash_request *req,
+ struct ahash_request *head)
+{
+ crypto_request_chain(&req->base, &head->base);
+}
+
/**
* DOC: Synchronous Message Digest API
*
@@ -947,4 +962,14 @@ static inline void shash_desc_zero(struct shash_desc *desc)
sizeof(*desc) + crypto_shash_descsize(desc->tfm));
}
+static inline int ahash_request_err(struct ahash_request *req)
+{
+ return req->base.err;
+}
+
+static inline bool ahash_is_async(struct crypto_ahash *tfm)
+{
+ return crypto_tfm_is_async(&tfm->base);
+}
+
#endif /* _CRYPTO_HASH_H */
diff --git a/include/crypto/internal/hash.h b/include/crypto/internal/hash.h
index 58967593b6b4..81542a48587e 100644
--- a/include/crypto/internal/hash.h
+++ b/include/crypto/internal/hash.h
@@ -270,5 +270,15 @@ static inline struct crypto_shash *__crypto_shash_cast(struct crypto_tfm *tfm)
return container_of(tfm, struct crypto_shash, base);
}
+static inline bool ahash_request_chained(struct ahash_request *req)
+{
+ return crypto_request_chained(&req->base);
+}
+
+static inline bool crypto_ahash_req_chain(struct crypto_ahash *tfm)
+{
+ return crypto_tfm_req_chain(&tfm->base);
+}
+
#endif /* _CRYPTO_INTERNAL_HASH_H */
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index b164da5e129e..6126c57b8452 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -13,6 +13,8 @@
#define _LINUX_CRYPTO_H
#include <linux/completion.h>
+#include <linux/errno.h>
+#include <linux/list.h>
#include <linux/refcount.h>
#include <linux/slab.h>
#include <linux/types.h>
@@ -124,6 +126,9 @@
*/
#define CRYPTO_ALG_FIPS_INTERNAL 0x00020000
+/* Set if the algorithm supports request chains. */
+#define CRYPTO_ALG_REQ_CHAIN 0x00040000
+
/*
* Transform masks and values (for crt_flags).
*/
@@ -133,6 +138,7 @@
#define CRYPTO_TFM_REQ_FORBID_WEAK_KEYS 0x00000100
#define CRYPTO_TFM_REQ_MAY_SLEEP 0x00000200
#define CRYPTO_TFM_REQ_MAY_BACKLOG 0x00000400
+#define CRYPTO_TFM_REQ_CHAIN 0x00000800
/*
* Miscellaneous stuff.
@@ -174,6 +180,7 @@ struct crypto_async_request {
struct crypto_tfm *tfm;
u32 flags;
+ int err;
};
/**
@@ -540,5 +547,24 @@ int crypto_comp_decompress(struct crypto_comp *tfm,
const u8 *src, unsigned int slen,
u8 *dst, unsigned int *dlen);
+static inline void crypto_reqchain_init(struct crypto_async_request *req)
+{
+ req->err = -EINPROGRESS;
+ req->flags |= CRYPTO_TFM_REQ_CHAIN;
+ INIT_LIST_HEAD(&req->list);
+}
+
+static inline void crypto_request_chain(struct crypto_async_request *req,
+ struct crypto_async_request *head)
+{
+ req->err = -EINPROGRESS;
+ list_add_tail(&req->list, &head->list);
+}
+
+static inline bool crypto_tfm_is_async(struct crypto_tfm *tfm)
+{
+ return tfm->__crt_alg->cra_flags & CRYPTO_ALG_ASYNC;
+}
+
#endif /* _LINUX_CRYPTO_H */
--
2.39.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 3/6] crypto: tcrypt - Restore multibuffer ahash tests
2024-10-27 9:45 [PATCH 0/6] Multibuffer hashing take two Herbert Xu
2024-10-27 9:45 ` [PATCH 1/6] crypto: ahash - Only save callback and data in ahash_save_req Herbert Xu
2024-10-27 9:45 ` [PATCH 2/6] crypto: hash - Add request chaining API Herbert Xu
@ 2024-10-27 9:45 ` Herbert Xu
2024-10-27 9:45 ` [PATCH 4/6] crypto: ahash - Add virtual address support Herbert Xu
` (3 subsequent siblings)
6 siblings, 0 replies; 10+ messages in thread
From: Herbert Xu @ 2024-10-27 9:45 UTC (permalink / raw)
To: Linux Crypto Mailing List
Cc: Eric Biggers, Ard Biesheuvel, Megha Dey, Tim Chen
This patch is a revert of commit 388ac25efc8ce3bf9768ce7bf24268d6fac285d5.
As multibuffer ahash is coming back in the form of request chaining,
restore the multibuffer ahash tests using the new interface.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
crypto/tcrypt.c | 227 ++++++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 227 insertions(+)
diff --git a/crypto/tcrypt.c b/crypto/tcrypt.c
index e9e7dceb606e..f996e5e2c83a 100644
--- a/crypto/tcrypt.c
+++ b/crypto/tcrypt.c
@@ -716,6 +716,203 @@ static inline int do_one_ahash_op(struct ahash_request *req, int ret)
return crypto_wait_req(ret, wait);
}
+struct test_mb_ahash_data {
+ struct scatterlist sg[XBUFSIZE];
+ char result[64];
+ struct ahash_request *req;
+ struct crypto_wait wait;
+ char *xbuf[XBUFSIZE];
+};
+
+static inline int do_mult_ahash_op(struct test_mb_ahash_data *data, u32 num_mb,
+ int *rc)
+{
+ int i, err;
+
+ /* Fire up a bunch of concurrent requests */
+ err = crypto_ahash_digest(data[0].req);
+
+ /* Wait for all requests to finish */
+ err = crypto_wait_req(err, &data[0].wait);
+
+ for (i = 0; i < num_mb; i++) {
+ rc[i] = ahash_request_err(data[i].req);
+ if (rc[i]) {
+ pr_info("concurrent request %d error %d\n", i, rc[i]);
+ err = rc[i];
+ }
+ }
+
+ return err;
+}
+
+static int test_mb_ahash_jiffies(struct test_mb_ahash_data *data, int blen,
+ int secs, u32 num_mb)
+{
+ unsigned long start, end;
+ int bcount;
+ int ret = 0;
+ int *rc;
+
+ rc = kcalloc(num_mb, sizeof(*rc), GFP_KERNEL);
+ if (!rc)
+ return -ENOMEM;
+
+ for (start = jiffies, end = start + secs * HZ, bcount = 0;
+ time_before(jiffies, end); bcount++) {
+ ret = do_mult_ahash_op(data, num_mb, rc);
+ if (ret)
+ goto out;
+ }
+
+ pr_cont("%d operations in %d seconds (%llu bytes)\n",
+ bcount * num_mb, secs, (u64)bcount * blen * num_mb);
+
+out:
+ kfree(rc);
+ return ret;
+}
+
+static int test_mb_ahash_cycles(struct test_mb_ahash_data *data, int blen,
+ u32 num_mb)
+{
+ unsigned long cycles = 0;
+ int ret = 0;
+ int i;
+ int *rc;
+
+ rc = kcalloc(num_mb, sizeof(*rc), GFP_KERNEL);
+ if (!rc)
+ return -ENOMEM;
+
+ /* Warm-up run. */
+ for (i = 0; i < 4; i++) {
+ ret = do_mult_ahash_op(data, num_mb, rc);
+ if (ret)
+ goto out;
+ }
+
+ /* The real thing. */
+ for (i = 0; i < 8; i++) {
+ cycles_t start, end;
+
+ start = get_cycles();
+ ret = do_mult_ahash_op(data, num_mb, rc);
+ end = get_cycles();
+
+ if (ret)
+ goto out;
+
+ cycles += end - start;
+ }
+
+ pr_cont("1 operation in %lu cycles (%d bytes)\n",
+ (cycles + 4) / (8 * num_mb), blen);
+
+out:
+ kfree(rc);
+ return ret;
+}
+
+static void test_mb_ahash_speed(const char *algo, unsigned int secs,
+ struct hash_speed *speed, u32 num_mb)
+{
+ struct test_mb_ahash_data *data;
+ struct crypto_ahash *tfm;
+ unsigned int i, j, k;
+ int ret;
+
+ data = kcalloc(num_mb, sizeof(*data), GFP_KERNEL);
+ if (!data)
+ return;
+
+ tfm = crypto_alloc_ahash(algo, 0, 0);
+ if (IS_ERR(tfm)) {
+ pr_err("failed to load transform for %s: %ld\n",
+ algo, PTR_ERR(tfm));
+ goto free_data;
+ }
+
+ for (i = 0; i < num_mb; ++i) {
+ if (testmgr_alloc_buf(data[i].xbuf))
+ goto out;
+
+ crypto_init_wait(&data[i].wait);
+
+ data[i].req = ahash_request_alloc(tfm, GFP_KERNEL);
+ if (!data[i].req) {
+ pr_err("alg: hash: Failed to allocate request for %s\n",
+ algo);
+ goto out;
+ }
+
+ if (i)
+ ahash_request_chain(data[i].req, data[0].req);
+ else
+ ahash_reqchain_init(data[i].req, 0, crypto_req_done,
+ &data[i].wait);
+
+ sg_init_table(data[i].sg, XBUFSIZE);
+ for (j = 0; j < XBUFSIZE; j++) {
+ sg_set_buf(data[i].sg + j, data[i].xbuf[j], PAGE_SIZE);
+ memset(data[i].xbuf[j], 0xff, PAGE_SIZE);
+ }
+ }
+
+ pr_info("\ntesting speed of multibuffer %s (%s)\n", algo,
+ get_driver_name(crypto_ahash, tfm));
+
+ for (i = 0; speed[i].blen != 0; i++) {
+ /* For some reason this only tests digests. */
+ if (speed[i].blen != speed[i].plen)
+ continue;
+
+ if (speed[i].blen > XBUFSIZE * PAGE_SIZE) {
+ pr_err("template (%u) too big for tvmem (%lu)\n",
+ speed[i].blen, XBUFSIZE * PAGE_SIZE);
+ goto out;
+ }
+
+ if (klen)
+ crypto_ahash_setkey(tfm, tvmem[0], klen);
+
+ for (k = 0; k < num_mb; k++)
+ ahash_request_set_crypt(data[k].req, data[k].sg,
+ data[k].result, speed[i].blen);
+
+ pr_info("test%3u "
+ "(%5u byte blocks,%5u bytes per update,%4u updates): ",
+ i, speed[i].blen, speed[i].plen,
+ speed[i].blen / speed[i].plen);
+
+ if (secs) {
+ ret = test_mb_ahash_jiffies(data, speed[i].blen, secs,
+ num_mb);
+ cond_resched();
+ } else {
+ ret = test_mb_ahash_cycles(data, speed[i].blen, num_mb);
+ }
+
+
+ if (ret) {
+ pr_err("At least one hashing failed ret=%d\n", ret);
+ break;
+ }
+ }
+
+out:
+ for (k = 0; k < num_mb; ++k)
+ ahash_request_free(data[k].req);
+
+ for (k = 0; k < num_mb; ++k)
+ testmgr_free_buf(data[k].xbuf);
+
+ crypto_free_ahash(tfm);
+
+free_data:
+ kfree(data);
+}
+
static int test_ahash_jiffies_digest(struct ahash_request *req, int blen,
char *out, int secs)
{
@@ -2395,6 +2592,36 @@ static int do_test(const char *alg, u32 type, u32 mask, int m, u32 num_mb)
test_ahash_speed("sm3", sec, generic_hash_speed_template);
if (mode > 400 && mode < 500) break;
fallthrough;
+ case 450:
+ test_mb_ahash_speed("sha1", sec, generic_hash_speed_template,
+ num_mb);
+ if (mode > 400 && mode < 500) break;
+ fallthrough;
+ case 451:
+ test_mb_ahash_speed("sha256", sec, generic_hash_speed_template,
+ num_mb);
+ if (mode > 400 && mode < 500) break;
+ fallthrough;
+ case 452:
+ test_mb_ahash_speed("sha512", sec, generic_hash_speed_template,
+ num_mb);
+ if (mode > 400 && mode < 500) break;
+ fallthrough;
+ case 453:
+ test_mb_ahash_speed("sm3", sec, generic_hash_speed_template,
+ num_mb);
+ if (mode > 400 && mode < 500) break;
+ fallthrough;
+ case 454:
+ test_mb_ahash_speed("streebog256", sec,
+ generic_hash_speed_template, num_mb);
+ if (mode > 400 && mode < 500) break;
+ fallthrough;
+ case 455:
+ test_mb_ahash_speed("streebog512", sec,
+ generic_hash_speed_template, num_mb);
+ if (mode > 400 && mode < 500) break;
+ fallthrough;
case 499:
break;
--
2.39.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 4/6] crypto: ahash - Add virtual address support
2024-10-27 9:45 [PATCH 0/6] Multibuffer hashing take two Herbert Xu
` (2 preceding siblings ...)
2024-10-27 9:45 ` [PATCH 3/6] crypto: tcrypt - Restore multibuffer ahash tests Herbert Xu
@ 2024-10-27 9:45 ` Herbert Xu
2024-11-05 2:08 ` kernel test robot
2024-10-27 9:45 ` [PATCH 5/6] crypto: ahash - Set default reqsize from ahash_alg Herbert Xu
` (2 subsequent siblings)
6 siblings, 1 reply; 10+ messages in thread
From: Herbert Xu @ 2024-10-27 9:45 UTC (permalink / raw)
To: Linux Crypto Mailing List
Cc: Eric Biggers, Ard Biesheuvel, Megha Dey, Tim Chen
This patch adds virtual address support to ahash. Virtual addresses
were previously only supported through shash. The user may choose
to use virtual addresses with ahash by calling ahash_request_set_virt
instead of ahash_request_set_crypt.
The API will take care of translating this to an SG list if necessary,
unless the algorithm declares that it supports chaining. Therefore
in order for an ahash algorithm to support chaining, it must also
support virtual addresses directly.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
crypto/ahash.c | 282 +++++++++++++++++++++++++++++----
include/crypto/hash.h | 39 ++++-
include/crypto/internal/hash.h | 7 +-
include/linux/crypto.h | 2 +-
4 files changed, 294 insertions(+), 36 deletions(-)
diff --git a/crypto/ahash.c b/crypto/ahash.c
index e1ee18deca67..b1c468797990 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -34,11 +34,17 @@ struct ahash_save_req_state {
int (*op)(struct ahash_request *req);
crypto_completion_t compl;
void *data;
+ struct scatterlist sg;
+ const u8 *src;
+ u8 *page;
+ unsigned int offset;
+ unsigned int nbytes;
};
static void ahash_reqchain_done(void *data, int err);
static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt);
-static void ahash_restore_req(struct ahash_request *req);
+static void ahash_restore_req(struct ahash_save_req_state *state);
+static void ahash_def_finup_done1(void *data, int err);
static int ahash_def_finup(struct ahash_request *req);
/*
@@ -100,6 +106,10 @@ int shash_ahash_digest(struct ahash_request *req, struct shash_desc *desc)
unsigned int offset;
int err;
+ if (ahash_request_isvirt(req))
+ return crypto_shash_digest(desc, req->svirt, nbytes,
+ req->result);
+
if (nbytes &&
(sg = req->src, offset = sg->offset,
nbytes <= min(sg->length, ((unsigned int)(PAGE_SIZE)) - offset))) {
@@ -182,6 +192,9 @@ static int hash_walk_new_entry(struct crypto_hash_walk *walk)
int crypto_hash_walk_done(struct crypto_hash_walk *walk, int err)
{
+ if ((walk->flags & CRYPTO_AHASH_REQ_VIRT))
+ return err;
+
walk->data -= walk->offset;
kunmap_local(walk->data);
@@ -209,14 +222,20 @@ int crypto_hash_walk_first(struct ahash_request *req,
struct crypto_hash_walk *walk)
{
walk->total = req->nbytes;
+ walk->entrylen = 0;
- if (!walk->total) {
- walk->entrylen = 0;
+ if (!walk->total)
return 0;
+
+ walk->flags = req->base.flags;
+
+ if (ahash_request_isvirt(req)) {
+ walk->data = req->svirt;
+ walk->total = 0;
+ return req->nbytes;
}
walk->sg = req->src;
- walk->flags = req->base.flags;
return hash_walk_new_entry(walk);
}
@@ -264,20 +283,85 @@ int crypto_ahash_setkey(struct crypto_ahash *tfm, const u8 *key,
}
EXPORT_SYMBOL_GPL(crypto_ahash_setkey);
+static bool ahash_request_hasvirt(struct ahash_request *req)
+{
+ struct ahash_request *r2;
+
+ if (ahash_request_isvirt(req))
+ return true;
+
+ if (!ahash_request_chained(req))
+ return false;
+
+ list_for_each_entry(r2, &req->base.list, base.list)
+ if (ahash_request_isvirt(r2))
+ return true;
+
+ return false;
+}
+
+static int ahash_reqchain_virt(struct ahash_save_req_state *state,
+ int err, u32 mask)
+{
+ struct ahash_request *req = state->cur;
+
+ for (;;) {
+ unsigned len = state->nbytes;
+
+ req->base.err = err;
+
+ if (!state->offset)
+ break;
+
+ if (state->offset == len || err) {
+ u8 *result = req->result;
+
+ ahash_request_set_virt(req, state->src, result, len);
+ state->offset = 0;
+ break;
+ }
+
+ len -= state->offset;
+
+ len = min(PAGE_SIZE, len);
+ memcpy(state->page, state->src + state->offset, len);
+ state->offset += len;
+ req->nbytes = len;
+
+ err = state->op(req);
+ if (err == -EINPROGRESS) {
+ if (!list_empty(&state->head) ||
+ state->offset < state->nbytes)
+ err = -EBUSY;
+ break;
+ }
+
+ if (err == -EBUSY)
+ break;
+ }
+
+ return err;
+}
+
static int ahash_reqchain_finish(struct ahash_save_req_state *state,
int err, u32 mask)
{
struct ahash_request *req0 = state->req0;
struct ahash_request *req = state->cur;
+ struct crypto_ahash *tfm;
struct ahash_request *n;
+ bool update;
- req->base.err = err;
+ err = ahash_reqchain_virt(state, err, mask);
if (req == req0)
INIT_LIST_HEAD(&req->base.list);
else
list_add_tail(&req->base.list, &req0->base.list);
+ tfm = crypto_ahash_reqtfm(req);
+ update = state->op == crypto_ahash_alg(tfm)->update;
+
list_for_each_entry_safe(req, n, &state->head, base.list) {
list_del_init(&req->base.list);
@@ -285,10 +369,27 @@ static int ahash_reqchain_finish(struct ahash_save_req_state *state,
req->base.complete = ahash_reqchain_done;
req->base.data = state;
state->cur = req;
+
+ if (update && ahash_request_isvirt(req) && req->nbytes) {
+ unsigned len = req->nbytes;
+ u8 *result = req->result;
+
+ state->src = req->svirt;
+ state->nbytes = len;
+
+ len = min(PAGE_SIZE, len);
+
+ memcpy(state->page, req->svirt, len);
+ state->offset = len;
+
+ ahash_request_set_crypt(req, &state->sg, result, len);
+ }
+
err = state->op(req);
if (err == -EINPROGRESS) {
- if (!list_empty(&state->head))
+ if (!list_empty(&state->head) ||
+ state->offset < state->nbytes)
err = -EBUSY;
goto out;
}
@@ -296,11 +397,14 @@ static int ahash_reqchain_finish(struct ahash_save_req_state *state,
if (err == -EBUSY)
goto out;
- req->base.err = err;
+ err = ahash_reqchain_virt(state, err, mask);
+ if (err == -EINPROGRESS || err == -EBUSY)
+ goto out;
+
list_add_tail(&req->base.list, &req0->base.list);
}
- ahash_restore_req(req0);
+ ahash_restore_req(state);
out:
return err;
@@ -314,7 +418,7 @@ static void ahash_reqchain_done(void *data, int err)
data = state->data;
if (err == -EINPROGRESS) {
- if (!list_empty(&state->head))
+ if (!list_empty(&state->head) || state->offset < state->nbytes)
return;
goto notify;
}
@@ -331,41 +435,84 @@ static int ahash_do_req_chain(struct ahash_request *req,
int (*op)(struct ahash_request *req))
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+ bool update = op == crypto_ahash_alg(tfm)->update;
struct ahash_save_req_state *state;
struct ahash_save_req_state state0;
+ struct ahash_request *r2;
+ u8 *page = NULL;
int err;
- if (!ahash_request_chained(req) || list_empty(&req->base.list) ||
- crypto_ahash_req_chain(tfm))
+ if (crypto_ahash_req_chain(tfm) ||
+ ((!ahash_request_chained(req) || list_empty(&req->base.list)) &&
+ (!update || !ahash_request_isvirt(req))))
return op(req);
- state = &state0;
+ if (update && ahash_request_hasvirt(req)) {
+ gfp_t gfp;
+ u32 flags;
+ flags = ahash_request_flags(req);
+ gfp = (flags & CRYPTO_TFM_REQ_MAY_SLEEP) ?
+ GFP_KERNEL : GFP_ATOMIC;
+ page = (void *)__get_free_page(gfp);
+ err = -ENOMEM;
+ if (!page)
+ goto out_set_chain;
+ }
+
+ state = &state0;
if (ahash_is_async(tfm)) {
err = ahash_save_req(req, ahash_reqchain_done);
- if (err) {
- struct ahash_request *r2;
-
- req->base.err = err;
- list_for_each_entry(r2, &req->base.list, base.list)
- r2->base.err = err;
-
- return err;
- }
+ if (err)
+ goto out_free_page;
state = req->base.data;
}
state->op = op;
state->cur = req;
+ state->page = page;
+ state->offset = 0;
+ state->nbytes = 0;
INIT_LIST_HEAD(&state->head);
list_splice(&req->base.list, &state->head);
+ if (page)
+ sg_init_one(&state->sg, page, PAGE_SIZE);
+
+ if (update && ahash_request_isvirt(req) && req->nbytes) {
+ unsigned len = req->nbytes;
+ u8 *result = req->result;
+
+ state->src = req->svirt;
+ state->nbytes = len;
+
+ len = min(PAGE_SIZE, len);
+
+ memcpy(page, req->svirt, len);
+ state->offset = len;
+
+ ahash_request_set_crypt(req, &state->sg, result, len);
+ }
+
err = op(req);
if (err == -EBUSY || err == -EINPROGRESS)
return -EBUSY;
return ahash_reqchain_finish(state, err, ~0);
+
+out_free_page:
+ if (page) {
+ memset(page, 0, PAGE_SIZE);
+ free_page((unsigned long)page);
+ }
+
+out_set_chain:
+ req->base.err = err;
+ list_for_each_entry(r2, &req->base.list, base.list)
+ r2->base.err = err;
+
+ return err;
}
int crypto_ahash_init(struct ahash_request *req)
@@ -419,15 +566,19 @@ static int ahash_save_req(struct ahash_request *req, crypto_completion_t cplt)
req->base.complete = cplt;
req->base.data = state;
state->req0 = req;
+ state->page = NULL;
return 0;
}
-static void ahash_restore_req(struct ahash_request *req)
+static void ahash_restore_req(struct ahash_save_req_state *state)
{
- struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
- struct ahash_save_req_state *state;
+ struct ahash_request *req = state->req0;
+ struct crypto_ahash *tfm;
+ free_page((unsigned long)state->page);
+
+ tfm = crypto_ahash_reqtfm(req);
if (!ahash_is_async(tfm))
return;
@@ -515,13 +666,74 @@ int crypto_ahash_finup(struct ahash_request *req)
return 0;
}
- if (!crypto_ahash_alg(tfm)->finup)
+ if (!crypto_ahash_alg(tfm)->finup ||
+ (!crypto_ahash_req_chain(tfm) && ahash_request_hasvirt(req)))
return ahash_def_finup(req);
return ahash_do_req_chain(req, crypto_ahash_alg(tfm)->finup);
}
EXPORT_SYMBOL_GPL(crypto_ahash_finup);
+static int ahash_def_digest_finish(struct ahash_save_req_state *state, int err)
+{
+ struct ahash_request *req = state->req0;
+ struct crypto_ahash *tfm;
+
+ if (err)
+ goto out;
+
+ tfm = crypto_ahash_reqtfm(req);
+ if (ahash_is_async(tfm))
+ req->base.complete = ahash_def_finup_done1;
+
+ err = crypto_ahash_update(req);
+ if (err == -EINPROGRESS || err == -EBUSY)
+ return err;
+
+out:
+ ahash_restore_req(state);
+ return err;
+}
+
+static void ahash_def_digest_done(void *data, int err)
+{
+ struct ahash_save_req_state *state0 = data;
+ struct ahash_save_req_state state;
+ struct ahash_request *areq;
+
+ state = *state0;
+ areq = state.req0;
+ if (err == -EINPROGRESS)
+ goto out;
+
+ areq->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
+
+ err = ahash_def_digest_finish(state0, err);
+ if (err == -EINPROGRESS || err == -EBUSY)
+ return;
+
+out:
+ state.compl(state.data, err);
+}
+
+static int ahash_def_digest(struct ahash_request *req)
+{
+ struct ahash_save_req_state *state;
+ int err;
+
+ err = ahash_save_req(req, ahash_def_digest_done);
+ if (err)
+ return err;
+
+ state = req->base.data;
+
+ err = crypto_ahash_init(req);
+ if (err == -EINPROGRESS || err == -EBUSY)
+ return err;
+
+ return ahash_def_digest_finish(state, err);
+}
+
int crypto_ahash_digest(struct ahash_request *req)
{
struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
@@ -545,6 +757,9 @@ int crypto_ahash_digest(struct ahash_request *req)
return 0;
}
+ if (!crypto_ahash_req_chain(tfm) && ahash_request_hasvirt(req))
+ return ahash_def_digest(req);
+
if (crypto_ahash_get_flags(tfm) & CRYPTO_TFM_NEED_KEY)
return -ENOKEY;
@@ -560,17 +775,19 @@ static void ahash_def_finup_done2(void *data, int err)
if (err == -EINPROGRESS)
return;
- ahash_restore_req(areq);
+ ahash_restore_req(state);
ahash_request_complete(areq, err);
}
-static int ahash_def_finup_finish1(struct ahash_request *req, int err)
+static int ahash_def_finup_finish1(struct ahash_save_req_state *state, int err)
{
- struct crypto_ahash *tfm = crypto_ahash_reqtfm(req);
+ struct ahash_request *req = state->req0;
+ struct crypto_ahash *tfm;
if (err)
goto out;
+ tfm = crypto_ahash_reqtfm(req);
if (ahash_is_async(tfm))
req->base.complete = ahash_def_finup_done2;
@@ -579,7 +796,7 @@ static int ahash_def_finup_finish1(struct ahash_request *req, int err)
return err;
out:
- ahash_restore_req(req);
+ ahash_restore_req(state);
return err;
}
@@ -596,7 +813,7 @@ static void ahash_def_finup_done1(void *data, int err)
areq->base.flags &= ~CRYPTO_TFM_REQ_MAY_SLEEP;
- err = ahash_def_finup_finish1(areq, err);
+ err = ahash_def_finup_finish1(state0, err);
if (err == -EINPROGRESS || err == -EBUSY)
return;
@@ -606,17 +823,20 @@ static void ahash_def_finup_done1(void *data, int err)
static int ahash_def_finup(struct ahash_request *req)
{
+ struct ahash_save_req_state *state;
int err;
err = ahash_save_req(req, ahash_def_finup_done1);
if (err)
return err;
+ state = req->base.data;
+
err = crypto_ahash_update(req);
if (err == -EINPROGRESS || err == -EBUSY)
return err;
- return ahash_def_finup_finish1(req, err);
+ return ahash_def_finup_finish1(state, err);
}
int crypto_ahash_export(struct ahash_request *req, void *out)
diff --git a/include/crypto/hash.h b/include/crypto/hash.h
index de5e5dcd0c95..0cdadd48d068 100644
--- a/include/crypto/hash.h
+++ b/include/crypto/hash.h
@@ -12,6 +12,9 @@
#include <linux/crypto.h>
#include <linux/string.h>
+/* Set this bit for virtual address instead of SG list. */
+#define CRYPTO_AHASH_REQ_VIRT 0x00000001
+
struct crypto_ahash;
/**
@@ -52,7 +55,10 @@ struct ahash_request {
struct crypto_async_request base;
unsigned int nbytes;
- struct scatterlist *src;
+ union {
+ struct scatterlist *src;
+ const u8 *svirt;
+ };
u8 *result;
void *__ctx[] CRYPTO_MINALIGN_ATTR;
@@ -619,10 +625,13 @@ static inline void ahash_request_set_callback(struct ahash_request *req,
crypto_completion_t compl,
void *data)
{
+ u32 keep = CRYPTO_TFM_REQ_CHAIN | CRYPTO_AHASH_REQ_VIRT;
+
req->base.complete = compl;
req->base.data = data;
- flags &= ~CRYPTO_TFM_REQ_CHAIN;
- req->base.flags = flags;
+ flags &= ~keep;
+ req->base.flags &= ~keep;
+ req->base.flags |= flags;
}
/**
@@ -645,6 +654,30 @@ static inline void ahash_request_set_crypt(struct ahash_request *req,
req->src = src;
req->nbytes = nbytes;
req->result = result;
+ req->base.flags &= ~CRYPTO_AHASH_REQ_VIRT;
+}
+
+/**
+ * ahash_request_set_virt() - set virtual address data buffers
+ * @req: ahash_request handle to be updated
+ * @src: source virtual address
+ * @result: buffer that is filled with the message digest -- the caller must
+ * ensure that the buffer has sufficient space by, for example, calling
+ * crypto_ahash_digestsize()
+ * @nbytes: number of bytes to process from the source virtual address
+ *
+ * By using this call, the caller references the source virtual address.
+ * The source virtual address points to the data the message digest is to
+ * be calculated for.
+ */
+static inline void ahash_request_set_virt(struct ahash_request *req,
+ const u8 *src, u8 *result,
+ unsigned int nbytes)
+{
+ req->svirt = src;
+ req->nbytes = nbytes;
+ req->result = result;
+ req->base.flags |= CRYPTO_AHASH_REQ_VIRT;
}
static inline void ahash_reqchain_init(struct ahash_request *req,
diff --git a/include/crypto/internal/hash.h b/include/crypto/internal/hash.h
index 81542a48587e..195d6aeeede3 100644
--- a/include/crypto/internal/hash.h
+++ b/include/crypto/internal/hash.h
@@ -15,7 +15,7 @@ struct ahash_request;
struct scatterlist;
struct crypto_hash_walk {
- char *data;
+ const char *data;
unsigned int offset;
unsigned int flags;
@@ -275,6 +275,11 @@ static inline bool ahash_request_chained(struct ahash_request *req)
return crypto_request_chained(&req->base);
}
+static inline bool ahash_request_isvirt(struct ahash_request *req)
+{
+ return req->base.flags & CRYPTO_AHASH_REQ_VIRT;
+}
+
static inline bool crypto_ahash_req_chain(struct crypto_ahash *tfm)
{
return crypto_tfm_req_chain(&tfm->base);
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 6126c57b8452..55fd77658b3e 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -126,7 +126,7 @@
*/
#define CRYPTO_ALG_FIPS_INTERNAL 0x00020000
-/* Set if the algorithm supports request chains. */
+/* Set if the algorithm supports request chains and virtual addresses. */
#define CRYPTO_ALG_REQ_CHAIN 0x00040000
/*
--
2.39.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 5/6] crypto: ahash - Set default reqsize from ahash_alg
2024-10-27 9:45 [PATCH 0/6] Multibuffer hashing take two Herbert Xu
` (3 preceding siblings ...)
2024-10-27 9:45 ` [PATCH 4/6] crypto: ahash - Add virtual address support Herbert Xu
@ 2024-10-27 9:45 ` Herbert Xu
2024-10-27 9:45 ` [PATCH 6/6] crypto: x86/sha2 - Restore multibuffer AVX2 support Herbert Xu
2024-10-28 19:00 ` [PATCH 0/6] Multibuffer hashing take two Eric Biggers
6 siblings, 0 replies; 10+ messages in thread
From: Herbert Xu @ 2024-10-27 9:45 UTC (permalink / raw)
To: Linux Crypto Mailing List
Cc: Eric Biggers, Ard Biesheuvel, Megha Dey, Tim Chen
Add a reqsize field to struct ahash_alg and use it to set the
default reqsize so that algorithms with a static reqsize are
not forced to create an init_tfm function.
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
crypto/ahash.c | 4 ++++
include/crypto/hash.h | 3 +++
2 files changed, 7 insertions(+)
diff --git a/crypto/ahash.c b/crypto/ahash.c
index b1c468797990..52dd45d6aeab 100644
--- a/crypto/ahash.c
+++ b/crypto/ahash.c
@@ -875,6 +875,7 @@ static int crypto_ahash_init_tfm(struct crypto_tfm *tfm)
struct ahash_alg *alg = crypto_ahash_alg(hash);
crypto_ahash_set_statesize(hash, alg->halg.statesize);
+ crypto_ahash_set_reqsize(hash, alg->reqsize);
if (tfm->__crt_alg->cra_type == &crypto_shash_type)
return crypto_init_ahash_using_shash(tfm);
@@ -1040,6 +1041,9 @@ static int ahash_prepare_alg(struct ahash_alg *alg)
if (alg->halg.statesize == 0)
return -EINVAL;
+ if (alg->reqsize && alg->reqsize < alg->halg.statesize)
+ return -EINVAL;
+
err = hash_prepare_alg(&alg->halg);
if (err)
return err;
diff --git a/include/crypto/hash.h b/include/crypto/hash.h
index 0cdadd48d068..db232070a317 100644
--- a/include/crypto/hash.h
+++ b/include/crypto/hash.h
@@ -135,6 +135,7 @@ struct ahash_request {
* This is a counterpart to @init_tfm, used to remove
* various changes set in @init_tfm.
* @clone_tfm: Copy transform into new object, may allocate memory.
+ * @reqsize: Size of the request context.
* @halg: see struct hash_alg_common
*/
struct ahash_alg {
@@ -151,6 +152,8 @@ struct ahash_alg {
void (*exit_tfm)(struct crypto_ahash *tfm);
int (*clone_tfm)(struct crypto_ahash *dst, struct crypto_ahash *src);
+ unsigned int reqsize;
+
struct hash_alg_common halg;
};
--
2.39.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* [PATCH 6/6] crypto: x86/sha2 - Restore multibuffer AVX2 support
2024-10-27 9:45 [PATCH 0/6] Multibuffer hashing take two Herbert Xu
` (4 preceding siblings ...)
2024-10-27 9:45 ` [PATCH 5/6] crypto: ahash - Set default reqsize from ahash_alg Herbert Xu
@ 2024-10-27 9:45 ` Herbert Xu
2024-10-28 19:00 ` [PATCH 0/6] Multibuffer hashing take two Eric Biggers
6 siblings, 0 replies; 10+ messages in thread
From: Herbert Xu @ 2024-10-27 9:45 UTC (permalink / raw)
To: Linux Crypto Mailing List
Cc: Eric Biggers, Ard Biesheuvel, Megha Dey, Tim Chen
Resurrect the old multibuffer AVX2 code removed by commit ab8085c130ed
("crypto: x86 - remove SHA multibuffer routines and mcryptd") using
the new request chaining interface.
This is purely a proof of concept and only meant to illustrate the
utility of the new API rather than a serious attempt at improving
the performance.
However, it is interesting to note that with x8 multibuffer the
performance of AVX2 is on par with SHA-NI.
testing speed of multibuffer sha256 (sha256-avx2)
tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 1 operation in 184 cycles (16 bytes)
tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 1 operation in 165 cycles (64 bytes)
tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 1 operation in 444 cycles (256 bytes)
tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 1 operation in 1549 cycles (1024 bytes)
tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 1 operation in 3060 cycles (2048 bytes)
tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 1 operation in 5983 cycles (4096 bytes)
tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 1 operation in 11980 cycles (8192 bytes)
tcrypt: testing speed of async sha256 (sha256-avx2)
tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 475 cycles/operation, 29 cycles/byte
tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 780 cycles/operation, 12 cycles/byte
tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 1872 cycles/operation, 7 cycles/byte
tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 5416 cycles/operation, 5 cycles/byte
tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 10339 cycles/operation, 5 cycles/byte
tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 20214 cycles/operation, 4 cycles/byte
tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 40042 cycles/operation, 4 cycles/byte
tcrypt: testing speed of async sha256-ni (sha256-ni)
tcrypt: test 0 ( 16 byte blocks, 16 bytes per update, 1 updates): 207 cycles/operation, 12 cycles/byte
tcrypt: test 2 ( 64 byte blocks, 64 bytes per update, 1 updates): 299 cycles/operation, 4 cycles/byte
tcrypt: test 5 ( 256 byte blocks, 256 bytes per update, 1 updates): 543 cycles/operation, 2 cycles/byte
tcrypt: test 8 ( 1024 byte blocks, 1024 bytes per update, 1 updates): 1523 cycles/operation, 1 cycles/byte
tcrypt: test 12 ( 2048 byte blocks, 2048 bytes per update, 1 updates): 2835 cycles/operation, 1 cycles/byte
tcrypt: test 16 ( 4096 byte blocks, 4096 bytes per update, 1 updates): 5459 cycles/operation, 1 cycles/byte
tcrypt: test 21 ( 8192 byte blocks, 8192 bytes per update, 1 updates): 10724 cycles/operation, 1 cycles/byte
Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au>
---
arch/x86/crypto/Makefile | 2 +-
arch/x86/crypto/sha256_mb_mgr_datastruct.S | 304 +++++++++++
arch/x86/crypto/sha256_ssse3_glue.c | 523 ++++++++++++++++--
arch/x86/crypto/sha256_x8_avx2.S | 596 +++++++++++++++++++++
4 files changed, 1382 insertions(+), 43 deletions(-)
create mode 100644 arch/x86/crypto/sha256_mb_mgr_datastruct.S
create mode 100644 arch/x86/crypto/sha256_x8_avx2.S
diff --git a/arch/x86/crypto/Makefile b/arch/x86/crypto/Makefile
index 53b4a277809e..e60abbfb6467 100644
--- a/arch/x86/crypto/Makefile
+++ b/arch/x86/crypto/Makefile
@@ -60,7 +60,7 @@ sha1-ssse3-y := sha1_avx2_x86_64_asm.o sha1_ssse3_asm.o sha1_ssse3_glue.o
sha1-ssse3-$(CONFIG_AS_SHA1_NI) += sha1_ni_asm.o
obj-$(CONFIG_CRYPTO_SHA256_SSSE3) += sha256-ssse3.o
-sha256-ssse3-y := sha256-ssse3-asm.o sha256-avx-asm.o sha256-avx2-asm.o sha256_ssse3_glue.o
+sha256-ssse3-y := sha256-ssse3-asm.o sha256-avx-asm.o sha256-avx2-asm.o sha256_ssse3_glue.o sha256_x8_avx2.o
sha256-ssse3-$(CONFIG_AS_SHA256_NI) += sha256_ni_asm.o
obj-$(CONFIG_CRYPTO_SHA512_SSSE3) += sha512-ssse3.o
diff --git a/arch/x86/crypto/sha256_mb_mgr_datastruct.S b/arch/x86/crypto/sha256_mb_mgr_datastruct.S
new file mode 100644
index 000000000000..5c377bac21d0
--- /dev/null
+++ b/arch/x86/crypto/sha256_mb_mgr_datastruct.S
@@ -0,0 +1,304 @@
+/*
+ * Header file for multi buffer SHA256 algorithm data structure
+ *
+ * This file is provided under a dual BSD/GPLv2 license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2016 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * Contact Information:
+ * Megha Dey <megha.dey@linux.intel.com>
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2016 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+# Macros for defining data structures
+
+# Usage example
+
+#START_FIELDS # JOB_AES
+### name size align
+#FIELD _plaintext, 8, 8 # pointer to plaintext
+#FIELD _ciphertext, 8, 8 # pointer to ciphertext
+#FIELD _IV, 16, 8 # IV
+#FIELD _keys, 8, 8 # pointer to keys
+#FIELD _len, 4, 4 # length in bytes
+#FIELD _status, 4, 4 # status enumeration
+#FIELD _user_data, 8, 8 # pointer to user data
+#UNION _union, size1, align1, \
+# size2, align2, \
+# size3, align3, \
+# ...
+#END_FIELDS
+#%assign _JOB_AES_size _FIELD_OFFSET
+#%assign _JOB_AES_align _STRUCT_ALIGN
+
+#########################################################################
+
+# Alternate "struc-like" syntax:
+# STRUCT job_aes2
+# RES_Q .plaintext, 1
+# RES_Q .ciphertext, 1
+# RES_DQ .IV, 1
+# RES_B .nested, _JOB_AES_SIZE, _JOB_AES_ALIGN
+# RES_U .union, size1, align1, \
+# size2, align2, \
+# ...
+# ENDSTRUCT
+# # Following only needed if nesting
+# %assign job_aes2_size _FIELD_OFFSET
+# %assign job_aes2_align _STRUCT_ALIGN
+#
+# RES_* macros take a name, a count and an optional alignment.
+# The count in in terms of the base size of the macro, and the
+# default alignment is the base size.
+# The macros are:
+# Macro Base size
+# RES_B 1
+# RES_W 2
+# RES_D 4
+# RES_Q 8
+# RES_DQ 16
+# RES_Y 32
+# RES_Z 64
+#
+# RES_U defines a union. It's arguments are a name and two or more
+# pairs of "size, alignment"
+#
+# The two assigns are only needed if this structure is being nested
+# within another. Even if the assigns are not done, one can still use
+# STRUCT_NAME_size as the size of the structure.
+#
+# Note that for nesting, you still need to assign to STRUCT_NAME_size.
+#
+# The differences between this and using "struc" directly are that each
+# type is implicitly aligned to its natural length (although this can be
+# over-ridden with an explicit third parameter), and that the structure
+# is padded at the end to its overall alignment.
+#
+
+#########################################################################
+
+#ifndef _DATASTRUCT_ASM_
+#define _DATASTRUCT_ASM_
+
+#define SZ8 8*SHA256_DIGEST_WORD_SIZE
+#define ROUNDS 64*SZ8
+#define PTR_SZ 8
+#define SHA256_DIGEST_WORD_SIZE 4
+#define MAX_SHA256_LANES 8
+#define SHA256_DIGEST_WORDS 8
+#define SHA256_DIGEST_ROW_SIZE (MAX_SHA256_LANES * SHA256_DIGEST_WORD_SIZE)
+#define SHA256_DIGEST_SIZE (SHA256_DIGEST_ROW_SIZE * SHA256_DIGEST_WORDS)
+#define SHA256_BLK_SZ 64
+
+# START_FIELDS
+.macro START_FIELDS
+ _FIELD_OFFSET = 0
+ _STRUCT_ALIGN = 0
+.endm
+
+# FIELD name size align
+.macro FIELD name size align
+ _FIELD_OFFSET = (_FIELD_OFFSET + (\align) - 1) & (~ ((\align)-1))
+ \name = _FIELD_OFFSET
+ _FIELD_OFFSET = _FIELD_OFFSET + (\size)
+.if (\align > _STRUCT_ALIGN)
+ _STRUCT_ALIGN = \align
+.endif
+.endm
+
+# END_FIELDS
+.macro END_FIELDS
+ _FIELD_OFFSET = (_FIELD_OFFSET + _STRUCT_ALIGN-1) & (~ (_STRUCT_ALIGN-1))
+.endm
+
+########################################################################
+
+.macro STRUCT p1
+START_FIELDS
+.struc \p1
+.endm
+
+.macro ENDSTRUCT
+ tmp = _FIELD_OFFSET
+ END_FIELDS
+ tmp = (_FIELD_OFFSET - %%tmp)
+.if (tmp > 0)
+ .lcomm tmp
+.endif
+.endstruc
+.endm
+
+## RES_int name size align
+.macro RES_int p1 p2 p3
+ name = \p1
+ size = \p2
+ align = .\p3
+
+ _FIELD_OFFSET = (_FIELD_OFFSET + (align) - 1) & (~ ((align)-1))
+.align align
+.lcomm name size
+ _FIELD_OFFSET = _FIELD_OFFSET + (size)
+.if (align > _STRUCT_ALIGN)
+ _STRUCT_ALIGN = align
+.endif
+.endm
+
+# macro RES_B name, size [, align]
+.macro RES_B _name, _size, _align=1
+RES_int _name _size _align
+.endm
+
+# macro RES_W name, size [, align]
+.macro RES_W _name, _size, _align=2
+RES_int _name 2*(_size) _align
+.endm
+
+# macro RES_D name, size [, align]
+.macro RES_D _name, _size, _align=4
+RES_int _name 4*(_size) _align
+.endm
+
+# macro RES_Q name, size [, align]
+.macro RES_Q _name, _size, _align=8
+RES_int _name 8*(_size) _align
+.endm
+
+# macro RES_DQ name, size [, align]
+.macro RES_DQ _name, _size, _align=16
+RES_int _name 16*(_size) _align
+.endm
+
+# macro RES_Y name, size [, align]
+.macro RES_Y _name, _size, _align=32
+RES_int _name 32*(_size) _align
+.endm
+
+# macro RES_Z name, size [, align]
+.macro RES_Z _name, _size, _align=64
+RES_int _name 64*(_size) _align
+.endm
+
+#endif
+
+
+########################################################################
+#### Define SHA256 Out Of Order Data Structures
+########################################################################
+
+START_FIELDS # LANE_DATA
+### name size align
+FIELD _job_in_lane, 8, 8 # pointer to job object
+END_FIELDS
+
+ _LANE_DATA_size = _FIELD_OFFSET
+ _LANE_DATA_align = _STRUCT_ALIGN
+
+########################################################################
+
+START_FIELDS # SHA256_ARGS_X4
+### name size align
+FIELD _digest, 4*8*8, 4 # transposed digest
+FIELD _data_ptr, 8*8, 8 # array of pointers to data
+END_FIELDS
+
+ _SHA256_ARGS_X4_size = _FIELD_OFFSET
+ _SHA256_ARGS_X4_align = _STRUCT_ALIGN
+ _SHA256_ARGS_X8_size = _FIELD_OFFSET
+ _SHA256_ARGS_X8_align = _STRUCT_ALIGN
+
+#######################################################################
+
+START_FIELDS # MB_MGR
+### name size align
+FIELD _args, _SHA256_ARGS_X4_size, _SHA256_ARGS_X4_align
+FIELD _lens, 4*8, 8
+FIELD _unused_lanes, 8, 8
+FIELD _ldata, _LANE_DATA_size*8, _LANE_DATA_align
+END_FIELDS
+
+ _MB_MGR_size = _FIELD_OFFSET
+ _MB_MGR_align = _STRUCT_ALIGN
+
+_args_digest = _args + _digest
+_args_data_ptr = _args + _data_ptr
+
+#######################################################################
+
+START_FIELDS #STACK_FRAME
+### name size align
+FIELD _data, 16*SZ8, 1 # transposed digest
+FIELD _digest, 8*SZ8, 1 # array of pointers to data
+FIELD _ytmp, 4*SZ8, 1
+FIELD _rsp, 8, 1
+END_FIELDS
+
+ _STACK_FRAME_size = _FIELD_OFFSET
+ _STACK_FRAME_align = _STRUCT_ALIGN
+
+#######################################################################
+
+########################################################################
+#### Define constants
+########################################################################
+
+#define STS_UNKNOWN 0
+#define STS_BEING_PROCESSED 1
+#define STS_COMPLETED 2
+
+########################################################################
+#### Define JOB_SHA256 structure
+########################################################################
+
+START_FIELDS # JOB_SHA256
+
+### name size align
+FIELD _buffer, 8, 8 # pointer to buffer
+FIELD _len, 8, 8 # length in bytes
+FIELD _result_digest, 8*4, 32 # Digest (output)
+FIELD _status, 4, 4
+FIELD _user_data, 8, 8
+END_FIELDS
+
+ _JOB_SHA256_size = _FIELD_OFFSET
+ _JOB_SHA256_align = _STRUCT_ALIGN
diff --git a/arch/x86/crypto/sha256_ssse3_glue.c b/arch/x86/crypto/sha256_ssse3_glue.c
index e04a43d9f7d5..78055bd78b31 100644
--- a/arch/x86/crypto/sha256_ssse3_glue.c
+++ b/arch/x86/crypto/sha256_ssse3_glue.c
@@ -41,8 +41,24 @@
#include <asm/cpu_device_id.h>
#include <asm/simd.h>
+struct sha256_x8_mbctx {
+ u32 state[8][8];
+ const u8 *input[8];
+};
+
+struct sha256_reqctx {
+ struct sha256_state state;
+ struct crypto_hash_walk walk;
+ const u8 *input;
+ unsigned int total;
+ unsigned int next;
+};
+
asmlinkage void sha256_transform_ssse3(struct sha256_state *state,
const u8 *data, int blocks);
+asmlinkage void sha256_transform_rorx(struct sha256_state *state,
+ const u8 *data, int blocks);
+asmlinkage void sha256_x8_avx2(struct sha256_x8_mbctx *mbctx, int blocks);
static const struct x86_cpu_id module_cpu_ids[] = {
#ifdef CONFIG_AS_SHA256_NI
@@ -55,14 +71,69 @@ static const struct x86_cpu_id module_cpu_ids[] = {
};
MODULE_DEVICE_TABLE(x86cpu, module_cpu_ids);
-static int _sha256_update(struct shash_desc *desc, const u8 *data,
- unsigned int len, sha256_block_fn *sha256_xform)
+static int sha256_import(struct ahash_request *req, const void *in)
{
- struct sha256_state *sctx = shash_desc_ctx(desc);
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+ memcpy(&rctx->state, in, sizeof(rctx->state));
+ return 0;
+}
+
+static int sha256_export(struct ahash_request *req, void *out)
+{
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+
+ memcpy(out, &rctx->state, sizeof(rctx->state));
+ return 0;
+}
+
+static int sha256_ahash_init(struct ahash_request *req)
+{
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+ struct ahash_request *r2;
+
+ sha256_init(&rctx->state);
+
+ if (!ahash_request_chained(req))
+ return 0;
+
+ req->base.err = 0;
+ list_for_each_entry(r2, &req->base.list, base.list) {
+ r2->base.err = 0;
+ rctx = ahash_request_ctx(r2);
+ sha256_init(&rctx->state);
+ }
+
+ return 0;
+}
+
+static int sha224_ahash_init(struct ahash_request *req)
+{
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+ struct ahash_request *r2;
+
+ sha224_init(&rctx->state);
+
+ if (!ahash_request_chained(req))
+ return 0;
+
+ req->base.err = 0;
+ list_for_each_entry(r2, &req->base.list, base.list) {
+ rctx = ahash_request_ctx(r2);
+ sha224_init(&rctx->state);
+ }
+
+ return 0;
+}
+
+static void __sha256_update(struct sha256_state *sctx, const u8 *data,
+ unsigned int len, sha256_block_fn *sha256_xform)
+{
if (!crypto_simd_usable() ||
- (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE)
- return crypto_sha256_update(desc, data, len);
+ (sctx->count % SHA256_BLOCK_SIZE) + len < SHA256_BLOCK_SIZE) {
+ sha256_update(sctx, data, len);
+ return;
+ }
/*
* Make sure struct sha256_state begins directly with the SHA256
@@ -71,25 +142,97 @@ static int _sha256_update(struct shash_desc *desc, const u8 *data,
BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
kernel_fpu_begin();
- sha256_base_do_update(desc, data, len, sha256_xform);
+ lib_sha256_base_do_update(sctx, data, len, sha256_xform);
+ kernel_fpu_end();
+}
+
+static int _sha256_update(struct shash_desc *desc, const u8 *data,
+ unsigned int len, sha256_block_fn *sha256_xform)
+{
+ __sha256_update(shash_desc_ctx(desc), data, len, sha256_xform);
+ return 0;
+}
+
+static int sha256_ahash_update(struct ahash_request *req,
+ sha256_block_fn *sha256_xform)
+{
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+ struct crypto_hash_walk *walk = &rctx->walk;
+ struct sha256_state *state = &rctx->state;
+ int nbytes;
+
+ /*
+ * Make sure struct sha256_state begins directly with the SHA256
+ * 256-bit internal state, as this is what the asm functions expect.
+ */
+ BUILD_BUG_ON(offsetof(struct sha256_state, state) != 0);
+
+ for (nbytes = crypto_hash_walk_first(req, walk); nbytes > 0;
+ nbytes = crypto_hash_walk_done(walk, 0))
+ __sha256_update(state, walk->data, nbytes, sha256_transform_rorx);
+
+ return nbytes;
+}
+
+static void _sha256_finup(struct sha256_state *state, const u8 *data,
+ unsigned int len, u8 *out, unsigned int ds,
+ sha256_block_fn *sha256_xform)
+{
+ if (!crypto_simd_usable()) {
+ sha256_update(state, data, len);
+ if (ds == SHA224_DIGEST_SIZE)
+ sha224_final(state, out);
+ else
+ sha256_final(state, out);
+ return;
+ }
+
+ kernel_fpu_begin();
+ if (len)
+ lib_sha256_base_do_update(state, data, len, sha256_xform);
+ lib_sha256_base_do_finalize(state, sha256_xform);
kernel_fpu_end();
- return 0;
+ lib_sha256_base_finish(state, out, ds);
+}
+
+static int sha256_ahash_finup(struct ahash_request *req,
+ sha256_block_fn *sha256_xform)
+{
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+ struct crypto_hash_walk *walk = &rctx->walk;
+ struct sha256_state *state = &rctx->state;
+ unsigned int ds;
+ int nbytes;
+
+ ds = crypto_ahash_digestsize(crypto_ahash_reqtfm(req));
+ if (!req->nbytes) {
+ _sha256_finup(state, NULL, 0, req->result,
+ ds, sha256_transform_rorx);
+ return 0;
+ }
+
+ for (nbytes = crypto_hash_walk_first(req, walk); nbytes > 0;
+ nbytes = crypto_hash_walk_done(walk, 0)) {
+ if (crypto_hash_walk_last(walk)) {
+ _sha256_finup(state, walk->data, nbytes, req->result,
+ ds, sha256_transform_rorx);
+ continue;
+ }
+
+ __sha256_update(state, walk->data, nbytes, sha256_transform_rorx);
+ }
+
+ return nbytes;
}
static int sha256_finup(struct shash_desc *desc, const u8 *data,
unsigned int len, u8 *out, sha256_block_fn *sha256_xform)
{
- if (!crypto_simd_usable())
- return crypto_sha256_finup(desc, data, len, out);
+ unsigned int ds = crypto_shash_digestsize(desc->tfm);
- kernel_fpu_begin();
- if (len)
- sha256_base_do_update(desc, data, len, sha256_xform);
- sha256_base_do_finalize(desc, sha256_xform);
- kernel_fpu_end();
-
- return sha256_base_finish(desc, out);
+ _sha256_finup(shash_desc_ctx(desc), data, len, out, ds, sha256_xform);
+ return 0;
}
static int sha256_ssse3_update(struct shash_desc *desc, const u8 *data,
@@ -247,61 +390,357 @@ static void unregister_sha256_avx(void)
ARRAY_SIZE(sha256_avx_algs));
}
-asmlinkage void sha256_transform_rorx(struct sha256_state *state,
- const u8 *data, int blocks);
-
-static int sha256_avx2_update(struct shash_desc *desc, const u8 *data,
- unsigned int len)
+static int sha256_pad2(struct ahash_request *req)
{
- return _sha256_update(desc, data, len, sha256_transform_rorx);
+ const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+ struct sha256_state *state = &rctx->state;
+ unsigned int partial = state->count;
+ __be64 *bits;
+
+ if (rctx->total)
+ return 0;
+
+ rctx->total = 1;
+
+ partial %= SHA256_BLOCK_SIZE;
+ memset(state->buf + partial, 0, bit_offset - partial);
+ bits = (__be64 *)(state->buf + bit_offset);
+ *bits = cpu_to_be64(state->count << 3);
+
+ return SHA256_BLOCK_SIZE;
}
-static int sha256_avx2_finup(struct shash_desc *desc, const u8 *data,
- unsigned int len, u8 *out)
+static int sha256_pad1(struct ahash_request *req, bool final)
{
- return sha256_finup(desc, data, len, out, sha256_transform_rorx);
+ const int bit_offset = SHA256_BLOCK_SIZE - sizeof(__be64);
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+ struct sha256_state *state = &rctx->state;
+ unsigned int partial = state->count;
+
+ if (!final)
+ return 0;
+
+ rctx->total = 0;
+ rctx->input = state->buf;
+
+ partial %= SHA256_BLOCK_SIZE;
+ state->buf[partial++] = 0x80;
+
+ if (partial > bit_offset) {
+ memset(state->buf + partial, 0, SHA256_BLOCK_SIZE - partial);
+ return SHA256_BLOCK_SIZE;
+ }
+
+ return sha256_pad2(req);
}
-static int sha256_avx2_final(struct shash_desc *desc, u8 *out)
+static int sha256_mb_start(struct ahash_request *req, bool final)
{
- return sha256_avx2_finup(desc, NULL, 0, out);
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+ struct sha256_state *state = &rctx->state;
+ unsigned int partial;
+ int nbytes;
+
+ nbytes = crypto_hash_walk_first(req, &rctx->walk);
+ if (!nbytes)
+ return sha256_pad1(req, final);
+
+ rctx->input = rctx->walk.data;
+
+ partial = state->count % SHA256_BLOCK_SIZE;
+ while (partial + nbytes < SHA256_BLOCK_SIZE) {
+ memcpy(state->buf + partial, rctx->input, nbytes);
+ state->count += nbytes;
+ partial += nbytes;
+
+ nbytes = crypto_hash_walk_done(&rctx->walk, 0);
+ if (!nbytes)
+ return sha256_pad1(req, final);
+
+ rctx->input = rctx->walk.data;
+ }
+
+ rctx->total = nbytes;
+ if (nbytes == 1) {
+ rctx->total = 0;
+ state->count++;
+ }
+
+ if (partial) {
+ unsigned int offset = SHA256_BLOCK_SIZE - partial;
+
+ memcpy(state->buf + partial, rctx->input, offset);
+ rctx->input = state->buf;
+
+ return SHA256_BLOCK_SIZE;
+ }
+
+ return nbytes;
}
-static int sha256_avx2_digest(struct shash_desc *desc, const u8 *data,
- unsigned int len, u8 *out)
+static int sha256_mb_next(struct ahash_request *req, unsigned int len,
+ bool final)
{
- return sha256_base_init(desc) ?:
- sha256_avx2_finup(desc, data, len, out);
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+ struct sha256_state *state = &rctx->state;
+ unsigned int partial;
+
+ if (rctx->input != state->buf) {
+ rctx->input += len;
+ rctx->total -= len;
+ state->count += len;
+ } else if (rctx->total > 1) {
+ unsigned int offset;
+
+ offset = SHA256_BLOCK_SIZE - state->count % SHA256_BLOCK_SIZE;
+ rctx->input = rctx->walk.data + offset;
+ rctx->total -= offset;
+ state->count += offset;
+ } else
+ return sha256_pad2(req);
+
+ partial = 0;
+ while (partial + rctx->total < SHA256_BLOCK_SIZE) {
+ memcpy(state->buf + partial, rctx->input, rctx->total);
+ state->count += rctx->total;
+ partial += rctx->total;
+
+ rctx->total = crypto_hash_walk_done(&rctx->walk, 0);
+ if (!rctx->total)
+ return sha256_pad1(req, final);
+
+ rctx->input = rctx->walk.data;
+ }
+
+ return rctx->total;
}
-static struct shash_alg sha256_avx2_algs[] = { {
- .digestsize = SHA256_DIGEST_SIZE,
- .init = sha256_base_init,
+static void sha256_update_x8x1(struct list_head *list,
+ struct ahash_request *reqs[8], bool final)
+{
+ struct sha256_x8_mbctx mbctx;
+ unsigned int len = 0;
+ u32 *states[8];
+ int i = 0;
+
+ do {
+ struct sha256_reqctx *rctx = ahash_request_ctx(reqs[i]);
+ unsigned int nbytes;
+
+ nbytes = rctx->next;
+ if (!i || nbytes < len)
+ len = nbytes;
+
+ states[i] = rctx->state.state;
+ mbctx.input[i] = rctx->input;
+ } while (++i < 8 && reqs[i]);
+
+ for (; i < 8; i++) {
+ mbctx.input[i] = mbctx.input[0];
+ }
+
+ for (i = 0; i < 8; i++) {
+ int j;
+
+ for (j = 0; j < 8; j++)
+ mbctx.state[i][j] = states[j][i];
+ }
+
+ sha256_x8_avx2(&mbctx, len / SHA256_BLOCK_SIZE);
+
+ for (i = 0; i < 8; i++) {
+ int j;
+
+ for (j = 0; j < 8; j++)
+ states[i][j] = mbctx.state[j][i];
+ }
+
+ i = 0;
+ do {
+ struct sha256_reqctx *rctx = ahash_request_ctx(reqs[i]);
+
+ rctx->next = sha256_mb_next(reqs[i], len, final);
+
+ if (rctx->next) {
+ if (++i >= 8)
+ break;
+ continue;
+ }
+
+ if (i == 7 || !reqs[i + 1]) {
+ struct ahash_request *r2 = reqs[i];
+
+ reqs[i] = NULL;
+ do {
+ while (!list_is_last(&r2->base.list, list)) {
+ r2 = list_next_entry(r2, base.list);
+ r2->base.err = 0;
+
+ rctx = ahash_request_ctx(r2);
+ rctx->next = sha256_mb_start(r2, final);
+ if (rctx->next) {
+ reqs[i] = r2;
+ break;
+ }
+ }
+ } while (reqs[i] && ++i < 8);
+
+ break;
+ }
+
+ memmove(reqs + i, reqs + i + 1, sizeof(reqs[0]) * (7 - i));
+ reqs[7] = NULL;
+ } while (reqs[i]);
+}
+
+static void sha256_update_x8(struct list_head *list,
+ struct ahash_request *reqs[8],
+ bool final)
+{
+ do {
+ sha256_update_x8x1(list, reqs, final);
+ } while (reqs[0]);
+}
+
+static void sha256_chain(struct ahash_request *req, bool final)
+{
+ struct sha256_reqctx *rctx = ahash_request_ctx(req);
+ struct ahash_request *reqs[8];
+ struct ahash_request *r2;
+ int i;
+
+ req->base.err = 0;
+ reqs[0] = req;
+ rctx->next = sha256_mb_start(req, final);
+ i = !!rctx->next;
+ list_for_each_entry(r2, &req->base.list, base.list) {
+ r2->base.err = 0;
+
+ rctx = ahash_request_ctx(r2);
+ rctx->next = sha256_mb_start(r2, final);
+ if (!rctx->next)
+ continue;
+
+ reqs[i++] = r2;
+ if (i < 8)
+ continue;
+
+ sha256_update_x8(&req->base.list, reqs, final);
+ i = 0;
+ }
+
+ if (i) {
+ memset(reqs + i, 0, sizeof(reqs) * (8 - i));
+ sha256_update_x8(&req->base.list, reqs, final);
+ }
+
+ return;
+}
+
+static int sha256_avx2_update(struct ahash_request *req)
+{
+ struct ahash_request *r2;
+ int err;
+
+ if (ahash_request_chained(req) && crypto_simd_usable()) {
+ sha256_chain(req, false);
+ return 0;
+ }
+
+ err = sha256_ahash_update(req, sha256_transform_rorx);
+ if (!ahash_request_chained(req))
+ return err;
+
+ req->base.err = err;
+
+ list_for_each_entry(r2, &req->base.list, base.list) {
+ err = sha256_ahash_update(r2, sha256_transform_rorx);
+ r2->base.err = err;
+ }
+
+ return 0;
+}
+
+static int sha256_avx2_finup(struct ahash_request *req)
+{
+ struct ahash_request *r2;
+ int err;
+
+ if (ahash_request_chained(req) && crypto_simd_usable()) {
+ sha256_chain(req, true);
+ return 0;
+ }
+
+ err = sha256_ahash_finup(req, sha256_transform_rorx);
+ if (!ahash_request_chained(req))
+ return err;
+
+ req->base.err = err;
+
+ list_for_each_entry(r2, &req->base.list, base.list) {
+ err = sha256_ahash_finup(r2, sha256_transform_rorx);
+ r2->base.err = err;
+ }
+
+ return 0;
+}
+
+static int sha256_avx2_final(struct ahash_request *req)
+{
+ req->nbytes = 0;
+ return sha256_avx2_finup(req);
+}
+
+static int sha256_avx2_digest(struct ahash_request *req)
+{
+ return sha256_ahash_init(req) ?:
+ sha256_avx2_finup(req);
+}
+
+static int sha224_avx2_digest(struct ahash_request *req)
+{
+ return sha224_ahash_init(req) ?:
+ sha256_avx2_finup(req);
+}
+
+static struct ahash_alg sha256_avx2_algs[] = { {
+ .halg.digestsize = SHA256_DIGEST_SIZE,
+ .halg.statesize = sizeof(struct sha256_state),
+ .reqsize = sizeof(struct sha256_reqctx),
+ .init = sha256_ahash_init,
.update = sha256_avx2_update,
.final = sha256_avx2_final,
.finup = sha256_avx2_finup,
.digest = sha256_avx2_digest,
- .descsize = sizeof(struct sha256_state),
- .base = {
+ .import = sha256_import,
+ .export = sha256_export,
+ .halg.base = {
.cra_name = "sha256",
.cra_driver_name = "sha256-avx2",
.cra_priority = 170,
.cra_blocksize = SHA256_BLOCK_SIZE,
.cra_module = THIS_MODULE,
+ .cra_flags = CRYPTO_ALG_REQ_CHAIN,
}
}, {
- .digestsize = SHA224_DIGEST_SIZE,
- .init = sha224_base_init,
+ .halg.digestsize = SHA224_DIGEST_SIZE,
+ .halg.statesize = sizeof(struct sha256_state),
+ .reqsize = sizeof(struct sha256_reqctx),
+ .init = sha224_ahash_init,
.update = sha256_avx2_update,
.final = sha256_avx2_final,
.finup = sha256_avx2_finup,
- .descsize = sizeof(struct sha256_state),
- .base = {
+ .digest = sha224_avx2_digest,
+ .import = sha256_import,
+ .export = sha256_export,
+ .halg.base = {
.cra_name = "sha224",
.cra_driver_name = "sha224-avx2",
.cra_priority = 170,
.cra_blocksize = SHA224_BLOCK_SIZE,
.cra_module = THIS_MODULE,
+ .cra_flags = CRYPTO_ALG_REQ_CHAIN,
}
} };
@@ -317,7 +756,7 @@ static bool avx2_usable(void)
static int register_sha256_avx2(void)
{
if (avx2_usable())
- return crypto_register_shashes(sha256_avx2_algs,
+ return crypto_register_ahashes(sha256_avx2_algs,
ARRAY_SIZE(sha256_avx2_algs));
return 0;
}
@@ -325,7 +764,7 @@ static int register_sha256_avx2(void)
static void unregister_sha256_avx2(void)
{
if (avx2_usable())
- crypto_unregister_shashes(sha256_avx2_algs,
+ crypto_unregister_ahashes(sha256_avx2_algs,
ARRAY_SIZE(sha256_avx2_algs));
}
diff --git a/arch/x86/crypto/sha256_x8_avx2.S b/arch/x86/crypto/sha256_x8_avx2.S
new file mode 100644
index 000000000000..deb891b458c8
--- /dev/null
+++ b/arch/x86/crypto/sha256_x8_avx2.S
@@ -0,0 +1,596 @@
+/*
+ * Multi-buffer SHA256 algorithm hash compute routine
+ *
+ * This file is provided under a dual BSD/GPLv2 license. When using or
+ * redistributing this file, you may do so under either license.
+ *
+ * GPL LICENSE SUMMARY
+ *
+ * Copyright(c) 2016 Intel Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of version 2 of the GNU General Public License as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * General Public License for more details.
+ *
+ * Contact Information:
+ * Megha Dey <megha.dey@linux.intel.com>
+ *
+ * BSD LICENSE
+ *
+ * Copyright(c) 2016 Intel Corporation.
+ *
+ * Redistribution and use in source and binary forms, with or without
+ * modification, are permitted provided that the following conditions
+ * are met:
+ *
+ * * Redistributions of source code must retain the above copyright
+ * notice, this list of conditions and the following disclaimer.
+ * * Redistributions in binary form must reproduce the above copyright
+ * notice, this list of conditions and the following disclaimer in
+ * the documentation and/or other materials provided with the
+ * distribution.
+ * * Neither the name of Intel Corporation nor the names of its
+ * contributors may be used to endorse or promote products derived
+ * from this software without specific prior written permission.
+ *
+ * THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
+ * "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
+ * LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
+ * A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
+ * OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
+ * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
+ * LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
+ * DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
+ * THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
+ * (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+ * OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+ */
+
+#include <asm/frame.h>
+#include <linux/cfi_types.h>
+#include <linux/linkage.h>
+#include "sha256_mb_mgr_datastruct.S"
+
+## code to compute oct SHA256 using SSE-256
+## outer calling routine takes care of save and restore of XMM registers
+## Logic designed/laid out by JDG
+
+## Function clobbers: rax, rcx, rdx, rbx, rsi, rdi, r9-r15; %ymm0-15
+## Linux clobbers: rax rbx rcx rdx rsi r9 r10 r11 r12 r13 r14 r15
+## Linux preserves: rdi rbp r8
+##
+## clobbers %ymm0-15
+
+arg1 = %rdi
+arg2 = %rsi
+reg3 = %rcx
+reg4 = %rdx
+
+# Common definitions
+STATE = arg1
+INP_SIZE = arg2
+
+IDX = %rax
+ROUND = %rbx
+TBL = reg3
+
+inp0 = %r9
+inp1 = %r10
+inp2 = %r11
+inp3 = %r12
+inp4 = %r13
+inp5 = %r14
+inp6 = %r15
+inp7 = reg4
+
+a = %ymm0
+b = %ymm1
+c = %ymm2
+d = %ymm3
+e = %ymm4
+f = %ymm5
+g = %ymm6
+h = %ymm7
+
+T1 = %ymm8
+
+a0 = %ymm12
+a1 = %ymm13
+a2 = %ymm14
+TMP = %ymm15
+TMP0 = %ymm6
+TMP1 = %ymm7
+
+TT0 = %ymm8
+TT1 = %ymm9
+TT2 = %ymm10
+TT3 = %ymm11
+TT4 = %ymm12
+TT5 = %ymm13
+TT6 = %ymm14
+TT7 = %ymm15
+
+# Define stack usage
+
+# Assume stack aligned to 32 bytes before call
+# Therefore FRAMESZ mod 32 must be 32-8 = 24
+
+#define FRAMESZ 0x388
+
+#define VMOVPS vmovups
+
+# TRANSPOSE8 r0, r1, r2, r3, r4, r5, r6, r7, t0, t1
+# "transpose" data in {r0...r7} using temps {t0...t1}
+# Input looks like: {r0 r1 r2 r3 r4 r5 r6 r7}
+# r0 = {a7 a6 a5 a4 a3 a2 a1 a0}
+# r1 = {b7 b6 b5 b4 b3 b2 b1 b0}
+# r2 = {c7 c6 c5 c4 c3 c2 c1 c0}
+# r3 = {d7 d6 d5 d4 d3 d2 d1 d0}
+# r4 = {e7 e6 e5 e4 e3 e2 e1 e0}
+# r5 = {f7 f6 f5 f4 f3 f2 f1 f0}
+# r6 = {g7 g6 g5 g4 g3 g2 g1 g0}
+# r7 = {h7 h6 h5 h4 h3 h2 h1 h0}
+#
+# Output looks like: {r0 r1 r2 r3 r4 r5 r6 r7}
+# r0 = {h0 g0 f0 e0 d0 c0 b0 a0}
+# r1 = {h1 g1 f1 e1 d1 c1 b1 a1}
+# r2 = {h2 g2 f2 e2 d2 c2 b2 a2}
+# r3 = {h3 g3 f3 e3 d3 c3 b3 a3}
+# r4 = {h4 g4 f4 e4 d4 c4 b4 a4}
+# r5 = {h5 g5 f5 e5 d5 c5 b5 a5}
+# r6 = {h6 g6 f6 e6 d6 c6 b6 a6}
+# r7 = {h7 g7 f7 e7 d7 c7 b7 a7}
+#
+
+.macro TRANSPOSE8 r0 r1 r2 r3 r4 r5 r6 r7 t0 t1
+ # process top half (r0..r3) {a...d}
+ vshufps $0x44, \r1, \r0, \t0 # t0 = {b5 b4 a5 a4 b1 b0 a1 a0}
+ vshufps $0xEE, \r1, \r0, \r0 # r0 = {b7 b6 a7 a6 b3 b2 a3 a2}
+ vshufps $0x44, \r3, \r2, \t1 # t1 = {d5 d4 c5 c4 d1 d0 c1 c0}
+ vshufps $0xEE, \r3, \r2, \r2 # r2 = {d7 d6 c7 c6 d3 d2 c3 c2}
+ vshufps $0xDD, \t1, \t0, \r3 # r3 = {d5 c5 b5 a5 d1 c1 b1 a1}
+ vshufps $0x88, \r2, \r0, \r1 # r1 = {d6 c6 b6 a6 d2 c2 b2 a2}
+ vshufps $0xDD, \r2, \r0, \r0 # r0 = {d7 c7 b7 a7 d3 c3 b3 a3}
+ vshufps $0x88, \t1, \t0, \t0 # t0 = {d4 c4 b4 a4 d0 c0 b0 a0}
+
+ # use r2 in place of t0
+ # process bottom half (r4..r7) {e...h}
+ vshufps $0x44, \r5, \r4, \r2 # r2 = {f5 f4 e5 e4 f1 f0 e1 e0}
+ vshufps $0xEE, \r5, \r4, \r4 # r4 = {f7 f6 e7 e6 f3 f2 e3 e2}
+ vshufps $0x44, \r7, \r6, \t1 # t1 = {h5 h4 g5 g4 h1 h0 g1 g0}
+ vshufps $0xEE, \r7, \r6, \r6 # r6 = {h7 h6 g7 g6 h3 h2 g3 g2}
+ vshufps $0xDD, \t1, \r2, \r7 # r7 = {h5 g5 f5 e5 h1 g1 f1 e1}
+ vshufps $0x88, \r6, \r4, \r5 # r5 = {h6 g6 f6 e6 h2 g2 f2 e2}
+ vshufps $0xDD, \r6, \r4, \r4 # r4 = {h7 g7 f7 e7 h3 g3 f3 e3}
+ vshufps $0x88, \t1, \r2, \t1 # t1 = {h4 g4 f4 e4 h0 g0 f0 e0}
+
+ vperm2f128 $0x13, \r1, \r5, \r6 # h6...a6
+ vperm2f128 $0x02, \r1, \r5, \r2 # h2...a2
+ vperm2f128 $0x13, \r3, \r7, \r5 # h5...a5
+ vperm2f128 $0x02, \r3, \r7, \r1 # h1...a1
+ vperm2f128 $0x13, \r0, \r4, \r7 # h7...a7
+ vperm2f128 $0x02, \r0, \r4, \r3 # h3...a3
+ vperm2f128 $0x13, \t0, \t1, \r4 # h4...a4
+ vperm2f128 $0x02, \t0, \t1, \r0 # h0...a0
+
+.endm
+
+.macro ROTATE_ARGS
+TMP_ = h
+h = g
+g = f
+f = e
+e = d
+d = c
+c = b
+b = a
+a = TMP_
+.endm
+
+.macro _PRORD reg imm tmp
+ vpslld $(32-\imm),\reg,\tmp
+ vpsrld $\imm,\reg, \reg
+ vpor \tmp,\reg, \reg
+.endm
+
+# PRORD_nd reg, imm, tmp, src
+.macro _PRORD_nd reg imm tmp src
+ vpslld $(32-\imm), \src, \tmp
+ vpsrld $\imm, \src, \reg
+ vpor \tmp, \reg, \reg
+.endm
+
+# PRORD dst/src, amt
+.macro PRORD reg imm
+ _PRORD \reg,\imm,TMP
+.endm
+
+# PRORD_nd dst, src, amt
+.macro PRORD_nd reg tmp imm
+ _PRORD_nd \reg, \imm, TMP, \tmp
+.endm
+
+# arguments passed implicitly in preprocessor symbols i, a...h
+.macro ROUND_00_15 _T1 i
+ PRORD_nd a0,e,5 # sig1: a0 = (e >> 5)
+
+ vpxor g, f, a2 # ch: a2 = f^g
+ vpand e,a2, a2 # ch: a2 = (f^g)&e
+ vpxor g, a2, a2 # a2 = ch
+
+ PRORD_nd a1,e,25 # sig1: a1 = (e >> 25)
+
+ vmovdqu \_T1,(SZ8*(\i & 0xf))(%rsp)
+ vpaddd (TBL,ROUND,1), \_T1, \_T1 # T1 = W + K
+ vpxor e,a0, a0 # sig1: a0 = e ^ (e >> 5)
+ PRORD a0, 6 # sig1: a0 = (e >> 6) ^ (e >> 11)
+ vpaddd a2, h, h # h = h + ch
+ PRORD_nd a2,a,11 # sig0: a2 = (a >> 11)
+ vpaddd \_T1,h, h # h = h + ch + W + K
+ vpxor a1, a0, a0 # a0 = sigma1
+ PRORD_nd a1,a,22 # sig0: a1 = (a >> 22)
+ vpxor c, a, \_T1 # maj: T1 = a^c
+ add $SZ8, ROUND # ROUND++
+ vpand b, \_T1, \_T1 # maj: T1 = (a^c)&b
+ vpaddd a0, h, h
+ vpaddd h, d, d
+ vpxor a, a2, a2 # sig0: a2 = a ^ (a >> 11)
+ PRORD a2,2 # sig0: a2 = (a >> 2) ^ (a >> 13)
+ vpxor a1, a2, a2 # a2 = sig0
+ vpand c, a, a1 # maj: a1 = a&c
+ vpor \_T1, a1, a1 # a1 = maj
+ vpaddd a1, h, h # h = h + ch + W + K + maj
+ vpaddd a2, h, h # h = h + ch + W + K + maj + sigma0
+ ROTATE_ARGS
+.endm
+
+# arguments passed implicitly in preprocessor symbols i, a...h
+.macro ROUND_16_XX _T1 i
+ vmovdqu (SZ8*((\i-15)&0xf))(%rsp), \_T1
+ vmovdqu (SZ8*((\i-2)&0xf))(%rsp), a1
+ vmovdqu \_T1, a0
+ PRORD \_T1,11
+ vmovdqu a1, a2
+ PRORD a1,2
+ vpxor a0, \_T1, \_T1
+ PRORD \_T1, 7
+ vpxor a2, a1, a1
+ PRORD a1, 17
+ vpsrld $3, a0, a0
+ vpxor a0, \_T1, \_T1
+ vpsrld $10, a2, a2
+ vpxor a2, a1, a1
+ vpaddd (SZ8*((\i-16)&0xf))(%rsp), \_T1, \_T1
+ vpaddd (SZ8*((\i-7)&0xf))(%rsp), a1, a1
+ vpaddd a1, \_T1, \_T1
+
+ ROUND_00_15 \_T1,\i
+.endm
+
+# void sha256_x8_avx2(struct sha256_mbctx *ctx, int blocks);
+#
+# arg 1 : ctx : pointer to array of pointers to input data
+# arg 2 : blocks : size of input in blocks
+ # save rsp, allocate 32-byte aligned for local variables
+SYM_FUNC_START(sha256_x8_avx2)
+ # save callee-saved clobbered registers to comply with C function ABI
+ push %r12
+ push %r13
+ push %r14
+ push %r15
+
+ push %rbp
+ mov %rsp, %rbp
+
+ sub $FRAMESZ, %rsp
+ and $~0x1F, %rsp
+
+ # Load the pre-transposed incoming digest.
+ vmovdqu 0*SHA256_DIGEST_ROW_SIZE(STATE),a
+ vmovdqu 1*SHA256_DIGEST_ROW_SIZE(STATE),b
+ vmovdqu 2*SHA256_DIGEST_ROW_SIZE(STATE),c
+ vmovdqu 3*SHA256_DIGEST_ROW_SIZE(STATE),d
+ vmovdqu 4*SHA256_DIGEST_ROW_SIZE(STATE),e
+ vmovdqu 5*SHA256_DIGEST_ROW_SIZE(STATE),f
+ vmovdqu 6*SHA256_DIGEST_ROW_SIZE(STATE),g
+ vmovdqu 7*SHA256_DIGEST_ROW_SIZE(STATE),h
+
+ lea K256_8(%rip),TBL
+
+ # load the address of each of the 4 message lanes
+ # getting ready to transpose input onto stack
+ mov _args_data_ptr+0*PTR_SZ(STATE),inp0
+ mov _args_data_ptr+1*PTR_SZ(STATE),inp1
+ mov _args_data_ptr+2*PTR_SZ(STATE),inp2
+ mov _args_data_ptr+3*PTR_SZ(STATE),inp3
+ mov _args_data_ptr+4*PTR_SZ(STATE),inp4
+ mov _args_data_ptr+5*PTR_SZ(STATE),inp5
+ mov _args_data_ptr+6*PTR_SZ(STATE),inp6
+ mov _args_data_ptr+7*PTR_SZ(STATE),inp7
+
+ xor IDX, IDX
+lloop:
+ xor ROUND, ROUND
+
+ # save old digest
+ vmovdqu a, _digest(%rsp)
+ vmovdqu b, _digest+1*SZ8(%rsp)
+ vmovdqu c, _digest+2*SZ8(%rsp)
+ vmovdqu d, _digest+3*SZ8(%rsp)
+ vmovdqu e, _digest+4*SZ8(%rsp)
+ vmovdqu f, _digest+5*SZ8(%rsp)
+ vmovdqu g, _digest+6*SZ8(%rsp)
+ vmovdqu h, _digest+7*SZ8(%rsp)
+ i = 0
+.rep 2
+ VMOVPS i*32(inp0, IDX), TT0
+ VMOVPS i*32(inp1, IDX), TT1
+ VMOVPS i*32(inp2, IDX), TT2
+ VMOVPS i*32(inp3, IDX), TT3
+ VMOVPS i*32(inp4, IDX), TT4
+ VMOVPS i*32(inp5, IDX), TT5
+ VMOVPS i*32(inp6, IDX), TT6
+ VMOVPS i*32(inp7, IDX), TT7
+ vmovdqu g, _ytmp(%rsp)
+ vmovdqu h, _ytmp+1*SZ8(%rsp)
+ TRANSPOSE8 TT0, TT1, TT2, TT3, TT4, TT5, TT6, TT7, TMP0, TMP1
+ vmovdqu PSHUFFLE_BYTE_FLIP_MASK(%rip), TMP1
+ vmovdqu _ytmp(%rsp), g
+ vpshufb TMP1, TT0, TT0
+ vpshufb TMP1, TT1, TT1
+ vpshufb TMP1, TT2, TT2
+ vpshufb TMP1, TT3, TT3
+ vpshufb TMP1, TT4, TT4
+ vpshufb TMP1, TT5, TT5
+ vpshufb TMP1, TT6, TT6
+ vpshufb TMP1, TT7, TT7
+ vmovdqu _ytmp+1*SZ8(%rsp), h
+ vmovdqu TT4, _ytmp(%rsp)
+ vmovdqu TT5, _ytmp+1*SZ8(%rsp)
+ vmovdqu TT6, _ytmp+2*SZ8(%rsp)
+ vmovdqu TT7, _ytmp+3*SZ8(%rsp)
+ ROUND_00_15 TT0,(i*8+0)
+ vmovdqu _ytmp(%rsp), TT0
+ ROUND_00_15 TT1,(i*8+1)
+ vmovdqu _ytmp+1*SZ8(%rsp), TT1
+ ROUND_00_15 TT2,(i*8+2)
+ vmovdqu _ytmp+2*SZ8(%rsp), TT2
+ ROUND_00_15 TT3,(i*8+3)
+ vmovdqu _ytmp+3*SZ8(%rsp), TT3
+ ROUND_00_15 TT0,(i*8+4)
+ ROUND_00_15 TT1,(i*8+5)
+ ROUND_00_15 TT2,(i*8+6)
+ ROUND_00_15 TT3,(i*8+7)
+ i = (i+1)
+.endr
+ add $64, IDX
+ i = (i*8)
+
+ jmp Lrounds_16_xx
+.align 16
+Lrounds_16_xx:
+.rep 16
+ ROUND_16_XX T1, i
+ i = (i+1)
+.endr
+
+ cmp $ROUNDS,ROUND
+ jb Lrounds_16_xx
+
+ # add old digest
+ vpaddd _digest+0*SZ8(%rsp), a, a
+ vpaddd _digest+1*SZ8(%rsp), b, b
+ vpaddd _digest+2*SZ8(%rsp), c, c
+ vpaddd _digest+3*SZ8(%rsp), d, d
+ vpaddd _digest+4*SZ8(%rsp), e, e
+ vpaddd _digest+5*SZ8(%rsp), f, f
+ vpaddd _digest+6*SZ8(%rsp), g, g
+ vpaddd _digest+7*SZ8(%rsp), h, h
+
+ sub $1, INP_SIZE # unit is blocks
+ jne lloop
+
+ # write back to memory (state object) the transposed digest
+ vmovdqu a, 0*SHA256_DIGEST_ROW_SIZE(STATE)
+ vmovdqu b, 1*SHA256_DIGEST_ROW_SIZE(STATE)
+ vmovdqu c, 2*SHA256_DIGEST_ROW_SIZE(STATE)
+ vmovdqu d, 3*SHA256_DIGEST_ROW_SIZE(STATE)
+ vmovdqu e, 4*SHA256_DIGEST_ROW_SIZE(STATE)
+ vmovdqu f, 5*SHA256_DIGEST_ROW_SIZE(STATE)
+ vmovdqu g, 6*SHA256_DIGEST_ROW_SIZE(STATE)
+ vmovdqu h, 7*SHA256_DIGEST_ROW_SIZE(STATE)
+
+ # update input pointers
+ add IDX, inp0
+ mov inp0, _args_data_ptr+0*8(STATE)
+ add IDX, inp1
+ mov inp1, _args_data_ptr+1*8(STATE)
+ add IDX, inp2
+ mov inp2, _args_data_ptr+2*8(STATE)
+ add IDX, inp3
+ mov inp3, _args_data_ptr+3*8(STATE)
+ add IDX, inp4
+ mov inp4, _args_data_ptr+4*8(STATE)
+ add IDX, inp5
+ mov inp5, _args_data_ptr+5*8(STATE)
+ add IDX, inp6
+ mov inp6, _args_data_ptr+6*8(STATE)
+ add IDX, inp7
+ mov inp7, _args_data_ptr+7*8(STATE)
+
+ # Postamble
+ mov %rbp, %rsp
+ pop %rbp
+
+ # restore callee-saved clobbered registers
+ pop %r15
+ pop %r14
+ pop %r13
+ pop %r12
+
+ RET
+SYM_FUNC_END(sha256_x8_avx2)
+
+.section .rodata.K256_8, "a", @progbits
+.align 64
+K256_8:
+ .octa 0x428a2f98428a2f98428a2f98428a2f98
+ .octa 0x428a2f98428a2f98428a2f98428a2f98
+ .octa 0x71374491713744917137449171374491
+ .octa 0x71374491713744917137449171374491
+ .octa 0xb5c0fbcfb5c0fbcfb5c0fbcfb5c0fbcf
+ .octa 0xb5c0fbcfb5c0fbcfb5c0fbcfb5c0fbcf
+ .octa 0xe9b5dba5e9b5dba5e9b5dba5e9b5dba5
+ .octa 0xe9b5dba5e9b5dba5e9b5dba5e9b5dba5
+ .octa 0x3956c25b3956c25b3956c25b3956c25b
+ .octa 0x3956c25b3956c25b3956c25b3956c25b
+ .octa 0x59f111f159f111f159f111f159f111f1
+ .octa 0x59f111f159f111f159f111f159f111f1
+ .octa 0x923f82a4923f82a4923f82a4923f82a4
+ .octa 0x923f82a4923f82a4923f82a4923f82a4
+ .octa 0xab1c5ed5ab1c5ed5ab1c5ed5ab1c5ed5
+ .octa 0xab1c5ed5ab1c5ed5ab1c5ed5ab1c5ed5
+ .octa 0xd807aa98d807aa98d807aa98d807aa98
+ .octa 0xd807aa98d807aa98d807aa98d807aa98
+ .octa 0x12835b0112835b0112835b0112835b01
+ .octa 0x12835b0112835b0112835b0112835b01
+ .octa 0x243185be243185be243185be243185be
+ .octa 0x243185be243185be243185be243185be
+ .octa 0x550c7dc3550c7dc3550c7dc3550c7dc3
+ .octa 0x550c7dc3550c7dc3550c7dc3550c7dc3
+ .octa 0x72be5d7472be5d7472be5d7472be5d74
+ .octa 0x72be5d7472be5d7472be5d7472be5d74
+ .octa 0x80deb1fe80deb1fe80deb1fe80deb1fe
+ .octa 0x80deb1fe80deb1fe80deb1fe80deb1fe
+ .octa 0x9bdc06a79bdc06a79bdc06a79bdc06a7
+ .octa 0x9bdc06a79bdc06a79bdc06a79bdc06a7
+ .octa 0xc19bf174c19bf174c19bf174c19bf174
+ .octa 0xc19bf174c19bf174c19bf174c19bf174
+ .octa 0xe49b69c1e49b69c1e49b69c1e49b69c1
+ .octa 0xe49b69c1e49b69c1e49b69c1e49b69c1
+ .octa 0xefbe4786efbe4786efbe4786efbe4786
+ .octa 0xefbe4786efbe4786efbe4786efbe4786
+ .octa 0x0fc19dc60fc19dc60fc19dc60fc19dc6
+ .octa 0x0fc19dc60fc19dc60fc19dc60fc19dc6
+ .octa 0x240ca1cc240ca1cc240ca1cc240ca1cc
+ .octa 0x240ca1cc240ca1cc240ca1cc240ca1cc
+ .octa 0x2de92c6f2de92c6f2de92c6f2de92c6f
+ .octa 0x2de92c6f2de92c6f2de92c6f2de92c6f
+ .octa 0x4a7484aa4a7484aa4a7484aa4a7484aa
+ .octa 0x4a7484aa4a7484aa4a7484aa4a7484aa
+ .octa 0x5cb0a9dc5cb0a9dc5cb0a9dc5cb0a9dc
+ .octa 0x5cb0a9dc5cb0a9dc5cb0a9dc5cb0a9dc
+ .octa 0x76f988da76f988da76f988da76f988da
+ .octa 0x76f988da76f988da76f988da76f988da
+ .octa 0x983e5152983e5152983e5152983e5152
+ .octa 0x983e5152983e5152983e5152983e5152
+ .octa 0xa831c66da831c66da831c66da831c66d
+ .octa 0xa831c66da831c66da831c66da831c66d
+ .octa 0xb00327c8b00327c8b00327c8b00327c8
+ .octa 0xb00327c8b00327c8b00327c8b00327c8
+ .octa 0xbf597fc7bf597fc7bf597fc7bf597fc7
+ .octa 0xbf597fc7bf597fc7bf597fc7bf597fc7
+ .octa 0xc6e00bf3c6e00bf3c6e00bf3c6e00bf3
+ .octa 0xc6e00bf3c6e00bf3c6e00bf3c6e00bf3
+ .octa 0xd5a79147d5a79147d5a79147d5a79147
+ .octa 0xd5a79147d5a79147d5a79147d5a79147
+ .octa 0x06ca635106ca635106ca635106ca6351
+ .octa 0x06ca635106ca635106ca635106ca6351
+ .octa 0x14292967142929671429296714292967
+ .octa 0x14292967142929671429296714292967
+ .octa 0x27b70a8527b70a8527b70a8527b70a85
+ .octa 0x27b70a8527b70a8527b70a8527b70a85
+ .octa 0x2e1b21382e1b21382e1b21382e1b2138
+ .octa 0x2e1b21382e1b21382e1b21382e1b2138
+ .octa 0x4d2c6dfc4d2c6dfc4d2c6dfc4d2c6dfc
+ .octa 0x4d2c6dfc4d2c6dfc4d2c6dfc4d2c6dfc
+ .octa 0x53380d1353380d1353380d1353380d13
+ .octa 0x53380d1353380d1353380d1353380d13
+ .octa 0x650a7354650a7354650a7354650a7354
+ .octa 0x650a7354650a7354650a7354650a7354
+ .octa 0x766a0abb766a0abb766a0abb766a0abb
+ .octa 0x766a0abb766a0abb766a0abb766a0abb
+ .octa 0x81c2c92e81c2c92e81c2c92e81c2c92e
+ .octa 0x81c2c92e81c2c92e81c2c92e81c2c92e
+ .octa 0x92722c8592722c8592722c8592722c85
+ .octa 0x92722c8592722c8592722c8592722c85
+ .octa 0xa2bfe8a1a2bfe8a1a2bfe8a1a2bfe8a1
+ .octa 0xa2bfe8a1a2bfe8a1a2bfe8a1a2bfe8a1
+ .octa 0xa81a664ba81a664ba81a664ba81a664b
+ .octa 0xa81a664ba81a664ba81a664ba81a664b
+ .octa 0xc24b8b70c24b8b70c24b8b70c24b8b70
+ .octa 0xc24b8b70c24b8b70c24b8b70c24b8b70
+ .octa 0xc76c51a3c76c51a3c76c51a3c76c51a3
+ .octa 0xc76c51a3c76c51a3c76c51a3c76c51a3
+ .octa 0xd192e819d192e819d192e819d192e819
+ .octa 0xd192e819d192e819d192e819d192e819
+ .octa 0xd6990624d6990624d6990624d6990624
+ .octa 0xd6990624d6990624d6990624d6990624
+ .octa 0xf40e3585f40e3585f40e3585f40e3585
+ .octa 0xf40e3585f40e3585f40e3585f40e3585
+ .octa 0x106aa070106aa070106aa070106aa070
+ .octa 0x106aa070106aa070106aa070106aa070
+ .octa 0x19a4c11619a4c11619a4c11619a4c116
+ .octa 0x19a4c11619a4c11619a4c11619a4c116
+ .octa 0x1e376c081e376c081e376c081e376c08
+ .octa 0x1e376c081e376c081e376c081e376c08
+ .octa 0x2748774c2748774c2748774c2748774c
+ .octa 0x2748774c2748774c2748774c2748774c
+ .octa 0x34b0bcb534b0bcb534b0bcb534b0bcb5
+ .octa 0x34b0bcb534b0bcb534b0bcb534b0bcb5
+ .octa 0x391c0cb3391c0cb3391c0cb3391c0cb3
+ .octa 0x391c0cb3391c0cb3391c0cb3391c0cb3
+ .octa 0x4ed8aa4a4ed8aa4a4ed8aa4a4ed8aa4a
+ .octa 0x4ed8aa4a4ed8aa4a4ed8aa4a4ed8aa4a
+ .octa 0x5b9cca4f5b9cca4f5b9cca4f5b9cca4f
+ .octa 0x5b9cca4f5b9cca4f5b9cca4f5b9cca4f
+ .octa 0x682e6ff3682e6ff3682e6ff3682e6ff3
+ .octa 0x682e6ff3682e6ff3682e6ff3682e6ff3
+ .octa 0x748f82ee748f82ee748f82ee748f82ee
+ .octa 0x748f82ee748f82ee748f82ee748f82ee
+ .octa 0x78a5636f78a5636f78a5636f78a5636f
+ .octa 0x78a5636f78a5636f78a5636f78a5636f
+ .octa 0x84c8781484c8781484c8781484c87814
+ .octa 0x84c8781484c8781484c8781484c87814
+ .octa 0x8cc702088cc702088cc702088cc70208
+ .octa 0x8cc702088cc702088cc702088cc70208
+ .octa 0x90befffa90befffa90befffa90befffa
+ .octa 0x90befffa90befffa90befffa90befffa
+ .octa 0xa4506ceba4506ceba4506ceba4506ceb
+ .octa 0xa4506ceba4506ceba4506ceba4506ceb
+ .octa 0xbef9a3f7bef9a3f7bef9a3f7bef9a3f7
+ .octa 0xbef9a3f7bef9a3f7bef9a3f7bef9a3f7
+ .octa 0xc67178f2c67178f2c67178f2c67178f2
+ .octa 0xc67178f2c67178f2c67178f2c67178f2
+
+.section .rodata.cst32.PSHUFFLE_BYTE_FLIP_MASK, "aM", @progbits, 32
+.align 32
+PSHUFFLE_BYTE_FLIP_MASK:
+.octa 0x0c0d0e0f08090a0b0405060700010203
+.octa 0x0c0d0e0f08090a0b0405060700010203
+
+.section .rodata.cst256.K256, "aM", @progbits, 256
+.align 64
+.global K256
+K256:
+ .int 0x428a2f98,0x71374491,0xb5c0fbcf,0xe9b5dba5
+ .int 0x3956c25b,0x59f111f1,0x923f82a4,0xab1c5ed5
+ .int 0xd807aa98,0x12835b01,0x243185be,0x550c7dc3
+ .int 0x72be5d74,0x80deb1fe,0x9bdc06a7,0xc19bf174
+ .int 0xe49b69c1,0xefbe4786,0x0fc19dc6,0x240ca1cc
+ .int 0x2de92c6f,0x4a7484aa,0x5cb0a9dc,0x76f988da
+ .int 0x983e5152,0xa831c66d,0xb00327c8,0xbf597fc7
+ .int 0xc6e00bf3,0xd5a79147,0x06ca6351,0x14292967
+ .int 0x27b70a85,0x2e1b2138,0x4d2c6dfc,0x53380d13
+ .int 0x650a7354,0x766a0abb,0x81c2c92e,0x92722c85
+ .int 0xa2bfe8a1,0xa81a664b,0xc24b8b70,0xc76c51a3
+ .int 0xd192e819,0xd6990624,0xf40e3585,0x106aa070
+ .int 0x19a4c116,0x1e376c08,0x2748774c,0x34b0bcb5
+ .int 0x391c0cb3,0x4ed8aa4a,0x5b9cca4f,0x682e6ff3
+ .int 0x748f82ee,0x78a5636f,0x84c87814,0x8cc70208
+ .int 0x90befffa,0xa4506ceb,0xbef9a3f7,0xc67178f2
--
2.39.5
^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: [PATCH 0/6] Multibuffer hashing take two
2024-10-27 9:45 [PATCH 0/6] Multibuffer hashing take two Herbert Xu
` (5 preceding siblings ...)
2024-10-27 9:45 ` [PATCH 6/6] crypto: x86/sha2 - Restore multibuffer AVX2 support Herbert Xu
@ 2024-10-28 19:00 ` Eric Biggers
2024-10-29 4:33 ` Herbert Xu
6 siblings, 1 reply; 10+ messages in thread
From: Eric Biggers @ 2024-10-28 19:00 UTC (permalink / raw)
To: Herbert Xu; +Cc: Linux Crypto Mailing List, Ard Biesheuvel, Megha Dey, Tim Chen
Hi Herbert,
On Sun, Oct 27, 2024 at 05:45:28PM +0800, Herbert Xu wrote:
> Multibuffer hashing was a constant sore while it was part of the
> kernel. It was very buggy and unnecessarily complex. Finally
> it was removed when it had been broken for a while without anyone
> noticing.
>
> Peace reigned in its absence, until Eric Biggers made a proposal
> for its comeback :)
>
> Link: https://lore.kernel.org/all/20240415213719.120673-1-ebiggers@kernel.org/
>
> The issue is that the SHA algorithm (and possibly others) is
> inherently not parallelisable. Therefore the only way to exploit
> parallelism on modern CPUs is to hash multiple indendent streams
> of data.
>
> Eric's proposal is a simple interface bolted onto shash that takes
> two streams of data of identical length. I thought the limitation
> of two was too small, and Eric addressed that in his latest version:
>
> Link: https://lore.kernel.org/all/20241001153718.111665-2-ebiggers@kernel.org/
>
> However, I still disliked the addition of this to shash as it meant
> that users would have to spend extra effort in order to accumulate
> and maintain multiple streams of data.
>
> My preference is to use ahash as the basis of multibuffer, because
> its request object interface is perfectly suited to chaining.
As I've explained before, I think your proposed approach is much too complex,
inefficient, and broken compared to my much simpler patchset
https://lore.kernel.org/linux-crypto/20241001153718.111665-1-ebiggers@kernel.org/.
If this is really going to be the API for multibuffer hashing, then I'm not very
interested in using or contributing to it.
The much larger positive diffstat in your patchset alone should speak for
itself, especially when it doesn't include essential pieces that were included
in my smaller patchset, such as self-tests, dm-verity and fs-verity support,
SHA-NI and ARMv8 CE support. Note that due to the complexity of your API, it
would require far more updates to the tests in order to cover all the new edge
cases. This patchset also removes the shash support for sha256-avx2, which is
not realistic, as there are still ~100 users of shash in the kernel.
You say that your API is needed so that users don't need to "spend extra effort
in order to accumulate and maintain multiple streams of data." That's
incorrect, though. The users, e.g. {dm,fs}-verity, will need to do that anyway
even with your API. I think this would have been clear if you had tried to
update them to use your API.
With this patchset I am also seeing random crashes in the x86 sha256 glue code,
and all multibuffer SHA256 hashes come back as all-zeroes. Bugs like this were
predictable, of course. There's a high amount of complexity inherent in the
ahash request chaining approach, both at the API layer and in the per-algorithm
glue code. It will be hard to get everything right. And I am just submitting 8
equal-length requests, so I haven't even tried any of the edge cases that your
proposed API allows, like submitting requests that aren't synced up properly.
I don't think it's worth the time for me to try to debug and fix your code and
add the missing pieces, when we could just choose a much simpler design that
would result in far fewer bugs. Especially for cryptographic code, choosing
sound designs that minimize the number of bugs should be the highest priority.
I understand that you're trying to contribute something useful, and perhaps
solve a wider set of problems than I set out to solve. The reality, though, is
that this patchset is creating more problems than it's solving. Compared to my
patchset, it makes things harder, more error-prone, and less efficient for the
users who actually want the multibuffer hashing, and likewise for wiring it up
to the low-level algorithms. It also doesn't bring us meaningfully closer to
applying multibuffer crypto in other applications like IPsec where it will be
very difficult to apply, is irrelevant for the most common algorithm used, and
would at best provide less benefit than other easier-to-implement optimizations.
The virtual address to support in ahash is a somewhat nice addition, but it
could go in independently, and ultimately it seems not that useful, given that
users could just use shash or lib/crypto/ instead. shash will still have less
overhead, and lib/crypto/ even less, despite ahash getting slightly better.
Also, users who actually care about old-school style crypto accelerators need
pages + async processing anyway for optimal performance, especially given this
patchset's proposed approach of handling virtual addresses using a bounce page.
If you're really interested in the AVX2 multibuffer SHA256 for some reason, I'd
be willing to clean up that assembly code and wire it up to the much simpler API
that I proposed. Despite the favorable microbenchmark result, this would be of
limited use, for various reasons that I've explained before. But it could be
done if desired, and it would be much simpler than what you have.
- Eric
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 0/6] Multibuffer hashing take two
2024-10-28 19:00 ` [PATCH 0/6] Multibuffer hashing take two Eric Biggers
@ 2024-10-29 4:33 ` Herbert Xu
0 siblings, 0 replies; 10+ messages in thread
From: Herbert Xu @ 2024-10-29 4:33 UTC (permalink / raw)
To: Eric Biggers
Cc: Linux Crypto Mailing List, Ard Biesheuvel, Megha Dey, Tim Chen
On Mon, Oct 28, 2024 at 12:00:45PM -0700, Eric Biggers wrote:
>
> You say that your API is needed so that users don't need to "spend extra effort
> in order to accumulate and maintain multiple streams of data." That's
> incorrect, though. The users, e.g. {dm,fs}-verity, will need to do that anyway
> even with your API. I think this would have been clear if you had tried to
> update them to use your API.
It's a lot easier once you switch them back to dynamically allocated
requests instead of having them on the stack. Storing the hash state
on the stack of course limits your ability to aggregate the hash
operations since each one is hundreds-of-bytes long.
We could also introduce SYNC_AHASH_REQ_ON_STACK like we do for
skcipher but I think we should move away from that for the cases
where aggregation makes sense.
Note that when I say switching back to ahash, I'm not talking about
switching back to an asynchronous API. If you're happy with the
synchronous offerings then you're totally free to use ahash with
synchronous-only implementations, just like skcipher.
> With this patchset I am also seeing random crashes in the x86 sha256 glue code,
> and all multibuffer SHA256 hashes come back as all-zeroes. Bugs like this were
Guilty as charged. I haven't tested this at all apart from timing
the speed.
However, adding a proper test is trivial. We already have the
ahash_requests ready to go so they just have to be chained together
and submitted en-masse.
> If you're really interested in the AVX2 multibuffer SHA256 for some reason, I'd
> be willing to clean up that assembly code and wire it up to the much simpler API
> that I proposed. Despite the favorable microbenchmark result, this would be of
> limited use, for various reasons that I've explained before. But it could be
> done if desired, and it would be much simpler than what you have.
No I have zero interest in AVX2. I simply picked that because
it was already in the kernel git history and I wasn't certain
whether my CPU is recent enough to see much of a benefit from
your SHA-NI code.
Cheers,
--
Email: Herbert Xu <herbert@gondor.apana.org.au>
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [PATCH 4/6] crypto: ahash - Add virtual address support
2024-10-27 9:45 ` [PATCH 4/6] crypto: ahash - Add virtual address support Herbert Xu
@ 2024-11-05 2:08 ` kernel test robot
0 siblings, 0 replies; 10+ messages in thread
From: kernel test robot @ 2024-11-05 2:08 UTC (permalink / raw)
To: Herbert Xu
Cc: oe-lkp, lkp, linux-crypto, Eric Biggers, Ard Biesheuvel,
Megha Dey, Tim Chen, oliver.sang
Hello,
kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_crypto/internal.h" on:
commit: 1e1d7cc33bd78b992ab90e28b02d7a2feef96538 ("[PATCH 4/6] crypto: ahash - Add virtual address support")
url: https://github.com/intel-lab-lkp/linux/commits/Herbert-Xu/crypto-ahash-Only-save-callback-and-data-in-ahash_save_req/20241027-174811
base: https://git.kernel.org/cgit/linux/kernel/git/herbert/cryptodev-2.6.git master
patch link: https://lore.kernel.org/all/bffef4bab1bf250bd64a3d02de53eb1fd047a96e.1730021644.git.herbert@gondor.apana.org.au/
patch subject: [PATCH 4/6] crypto: ahash - Add virtual address support
in testcase: kernel-selftests-bpf
version:
with following parameters:
group: net/netfilter
test: nft_flowtable.sh
config: x86_64-rhel-8.3-bpf
compiler: gcc-12
test machine: 36 threads 1 sockets Intel(R) Core(TM) i9-10980XE CPU @ 3.00GHz (Cascade Lake) with 32G memory
(please refer to attached dmesg/kmsg for entire log/backtrace)
If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202411050927.673246d5-lkp@intel.com
kern :err : [ 43.366785] BUG: sleeping function called from invalid context at crypto/internal.h:189
kern :err : [ 43.367576] in_atomic(): 1, irqs_disabled(): 0, non_block: 0, pid: 2600, name: socat
kern :err : [ 43.368344] preempt_count: 101, expected: 0
kern :err : [ 43.368804] RCU nest depth: 1, expected: 0
kern :warn : [ 43.369258] CPU: 6 UID: 0 PID: 2600 Comm: socat Tainted: G S 6.12.0-rc1-00092-g1e1d7cc33bd7 #1
kern :warn : [ 43.370202] Tainted: [S]=CPU_OUT_OF_SPEC
kern :warn : [ 43.370639] Hardware name: Gigabyte Technology Co., Ltd. X299 UD4 Pro/X299 UD4 Pro-CF, BIOS F8a 04/27/2021
kern :warn : [ 43.371552] Call Trace:
kern :warn : [ 43.371871] <IRQ>
kern :warn : [ 43.372155] dump_stack_lvl (lib/dump_stack.c:123)
kern :warn : [ 43.372574] __might_resched (kernel/sched/core.c:8632)
kern :warn : [ 43.373014] crypto_hash_walk_done (include/linux/sched.h:2031 crypto/internal.h:189 crypto/ahash.c:201)
kern :warn : [ 43.373483] shash_ahash_finup (crypto/ahash.c:96)
kern :warn : [ 43.373929] crypto_ahash_digest (crypto/ahash.c:747)
kern :warn : [ 43.374390] crypto_authenc_genicv (crypto/authenc.c:151) authenc
kern :warn : [ 43.374929] esp_output_tail (net/ipv4/esp4.c:627) esp4
kern :warn : [ 43.375417] esp_output (net/ipv4/esp4.c:701) esp4
kern :warn : [ 43.375869] xfrm_output_one (net/xfrm/xfrm_output.c:553)
kern :warn : [ 43.376310] xfrm_output_resume (net/xfrm/xfrm_output.c:588)
kern :warn : [ 43.376763] ? __pfx_csum_partial_ext (include/net/checksum.h:120)
kern :warn : [ 43.377252] ? __pfx_csum_block_add_ext (net/core/skbuff.c:103)
kern :warn : [ 43.377754] ? skb_checksum_help (net/core/dev.c:3346)
kern :warn : [ 43.378215] __netif_receive_skb_one_core (net/core/dev.c:5662 (discriminator 4))
kern :warn : [ 43.378732] process_backlog (include/linux/rcupdate.h:882 net/core/dev.c:6108)
kern :warn : [ 43.379163] __napi_poll+0x28/0x1c0
kern :warn : [ 43.379652] net_rx_action (net/core/dev.c:6842 net/core/dev.c:6962)
kern :warn : [ 43.380078] handle_softirqs (kernel/softirq.c:554)
kern :warn : [ 43.380518] do_softirq (kernel/softirq.c:455 kernel/softirq.c:442)
kern :warn : [ 43.380907] </IRQ>
kern :warn : [ 43.381196] <TASK>
kern :warn : [ 43.381485] __local_bh_enable_ip (kernel/softirq.c:382)
kern :warn : [ 43.381945] tcp_sendmsg (net/ipv4/tcp.c:1361)
kern :warn : [ 43.382342] sock_write_iter (net/socket.c:729 net/socket.c:744 net/socket.c:1165)
kern :warn : [ 43.382781] ? sock_recvmsg (net/socket.c:1051 net/socket.c:1073)
kern :warn : [ 43.383199] vfs_write (fs/read_write.c:590 fs/read_write.c:683)
kern :warn : [ 43.383598] ksys_write (include/linux/file.h:83 fs/read_write.c:739)
kern :warn : [ 43.383987] do_syscall_64 (arch/x86/entry/common.c:52 arch/x86/entry/common.c:83)
kern :warn : [ 43.384406] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:130)
kern :warn : [ 43.384937] RIP: 0033:0x7f39ddcef240
kern :warn : [ 43.385348] Code: 40 00 48 8b 15 c1 9b 0d 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b7 0f 1f 00 80 3d a1 23 0e 00 00 74 17 b8 01 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 58 c3 0f 1f 80 00 00 00 00 48 83 ec 28 48 89
All code
========
0: 40 00 48 8b rex add %cl,-0x75(%rax)
4: 15 c1 9b 0d 00 adc $0xd9bc1,%eax
9: f7 d8 neg %eax
b: 64 89 02 mov %eax,%fs:(%rdx)
e: 48 c7 c0 ff ff ff ff mov $0xffffffffffffffff,%rax
15: eb b7 jmp 0xffffffffffffffce
17: 0f 1f 00 nopl (%rax)
1a: 80 3d a1 23 0e 00 00 cmpb $0x0,0xe23a1(%rip) # 0xe23c2
21: 74 17 je 0x3a
23: b8 01 00 00 00 mov $0x1,%eax
28: 0f 05 syscall
2a:* 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax <-- trapping instruction
30: 77 58 ja 0x8a
32: c3 ret
33: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
3a: 48 83 ec 28 sub $0x28,%rsp
3e: 48 rex.W
3f: 89 .byte 0x89
Code starting with the faulting instruction
===========================================
0: 48 3d 00 f0 ff ff cmp $0xfffffffffffff000,%rax
6: 77 58 ja 0x60
8: c3 ret
9: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
10: 48 83 ec 28 sub $0x28,%rsp
14: 48 rex.W
15: 89 .byte 0x89
kern :warn : [ 43.387012] RSP: 002b:00007fff1b6b3288 EFLAGS: 00000202 ORIG_RAX: 0000000000000001
kern :warn : [ 43.387756] RAX: ffffffffffffffda RBX: 0000000000000007 RCX: 00007f39ddcef240
kern :warn : [ 43.388465] RDX: 0000000000002000 RSI: 000055ba398a7000 RDI: 0000000000000007
kern :warn : [ 43.389173] RBP: 000055ba398a7000 R08: 0000000000002000 R09: 0000000000000000
kern :warn : [ 43.389881] R10: 00007f39ddc104f0 R11: 0000000000000202 R12: 0000000000002000
kern :warn : [ 43.390590] R13: 0000000000000007 R14: 0000000000002000 R15: 000055ba398a7000
kern :warn : [ 43.391300] </TASK>
The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20241105/202411050927.673246d5-lkp@intel.com
--
0-DAY CI Kernel Test Service
https://github.com/intel/lkp-tests/wiki
^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2024-11-05 2:08 UTC | newest]
Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-27 9:45 [PATCH 0/6] Multibuffer hashing take two Herbert Xu
2024-10-27 9:45 ` [PATCH 1/6] crypto: ahash - Only save callback and data in ahash_save_req Herbert Xu
2024-10-27 9:45 ` [PATCH 2/6] crypto: hash - Add request chaining API Herbert Xu
2024-10-27 9:45 ` [PATCH 3/6] crypto: tcrypt - Restore multibuffer ahash tests Herbert Xu
2024-10-27 9:45 ` [PATCH 4/6] crypto: ahash - Add virtual address support Herbert Xu
2024-11-05 2:08 ` kernel test robot
2024-10-27 9:45 ` [PATCH 5/6] crypto: ahash - Set default reqsize from ahash_alg Herbert Xu
2024-10-27 9:45 ` [PATCH 6/6] crypto: x86/sha2 - Restore multibuffer AVX2 support Herbert Xu
2024-10-28 19:00 ` [PATCH 0/6] Multibuffer hashing take two Eric Biggers
2024-10-29 4:33 ` Herbert Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).