* [PATCH net-next v2] net/tls: Add support for async decryption of tls records
@ 2018-08-29 9:56 Vakul Garg
2018-09-01 1:00 ` David Miller
2018-09-14 19:39 ` John Fastabend
0 siblings, 2 replies; 6+ messages in thread
From: Vakul Garg @ 2018-08-29 9:56 UTC (permalink / raw)
To: netdev; +Cc: borisp, aviadye, davejwatson, davem, Vakul Garg
When tls records are decrypted using asynchronous acclerators such as
NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
getting -EINPROGRESS, the tls record processing stops till the time the
crypto accelerator finishes off and returns the result. This incurs a
context switch and is not an efficient way of accessing the crypto
accelerators. Crypto accelerators work efficient when they are queued
with multiple crypto jobs without having to wait for the previous ones
to complete.
The patch submits multiple crypto requests without having to wait for
for previous ones to complete. This has been implemented for records
which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
for all the asynchronous decryption requests to complete.
The references to records which have been sent for async decryption are
dropped. For cases where record decryption is not possible in zero-copy
mode, asynchronous decryption is not used and we wait for decryption
crypto api to complete.
For crypto requests executing in async fashion, the memory for
aead_request, sglists and skb etc is freed from the decryption
completion handler. The decryption completion handler wakesup the
sleeping user context when recvmsg() flags that it has done sending
all the decryption requests and there are no more decryption requests
pending to be completed.
Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
Reviewed-by: Dave Watson <davejwatson@fb.com>
---
Changes since v1:
- Simplified recvmsg() so to drop reference to skb in case it
was submimtted for async decryption.
- Modified tls_sw_advance_skb() to handle case when input skb is
NULL.
include/net/tls.h | 6 +++
net/tls/tls_sw.c | 134 ++++++++++++++++++++++++++++++++++++++++++++++++------
2 files changed, 127 insertions(+), 13 deletions(-)
diff --git a/include/net/tls.h b/include/net/tls.h
index d5c683e8bb22..cd0a65bd92f9 100644
--- a/include/net/tls.h
+++ b/include/net/tls.h
@@ -124,6 +124,12 @@ struct tls_sw_context_rx {
struct sk_buff *recv_pkt;
u8 control;
bool decrypted;
+ atomic_t decrypt_pending;
+ bool async_notify;
+};
+
+struct decrypt_req_ctx {
+ struct sock *sk;
};
struct tls_record_info {
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 52fbe727d7c1..9503e5a4c27e 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -43,12 +43,50 @@
#define MAX_IV_SIZE TLS_CIPHER_AES_GCM_128_IV_SIZE
+static void tls_decrypt_done(struct crypto_async_request *req, int err)
+{
+ struct aead_request *aead_req = (struct aead_request *)req;
+ struct decrypt_req_ctx *req_ctx =
+ (struct decrypt_req_ctx *)(aead_req + 1);
+
+ struct scatterlist *sgout = aead_req->dst;
+
+ struct tls_context *tls_ctx = tls_get_ctx(req_ctx->sk);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+ int pending = atomic_dec_return(&ctx->decrypt_pending);
+ struct scatterlist *sg;
+ unsigned int pages;
+
+ /* Propagate if there was an err */
+ if (err) {
+ ctx->async_wait.err = err;
+ tls_err_abort(req_ctx->sk, err);
+ }
+
+ /* Release the skb, pages and memory allocated for crypto req */
+ kfree_skb(req->data);
+
+ /* Skip the first S/G entry as it points to AAD */
+ for_each_sg(sg_next(sgout), sg, UINT_MAX, pages) {
+ if (!sg)
+ break;
+ put_page(sg_page(sg));
+ }
+
+ kfree(aead_req);
+
+ if (!pending && READ_ONCE(ctx->async_notify))
+ complete(&ctx->async_wait.completion);
+}
+
static int tls_do_decryption(struct sock *sk,
+ struct sk_buff *skb,
struct scatterlist *sgin,
struct scatterlist *sgout,
char *iv_recv,
size_t data_len,
- struct aead_request *aead_req)
+ struct aead_request *aead_req,
+ bool async)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
@@ -59,10 +97,34 @@ static int tls_do_decryption(struct sock *sk,
aead_request_set_crypt(aead_req, sgin, sgout,
data_len + tls_ctx->rx.tag_size,
(u8 *)iv_recv);
- aead_request_set_callback(aead_req, CRYPTO_TFM_REQ_MAY_BACKLOG,
- crypto_req_done, &ctx->async_wait);
- ret = crypto_wait_req(crypto_aead_decrypt(aead_req), &ctx->async_wait);
+ if (async) {
+ struct decrypt_req_ctx *req_ctx;
+
+ req_ctx = (struct decrypt_req_ctx *)(aead_req + 1);
+ req_ctx->sk = sk;
+
+ aead_request_set_callback(aead_req,
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ tls_decrypt_done, skb);
+ atomic_inc(&ctx->decrypt_pending);
+ } else {
+ aead_request_set_callback(aead_req,
+ CRYPTO_TFM_REQ_MAY_BACKLOG,
+ crypto_req_done, &ctx->async_wait);
+ }
+
+ ret = crypto_aead_decrypt(aead_req);
+ if (ret == -EINPROGRESS) {
+ if (async)
+ return ret;
+
+ ret = crypto_wait_req(ret, &ctx->async_wait);
+ }
+
+ if (async)
+ atomic_dec(&ctx->decrypt_pending);
+
return ret;
}
@@ -763,7 +825,10 @@ static int decrypt_internal(struct sock *sk, struct sk_buff *skb,
}
/* Prepare and submit AEAD request */
- err = tls_do_decryption(sk, sgin, sgout, iv, data_len, aead_req);
+ err = tls_do_decryption(sk, skb, sgin, sgout, iv,
+ data_len, aead_req, *zc);
+ if (err == -EINPROGRESS)
+ return err;
/* Release the pages in case iov was mapped to pages */
for (; pages > 0; pages--)
@@ -788,8 +853,12 @@ static int decrypt_skb_update(struct sock *sk, struct sk_buff *skb,
#endif
if (!ctx->decrypted) {
err = decrypt_internal(sk, skb, dest, NULL, chunk, zc);
- if (err < 0)
+ if (err < 0) {
+ if (err == -EINPROGRESS)
+ tls_advance_record_sn(sk, &tls_ctx->rx);
+
return err;
+ }
} else {
*zc = false;
}
@@ -817,18 +886,20 @@ static bool tls_sw_advance_skb(struct sock *sk, struct sk_buff *skb,
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
- struct strp_msg *rxm = strp_msg(skb);
- if (len < rxm->full_len) {
- rxm->offset += len;
- rxm->full_len -= len;
+ if (skb) {
+ struct strp_msg *rxm = strp_msg(skb);
- return false;
+ if (len < rxm->full_len) {
+ rxm->offset += len;
+ rxm->full_len -= len;
+ return false;
+ }
+ kfree_skb(skb);
}
/* Finished with message */
ctx->recv_pkt = NULL;
- kfree_skb(skb);
__strp_unpause(&ctx->strp);
return true;
@@ -851,6 +922,7 @@ int tls_sw_recvmsg(struct sock *sk,
int target, err = 0;
long timeo;
bool is_kvec = msg->msg_iter.type & ITER_KVEC;
+ int num_async = 0;
flags |= nonblock;
@@ -863,6 +935,7 @@ int tls_sw_recvmsg(struct sock *sk,
timeo = sock_rcvtimeo(sk, flags & MSG_DONTWAIT);
do {
bool zc = false;
+ bool async = false;
int chunk = 0;
skb = tls_wait_data(sk, flags, timeo, &err);
@@ -870,6 +943,7 @@ int tls_sw_recvmsg(struct sock *sk,
goto recv_end;
rxm = strp_msg(skb);
+
if (!cmsg) {
int cerr;
@@ -896,26 +970,39 @@ int tls_sw_recvmsg(struct sock *sk,
err = decrypt_skb_update(sk, skb, &msg->msg_iter,
&chunk, &zc);
- if (err < 0) {
+ if (err < 0 && err != -EINPROGRESS) {
tls_err_abort(sk, EBADMSG);
goto recv_end;
}
+
+ if (err == -EINPROGRESS) {
+ async = true;
+ num_async++;
+ goto pick_next_record;
+ }
+
ctx->decrypted = true;
}
if (!zc) {
chunk = min_t(unsigned int, rxm->full_len, len);
+
err = skb_copy_datagram_msg(skb, rxm->offset, msg,
chunk);
if (err < 0)
goto recv_end;
}
+pick_next_record:
copied += chunk;
len -= chunk;
if (likely(!(flags & MSG_PEEK))) {
u8 control = ctx->control;
+ /* For async, drop current skb reference */
+ if (async)
+ skb = NULL;
+
if (tls_sw_advance_skb(sk, skb, chunk)) {
/* Return full control message to
* userspace before trying to parse
@@ -924,14 +1011,33 @@ int tls_sw_recvmsg(struct sock *sk,
msg->msg_flags |= MSG_EOR;
if (control != TLS_RECORD_TYPE_DATA)
goto recv_end;
+ } else {
+ break;
}
}
+
/* If we have a new message from strparser, continue now. */
if (copied >= target && !ctx->recv_pkt)
break;
} while (len);
recv_end:
+ if (num_async) {
+ /* Wait for all previously submitted records to be decrypted */
+ smp_store_mb(ctx->async_notify, true);
+ if (atomic_read(&ctx->decrypt_pending)) {
+ err = crypto_wait_req(-EINPROGRESS, &ctx->async_wait);
+ if (err) {
+ /* one of async decrypt failed */
+ tls_err_abort(sk, err);
+ copied = 0;
+ }
+ } else {
+ reinit_completion(&ctx->async_wait.completion);
+ }
+ WRITE_ONCE(ctx->async_notify, false);
+ }
+
release_sock(sk);
return copied ? : err;
}
@@ -1271,6 +1377,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
goto free_aead;
if (sw_ctx_rx) {
+ (*aead)->reqsize = sizeof(struct decrypt_req_ctx);
+
/* Set up strparser */
memset(&cb, 0, sizeof(cb));
cb.rcv_msg = tls_queue;
--
2.13.6
^ permalink raw reply related [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2] net/tls: Add support for async decryption of tls records
2018-08-29 9:56 [PATCH net-next v2] net/tls: Add support for async decryption of tls records Vakul Garg
@ 2018-09-01 1:00 ` David Miller
2018-09-02 2:28 ` Vakul Garg
2018-09-14 19:39 ` John Fastabend
1 sibling, 1 reply; 6+ messages in thread
From: David Miller @ 2018-09-01 1:00 UTC (permalink / raw)
To: vakul.garg; +Cc: netdev, borisp, aviadye, davejwatson
From: Vakul Garg <vakul.garg@nxp.com>
Date: Wed, 29 Aug 2018 15:26:55 +0530
> When tls records are decrypted using asynchronous acclerators such as
> NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
> getting -EINPROGRESS, the tls record processing stops till the time the
> crypto accelerator finishes off and returns the result. This incurs a
> context switch and is not an efficient way of accessing the crypto
> accelerators. Crypto accelerators work efficient when they are queued
> with multiple crypto jobs without having to wait for the previous ones
> to complete.
>
> The patch submits multiple crypto requests without having to wait for
> for previous ones to complete. This has been implemented for records
> which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
> for all the asynchronous decryption requests to complete.
>
> The references to records which have been sent for async decryption are
> dropped. For cases where record decryption is not possible in zero-copy
> mode, asynchronous decryption is not used and we wait for decryption
> crypto api to complete.
>
> For crypto requests executing in async fashion, the memory for
> aead_request, sglists and skb etc is freed from the decryption
> completion handler. The decryption completion handler wakesup the
> sleeping user context when recvmsg() flags that it has done sending
> all the decryption requests and there are no more decryption requests
> pending to be completed.
>
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
> Reviewed-by: Dave Watson <davejwatson@fb.com>
> ---
>
> Changes since v1:
> - Simplified recvmsg() so to drop reference to skb in case it
> was submimtted for async decryption.
> - Modified tls_sw_advance_skb() to handle case when input skb is
> NULL.
Applied.
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH net-next v2] net/tls: Add support for async decryption of tls records
2018-09-01 1:00 ` David Miller
@ 2018-09-02 2:28 ` Vakul Garg
2018-09-02 2:53 ` David Miller
0 siblings, 1 reply; 6+ messages in thread
From: Vakul Garg @ 2018-09-02 2:28 UTC (permalink / raw)
To: David Miller
Cc: netdev@vger.kernel.org, borisp@mellanox.com, aviadye@mellanox.com,
davejwatson@fb.com
> -----Original Message-----
> From: David Miller <davem@davemloft.net>
> Sent: Saturday, September 1, 2018 6:31 AM
> To: Vakul Garg <vakul.garg@nxp.com>
> Cc: netdev@vger.kernel.org; borisp@mellanox.com;
> aviadye@mellanox.com; davejwatson@fb.com
> Subject: Re: [PATCH net-next v2] net/tls: Add support for async decryption of
> tls records
>
> From: Vakul Garg <vakul.garg@nxp.com>
> Date: Wed, 29 Aug 2018 15:26:55 +0530
>
> > When tls records are decrypted using asynchronous acclerators such as
> > NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
> > getting -EINPROGRESS, the tls record processing stops till the time
> > the crypto accelerator finishes off and returns the result. This
> > incurs a context switch and is not an efficient way of accessing the
> > crypto accelerators. Crypto accelerators work efficient when they are
> > queued with multiple crypto jobs without having to wait for the
> > previous ones to complete.
> >
> > The patch submits multiple crypto requests without having to wait for
> > for previous ones to complete. This has been implemented for records
> > which are decrypted in zero-copy mode. At the end of recvmsg(), we
> > wait for all the asynchronous decryption requests to complete.
> >
> > The references to records which have been sent for async decryption
> > are dropped. For cases where record decryption is not possible in
> > zero-copy mode, asynchronous decryption is not used and we wait for
> > decryption crypto api to complete.
> >
> > For crypto requests executing in async fashion, the memory for
> > aead_request, sglists and skb etc is freed from the decryption
> > completion handler. The decryption completion handler wakesup the
> > sleeping user context when recvmsg() flags that it has done sending
> > all the decryption requests and there are no more decryption requests
> > pending to be completed.
> >
> > Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
> > Reviewed-by: Dave Watson <davejwatson@fb.com>
> > ---
> >
> > Changes since v1:
> > - Simplified recvmsg() so to drop reference to skb in case it
> > was submimtted for async decryption.
> > - Modified tls_sw_advance_skb() to handle case when input skb is
> > NULL.
>
> Applied.
I do not find this patch in tree yet.
Can you please check? Thanks and Regards.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2] net/tls: Add support for async decryption of tls records
2018-09-02 2:28 ` Vakul Garg
@ 2018-09-02 2:53 ` David Miller
0 siblings, 0 replies; 6+ messages in thread
From: David Miller @ 2018-09-02 2:53 UTC (permalink / raw)
To: vakul.garg; +Cc: netdev, borisp, aviadye, davejwatson
From: Vakul Garg <vakul.garg@nxp.com>
Date: Sun, 2 Sep 2018 02:28:00 +0000
> I do not find this patch in tree yet.
> Can you please check? Thanks and Regards.
The perils of working on two different machines :-)
It should be there now, sorry about that.
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [PATCH net-next v2] net/tls: Add support for async decryption of tls records
2018-08-29 9:56 [PATCH net-next v2] net/tls: Add support for async decryption of tls records Vakul Garg
2018-09-01 1:00 ` David Miller
@ 2018-09-14 19:39 ` John Fastabend
2018-09-15 12:11 ` Vakul Garg
1 sibling, 1 reply; 6+ messages in thread
From: John Fastabend @ 2018-09-14 19:39 UTC (permalink / raw)
To: Vakul Garg, netdev; +Cc: borisp, aviadye, davejwatson, davem
On 08/29/2018 02:56 AM, Vakul Garg wrote:
> When tls records are decrypted using asynchronous acclerators such as
> NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
> getting -EINPROGRESS, the tls record processing stops till the time the
> crypto accelerator finishes off and returns the result. This incurs a
> context switch and is not an efficient way of accessing the crypto
> accelerators. Crypto accelerators work efficient when they are queued
> with multiple crypto jobs without having to wait for the previous ones
> to complete.
>
> The patch submits multiple crypto requests without having to wait for
> for previous ones to complete. This has been implemented for records
> which are decrypted in zero-copy mode. At the end of recvmsg(), we wait
> for all the asynchronous decryption requests to complete.
>
> The references to records which have been sent for async decryption are
> dropped. For cases where record decryption is not possible in zero-copy
> mode, asynchronous decryption is not used and we wait for decryption
> crypto api to complete.
>
> For crypto requests executing in async fashion, the memory for
> aead_request, sglists and skb etc is freed from the decryption
> completion handler. The decryption completion handler wakesup the
> sleeping user context when recvmsg() flags that it has done sending
> all the decryption requests and there are no more decryption requests
> pending to be completed.
>
> Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
> Reviewed-by: Dave Watson <davejwatson@fb.com>
> ---
[...]
> @@ -1271,6 +1377,8 @@ int tls_set_sw_offload(struct sock *sk, struct tls_context *ctx, int tx)
> goto free_aead;
>
> if (sw_ctx_rx) {
> + (*aead)->reqsize = sizeof(struct decrypt_req_ctx);
> +
This is not valid and may cause GPF or best case only a KASAN
warning. 'reqsize' should probably not be mangled outside the
internal crypto APIs but the real reason is the reqsize is used
to determine how much space is needed at the end of the aead_request
for crypto private ctx use in encrypt/decrypt. After this patch
when we submit an aead_request the crypto layer will think it
has room for its private structs at the end but now only 8B will
be there and crypto layer will happily memset some arbitrary
memory for you amongst other things.
Anyways testing a fix now will post shortly.
Thanks,
John
^ permalink raw reply [flat|nested] 6+ messages in thread
* RE: [PATCH net-next v2] net/tls: Add support for async decryption of tls records
2018-09-14 19:39 ` John Fastabend
@ 2018-09-15 12:11 ` Vakul Garg
0 siblings, 0 replies; 6+ messages in thread
From: Vakul Garg @ 2018-09-15 12:11 UTC (permalink / raw)
To: John Fastabend, netdev@vger.kernel.org
Cc: borisp@mellanox.com, aviadye@mellanox.com, davejwatson@fb.com,
davem@davemloft.net
> -----Original Message-----
> From: John Fastabend <john.fastabend@gmail.com>
> Sent: Saturday, September 15, 2018 1:10 AM
> To: Vakul Garg <vakul.garg@nxp.com>; netdev@vger.kernel.org
> Cc: borisp@mellanox.com; aviadye@mellanox.com; davejwatson@fb.com;
> davem@davemloft.net
> Subject: Re: [PATCH net-next v2] net/tls: Add support for async decryption of
> tls records
>
> On 08/29/2018 02:56 AM, Vakul Garg wrote:
> > When tls records are decrypted using asynchronous acclerators such as
> > NXP CAAM engine, the crypto apis return -EINPROGRESS. Presently, on
> > getting -EINPROGRESS, the tls record processing stops till the time
> > the crypto accelerator finishes off and returns the result. This
> > incurs a context switch and is not an efficient way of accessing the
> > crypto accelerators. Crypto accelerators work efficient when they are
> > queued with multiple crypto jobs without having to wait for the
> > previous ones to complete.
> >
> > The patch submits multiple crypto requests without having to wait for
> > for previous ones to complete. This has been implemented for records
> > which are decrypted in zero-copy mode. At the end of recvmsg(), we
> > wait for all the asynchronous decryption requests to complete.
> >
> > The references to records which have been sent for async decryption
> > are dropped. For cases where record decryption is not possible in
> > zero-copy mode, asynchronous decryption is not used and we wait for
> > decryption crypto api to complete.
> >
> > For crypto requests executing in async fashion, the memory for
> > aead_request, sglists and skb etc is freed from the decryption
> > completion handler. The decryption completion handler wakesup the
> > sleeping user context when recvmsg() flags that it has done sending
> > all the decryption requests and there are no more decryption requests
> > pending to be completed.
> >
> > Signed-off-by: Vakul Garg <vakul.garg@nxp.com>
> > Reviewed-by: Dave Watson <davejwatson@fb.com>
> > ---
>
> [...]
>
>
> > @@ -1271,6 +1377,8 @@ int tls_set_sw_offload(struct sock *sk, struct
> tls_context *ctx, int tx)
> > goto free_aead;
> >
> > if (sw_ctx_rx) {
> > + (*aead)->reqsize = sizeof(struct decrypt_req_ctx);
> > +
>
> This is not valid and may cause GPF or best case only a KASAN warning.
> 'reqsize' should probably not be mangled outside the internal crypto APIs but
> the real reason is the reqsize is used to determine how much space is needed
> at the end of the aead_request for crypto private ctx use in encrypt/decrypt.
> After this patch when we submit an aead_request the crypto layer will think it
> has room for its private structs at the end but now only 8B will be there and
> crypto layer will happily memset some arbitrary memory for you amongst
> other things.
>
> Anyways testing a fix now will post shortly.
I am aware that the space in question after aead_request is used internally
by cryptoapi. My plan was to inflate this space by an extra sizeof(struct decrypt_req_ctx).
(*aead)->reqsize += sizeof(struct decrypt_req_ctx);
Somehow, I missed it.
But your piece of change look a better solution.
Thanks for submitting the fix.
>
> Thanks,
> John
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2018-09-15 17:30 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-29 9:56 [PATCH net-next v2] net/tls: Add support for async decryption of tls records Vakul Garg
2018-09-01 1:00 ` David Miller
2018-09-02 2:28 ` Vakul Garg
2018-09-02 2:53 ` David Miller
2018-09-14 19:39 ` John Fastabend
2018-09-15 12:11 ` Vakul Garg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).