* [PATCH 1/5] net/tls: handle MSG_EOR for tls_sw TX flow
2023-07-03 9:04 [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
@ 2023-07-03 9:04 ` Hannes Reinecke
2023-07-03 9:04 ` [PATCH 2/5] net/tls: handle MSG_EOR for tls_device " Hannes Reinecke
` (4 subsequent siblings)
5 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 9:04 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke
tls_sw_sendmsg() already handles MSG_MORE, but bails
out on MSG_EOR.
Seeing that MSG_EOR is basically the opposite of
MSG_MORE this patch adds handling MSG_EOR by treating
it as the negation of MSG_MORE.
And erroring out if MSG_EOR is specified with MSG_MORE.
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
net/tls/tls_sw.c | 5 ++++-
1 file changed, 4 insertions(+), 1 deletion(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 53f944e6d8ef..9aef45e870a5 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -984,6 +984,9 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
int ret = 0;
int pending;
+ if (!eor && (msg->msg_flags & MSG_EOR))
+ return -EINVAL;
+
if (unlikely(msg->msg_controllen)) {
ret = tls_process_cmsg(sk, msg, &record_type);
if (ret) {
@@ -1193,7 +1196,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
int ret;
if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
- MSG_CMSG_COMPAT | MSG_SPLICE_PAGES |
+ MSG_CMSG_COMPAT | MSG_SPLICE_PAGES | MSG_EOR |
MSG_SENDPAGE_NOPOLICY))
return -EOPNOTSUPP;
--
2.35.3
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 2/5] net/tls: handle MSG_EOR for tls_device TX flow
2023-07-03 9:04 [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
2023-07-03 9:04 ` [PATCH 1/5] net/tls: handle MSG_EOR for tls_sw TX flow Hannes Reinecke
@ 2023-07-03 9:04 ` Hannes Reinecke
2023-07-03 9:04 ` [PATCH 3/5] selftests/net/tls: add test for MSG_EOR Hannes Reinecke
` (3 subsequent siblings)
5 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 9:04 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke
tls_push_data() MSG_MORE, but bails out on MSG_EOR.
Seeing that MSG_EOR is basically the opposite of MSG_MORE
this patch adds handling MSG_EOR by treating it as the
absence of MSG_MORE.
Consequently we should return an error when both are set.
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
net/tls/tls_device.c | 6 +++++-
1 file changed, 5 insertions(+), 1 deletion(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index 2021fe557e50..5df18f696d7f 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -441,9 +441,13 @@ static int tls_push_data(struct sock *sk,
long timeo;
if (flags &
- ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SPLICE_PAGES))
+ ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+ MSG_SPLICE_PAGES | MSG_EOR))
return -EOPNOTSUPP;
+ if ((flags & (MSG_MORE | MSG_EOR)) == (MSG_MORE | MSG_EOR))
+ return -EINVAL;
+
if (unlikely(sk->sk_err))
return -sk->sk_err;
--
2.35.3
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 3/5] selftests/net/tls: add test for MSG_EOR
2023-07-03 9:04 [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
2023-07-03 9:04 ` [PATCH 1/5] net/tls: handle MSG_EOR for tls_sw TX flow Hannes Reinecke
2023-07-03 9:04 ` [PATCH 2/5] net/tls: handle MSG_EOR for tls_device " Hannes Reinecke
@ 2023-07-03 9:04 ` Hannes Reinecke
2023-07-03 9:04 ` [PATCH 4/5] net/tls: split tls_rx_reader_lock Hannes Reinecke
` (2 subsequent siblings)
5 siblings, 0 replies; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 9:04 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke
As the recent patch is modifying the behaviour for TLS re MSG_EOR
handling we should be having a test for it.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
tools/testing/selftests/net/tls.c | 11 +++++++++++
1 file changed, 11 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index a3c57004344c..4b63708c6a81 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -486,6 +486,17 @@ TEST_F(tls, msg_more_unsent)
EXPECT_EQ(recv(self->cfd, buf, send_len, MSG_DONTWAIT), -1);
}
+TEST_F(tls, msg_eor)
+{
+ char const *test_str = "test_read";
+ int send_len = 10;
+ char buf[10];
+
+ EXPECT_EQ(send(self->fd, test_str, send_len, MSG_EOR), send_len);
+ EXPECT_EQ(recv(self->cfd, buf, send_len, MSG_WAITALL), send_len);
+ EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
+}
+
TEST_F(tls, sendmsg_single)
{
struct msghdr msg;
--
2.35.3
^ permalink raw reply related [flat|nested] 24+ messages in thread* [PATCH 4/5] net/tls: split tls_rx_reader_lock
2023-07-03 9:04 [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
` (2 preceding siblings ...)
2023-07-03 9:04 ` [PATCH 3/5] selftests/net/tls: add test for MSG_EOR Hannes Reinecke
@ 2023-07-03 9:04 ` Hannes Reinecke
2023-07-03 10:06 ` Sagi Grimberg
2023-07-03 9:04 ` [PATCH 5/5] net/tls: implement ->read_sock() Hannes Reinecke
2023-07-03 10:08 ` [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS Sagi Grimberg
5 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 9:04 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke
Split tls_rx_reader_{lock,unlock} into an 'acquire/release' and
the actual locking part.
With that we can use the tls_rx_reader_lock in situations where
the socket is already locked.
Suggested-by: Sagi Grimberg <sagi@grimberg.me>
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
net/tls/tls_sw.c | 38 ++++++++++++++++++++++----------------
1 file changed, 22 insertions(+), 16 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 9aef45e870a5..d0636ea13009 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1848,13 +1848,10 @@ tls_read_flush_backlog(struct sock *sk, struct tls_prot_info *prot,
return sk_flush_backlog(sk);
}
-static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx *ctx,
- bool nonblock)
+static int tls_rx_reader_acquire(struct sock *sk, struct tls_sw_context_rx *ctx,
+ bool nonblock)
{
long timeo;
- int err;
-
- lock_sock(sk);
timeo = sock_rcvtimeo(sk, nonblock);
@@ -1868,26 +1865,30 @@ static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx *ctx,
!READ_ONCE(ctx->reader_present), &wait);
remove_wait_queue(&ctx->wq, &wait);
- if (timeo <= 0) {
- err = -EAGAIN;
- goto err_unlock;
- }
- if (signal_pending(current)) {
- err = sock_intr_errno(timeo);
- goto err_unlock;
- }
+ if (timeo <= 0)
+ return -EAGAIN;
+ if (signal_pending(current))
+ return sock_intr_errno(timeo);
}
WRITE_ONCE(ctx->reader_present, 1);
return 0;
+}
-err_unlock:
- release_sock(sk);
+static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx *ctx,
+ bool nonblock)
+{
+ int err;
+
+ lock_sock(sk);
+ err = tls_rx_reader_acquire(sk, ctx, nonblock);
+ if (err)
+ release_sock(sk);
return err;
}
-static void tls_rx_reader_unlock(struct sock *sk, struct tls_sw_context_rx *ctx)
+static void tls_rx_reader_release(struct sock *sk, struct tls_sw_context_rx *ctx)
{
if (unlikely(ctx->reader_contended)) {
if (wq_has_sleeper(&ctx->wq))
@@ -1899,6 +1900,11 @@ static void tls_rx_reader_unlock(struct sock *sk, struct tls_sw_context_rx *ctx)
}
WRITE_ONCE(ctx->reader_present, 0);
+}
+
+static void tls_rx_reader_unlock(struct sock *sk, struct tls_sw_context_rx *ctx)
+{
+ tls_rx_reader_release(sk, ctx);
release_sock(sk);
}
--
2.35.3
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 4/5] net/tls: split tls_rx_reader_lock
2023-07-03 9:04 ` [PATCH 4/5] net/tls: split tls_rx_reader_lock Hannes Reinecke
@ 2023-07-03 10:06 ` Sagi Grimberg
2023-07-03 10:21 ` Hannes Reinecke
0 siblings, 1 reply; 24+ messages in thread
From: Sagi Grimberg @ 2023-07-03 10:06 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 12:04, Hannes Reinecke wrote:
> Split tls_rx_reader_{lock,unlock} into an 'acquire/release' and
> the actual locking part.
> With that we can use the tls_rx_reader_lock in situations where
> the socket is already locked.
>
> Suggested-by: Sagi Grimberg <sagi@grimberg.me>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
> net/tls/tls_sw.c | 38 ++++++++++++++++++++++----------------
> 1 file changed, 22 insertions(+), 16 deletions(-)
>
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 9aef45e870a5..d0636ea13009 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1848,13 +1848,10 @@ tls_read_flush_backlog(struct sock *sk, struct tls_prot_info *prot,
> return sk_flush_backlog(sk);
> }
>
> -static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx *ctx,
> - bool nonblock)
> +static int tls_rx_reader_acquire(struct sock *sk, struct tls_sw_context_rx *ctx,
> + bool nonblock)
Nit: I still think tls_rx_reader_enter/tls_rx_reader_exit are more
appropriate names.
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH 4/5] net/tls: split tls_rx_reader_lock
2023-07-03 10:06 ` Sagi Grimberg
@ 2023-07-03 10:21 ` Hannes Reinecke
2023-07-05 20:55 ` Jakub Kicinski
0 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 10:21 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 12:06, Sagi Grimberg wrote:
>
>
> On 7/3/23 12:04, Hannes Reinecke wrote:
>> Split tls_rx_reader_{lock,unlock} into an 'acquire/release' and
>> the actual locking part.
>> With that we can use the tls_rx_reader_lock in situations where
>> the socket is already locked.
>>
>> Suggested-by: Sagi Grimberg <sagi@grimberg.me>
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>> net/tls/tls_sw.c | 38 ++++++++++++++++++++++----------------
>> 1 file changed, 22 insertions(+), 16 deletions(-)
>>
>> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
>> index 9aef45e870a5..d0636ea13009 100644
>> --- a/net/tls/tls_sw.c
>> +++ b/net/tls/tls_sw.c
>> @@ -1848,13 +1848,10 @@ tls_read_flush_backlog(struct sock *sk, struct
>> tls_prot_info *prot,
>> return sk_flush_backlog(sk);
>> }
>> -static int tls_rx_reader_lock(struct sock *sk, struct
>> tls_sw_context_rx *ctx,
>> - bool nonblock)
>> +static int tls_rx_reader_acquire(struct sock *sk, struct
>> tls_sw_context_rx *ctx,
>> + bool nonblock)
>
> Nit: I still think tls_rx_reader_enter/tls_rx_reader_exit are more
> appropriate names.
I don't mind either way, but I've interpreted the comments from Jakub
that he'd like this naming better.
Jakub?
Cheers,
Hannes
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH 4/5] net/tls: split tls_rx_reader_lock
2023-07-03 10:21 ` Hannes Reinecke
@ 2023-07-05 20:55 ` Jakub Kicinski
0 siblings, 0 replies; 24+ messages in thread
From: Jakub Kicinski @ 2023-07-05 20:55 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Sagi Grimberg, Keith Busch, Christoph Hellwig, linux-nvme,
Eric Dumazet, Paolo Abeni, netdev
On Mon, 3 Jul 2023 12:21:32 +0200 Hannes Reinecke wrote:
> > Nit: I still think tls_rx_reader_enter/tls_rx_reader_exit are more
> > appropriate names.
>
> I don't mind either way, but I've interpreted the comments from Jakub
> that he'd like this naming better.
>
> Jakub?
Slight preference for enter/leave (leave rather than exit).
^ permalink raw reply [flat|nested] 24+ messages in thread
* [PATCH 5/5] net/tls: implement ->read_sock()
2023-07-03 9:04 [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
` (3 preceding siblings ...)
2023-07-03 9:04 ` [PATCH 4/5] net/tls: split tls_rx_reader_lock Hannes Reinecke
@ 2023-07-03 9:04 ` Hannes Reinecke
2023-07-04 7:56 ` Paolo Abeni
2023-07-03 10:08 ` [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS Sagi Grimberg
5 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 9:04 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke,
Boris Pismenny
Implement ->read_sock() function for use with nvme-tcp.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Boris Pismenny <boris.pismenny@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
---
net/tls/tls.h | 2 ++
net/tls/tls_main.c | 2 ++
net/tls/tls_sw.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
3 files changed, 82 insertions(+)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index 86cef1c68e03..7e4d45537deb 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -110,6 +110,8 @@ bool tls_sw_sock_is_readable(struct sock *sk);
ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
struct pipe_inode_info *pipe,
size_t len, unsigned int flags);
+int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
+ sk_read_actor_t read_actor);
int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
void tls_device_splice_eof(struct socket *sock);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index b6896126bb92..7dbb8cd8f809 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -962,10 +962,12 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
ops[TLS_BASE][TLS_SW ] = ops[TLS_BASE][TLS_BASE];
ops[TLS_BASE][TLS_SW ].splice_read = tls_sw_splice_read;
ops[TLS_BASE][TLS_SW ].poll = tls_sk_poll;
+ ops[TLS_BASE][TLS_SW ].read_sock = tls_sw_read_sock;
ops[TLS_SW ][TLS_SW ] = ops[TLS_SW ][TLS_BASE];
ops[TLS_SW ][TLS_SW ].splice_read = tls_sw_splice_read;
ops[TLS_SW ][TLS_SW ].poll = tls_sk_poll;
+ ops[TLS_SW ][TLS_SW ].read_sock = tls_sw_read_sock;
#ifdef CONFIG_TLS_DEVICE
ops[TLS_HW ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index d0636ea13009..dbf1c8a71f61 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2202,6 +2202,84 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
goto splice_read_end;
}
+int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
+ sk_read_actor_t read_actor)
+{
+ struct tls_context *tls_ctx = tls_get_ctx(sk);
+ struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+ struct strp_msg *rxm = NULL;
+ struct tls_msg *tlm;
+ struct sk_buff *skb;
+ ssize_t copied = 0;
+ int err, used;
+
+ err = tls_rx_reader_acquire(sk, ctx, true);
+ if (err < 0)
+ return err;
+ if (!skb_queue_empty(&ctx->rx_list)) {
+ skb = __skb_dequeue(&ctx->rx_list);
+ } else {
+ struct tls_decrypt_arg darg;
+
+ err = tls_rx_rec_wait(sk, NULL, true, true);
+ if (err <= 0) {
+ tls_rx_reader_release(sk, ctx);
+ return err;
+ }
+
+ memset(&darg.inargs, 0, sizeof(darg.inargs));
+
+ err = tls_rx_one_record(sk, NULL, &darg);
+ if (err < 0) {
+ tls_err_abort(sk, -EBADMSG);
+ tls_rx_reader_release(sk, ctx);
+ return err;
+ }
+
+ tls_rx_rec_done(ctx);
+ skb = darg.skb;
+ }
+
+ do {
+ rxm = strp_msg(skb);
+ tlm = tls_msg(skb);
+
+ /* read_sock does not support reading control messages */
+ if (tlm->control != TLS_RECORD_TYPE_DATA) {
+ err = -EINVAL;
+ goto read_sock_requeue;
+ }
+
+ used = read_actor(desc, skb, rxm->offset, rxm->full_len);
+ if (used <= 0) {
+ err = used;
+ goto read_sock_end;
+ }
+
+ copied += used;
+ if (used < rxm->full_len) {
+ rxm->offset += used;
+ rxm->full_len -= used;
+ if (!desc->count)
+ goto read_sock_requeue;
+ } else {
+ consume_skb(skb);
+ if (desc->count && !skb_queue_empty(&ctx->rx_list))
+ skb = __skb_dequeue(&ctx->rx_list);
+ else
+ skb = NULL;
+ }
+ } while (skb);
+
+read_sock_end:
+ tls_rx_reader_release(sk, ctx);
+ return copied ? : err;
+
+read_sock_requeue:
+ __skb_queue_head(&ctx->rx_list, skb);
+ goto read_sock_end;
+}
+
bool tls_sw_sock_is_readable(struct sock *sk)
{
struct tls_context *tls_ctx = tls_get_ctx(sk);
--
2.35.3
^ permalink raw reply related [flat|nested] 24+ messages in thread* Re: [PATCH 5/5] net/tls: implement ->read_sock()
2023-07-03 9:04 ` [PATCH 5/5] net/tls: implement ->read_sock() Hannes Reinecke
@ 2023-07-04 7:56 ` Paolo Abeni
0 siblings, 0 replies; 24+ messages in thread
From: Paolo Abeni @ 2023-07-04 7:56 UTC (permalink / raw)
To: Hannes Reinecke, Sagi Grimberg
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, netdev, Boris Pismenny
On Mon, 2023-07-03 at 11:04 +0200, Hannes Reinecke wrote:
> Implement ->read_sock() function for use with nvme-tcp.
>
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Cc: Boris Pismenny <boris.pismenny@gmail.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: netdev@vger.kernel.org
> ---
> net/tls/tls.h | 2 ++
> net/tls/tls_main.c | 2 ++
> net/tls/tls_sw.c | 78 ++++++++++++++++++++++++++++++++++++++++++++++
> 3 files changed, 82 insertions(+)
>
> diff --git a/net/tls/tls.h b/net/tls/tls.h
> index 86cef1c68e03..7e4d45537deb 100644
> --- a/net/tls/tls.h
> +++ b/net/tls/tls.h
> @@ -110,6 +110,8 @@ bool tls_sw_sock_is_readable(struct sock *sk);
> ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
> struct pipe_inode_info *pipe,
> size_t len, unsigned int flags);
> +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
> + sk_read_actor_t read_actor);
>
> int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
> void tls_device_splice_eof(struct socket *sock);
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index b6896126bb92..7dbb8cd8f809 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -962,10 +962,12 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
> ops[TLS_BASE][TLS_SW ] = ops[TLS_BASE][TLS_BASE];
> ops[TLS_BASE][TLS_SW ].splice_read = tls_sw_splice_read;
> ops[TLS_BASE][TLS_SW ].poll = tls_sk_poll;
> + ops[TLS_BASE][TLS_SW ].read_sock = tls_sw_read_sock;
>
> ops[TLS_SW ][TLS_SW ] = ops[TLS_SW ][TLS_BASE];
> ops[TLS_SW ][TLS_SW ].splice_read = tls_sw_splice_read;
> ops[TLS_SW ][TLS_SW ].poll = tls_sk_poll;
> + ops[TLS_SW ][TLS_SW ].read_sock = tls_sw_read_sock;
>
> #ifdef CONFIG_TLS_DEVICE
> ops[TLS_HW ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index d0636ea13009..dbf1c8a71f61 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -2202,6 +2202,84 @@ ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
> goto splice_read_end;
> }
>
> +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
> + sk_read_actor_t read_actor)
> +{
> + struct tls_context *tls_ctx = tls_get_ctx(sk);
> + struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> + struct strp_msg *rxm = NULL;
> + struct tls_msg *tlm;
> + struct sk_buff *skb;
> + ssize_t copied = 0;
> + int err, used;
> +
> + err = tls_rx_reader_acquire(sk, ctx, true);
> + if (err < 0)
> + return err;
> + if (!skb_queue_empty(&ctx->rx_list)) {
> + skb = __skb_dequeue(&ctx->rx_list);
> + } else {
> + struct tls_decrypt_arg darg;
> +
> + err = tls_rx_rec_wait(sk, NULL, true, true);
> + if (err <= 0) {
> + tls_rx_reader_release(sk, ctx);
> + return err;
You can replace the 2 lines above with:
goto read_sock_end;
> + }
> +
> + memset(&darg.inargs, 0, sizeof(darg.inargs));
> +
> + err = tls_rx_one_record(sk, NULL, &darg);
> + if (err < 0) {
> + tls_err_abort(sk, -EBADMSG);
> + tls_rx_reader_release(sk, ctx);
> + return err;
Same here.
> + }
> +
> + tls_rx_rec_done(ctx);
> + skb = darg.skb;
> + }
> +
> + do {
> + rxm = strp_msg(skb);
> + tlm = tls_msg(skb);
> +
> + /* read_sock does not support reading control messages */
> + if (tlm->control != TLS_RECORD_TYPE_DATA) {
> + err = -EINVAL;
> + goto read_sock_requeue;
> + }
> +
> + used = read_actor(desc, skb, rxm->offset, rxm->full_len);
> + if (used <= 0) {
> + err = used;
> + goto read_sock_end;
> + }
> +
> + copied += used;
> + if (used < rxm->full_len) {
> + rxm->offset += used;
> + rxm->full_len -= used;
> + if (!desc->count)
> + goto read_sock_requeue;
> + } else {
> + consume_skb(skb);
> + if (desc->count && !skb_queue_empty(&ctx->rx_list))
> + skb = __skb_dequeue(&ctx->rx_list);
> + else
> + skb = NULL;
> + }
> + } while (skb);
> +
> +read_sock_end:
> + tls_rx_reader_release(sk, ctx);
> + return copied ? : err;
WRT the return value, I think you should look at tcp_read_sock() as the
reference. The above LGTM. Some ->read_sock() callers ignore the
return value due to the specific 'read_actor' callback used.
WRT the deadlock you see, try to run your tests with lockdep enabled,
it should provide valuable information on the deadlock cause, if any.
Cheers,
Paolo
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 9:04 [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
` (4 preceding siblings ...)
2023-07-03 9:04 ` [PATCH 5/5] net/tls: implement ->read_sock() Hannes Reinecke
@ 2023-07-03 10:08 ` Sagi Grimberg
2023-07-03 10:20 ` Hannes Reinecke
5 siblings, 1 reply; 24+ messages in thread
From: Sagi Grimberg @ 2023-07-03 10:08 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 12:04, Hannes Reinecke wrote:
> Hi all,
>
> here are some small fixes to get NVMe-over-TLS up and running.
> The first three are just minor modifications to have MSG_EOR handled
> for TLS (and adding a test for it), but the last two implement the
> ->read_sock() callback for tls_sw and that, I guess, could do with
> some reviews.
> It does work with my NVMe-TLS test harness, but what do I know :-)
Hannes, have you tested nvme/tls with the MSG_SPLICE_PAGES series from
david?
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 10:08 ` [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS Sagi Grimberg
@ 2023-07-03 10:20 ` Hannes Reinecke
2023-07-03 12:13 ` Hannes Reinecke
0 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 10:20 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 12:08, Sagi Grimberg wrote:
>
>
> On 7/3/23 12:04, Hannes Reinecke wrote:
>> Hi all,
>>
>> here are some small fixes to get NVMe-over-TLS up and running.
>> The first three are just minor modifications to have MSG_EOR handled
>> for TLS (and adding a test for it), but the last two implement the
>> ->read_sock() callback for tls_sw and that, I guess, could do with
>> some reviews.
>> It does work with my NVMe-TLS test harness, but what do I know :-)
>
> Hannes, have you tested nvme/tls with the MSG_SPLICE_PAGES series from
> david?
Yes. This patchset has been tested on top of current linus' HEAD, which
includes the MSG_SPLICE_PAGES series (hence the absence of patch hunks
for ->sendpage() and friends).
Cheers,
Hannes
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 10:20 ` Hannes Reinecke
@ 2023-07-03 12:13 ` Hannes Reinecke
2023-07-03 12:19 ` Hannes Reinecke
0 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 12:13 UTC (permalink / raw)
To: Sagi Grimberg
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 12:20, Hannes Reinecke wrote:
> On 7/3/23 12:08, Sagi Grimberg wrote:
>>
>>
>> On 7/3/23 12:04, Hannes Reinecke wrote:
>>> Hi all,
>>>
>>> here are some small fixes to get NVMe-over-TLS up and running.
>>> The first three are just minor modifications to have MSG_EOR handled
>>> for TLS (and adding a test for it), but the last two implement the
>>> ->read_sock() callback for tls_sw and that, I guess, could do with
>>> some reviews.
>>> It does work with my NVMe-TLS test harness, but what do I know :-)
>>
>> Hannes, have you tested nvme/tls with the MSG_SPLICE_PAGES series from
>> david?
>
> Yes. This patchset has been tested on top of current linus' HEAD, which
> includes the MSG_SPLICE_PAGES series (hence the absence of patch hunks
> for ->sendpage() and friends).
>
Well, of course, spoke too soon.
'discover' and 'connect' works, but when I'm trying to transfer data
(eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
sock_sendmsg() as it's trying to access invalid pages :-(
Debugging to find out if it's my or Davids fault.
Cheers,
Hannes
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 12:13 ` Hannes Reinecke
@ 2023-07-03 12:19 ` Hannes Reinecke
2023-07-03 12:26 ` David Howells
0 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 12:19 UTC (permalink / raw)
To: Sagi Grimberg, David Howells
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 14:13, Hannes Reinecke wrote:
> On 7/3/23 12:20, Hannes Reinecke wrote:
>> On 7/3/23 12:08, Sagi Grimberg wrote:
>>>
>>>
>>> On 7/3/23 12:04, Hannes Reinecke wrote:
>>>> Hi all,
>>>>
>>>> here are some small fixes to get NVMe-over-TLS up and running.
>>>> The first three are just minor modifications to have MSG_EOR handled
>>>> for TLS (and adding a test for it), but the last two implement the
>>>> ->read_sock() callback for tls_sw and that, I guess, could do with
>>>> some reviews.
>>>> It does work with my NVMe-TLS test harness, but what do I know :-)
>>>
>>> Hannes, have you tested nvme/tls with the MSG_SPLICE_PAGES series from
>>> david?
>>
>> Yes. This patchset has been tested on top of current linus' HEAD,
>> which includes the MSG_SPLICE_PAGES series (hence the absence of patch
>> hunks for ->sendpage() and friends).
>>
> Well, of course, spoke too soon.
> 'discover' and 'connect' works, but when I'm trying to transfer data
> (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
> sock_sendmsg() as it's trying to access invalid pages :-(
>
> Debugging to find out if it's my or Davids fault.
>
Hehe.
Wasn't me. Crashes even without TLS.
Sagi, does current linus' HEAD (I've tested with e55e5df193d2) work for
you (traddr tcp 127.0.0.1)?
Cheers,
Hannes
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 12:19 ` Hannes Reinecke
@ 2023-07-03 12:26 ` David Howells
2023-07-03 12:33 ` Sagi Grimberg
2023-07-03 12:35 ` Hannes Reinecke
0 siblings, 2 replies; 24+ messages in thread
From: David Howells @ 2023-07-03 12:26 UTC (permalink / raw)
To: Hannes Reinecke
Cc: dhowells, Sagi Grimberg, Keith Busch, Christoph Hellwig,
linux-nvme, Jakub Kicinski, Eric Dumazet, Paolo Abeni, netdev
Hannes Reinecke <hare@suse.de> wrote:
> > 'discover' and 'connect' works, but when I'm trying to transfer data
> > (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
> > sock_sendmsg() as it's trying to access invalid pages :-(
Can you be more specific about the crash?
David
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 12:26 ` David Howells
@ 2023-07-03 12:33 ` Sagi Grimberg
2023-07-03 13:15 ` Hannes Reinecke
2023-07-03 12:35 ` Hannes Reinecke
1 sibling, 1 reply; 24+ messages in thread
From: Sagi Grimberg @ 2023-07-03 12:33 UTC (permalink / raw)
To: David Howells, Hannes Reinecke
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
> Hannes Reinecke <hare@suse.de> wrote:
>
>>> 'discover' and 'connect' works, but when I'm trying to transfer data
>>> (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
>>> sock_sendmsg() as it's trying to access invalid pages :-(
>
> Can you be more specific about the crash?
Hannes,
See:
[PATCH net] nvme-tcp: Fix comma-related oops
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 12:33 ` Sagi Grimberg
@ 2023-07-03 13:15 ` Hannes Reinecke
2023-07-03 13:42 ` Sagi Grimberg
0 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 13:15 UTC (permalink / raw)
To: Sagi Grimberg, David Howells
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 14:33, Sagi Grimberg wrote:
>
>> Hannes Reinecke <hare@suse.de> wrote:
>>
>>>> 'discover' and 'connect' works, but when I'm trying to transfer data
>>>> (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
>>>> sock_sendmsg() as it's trying to access invalid pages :-(
>>
>> Can you be more specific about the crash?
>
> Hannes,
>
> See:
> [PATCH net] nvme-tcp: Fix comma-related oops
Ah, right. That solves _that_ issue.
But now I'm deadlocking on the tls_rx_reader_lock() (patched as to your
suggestion). Investigating.
But it brought up yet another can of worms: what _exactly_ is the return
value of ->read_sock()?
There are currently two conflicting use-cases:
-> Ignore the return value, and assume errors etc are signalled
via 'desc.error'.
net/strparser/strparser.c
drivers/infiniband/sw/siw
drivers/scsi/iscsi_tcp.c
-> use the return value of ->read_sock(), ignoring 'desc.error':
drivers/nvme/host/tcp.c
net/ipv4/tcp.c
So which one is it?
Needless to say, implementations following the second style do not
set 'desc.error', causing any errors there to be ignored for callers
from the first style...
Jakub?
Cheers,
Hannes
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 13:15 ` Hannes Reinecke
@ 2023-07-03 13:42 ` Sagi Grimberg
2023-07-03 13:46 ` Hannes Reinecke
2023-07-03 13:57 ` Hannes Reinecke
0 siblings, 2 replies; 24+ messages in thread
From: Sagi Grimberg @ 2023-07-03 13:42 UTC (permalink / raw)
To: Hannes Reinecke, David Howells
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
>>> Hannes Reinecke <hare@suse.de> wrote:
>>>
>>>>> 'discover' and 'connect' works, but when I'm trying to transfer data
>>>>> (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
>>>>> sock_sendmsg() as it's trying to access invalid pages :-(
>>>
>>> Can you be more specific about the crash?
>>
>> Hannes,
>>
>> See:
>> [PATCH net] nvme-tcp: Fix comma-related oops
>
> Ah, right. That solves _that_ issue.
>
> But now I'm deadlocking on the tls_rx_reader_lock() (patched as to your
> suggestion). Investigating.
Are you sure it is a deadlock? or maybe you returned EAGAIN and nvme-tcp
does not interpret this as a transient status and simply returns from
io_work?
> But it brought up yet another can of worms: what _exactly_ is the return
> value of ->read_sock()?
>
> There are currently two conflicting use-cases:
> -> Ignore the return value, and assume errors etc are signalled
> via 'desc.error'.
> net/strparser/strparser.c
> drivers/infiniband/sw/siw
> drivers/scsi/iscsi_tcp.c
> -> use the return value of ->read_sock(), ignoring 'desc.error':
> drivers/nvme/host/tcp.c
> net/ipv4/tcp.c
> So which one is it?
> Needless to say, implementations following the second style do not
> set 'desc.error', causing any errors there to be ignored for callers
> from the first style...
I don't think ignoring the return value of read_sock makes sense because
it can fail outside of the recv_actor failures.
But to be on the safe side, perhaps you can both return an error and set
desc.error?
> Jakub?
>
> Cheers,
>
> Hannes
>
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 13:42 ` Sagi Grimberg
@ 2023-07-03 13:46 ` Hannes Reinecke
2023-07-03 14:01 ` Sagi Grimberg
2023-07-03 13:57 ` Hannes Reinecke
1 sibling, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 13:46 UTC (permalink / raw)
To: Sagi Grimberg, David Howells
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 15:42, Sagi Grimberg wrote:
>
>>>> Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>>>> 'discover' and 'connect' works, but when I'm trying to transfer data
>>>>>> (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
>>>>>> sock_sendmsg() as it's trying to access invalid pages :-(
>>>>
>>>> Can you be more specific about the crash?
>>>
>>> Hannes,
>>>
>>> See:
>>> [PATCH net] nvme-tcp: Fix comma-related oops
>>
>> Ah, right. That solves _that_ issue.
>>
>> But now I'm deadlocking on the tls_rx_reader_lock() (patched as to
>> your suggestion). Investigating.
>
> Are you sure it is a deadlock? or maybe you returned EAGAIN and nvme-tcp
> does not interpret this as a transient status and simply returns from
> io_work?
>
>> But it brought up yet another can of worms: what _exactly_ is the
>> return value of ->read_sock()?
>>
>> There are currently two conflicting use-cases:
>> -> Ignore the return value, and assume errors etc are signalled
>> via 'desc.error'.
>> net/strparser/strparser.c
>> drivers/infiniband/sw/siw
>> drivers/scsi/iscsi_tcp.c
>> -> use the return value of ->read_sock(), ignoring 'desc.error':
>> drivers/nvme/host/tcp.c
>> net/ipv4/tcp.c
>> So which one is it?
>> Needless to say, implementations following the second style do not
>> set 'desc.error', causing any errors there to be ignored for callers
>> from the first style...
>
> I don't think ignoring the return value of read_sock makes sense because
> it can fail outside of the recv_actor failures.
>
Oh, but it's not read_actor which is expected to set desc.error.
Have a look at 'strp_read_sock()':
/* sk should be locked here, so okay to do read_sock */
sock->ops->read_sock(strp->sk, &desc, strp_recv);
desc.error = strp->cb.read_sock_done(strp, desc.error);
it's the ->read_sock() callback which is expected to set desc.error.
> But to be on the safe side, perhaps you can both return an error and set
> desc.error?
>
But why? We can easily make ->read_sock() a void function, then it's
obvious that you can't check the return value.
Cheers,
Hannes
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 13:46 ` Hannes Reinecke
@ 2023-07-03 14:01 ` Sagi Grimberg
0 siblings, 0 replies; 24+ messages in thread
From: Sagi Grimberg @ 2023-07-03 14:01 UTC (permalink / raw)
To: Hannes Reinecke, David Howells
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
>>>>> Hannes Reinecke <hare@suse.de> wrote:
>>>>>
>>>>>>> 'discover' and 'connect' works, but when I'm trying to transfer data
>>>>>>> (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
>>>>>>> sock_sendmsg() as it's trying to access invalid pages :-(
>>>>>
>>>>> Can you be more specific about the crash?
>>>>
>>>> Hannes,
>>>>
>>>> See:
>>>> [PATCH net] nvme-tcp: Fix comma-related oops
>>>
>>> Ah, right. That solves _that_ issue.
>>>
>>> But now I'm deadlocking on the tls_rx_reader_lock() (patched as to
>>> your suggestion). Investigating.
>>
>> Are you sure it is a deadlock? or maybe you returned EAGAIN and nvme-tcp
>> does not interpret this as a transient status and simply returns from
>> io_work?
>>
>>> But it brought up yet another can of worms: what _exactly_ is the
>>> return value of ->read_sock()?
>>>
>>> There are currently two conflicting use-cases:
>>> -> Ignore the return value, and assume errors etc are signalled
>>> via 'desc.error'.
>>> net/strparser/strparser.c
>>> drivers/infiniband/sw/siw
>>> drivers/scsi/iscsi_tcp.c
>>> -> use the return value of ->read_sock(), ignoring 'desc.error':
>>> drivers/nvme/host/tcp.c
>>> net/ipv4/tcp.c
>>> So which one is it?
>>> Needless to say, implementations following the second style do not
>>> set 'desc.error', causing any errors there to be ignored for callers
>>> from the first style...
>>
>> I don't think ignoring the return value of read_sock makes sense because
>> it can fail outside of the recv_actor failures.
>>
> Oh, but it's not read_actor which is expected to set desc.error.
> Have a look at 'strp_read_sock()':
>
> /* sk should be locked here, so okay to do read_sock */
> sock->ops->read_sock(strp->sk, &desc, strp_recv);
>
> desc.error = strp->cb.read_sock_done(strp, desc.error);
>
> it's the ->read_sock() callback which is expected to set desc.error.
Then it is completely up to the consumer how it wants to interpret the
error.
>> But to be on the safe side, perhaps you can both return an error and set
>> desc.error?
>>
> But why? We can easily make ->read_sock() a void function, then it's
> obvious that you can't check the return value.
but it returns the consumed byte count, where would this info go?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 13:42 ` Sagi Grimberg
2023-07-03 13:46 ` Hannes Reinecke
@ 2023-07-03 13:57 ` Hannes Reinecke
2023-07-03 14:10 ` Sagi Grimberg
1 sibling, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 13:57 UTC (permalink / raw)
To: Sagi Grimberg, David Howells
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 15:42, Sagi Grimberg wrote:
>
>>>> Hannes Reinecke <hare@suse.de> wrote:
>>>>
>>>>>> 'discover' and 'connect' works, but when I'm trying to transfer data
>>>>>> (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
>>>>>> sock_sendmsg() as it's trying to access invalid pages :-(
>>>>
>>>> Can you be more specific about the crash?
>>>
>>> Hannes,
>>>
>>> See:
>>> [PATCH net] nvme-tcp: Fix comma-related oops
>>
>> Ah, right. That solves _that_ issue.
>>
>> But now I'm deadlocking on the tls_rx_reader_lock() (patched as to
>> your suggestion). Investigating.
>
> Are you sure it is a deadlock? or maybe you returned EAGAIN and nvme-tcp
> does not interpret this as a transient status and simply returns from
> io_work?
>
Unfortunately, yes.
static int tls_rx_reader_acquire(struct sock *sk, struct
tls_sw_context_rx *ctx,
bool nonblock)
{
long timeo;
timeo = sock_rcvtimeo(sk, nonblock);
while (unlikely(ctx->reader_present)) {
DEFINE_WAIT_FUNC(wait, woken_wake_function);
ctx->reader_contended = 1;
add_wait_queue(&ctx->wq, &wait);
sk_wait_event(sk, &timeo,
!READ_ONCE(ctx->reader_present), &wait);
and sk_wait_event() does:
#define sk_wait_event(__sk, __timeo, __condition, __wait) \
({ int __rc; \
__sk->sk_wait_pending++; \
release_sock(__sk); \
__rc = __condition; \
if (!__rc) { \
*(__timeo) = wait_woken(__wait, \
TASK_INTERRUPTIBLE, \
*(__timeo)); \
} \
sched_annotate_sleep(); \
lock_sock(__sk); \
__sk->sk_wait_pending--; \
__rc = __condition; \
__rc; \
})
so not calling 'lock_sock()' in tls_tx_reader_acquire() helps only _so_
much, we're still deadlocking.
Cheers,
Hannes
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 13:57 ` Hannes Reinecke
@ 2023-07-03 14:10 ` Sagi Grimberg
0 siblings, 0 replies; 24+ messages in thread
From: Sagi Grimberg @ 2023-07-03 14:10 UTC (permalink / raw)
To: Hannes Reinecke, David Howells
Cc: Keith Busch, Christoph Hellwig, linux-nvme, Jakub Kicinski,
Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 16:57, Hannes Reinecke wrote:
> On 7/3/23 15:42, Sagi Grimberg wrote:
>>
>>>>> Hannes Reinecke <hare@suse.de> wrote:
>>>>>
>>>>>>> 'discover' and 'connect' works, but when I'm trying to transfer data
>>>>>>> (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
>>>>>>> sock_sendmsg() as it's trying to access invalid pages :-(
>>>>>
>>>>> Can you be more specific about the crash?
>>>>
>>>> Hannes,
>>>>
>>>> See:
>>>> [PATCH net] nvme-tcp: Fix comma-related oops
>>>
>>> Ah, right. That solves _that_ issue.
>>>
>>> But now I'm deadlocking on the tls_rx_reader_lock() (patched as to
>>> your suggestion). Investigating.
>>
>> Are you sure it is a deadlock? or maybe you returned EAGAIN and nvme-tcp
>> does not interpret this as a transient status and simply returns from
>> io_work?
>>
> Unfortunately, yes.
>
> static int tls_rx_reader_acquire(struct sock *sk, struct
> tls_sw_context_rx *ctx,
> bool nonblock)
> {
> long timeo;
>
> timeo = sock_rcvtimeo(sk, nonblock);
>
> while (unlikely(ctx->reader_present)) {
> DEFINE_WAIT_FUNC(wait, woken_wake_function);
>
> ctx->reader_contended = 1;
>
> add_wait_queue(&ctx->wq, &wait);
> sk_wait_event(sk, &timeo,
> !READ_ONCE(ctx->reader_present), &wait);
>
> and sk_wait_event() does:
> #define sk_wait_event(__sk, __timeo, __condition, __wait) \
> ({ int __rc; \
> __sk->sk_wait_pending++; \
> release_sock(__sk); \
> __rc = __condition; \
> if (!__rc) { \
> *(__timeo) = wait_woken(__wait, \
> TASK_INTERRUPTIBLE, \
> *(__timeo)); \
> } \
> sched_annotate_sleep(); \
> lock_sock(__sk); \
> __sk->sk_wait_pending--; \
> __rc = __condition; \
> __rc; \
> })
>
> so not calling 'lock_sock()' in tls_tx_reader_acquire() helps only _so_
> much, we're still deadlocking.
That still is legal assuming that sock lock is taken prior to
sk_wait_event...
What are the blocked threads from sysrq-trigger?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 12:26 ` David Howells
2023-07-03 12:33 ` Sagi Grimberg
@ 2023-07-03 12:35 ` Hannes Reinecke
2023-07-03 12:45 ` David Howells
1 sibling, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2023-07-03 12:35 UTC (permalink / raw)
To: David Howells
Cc: Sagi Grimberg, Keith Busch, Christoph Hellwig, linux-nvme,
Jakub Kicinski, Eric Dumazet, Paolo Abeni, netdev
On 7/3/23 14:26, David Howells wrote:
> Hannes Reinecke <hare@suse.de> wrote:
>
>>> 'discover' and 'connect' works, but when I'm trying to transfer data
>>> (eg by doing a 'mkfs.xfs') the whole thing crashes horribly in
>>> sock_sendmsg() as it's trying to access invalid pages :-(
>
> Can you be more specific about the crash?
>
[ 1804.190109] nvme nvme0: new ctrl: NQN "nqn.test", addr 127.0.0.1:4420
[ 1814.346751] BUG: unable to handle page fault for address:
ffffffffc4e166cc
[ 1814.347980] #PF: supervisor read access in kernel mode
[ 1814.348864] #PF: error_code(0x0000) - not-present page
[ 1814.349727] PGD 2b239067 P4D 2b239067 PUD 2b23b067 PMD 0
[ 1814.350630] Oops: 0000 [#1] PREEMPT SMP NOPTI
[ 1814.351369] CPU: 0 PID: 2309 Comm: mkfs.xfs Kdump: loaded Tainted: G
E 6.4.0-54-default+ #296
[ 1814.353021] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
0.0.0 02/06/2015
[ 1814.354298] RIP: 0010:skb_splice_from_iter+0xda/0x310
[ 1814.355171] Code: 0f 8e 2f 02 00 00 4c 8d 6c 24 30 48 8b 54 24 28 4c
89 f8 4d 89 f7 4d 89 ee 49 89 c5 49 8b 1e bd 00
10 00 00 48 29 d5 4c 39 fd <48> 8b 43 08 49 0f 47 ef a8 01 0f 85 c9 01
00 00 66 90 48 89 d8 48
[ 1814.358314] RSP: 0018:ffffa90f00db7778 EFLAGS: 00010283
[ 1814.359202] RAX: ffff8941ad156600 RBX: ffffffffc4e166c4 RCX:
0000000000000000
[ 1814.360426] RDX: 0000000000000fff RSI: 0000000000000000 RDI:
0000000000000000
[ 1814.361613] RBP: 0000000000000001 R08: ffffa90f00db7958 R09:
00000000ac6df68e
[ 1814.362797] R10: 0000000000020000 R11: ffffffffac9873c0 R12:
0000000000000000
[ 1814.363998] R13: ffff8941ad156600 R14: ffffa90f00db77a8 R15:
0000000000001000
[ 1814.365175] FS: 00007f5f0ea63780(0000) GS:ffff8941fb400000(0000)
knlGS:0000000000000000
[ 1814.366493] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1814.367456] CR2: ffffffffc4e166cc CR3: 0000000005e68000 CR4:
0000000000350ef0
[ 1814.368671] Call Trace:
[ 1814.369092] <TASK>
[ 1814.369463] ? __die_body+0x1a/0x60
[ 1814.370060] ? page_fault_oops+0x151/0x470
[ 1814.370746] ? search_bpf_extables+0x65/0x70
[ 1814.371471] ? fixup_exception+0x22/0x310
[ 1814.372162] ? exc_page_fault+0xb1/0x150
[ 1814.372822] ? asm_exc_page_fault+0x22/0x30
[ 1814.373521] ? kmem_cache_alloc_node+0x1c0/0x320
[ 1814.374294] ? skb_splice_from_iter+0xda/0x310
[ 1814.375036] ? skb_splice_from_iter+0xa9/0x310
[ 1814.375777] tcp_sendmsg_locked+0x763/0xce0
[ 1814.376488] ? tcp_sendmsg_locked+0x936/0xce0
[ 1814.377225] tcp_sendmsg+0x27/0x40
[ 1814.377797] sock_sendmsg+0x8b/0xa0
[ 1814.378386] nvme_tcp_try_send_data+0x131/0x400 [nvme_tcp]
[ 1814.379297] ? nvme_tcp_try_send_cmd_pdu+0x164/0x290 [nvme_tcp]
[ 1814.380302] ? __kernel_text_address+0xe/0x40
[ 1814.381037] ? __pfx_stack_trace_consume_entry+0x10/0x10
[ 1814.381909] ? arch_stack_walk+0xa1/0xf0
[ 1814.382565] nvme_tcp_try_send+0x197/0x2c0 [nvme_tcp]
[ 1814.383407] ? __rq_qos_issue+0x24/0x40
[ 1814.384069] nvme_tcp_queue_rq+0x38b/0x3c0 [nvme_tcp]
[ 1814.384907] __blk_mq_issue_directly+0x3e/0xe0
[ 1814.385648] blk_mq_try_issue_directly+0x6c/0xa0
[ 1814.386411] blk_mq_submit_bio+0x52b/0x650
[ 1814.387095] __submit_bio+0xd8/0x150
[ 1814.387697] submit_bio_noacct_nocheck+0x151/0x360
[ 1814.388503] ? submit_bio_wait+0x54/0xc0
[ 1814.389157] submit_bio_wait+0x54/0xc0
[ 1814.389780] __blkdev_direct_IO_simple+0x124/0x280
My suspicion is here:
while (true) {
struct bio_vec bvec;
struct msghdr msg = {
.msg_flags = MSG_DONTWAIT | MSG_SPLICE_PAGES,
};
struct page *page = nvme_tcp_req_cur_page(req);
size_t offset = nvme_tcp_req_cur_offset(req);
size_t len = nvme_tcp_req_cur_length(req);
bool last = nvme_tcp_pdu_last_send(req, len);
int req_data_sent = req->data_sent;
int ret;
if (!last || queue->data_digest ||
nvme_tcp_queue_more(queue))
msg.msg_flags |= MSG_MORE;
if (!sendpage_ok(page))
msg.msg_flags &= ~MSG_SPLICE_PAGES,
bvec_set_page(&bvec, page, len, offset);
iov_iter_bvec(&msg.msg_iter, ITER_SOURCE, &bvec, 1, len);
ret = sock_sendmsg(queue->sock, &msg);
Thing is, 'req' already has an iter (that's what nvme_tcp_req_cur_page()
extracts from.
Maybe a missing kmap() somewhere?
But the entire thing is borderline pedantic.
'req' already has an iter, so we should be using it directly and not
create another iter which just replicates the existing one.
Cheers,
Hannes
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCHv6 0/5] net/tls: fixes for NVMe-over-TLS
2023-07-03 12:35 ` Hannes Reinecke
@ 2023-07-03 12:45 ` David Howells
0 siblings, 0 replies; 24+ messages in thread
From: David Howells @ 2023-07-03 12:45 UTC (permalink / raw)
To: Hannes Reinecke
Cc: dhowells, Sagi Grimberg, Keith Busch, Christoph Hellwig,
linux-nvme, Jakub Kicinski, Eric Dumazet, Paolo Abeni, netdev
Hannes Reinecke <hare@suse.de> wrote:
> msg.msg_flags &= ~MSG_SPLICE_PAGES,
Ah, yes - as Sagi pointed out there's a patch for that comma there.
David
^ permalink raw reply [flat|nested] 24+ messages in thread