* [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-14  6:22 [PATCHv4 0/4] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
@ 2023-06-14  6:22 ` Hannes Reinecke
  2023-06-17  6:28   ` Jakub Kicinski
  2023-06-17 14:08   ` Simon Horman
  0 siblings, 2 replies; 18+ messages in thread
From: Hannes Reinecke @ 2023-06-14  6:22 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski, netdev,
	Hannes Reinecke, Boris Pismenny
Implement ->read_sock() function for use with nvme-tcp.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Boris Pismenny <boris.pismenny@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
---
 net/tls/tls.h      |  2 ++
 net/tls/tls_main.c |  2 ++
 net/tls/tls_sw.c   | 71 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 75 insertions(+)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index d002c3af1966..ba55cd5c4913 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -114,6 +114,8 @@ bool tls_sw_sock_is_readable(struct sock *sk);
 ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
 			   struct pipe_inode_info *pipe,
 			   size_t len, unsigned int flags);
+int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
+		     sk_read_actor_t read_actor);
 
 int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 void tls_device_splice_eof(struct socket *sock);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 7b9c83dd7de2..1a062a8c6d33 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -963,10 +963,12 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
 	ops[TLS_BASE][TLS_SW  ] = ops[TLS_BASE][TLS_BASE];
 	ops[TLS_BASE][TLS_SW  ].splice_read	= tls_sw_splice_read;
 	ops[TLS_BASE][TLS_SW  ].poll		= tls_sk_poll;
+	ops[TLS_BASE][TLS_SW  ].read_sock	= tls_sw_read_sock;
 
 	ops[TLS_SW  ][TLS_SW  ] = ops[TLS_SW  ][TLS_BASE];
 	ops[TLS_SW  ][TLS_SW  ].splice_read	= tls_sw_splice_read;
 	ops[TLS_SW  ][TLS_SW  ].poll		= tls_sk_poll;
+	ops[TLS_SW  ][TLS_SW  ].read_sock	= tls_sw_read_sock;
 
 #ifdef CONFIG_TLS_DEVICE
 	ops[TLS_HW  ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 47eeff4d7d10..f0e0a0afb8c9 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2231,6 +2231,77 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 	goto splice_read_end;
 }
 
+int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
+		     sk_read_actor_t read_actor)
+{
+	struct tls_context *tls_ctx = tls_get_ctx(sk);
+	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+	struct strp_msg *rxm = NULL;
+	struct tls_msg *tlm;
+	struct sk_buff *skb;
+	ssize_t copied = 0;
+	int err, used;
+
+	if (!skb_queue_empty(&ctx->rx_list)) {
+		skb = __skb_dequeue(&ctx->rx_list);
+	} else {
+		struct tls_decrypt_arg darg;
+
+		err = tls_rx_rec_wait(sk, NULL, true, true);
+		if (err <= 0)
+			return err;
+
+		memset(&darg.inargs, 0, sizeof(darg.inargs));
+
+		err = tls_rx_one_record(sk, NULL, &darg);
+		if (err < 0) {
+			tls_err_abort(sk, -EBADMSG);
+			return err;
+		}
+
+		tls_rx_rec_done(ctx);
+		skb = darg.skb;
+	}
+
+	do {
+		rxm = strp_msg(skb);
+		tlm = tls_msg(skb);
+
+		/* read_sock does not support reading control messages */
+		if (tlm->control != TLS_RECORD_TYPE_DATA) {
+			err = -EINVAL;
+			goto read_sock_requeue;
+		}
+
+		used = read_actor(desc, skb, rxm->offset, rxm->full_len);
+		if (used <= 0) {
+			err = used;
+			goto read_sock_end;
+		}
+
+		copied += used;
+		if (used < rxm->full_len) {
+			rxm->offset += used;
+			rxm->full_len -= used;
+			if (!desc->count)
+				goto read_sock_requeue;
+		} else {
+			consume_skb(skb);
+			if (desc->count && !skb_queue_empty(&ctx->rx_list))
+				skb = __skb_dequeue(&ctx->rx_list);
+			else
+				skb = NULL;
+		}
+	} while (skb);
+
+read_sock_end:
+	return copied ? : err;
+
+read_sock_requeue:
+	__skb_queue_head(&ctx->rx_list, skb);
+	goto read_sock_end;
+}
+
 bool tls_sw_sock_is_readable(struct sock *sk)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
-- 
2.35.3
^ permalink raw reply related	[flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-14  6:22 ` [PATCH 4/4] net/tls: implement ->read_sock() Hannes Reinecke
@ 2023-06-17  6:28   ` Jakub Kicinski
  2023-06-17 14:08   ` Simon Horman
  1 sibling, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2023-06-17  6:28 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme, netdev,
	Boris Pismenny
On Wed, 14 Jun 2023 08:22:12 +0200 Hannes Reinecke wrote:
> Implement ->read_sock() function for use with nvme-tcp.
please take tls_rx_reader_lock() :S
-- 
pw-bot: cr
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-14  6:22 ` [PATCH 4/4] net/tls: implement ->read_sock() Hannes Reinecke
  2023-06-17  6:28   ` Jakub Kicinski
@ 2023-06-17 14:08   ` Simon Horman
  2023-06-19  8:16     ` Dan Carpenter
  1 sibling, 1 reply; 18+ messages in thread
From: Simon Horman @ 2023-06-17 14:08 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme,
	Jakub Kicinski, netdev, Boris Pismenny, Dan Carpenter
+ Dan Carpenter
On Wed, Jun 14, 2023 at 08:22:12AM +0200, Hannes Reinecke wrote:
...
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 47eeff4d7d10..f0e0a0afb8c9 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -2231,6 +2231,77 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
>  	goto splice_read_end;
>  }
>  
> +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
> +		     sk_read_actor_t read_actor)
> +{
> +	struct tls_context *tls_ctx = tls_get_ctx(sk);
> +	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> +	struct strp_msg *rxm = NULL;
> +	struct tls_msg *tlm;
> +	struct sk_buff *skb;
> +	ssize_t copied = 0;
> +	int err, used;
> +
> +	if (!skb_queue_empty(&ctx->rx_list)) {
> +		skb = __skb_dequeue(&ctx->rx_list);
> +	} else {
> +		struct tls_decrypt_arg darg;
> +
> +		err = tls_rx_rec_wait(sk, NULL, true, true);
> +		if (err <= 0)
> +			return err;
> +
> +		memset(&darg.inargs, 0, sizeof(darg.inargs));
> +
> +		err = tls_rx_one_record(sk, NULL, &darg);
> +		if (err < 0) {
> +			tls_err_abort(sk, -EBADMSG);
> +			return err;
> +		}
> +
> +		tls_rx_rec_done(ctx);
> +		skb = darg.skb;
> +	}
> +
> +	do {
> +		rxm = strp_msg(skb);
> +		tlm = tls_msg(skb);
> +
> +		/* read_sock does not support reading control messages */
> +		if (tlm->control != TLS_RECORD_TYPE_DATA) {
> +			err = -EINVAL;
> +			goto read_sock_requeue;
> +		}
> +
> +		used = read_actor(desc, skb, rxm->offset, rxm->full_len);
> +		if (used <= 0) {
> +			err = used;
> +			goto read_sock_end;
> +		}
> +
> +		copied += used;
> +		if (used < rxm->full_len) {
> +			rxm->offset += used;
> +			rxm->full_len -= used;
> +			if (!desc->count)
> +				goto read_sock_requeue;
> +		} else {
> +			consume_skb(skb);
> +			if (desc->count && !skb_queue_empty(&ctx->rx_list))
> +				skb = __skb_dequeue(&ctx->rx_list);
> +			else
> +				skb = NULL;
> +		}
> +	} while (skb);
> +
> +read_sock_end:
> +	return copied ? : err;
Hi Hannes,
I'm of two minds about raising this or not, but in any case here I am
doing so.
Both gcc-12 [-Wmaybe-uninitialized] and Smatch warn that err may be
used uninitialised on the line above.
My own analysis is that it cannot occur: I think it is always the case
that either copied is non-zero or err is initialised. But still
the warning is there. And in future it may create noise that may
crowds out real problems.
It also seems to imply that the path is somewhat complex,
and hard to analyse: certainly it took my small brain a while.
So I do wonder if there is a value in ensuring err is always set to
something appropriate, perhaps set to -EINVAL above the do loop.
I guess that in the end I decided it was best to put this thinking in the
open. And let you decide.
> +
> +read_sock_requeue:
> +	__skb_queue_head(&ctx->rx_list, skb);
> +	goto read_sock_end;
> +}
> +
>  bool tls_sw_sock_is_readable(struct sock *sk)
>  {
>  	struct tls_context *tls_ctx = tls_get_ctx(sk);
> -- 
> 2.35.3
> 
> 
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-17 14:08   ` Simon Horman
@ 2023-06-19  8:16     ` Dan Carpenter
  0 siblings, 0 replies; 18+ messages in thread
From: Dan Carpenter @ 2023-06-19  8:16 UTC (permalink / raw)
  To: Simon Horman
  Cc: Hannes Reinecke, Christoph Hellwig, Sagi Grimberg, Keith Busch,
	linux-nvme, Jakub Kicinski, netdev, Boris Pismenny
On Sat, Jun 17, 2023 at 04:08:08PM +0200, Simon Horman wrote:
> + Dan Carpenter
> 
> On Wed, Jun 14, 2023 at 08:22:12AM +0200, Hannes Reinecke wrote:
> 
> ...
> 
> > +
> > +read_sock_end:
> > +	return copied ? : err;
> 
> Hi Hannes,
> 
> I'm of two minds about raising this or not, but in any case here I am
> doing so.
> 
> Both gcc-12 [-Wmaybe-uninitialized] and Smatch warn that err may be
> used uninitialised on the line above.
> 
> My own analysis is that it cannot occur: I think it is always the case
> that either copied is non-zero or err is initialised.
Hm...  Yeah.  This is a bug in how Smatch handles:
	return copied ? : err;
Smatch wants every return to have a simple literal or variable.  So if
the return is complicated it gets changed into:
	fake_variable = copied ? : err;
	return fake_variable;
Smatch translates this to:
	if (!(fake_variable = copied))
		fake_variable = err;
[ Here fake_variable doesn't have side effects but under other
  circumstances this transformation could cause double evaluate side
  effects bugs.  So that's another bug in Smatch. ]
Then Smatch parses the fake_variable = copied condition as:
	fake_variable = copied;
	if (fake_variable) {
The problem is that the type of fake_variable is based on the type of
tls_sw_read_sock() so it is a 32bit while copied is a 64bit (of unknown
value).  So Smatch says, "Just because copied is non-zero doesn't mean
fake_variable is non-zero because the value might get truncated when
we cast away the high 32 bits."
This not a serious proposal but a just to demonstrate that this is
what happens there are two ways to silence this warning.  Changing the
type of tls_sw_read_sock() to long.  Or change the code to:
	if (copied)
		return copied;
	return err;
Probably the right thing is to create a second fake_copied variable
based on typeof(copied).
	fake_copied = copied;
	if (fake_copied)
		fake_return_variable = fake_copied;
	else
		fake_return_variable = err;
It's a doable thing.  Plus now there are no double evaluate side effects
bugs.  I have written this code and it silences the warning, but I'll
test it out tonight to see what breaks.
regards,
dan carpenter
^ permalink raw reply	[flat|nested] 18+ messages in thread
* [PATCHv5 0/4] net/tls: fixes for NVMe-over-TLS
@ 2023-06-20 10:28 Hannes Reinecke
  2023-06-20 10:28 ` [PATCH 1/4] net/tls: handle MSG_EOR for tls_sw TX flow Hannes Reinecke
                   ` (3 more replies)
  0 siblings, 4 replies; 18+ messages in thread
From: Hannes Reinecke @ 2023-06-20 10:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke
Hi all,
here are some small fixes to get NVMe-over-TLS up and running.
The first thre are just minor modifications to have MSG_EOR handled
for TLS (and adding a test for it), but the last implements the
->read_sock() callback for tls_sw and I guess could do with some
reviews.
It does work with my NVMe-TLS test harness, but what do I know :-)
As usual, comments and reviews are welcome.
Changes to the original submission:
- Add a testcase for MSG_EOR handling
Changes to v2:
- Bail out on conflicting message flags
- Rework flag handling
Changes to v3:
- Return -EINVAL on conflicting flags
- Rebase on top of net-next
Changes to v4:
- Add tlx_rx_reader_lock() to read_sock
- Add MSG_EOR handling to tls_sw_readpages()
Hannes Reinecke (4):
  net/tls: handle MSG_EOR for tls_sw TX flow
  net/tls: handle MSG_EOR for tls_device TX flow
  selftests/net/tls: add test for MSG_EOR
  net/tls: implement ->read_sock()
 net/tls/tls.h                     |  2 +
 net/tls/tls_device.c              | 25 +++++++--
 net/tls/tls_main.c                |  2 +
 net/tls/tls_sw.c                  | 87 +++++++++++++++++++++++++++++--
 tools/testing/selftests/net/tls.c | 11 ++++
 5 files changed, 119 insertions(+), 8 deletions(-)
-- 
2.35.3
^ permalink raw reply	[flat|nested] 18+ messages in thread
* [PATCH 1/4] net/tls: handle MSG_EOR for tls_sw TX flow
  2023-06-20 10:28 [PATCHv5 0/4] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
@ 2023-06-20 10:28 ` Hannes Reinecke
  2023-06-20 10:28 ` [PATCH 2/4] net/tls: handle MSG_EOR for tls_device " Hannes Reinecke
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2023-06-20 10:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke
tls_sw_sendmsg() / tls_do_sw_sendpage() already handles
MSG_MORE / MSG_SENDPAGE_NOTLAST, but bails out on MSG_EOR.
But seeing that MSG_EOR is basically the opposite of
MSG_MORE / MSG_SENDPAGE_NOTLAST this patch adds handling
MSG_EOR by treating it as the negation of MSG_MORE.
And erroring out if MSG_EOR is specified with either
MSG_MORE or MSG_SENDPAGE_NOTLAST.
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 net/tls/tls_sw.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 319f61590d2c..97379e34c997 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -984,6 +984,9 @@ static int tls_sw_sendmsg_locked(struct sock *sk, struct msghdr *msg,
 	int ret = 0;
 	int pending;
 
+	if (!eor && (msg->msg_flags & MSG_EOR))
+		return -EINVAL;
+
 	if (unlikely(msg->msg_controllen)) {
 		ret = tls_process_cmsg(sk, msg, &record_type);
 		if (ret) {
@@ -1193,7 +1196,7 @@ int tls_sw_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	int ret;
 
 	if (msg->msg_flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
-			       MSG_CMSG_COMPAT | MSG_SPLICE_PAGES |
+			       MSG_CMSG_COMPAT | MSG_SPLICE_PAGES | MSG_EOR |
 			       MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
 		return -EOPNOTSUPP;
 
@@ -1287,7 +1290,7 @@ int tls_sw_sendpage_locked(struct sock *sk, struct page *page,
 	struct bio_vec bvec;
 	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
 
-	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_EOR |
 		      MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY |
 		      MSG_NO_SHARED_FRAGS))
 		return -EOPNOTSUPP;
@@ -1305,7 +1308,7 @@ int tls_sw_sendpage(struct sock *sk, struct page *page,
 	struct bio_vec bvec;
 	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
 
-	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL |
+	if (flags & ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_EOR |
 		      MSG_SENDPAGE_NOTLAST | MSG_SENDPAGE_NOPOLICY))
 		return -EOPNOTSUPP;
 	if (flags & MSG_SENDPAGE_NOTLAST)
-- 
2.35.3
^ permalink raw reply related	[flat|nested] 18+ messages in thread
* [PATCH 2/4] net/tls: handle MSG_EOR for tls_device TX flow
  2023-06-20 10:28 [PATCHv5 0/4] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
  2023-06-20 10:28 ` [PATCH 1/4] net/tls: handle MSG_EOR for tls_sw TX flow Hannes Reinecke
@ 2023-06-20 10:28 ` Hannes Reinecke
  2023-06-20 17:12   ` Sabrina Dubroca
  2023-06-20 10:28 ` [PATCH 3/4] selftests/net/tls: add test for MSG_EOR Hannes Reinecke
  2023-06-20 10:28 ` [PATCH 4/4] net/tls: implement ->read_sock() Hannes Reinecke
  3 siblings, 1 reply; 18+ messages in thread
From: Hannes Reinecke @ 2023-06-20 10:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke
tls_push_data() MSG_MORE / MSG_SENDPAGE_NOTLAST, but bails
out on MSG_EOR.
But seeing that MSG_EOR is basically the opposite of
MSG_MORE / MSG_SENDPAGE_NOTLAST this patch adds handling
MSG_EOR by treating it as the absence of MSG_MORE.
Consequently we should return an error when both are set.
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
Signed-off-by: Hannes Reinecke <hare@suse.de>
---
 net/tls/tls_device.c | 25 ++++++++++++++++++++-----
 1 file changed, 20 insertions(+), 5 deletions(-)
diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
index b82770f68807..ebefd148ecf5 100644
--- a/net/tls/tls_device.c
+++ b/net/tls/tls_device.c
@@ -440,11 +440,6 @@ static int tls_push_data(struct sock *sk,
 	int copy, rc = 0;
 	long timeo;
 
-	if (flags &
-	    ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST |
-	      MSG_SPLICE_PAGES))
-		return -EOPNOTSUPP;
-
 	if (unlikely(sk->sk_err))
 		return -sk->sk_err;
 
@@ -536,6 +531,10 @@ static int tls_push_data(struct sock *sk,
 				more = true;
 				break;
 			}
+			if (flags & MSG_EOR) {
+				more = false;
+				break;
+			}
 
 			done = true;
 		}
@@ -582,6 +581,14 @@ int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size)
 	if (!tls_ctx->zerocopy_sendfile)
 		msg->msg_flags &= ~MSG_SPLICE_PAGES;
 
+	if (msg->msg_flags &
+	    ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST |
+	      MSG_SPLICE_PAGES | MSG_EOR))
+		return -EOPNOTSUPP;
+
+	if ((msg->msg_flags & (MSG_MORE | MSG_EOR)) == (MSG_MORE | MSG_EOR))
+		return -EOPNOTSUPP;
+
 	mutex_lock(&tls_ctx->tx_lock);
 	lock_sock(sk);
 
@@ -627,9 +634,17 @@ int tls_device_sendpage(struct sock *sk, struct page *page,
 	struct bio_vec bvec;
 	struct msghdr msg = { .msg_flags = flags | MSG_SPLICE_PAGES, };
 
+	if (flags &
+	    ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST |
+	      MSG_SPLICE_PAGES | MSG_EOR))
+		return -EOPNOTSUPP;
+
 	if (flags & MSG_SENDPAGE_NOTLAST)
 		msg.msg_flags |= MSG_MORE;
 
+	if ((msg.msg_flags & (MSG_MORE | MSG_EOR)) == (MSG_MORE | MSG_EOR))
+		return -EINVAL;
+
 	if (flags & MSG_OOB)
 		return -EOPNOTSUPP;
 
-- 
2.35.3
^ permalink raw reply related	[flat|nested] 18+ messages in thread
* [PATCH 3/4] selftests/net/tls: add test for MSG_EOR
  2023-06-20 10:28 [PATCHv5 0/4] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
  2023-06-20 10:28 ` [PATCH 1/4] net/tls: handle MSG_EOR for tls_sw TX flow Hannes Reinecke
  2023-06-20 10:28 ` [PATCH 2/4] net/tls: handle MSG_EOR for tls_device " Hannes Reinecke
@ 2023-06-20 10:28 ` Hannes Reinecke
  2023-06-20 10:28 ` [PATCH 4/4] net/tls: implement ->read_sock() Hannes Reinecke
  3 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2023-06-20 10:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke
As the recent patch is modifying the behaviour for TLS re MSG_EOR
handling we should be having a test for it.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
---
 tools/testing/selftests/net/tls.c | 11 +++++++++++
 1 file changed, 11 insertions(+)
diff --git a/tools/testing/selftests/net/tls.c b/tools/testing/selftests/net/tls.c
index eccea9845c65..e8540fa5b310 100644
--- a/tools/testing/selftests/net/tls.c
+++ b/tools/testing/selftests/net/tls.c
@@ -477,6 +477,17 @@ TEST_F(tls, msg_more_unsent)
 	EXPECT_EQ(recv(self->cfd, buf, send_len, MSG_DONTWAIT), -1);
 }
 
+TEST_F(tls, msg_eor)
+{
+	char const *test_str = "test_read";
+	int send_len = 10;
+	char buf[10];
+
+	EXPECT_EQ(send(self->fd, test_str, send_len, MSG_EOR), send_len);
+	EXPECT_EQ(recv(self->cfd, buf, send_len, MSG_WAITALL), send_len);
+	EXPECT_EQ(memcmp(buf, test_str, send_len), 0);
+}
+
 TEST_F(tls, sendmsg_single)
 {
 	struct msghdr msg;
-- 
2.35.3
^ permalink raw reply related	[flat|nested] 18+ messages in thread
* [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-20 10:28 [PATCHv5 0/4] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
                   ` (2 preceding siblings ...)
  2023-06-20 10:28 ` [PATCH 3/4] selftests/net/tls: add test for MSG_EOR Hannes Reinecke
@ 2023-06-20 10:28 ` Hannes Reinecke
  2023-06-20 13:21   ` Sagi Grimberg
  3 siblings, 1 reply; 18+ messages in thread
From: Hannes Reinecke @ 2023-06-20 10:28 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Sagi Grimberg, Keith Busch, linux-nvme, Jakub Kicinski,
	Eric Dumazet, Paolo Abeni, netdev, Hannes Reinecke,
	Boris Pismenny
Implement ->read_sock() function for use with nvme-tcp.
Signed-off-by: Hannes Reinecke <hare@suse.de>
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
Cc: Boris Pismenny <boris.pismenny@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: netdev@vger.kernel.org
---
 net/tls/tls.h      |  2 ++
 net/tls/tls_main.c |  2 ++
 net/tls/tls_sw.c   | 78 ++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 82 insertions(+)
diff --git a/net/tls/tls.h b/net/tls/tls.h
index d002c3af1966..ba55cd5c4913 100644
--- a/net/tls/tls.h
+++ b/net/tls/tls.h
@@ -114,6 +114,8 @@ bool tls_sw_sock_is_readable(struct sock *sk);
 ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
 			   struct pipe_inode_info *pipe,
 			   size_t len, unsigned int flags);
+int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
+		     sk_read_actor_t read_actor);
 
 int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
 void tls_device_splice_eof(struct socket *sock);
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 7b9c83dd7de2..1a062a8c6d33 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -963,10 +963,12 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
 	ops[TLS_BASE][TLS_SW  ] = ops[TLS_BASE][TLS_BASE];
 	ops[TLS_BASE][TLS_SW  ].splice_read	= tls_sw_splice_read;
 	ops[TLS_BASE][TLS_SW  ].poll		= tls_sk_poll;
+	ops[TLS_BASE][TLS_SW  ].read_sock	= tls_sw_read_sock;
 
 	ops[TLS_SW  ][TLS_SW  ] = ops[TLS_SW  ][TLS_BASE];
 	ops[TLS_SW  ][TLS_SW  ].splice_read	= tls_sw_splice_read;
 	ops[TLS_SW  ][TLS_SW  ].poll		= tls_sk_poll;
+	ops[TLS_SW  ][TLS_SW  ].read_sock	= tls_sw_read_sock;
 
 #ifdef CONFIG_TLS_DEVICE
 	ops[TLS_HW  ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 97379e34c997..e918c98bbeb2 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -2231,6 +2231,84 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
 	goto splice_read_end;
 }
 
+int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
+		     sk_read_actor_t read_actor)
+{
+	struct tls_context *tls_ctx = tls_get_ctx(sk);
+	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
+	struct strp_msg *rxm = NULL;
+	struct tls_msg *tlm;
+	struct sk_buff *skb;
+	ssize_t copied = 0;
+	int err, used;
+
+	err = tls_rx_reader_lock(sk, ctx, true);
+	if (err < 0)
+		return err;
+	if (!skb_queue_empty(&ctx->rx_list)) {
+		skb = __skb_dequeue(&ctx->rx_list);
+	} else {
+		struct tls_decrypt_arg darg;
+
+		err = tls_rx_rec_wait(sk, NULL, true, true);
+		if (err <= 0) {
+			tls_rx_reader_unlock(sk, ctx);
+			return err;
+		}
+
+		memset(&darg.inargs, 0, sizeof(darg.inargs));
+
+		err = tls_rx_one_record(sk, NULL, &darg);
+		if (err < 0) {
+			tls_err_abort(sk, -EBADMSG);
+			tls_rx_reader_unlock(sk, ctx);
+			return err;
+		}
+
+		tls_rx_rec_done(ctx);
+		skb = darg.skb;
+	}
+
+	do {
+		rxm = strp_msg(skb);
+		tlm = tls_msg(skb);
+
+		/* read_sock does not support reading control messages */
+		if (tlm->control != TLS_RECORD_TYPE_DATA) {
+			err = -EINVAL;
+			goto read_sock_requeue;
+		}
+
+		used = read_actor(desc, skb, rxm->offset, rxm->full_len);
+		if (used <= 0) {
+			err = used;
+			goto read_sock_end;
+		}
+
+		copied += used;
+		if (used < rxm->full_len) {
+			rxm->offset += used;
+			rxm->full_len -= used;
+			if (!desc->count)
+				goto read_sock_requeue;
+		} else {
+			consume_skb(skb);
+			if (desc->count && !skb_queue_empty(&ctx->rx_list))
+				skb = __skb_dequeue(&ctx->rx_list);
+			else
+				skb = NULL;
+		}
+	} while (skb);
+
+read_sock_end:
+	tls_rx_reader_unlock(sk, ctx);
+	return copied ? : err;
+
+read_sock_requeue:
+	__skb_queue_head(&ctx->rx_list, skb);
+	goto read_sock_end;
+}
+
 bool tls_sw_sock_is_readable(struct sock *sk)
 {
 	struct tls_context *tls_ctx = tls_get_ctx(sk);
-- 
2.35.3
^ permalink raw reply related	[flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-20 10:28 ` [PATCH 4/4] net/tls: implement ->read_sock() Hannes Reinecke
@ 2023-06-20 13:21   ` Sagi Grimberg
  2023-06-20 17:08     ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2023-06-20 13:21 UTC (permalink / raw)
  To: Hannes Reinecke, Christoph Hellwig
  Cc: Keith Busch, linux-nvme, Jakub Kicinski, Eric Dumazet,
	Paolo Abeni, netdev, Boris Pismenny
> Implement ->read_sock() function for use with nvme-tcp.
> 
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
> Cc: Boris Pismenny <boris.pismenny@gmail.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: netdev@vger.kernel.org
> ---
>   net/tls/tls.h      |  2 ++
>   net/tls/tls_main.c |  2 ++
>   net/tls/tls_sw.c   | 78 ++++++++++++++++++++++++++++++++++++++++++++++
>   3 files changed, 82 insertions(+)
> 
> diff --git a/net/tls/tls.h b/net/tls/tls.h
> index d002c3af1966..ba55cd5c4913 100644
> --- a/net/tls/tls.h
> +++ b/net/tls/tls.h
> @@ -114,6 +114,8 @@ bool tls_sw_sock_is_readable(struct sock *sk);
>   ssize_t tls_sw_splice_read(struct socket *sock, loff_t *ppos,
>   			   struct pipe_inode_info *pipe,
>   			   size_t len, unsigned int flags);
> +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
> +		     sk_read_actor_t read_actor);
>   
>   int tls_device_sendmsg(struct sock *sk, struct msghdr *msg, size_t size);
>   void tls_device_splice_eof(struct socket *sock);
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index 7b9c83dd7de2..1a062a8c6d33 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -963,10 +963,12 @@ static void build_proto_ops(struct proto_ops ops[TLS_NUM_CONFIG][TLS_NUM_CONFIG]
>   	ops[TLS_BASE][TLS_SW  ] = ops[TLS_BASE][TLS_BASE];
>   	ops[TLS_BASE][TLS_SW  ].splice_read	= tls_sw_splice_read;
>   	ops[TLS_BASE][TLS_SW  ].poll		= tls_sk_poll;
> +	ops[TLS_BASE][TLS_SW  ].read_sock	= tls_sw_read_sock;
>   
>   	ops[TLS_SW  ][TLS_SW  ] = ops[TLS_SW  ][TLS_BASE];
>   	ops[TLS_SW  ][TLS_SW  ].splice_read	= tls_sw_splice_read;
>   	ops[TLS_SW  ][TLS_SW  ].poll		= tls_sk_poll;
> +	ops[TLS_SW  ][TLS_SW  ].read_sock	= tls_sw_read_sock;
>   
>   #ifdef CONFIG_TLS_DEVICE
>   	ops[TLS_HW  ][TLS_BASE] = ops[TLS_BASE][TLS_BASE];
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 97379e34c997..e918c98bbeb2 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -2231,6 +2231,84 @@ ssize_t tls_sw_splice_read(struct socket *sock,  loff_t *ppos,
>   	goto splice_read_end;
>   }
>   
> +int tls_sw_read_sock(struct sock *sk, read_descriptor_t *desc,
> +		     sk_read_actor_t read_actor)
> +{
> +	struct tls_context *tls_ctx = tls_get_ctx(sk);
> +	struct tls_sw_context_rx *ctx = tls_sw_ctx_rx(tls_ctx);
> +	struct strp_msg *rxm = NULL;
> +	struct tls_msg *tlm;
> +	struct sk_buff *skb;
> +	ssize_t copied = 0;
> +	int err, used;
> +
> +	err = tls_rx_reader_lock(sk, ctx, true);
> +	if (err < 0)
> +		return err;
Unlike recvmsg or splice_read, the caller of read_sock is assumed to
have the socket locked, and tls_rx_reader_lock also calls lock_sock,
how is this not a deadlock?
I'm not exactly clear why the lock is needed here or what is the subtle
distinction between tls_rx_reader_lock and what lock_sock provides.
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-20 13:21   ` Sagi Grimberg
@ 2023-06-20 17:08     ` Jakub Kicinski
  2023-06-21  6:44       ` Hannes Reinecke
  0 siblings, 1 reply; 18+ messages in thread
From: Jakub Kicinski @ 2023-06-20 17:08 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Hannes Reinecke, Christoph Hellwig, Keith Busch, linux-nvme,
	Eric Dumazet, Paolo Abeni, netdev, Boris Pismenny
On Tue, 20 Jun 2023 16:21:22 +0300 Sagi Grimberg wrote:
> > +	err = tls_rx_reader_lock(sk, ctx, true);
> > +	if (err < 0)
> > +		return err;  
> 
> Unlike recvmsg or splice_read, the caller of read_sock is assumed to
> have the socket locked, and tls_rx_reader_lock also calls lock_sock,
> how is this not a deadlock?
Yeah :|
> I'm not exactly clear why the lock is needed here or what is the subtle
> distinction between tls_rx_reader_lock and what lock_sock provides.
It's a bit of a workaround for the consistency of the data stream.
There's bunch of state in the TLS ULP and waiting for mem or data
releases and re-takes the socket lock. So to stop the flow annoying
corner case races I slapped a lock around all of the reader.
IMHO depending on the socket lock for anything non-trivial and outside
of the socket itself is a bad idea in general.
The immediate need at the time was that if you did a read() and someone
else did a peek() at the same time from a stream of A B C D you may read
A D B C.
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] net/tls: handle MSG_EOR for tls_device TX flow
  2023-06-20 10:28 ` [PATCH 2/4] net/tls: handle MSG_EOR for tls_device " Hannes Reinecke
@ 2023-06-20 17:12   ` Sabrina Dubroca
  2023-06-21  6:09     ` Hannes Reinecke
  0 siblings, 1 reply; 18+ messages in thread
From: Sabrina Dubroca @ 2023-06-20 17:12 UTC (permalink / raw)
  To: Hannes Reinecke
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni, netdev
2023-06-20, 12:28:54 +0200, Hannes Reinecke wrote:
> tls_push_data() MSG_MORE / MSG_SENDPAGE_NOTLAST, but bails
> out on MSG_EOR.
> But seeing that MSG_EOR is basically the opposite of
> MSG_MORE / MSG_SENDPAGE_NOTLAST this patch adds handling
> MSG_EOR by treating it as the absence of MSG_MORE.
> Consequently we should return an error when both are set.
> 
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: netdev@vger.kernel.org
> Signed-off-by: Hannes Reinecke <hare@suse.de>
> ---
>  net/tls/tls_device.c | 25 ++++++++++++++++++++-----
>  1 file changed, 20 insertions(+), 5 deletions(-)
> 
> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
> index b82770f68807..ebefd148ecf5 100644
> --- a/net/tls/tls_device.c
> +++ b/net/tls/tls_device.c
> @@ -440,11 +440,6 @@ static int tls_push_data(struct sock *sk,
>  	int copy, rc = 0;
>  	long timeo;
>  
> -	if (flags &
> -	    ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST |
> -	      MSG_SPLICE_PAGES))
> -		return -EOPNOTSUPP;
> -
>  	if (unlikely(sk->sk_err))
>  		return -sk->sk_err;
>  
> @@ -536,6 +531,10 @@ static int tls_push_data(struct sock *sk,
>  				more = true;
>  				break;
>  			}
> +			if (flags & MSG_EOR) {
> +				more = false;
> +				break;
Why the break here? We don't want to close and push the record in that
case? (the "if (done || ...)" block just below)
> +			}
>  
>  			done = true;
>  		}
Thanks,
-- 
Sabrina
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH 2/4] net/tls: handle MSG_EOR for tls_device TX flow
  2023-06-20 17:12   ` Sabrina Dubroca
@ 2023-06-21  6:09     ` Hannes Reinecke
  0 siblings, 0 replies; 18+ messages in thread
From: Hannes Reinecke @ 2023-06-21  6:09 UTC (permalink / raw)
  To: Sabrina Dubroca
  Cc: Christoph Hellwig, Sagi Grimberg, Keith Busch, linux-nvme,
	Jakub Kicinski, Eric Dumazet, Paolo Abeni, netdev
On 6/20/23 19:12, Sabrina Dubroca wrote:
> 2023-06-20, 12:28:54 +0200, Hannes Reinecke wrote:
>> tls_push_data() MSG_MORE / MSG_SENDPAGE_NOTLAST, but bails
>> out on MSG_EOR.
>> But seeing that MSG_EOR is basically the opposite of
>> MSG_MORE / MSG_SENDPAGE_NOTLAST this patch adds handling
>> MSG_EOR by treating it as the absence of MSG_MORE.
>> Consequently we should return an error when both are set.
>>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: netdev@vger.kernel.org
>> Signed-off-by: Hannes Reinecke <hare@suse.de>
>> ---
>>   net/tls/tls_device.c | 25 ++++++++++++++++++++-----
>>   1 file changed, 20 insertions(+), 5 deletions(-)
>>
>> diff --git a/net/tls/tls_device.c b/net/tls/tls_device.c
>> index b82770f68807..ebefd148ecf5 100644
>> --- a/net/tls/tls_device.c
>> +++ b/net/tls/tls_device.c
>> @@ -440,11 +440,6 @@ static int tls_push_data(struct sock *sk,
>>   	int copy, rc = 0;
>>   	long timeo;
>>   
>> -	if (flags &
>> -	    ~(MSG_MORE | MSG_DONTWAIT | MSG_NOSIGNAL | MSG_SENDPAGE_NOTLAST |
>> -	      MSG_SPLICE_PAGES))
>> -		return -EOPNOTSUPP;
>> -
>>   	if (unlikely(sk->sk_err))
>>   		return -sk->sk_err;
>>   
>> @@ -536,6 +531,10 @@ static int tls_push_data(struct sock *sk,
>>   				more = true;
>>   				break;
>>   			}
>> +			if (flags & MSG_EOR) {
>> +				more = false;
>> +				break;
> 
> Why the break here? We don't want to close and push the record in that
> case? (the "if (done || ...)" block just below)
> 
Ah, yes, you are correct. Will be fixing it.
Cheers,
Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-20 17:08     ` Jakub Kicinski
@ 2023-06-21  6:44       ` Hannes Reinecke
  2023-06-21  8:39         ` Sagi Grimberg
  0 siblings, 1 reply; 18+ messages in thread
From: Hannes Reinecke @ 2023-06-21  6:44 UTC (permalink / raw)
  To: Jakub Kicinski, Sagi Grimberg
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Eric Dumazet,
	Paolo Abeni, netdev, Boris Pismenny
On 6/20/23 19:08, Jakub Kicinski wrote:
> On Tue, 20 Jun 2023 16:21:22 +0300 Sagi Grimberg wrote:
>>> +	err = tls_rx_reader_lock(sk, ctx, true);
>>> +	if (err < 0)
>>> +		return err;
>>
>> Unlike recvmsg or splice_read, the caller of read_sock is assumed to
>> have the socket locked, and tls_rx_reader_lock also calls lock_sock,
>> how is this not a deadlock?
> 
> Yeah :|
> 
>> I'm not exactly clear why the lock is needed here or what is the subtle
>> distinction between tls_rx_reader_lock and what lock_sock provides.
> 
> It's a bit of a workaround for the consistency of the data stream.
> There's bunch of state in the TLS ULP and waiting for mem or data
> releases and re-takes the socket lock. So to stop the flow annoying
> corner case races I slapped a lock around all of the reader.
> 
> IMHO depending on the socket lock for anything non-trivial and outside
> of the socket itself is a bad idea in general.
> 
> The immediate need at the time was that if you did a read() and someone
> else did a peek() at the same time from a stream of A B C D you may read
> A D B C.
Leaving me ever so confused.
read_sock() is a generic interface; we cannot require a protocol 
specific lock before calling it.
What to do now?
Drop the tls_rx_read_lock from read_sock() again?
Cheers,
Hannes
-- 
Dr. Hannes Reinecke                Kernel Storage Architect
hare@suse.de                              +49 911 74053 688
SUSE Software Solutions GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), Geschäftsführer: Ivo Totev, Andrew
Myers, Andrew McDonald, Martje Boudien Moerman
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-21  6:44       ` Hannes Reinecke
@ 2023-06-21  8:39         ` Sagi Grimberg
  2023-06-21  9:08           ` Hannes Reinecke
  0 siblings, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2023-06-21  8:39 UTC (permalink / raw)
  To: Hannes Reinecke, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Eric Dumazet,
	Paolo Abeni, netdev, Boris Pismenny
>> On Tue, 20 Jun 2023 16:21:22 +0300 Sagi Grimberg wrote:
>>>> +    err = tls_rx_reader_lock(sk, ctx, true);
>>>> +    if (err < 0)
>>>> +        return err;
>>>
>>> Unlike recvmsg or splice_read, the caller of read_sock is assumed to
>>> have the socket locked, and tls_rx_reader_lock also calls lock_sock,
>>> how is this not a deadlock?
>>
>> Yeah :|
>>
>>> I'm not exactly clear why the lock is needed here or what is the subtle
>>> distinction between tls_rx_reader_lock and what lock_sock provides.
>>
>> It's a bit of a workaround for the consistency of the data stream.
>> There's bunch of state in the TLS ULP and waiting for mem or data
>> releases and re-takes the socket lock. So to stop the flow annoying
>> corner case races I slapped a lock around all of the reader.
>>
>> IMHO depending on the socket lock for anything non-trivial and outside
>> of the socket itself is a bad idea in general.
>>
>> The immediate need at the time was that if you did a read() and someone
>> else did a peek() at the same time from a stream of A B C D you may read
>> A D B C.
> 
> Leaving me ever so confused.
> 
> read_sock() is a generic interface; we cannot require a protocol 
> specific lock before calling it.
> 
> What to do now?
> Drop the tls_rx_read_lock from read_sock() again?
Probably just need to synchronize the readers by splitting that from
tls_rx_reader_lock:
--
diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
index 53f944e6d8ef..53404c3fdcc6 100644
--- a/net/tls/tls_sw.c
+++ b/net/tls/tls_sw.c
@@ -1845,13 +1845,10 @@ tls_read_flush_backlog(struct sock *sk, struct 
tls_prot_info *prot,
         return sk_flush_backlog(sk);
  }
-static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx 
*ctx,
-                             bool nonblock)
+static int tls_rx_reader_acquire(struct sock *sk, struct 
tls_sw_context_rx *ctx,
+                            bool nonblock)
  {
         long timeo;
-       int err;
-
-       lock_sock(sk);
         timeo = sock_rcvtimeo(sk, nonblock);
@@ -1865,26 +1862,30 @@ static int tls_rx_reader_lock(struct sock *sk, 
struct tls_sw_context_rx *ctx,
                               !READ_ONCE(ctx->reader_present), &wait);
                 remove_wait_queue(&ctx->wq, &wait);
-               if (timeo <= 0) {
-                       err = -EAGAIN;
-                       goto err_unlock;
-               }
-               if (signal_pending(current)) {
-                       err = sock_intr_errno(timeo);
-                       goto err_unlock;
-               }
+               if (timeo <= 0)
+                       return -EAGAIN;
+               if (signal_pending(current))
+                       return sock_intr_errno(timeo);
         }
         WRITE_ONCE(ctx->reader_present, 1);
         return 0;
+}
-err_unlock:
-       release_sock(sk);
+static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx 
*ctx,
+                             bool nonblock)
+{
+       int err;
+
+       lock_sock(sk);
+       err = tls_rx_reader_acquire(sk, ctx, nonblock);
+       if (err)
+               release_sock(sk);
         return err;
  }
-static void tls_rx_reader_unlock(struct sock *sk, struct 
tls_sw_context_rx *ctx)
+static void tls_rx_reader_release(struct sock *sk, struct 
tls_sw_context_rx *ctx)
  {
         if (unlikely(ctx->reader_contended)) {
                 if (wq_has_sleeper(&ctx->wq))
@@ -1896,6 +1897,11 @@ static void tls_rx_reader_unlock(struct sock *sk, 
struct tls_sw_context_rx *ctx)
         }
         WRITE_ONCE(ctx->reader_present, 0);
+}
+
+static void tls_rx_reader_unlock(struct sock *sk, struct 
tls_sw_context_rx *ctx)
+{
+       tls_rx_reader_release(sk, ctx);
         release_sock(sk);
  }
--
Then read_sock can just acquire/release.
^ permalink raw reply related	[flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-21  8:39         ` Sagi Grimberg
@ 2023-06-21  9:08           ` Hannes Reinecke
  2023-06-21  9:49             ` Sagi Grimberg
  0 siblings, 1 reply; 18+ messages in thread
From: Hannes Reinecke @ 2023-06-21  9:08 UTC (permalink / raw)
  To: Sagi Grimberg, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Eric Dumazet,
	Paolo Abeni, netdev, Boris Pismenny
On 6/21/23 10:39, Sagi Grimberg wrote:
> 
>>> On Tue, 20 Jun 2023 16:21:22 +0300 Sagi Grimberg wrote:
>>>>> +    err = tls_rx_reader_lock(sk, ctx, true);
>>>>> +    if (err < 0)
>>>>> +        return err;
>>>>
>>>> Unlike recvmsg or splice_read, the caller of read_sock is assumed to
>>>> have the socket locked, and tls_rx_reader_lock also calls lock_sock,
>>>> how is this not a deadlock?
>>>
>>> Yeah :|
>>>
>>>> I'm not exactly clear why the lock is needed here or what is the subtle
>>>> distinction between tls_rx_reader_lock and what lock_sock provides.
>>>
>>> It's a bit of a workaround for the consistency of the data stream.
>>> There's bunch of state in the TLS ULP and waiting for mem or data
>>> releases and re-takes the socket lock. So to stop the flow annoying
>>> corner case races I slapped a lock around all of the reader.
>>>
>>> IMHO depending on the socket lock for anything non-trivial and outside
>>> of the socket itself is a bad idea in general.
>>>
>>> The immediate need at the time was that if you did a read() and someone
>>> else did a peek() at the same time from a stream of A B C D you may read
>>> A D B C.
>>
>> Leaving me ever so confused.
>>
>> read_sock() is a generic interface; we cannot require a protocol 
>> specific lock before calling it.
>>
>> What to do now?
>> Drop the tls_rx_read_lock from read_sock() again?
> 
> Probably just need to synchronize the readers by splitting that from
> tls_rx_reader_lock:
> -- 
> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
> index 53f944e6d8ef..53404c3fdcc6 100644
> --- a/net/tls/tls_sw.c
> +++ b/net/tls/tls_sw.c
> @@ -1845,13 +1845,10 @@ tls_read_flush_backlog(struct sock *sk, struct 
> tls_prot_info *prot,
>          return sk_flush_backlog(sk);
>   }
> 
> -static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx 
> *ctx,
> -                             bool nonblock)
> +static int tls_rx_reader_acquire(struct sock *sk, struct 
> tls_sw_context_rx *ctx,
> +                            bool nonblock)
>   {
>          long timeo;
> -       int err;
> -
> -       lock_sock(sk);
> 
>          timeo = sock_rcvtimeo(sk, nonblock);
> 
> @@ -1865,26 +1862,30 @@ static int tls_rx_reader_lock(struct sock *sk, 
> struct tls_sw_context_rx *ctx,
>                                !READ_ONCE(ctx->reader_present), &wait);
>                  remove_wait_queue(&ctx->wq, &wait);
> 
> -               if (timeo <= 0) {
> -                       err = -EAGAIN;
> -                       goto err_unlock;
> -               }
> -               if (signal_pending(current)) {
> -                       err = sock_intr_errno(timeo);
> -                       goto err_unlock;
> -               }
> +               if (timeo <= 0)
> +                       return -EAGAIN;
> +               if (signal_pending(current))
> +                       return sock_intr_errno(timeo);
>          }
> 
>          WRITE_ONCE(ctx->reader_present, 1);
> 
>          return 0;
> +}
> 
> -err_unlock:
> -       release_sock(sk);
> +static int tls_rx_reader_lock(struct sock *sk, struct tls_sw_context_rx 
> *ctx,
> +                             bool nonblock)
> +{
> +       int err;
> +
> +       lock_sock(sk);
> +       err = tls_rx_reader_acquire(sk, ctx, nonblock);
> +       if (err)
> +               release_sock(sk);
>          return err;
>   }
> 
> -static void tls_rx_reader_unlock(struct sock *sk, struct 
> tls_sw_context_rx *ctx)
> +static void tls_rx_reader_release(struct sock *sk, struct 
> tls_sw_context_rx *ctx)
>   {
>          if (unlikely(ctx->reader_contended)) {
>                  if (wq_has_sleeper(&ctx->wq))
> @@ -1896,6 +1897,11 @@ static void tls_rx_reader_unlock(struct sock *sk, 
> struct tls_sw_context_rx *ctx)
>          }
> 
>          WRITE_ONCE(ctx->reader_present, 0);
> +}
> +
> +static void tls_rx_reader_unlock(struct sock *sk, struct 
> tls_sw_context_rx *ctx)
> +{
> +       tls_rx_reader_release(sk, ctx);
>          release_sock(sk);
>   }
> -- 
> 
> Then read_sock can just acquire/release.
Good suggestion.
Will be including it in the next round.
Cheers,
Hannes
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-21  9:08           ` Hannes Reinecke
@ 2023-06-21  9:49             ` Sagi Grimberg
  2023-06-21 19:31               ` Jakub Kicinski
  0 siblings, 1 reply; 18+ messages in thread
From: Sagi Grimberg @ 2023-06-21  9:49 UTC (permalink / raw)
  To: Hannes Reinecke, Jakub Kicinski
  Cc: Christoph Hellwig, Keith Busch, linux-nvme, Eric Dumazet,
	Paolo Abeni, netdev, Boris Pismenny
On 6/21/23 12:08, Hannes Reinecke wrote:
> On 6/21/23 10:39, Sagi Grimberg wrote:
>>
>>>> On Tue, 20 Jun 2023 16:21:22 +0300 Sagi Grimberg wrote:
>>>>>> +    err = tls_rx_reader_lock(sk, ctx, true);
>>>>>> +    if (err < 0)
>>>>>> +        return err;
>>>>>
>>>>> Unlike recvmsg or splice_read, the caller of read_sock is assumed to
>>>>> have the socket locked, and tls_rx_reader_lock also calls lock_sock,
>>>>> how is this not a deadlock?
>>>>
>>>> Yeah :|
>>>>
>>>>> I'm not exactly clear why the lock is needed here or what is the 
>>>>> subtle
>>>>> distinction between tls_rx_reader_lock and what lock_sock provides.
>>>>
>>>> It's a bit of a workaround for the consistency of the data stream.
>>>> There's bunch of state in the TLS ULP and waiting for mem or data
>>>> releases and re-takes the socket lock. So to stop the flow annoying
>>>> corner case races I slapped a lock around all of the reader.
>>>>
>>>> IMHO depending on the socket lock for anything non-trivial and outside
>>>> of the socket itself is a bad idea in general.
>>>>
>>>> The immediate need at the time was that if you did a read() and someone
>>>> else did a peek() at the same time from a stream of A B C D you may 
>>>> read
>>>> A D B C.
>>>
>>> Leaving me ever so confused.
>>>
>>> read_sock() is a generic interface; we cannot require a protocol 
>>> specific lock before calling it.
>>>
>>> What to do now?
>>> Drop the tls_rx_read_lock from read_sock() again?
>>
>> Probably just need to synchronize the readers by splitting that from
>> tls_rx_reader_lock:
>> -- 
>> diff --git a/net/tls/tls_sw.c b/net/tls/tls_sw.c
>> index 53f944e6d8ef..53404c3fdcc6 100644
>> --- a/net/tls/tls_sw.c
>> +++ b/net/tls/tls_sw.c
>> @@ -1845,13 +1845,10 @@ tls_read_flush_backlog(struct sock *sk, struct 
>> tls_prot_info *prot,
>>          return sk_flush_backlog(sk);
>>   }
>>
>> -static int tls_rx_reader_lock(struct sock *sk, struct 
>> tls_sw_context_rx *ctx,
>> -                             bool nonblock)
>> +static int tls_rx_reader_acquire(struct sock *sk, struct 
>> tls_sw_context_rx *ctx,
>> +                            bool nonblock)
>>   {
>>          long timeo;
>> -       int err;
>> -
>> -       lock_sock(sk);
>>
>>          timeo = sock_rcvtimeo(sk, nonblock);
>>
>> @@ -1865,26 +1862,30 @@ static int tls_rx_reader_lock(struct sock *sk, 
>> struct tls_sw_context_rx *ctx,
>>                                !READ_ONCE(ctx->reader_present), &wait);
>>                  remove_wait_queue(&ctx->wq, &wait);
>>
>> -               if (timeo <= 0) {
>> -                       err = -EAGAIN;
>> -                       goto err_unlock;
>> -               }
>> -               if (signal_pending(current)) {
>> -                       err = sock_intr_errno(timeo);
>> -                       goto err_unlock;
>> -               }
>> +               if (timeo <= 0)
>> +                       return -EAGAIN;
>> +               if (signal_pending(current))
>> +                       return sock_intr_errno(timeo);
>>          }
>>
>>          WRITE_ONCE(ctx->reader_present, 1);
>>
>>          return 0;
>> +}
>>
>> -err_unlock:
>> -       release_sock(sk);
>> +static int tls_rx_reader_lock(struct sock *sk, struct 
>> tls_sw_context_rx *ctx,
>> +                             bool nonblock)
>> +{
>> +       int err;
>> +
>> +       lock_sock(sk);
>> +       err = tls_rx_reader_acquire(sk, ctx, nonblock);
>> +       if (err)
>> +               release_sock(sk);
>>          return err;
>>   }
>>
>> -static void tls_rx_reader_unlock(struct sock *sk, struct 
>> tls_sw_context_rx *ctx)
>> +static void tls_rx_reader_release(struct sock *sk, struct 
>> tls_sw_context_rx *ctx)
>>   {
>>          if (unlikely(ctx->reader_contended)) {
>>                  if (wq_has_sleeper(&ctx->wq))
>> @@ -1896,6 +1897,11 @@ static void tls_rx_reader_unlock(struct sock 
>> *sk, struct tls_sw_context_rx *ctx)
>>          }
>>
>>          WRITE_ONCE(ctx->reader_present, 0);
>> +}
>> +
>> +static void tls_rx_reader_unlock(struct sock *sk, struct 
>> tls_sw_context_rx *ctx)
>> +{
>> +       tls_rx_reader_release(sk, ctx);
>>          release_sock(sk);
>>   }
>> -- 
>>
>> Then read_sock can just acquire/release.
> 
> Good suggestion.
> Will be including it in the next round.
Maybe more appropriate helper names would be
tls_rx_reader_enter / tls_rx_reader_exit.
Whatever Jakub prefers...
^ permalink raw reply	[flat|nested] 18+ messages in thread
* Re: [PATCH 4/4] net/tls: implement ->read_sock()
  2023-06-21  9:49             ` Sagi Grimberg
@ 2023-06-21 19:31               ` Jakub Kicinski
  0 siblings, 0 replies; 18+ messages in thread
From: Jakub Kicinski @ 2023-06-21 19:31 UTC (permalink / raw)
  To: Sagi Grimberg
  Cc: Hannes Reinecke, Christoph Hellwig, Keith Busch, linux-nvme,
	Eric Dumazet, Paolo Abeni, netdev, Boris Pismenny
On Wed, 21 Jun 2023 12:49:21 +0300 Sagi Grimberg wrote:
> > Good suggestion.
> > Will be including it in the next round.  
> 
> Maybe more appropriate helper names would be
> tls_rx_reader_enter / tls_rx_reader_exit.
> 
> Whatever Jakub prefers...
I was thinking along the same lines but with __ in front of the names
of the factored out code. Your naming as suggested in the diff is
better.
^ permalink raw reply	[flat|nested] 18+ messages in thread
end of thread, other threads:[~2023-06-21 19:31 UTC | newest]
Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-06-20 10:28 [PATCHv5 0/4] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
2023-06-20 10:28 ` [PATCH 1/4] net/tls: handle MSG_EOR for tls_sw TX flow Hannes Reinecke
2023-06-20 10:28 ` [PATCH 2/4] net/tls: handle MSG_EOR for tls_device " Hannes Reinecke
2023-06-20 17:12   ` Sabrina Dubroca
2023-06-21  6:09     ` Hannes Reinecke
2023-06-20 10:28 ` [PATCH 3/4] selftests/net/tls: add test for MSG_EOR Hannes Reinecke
2023-06-20 10:28 ` [PATCH 4/4] net/tls: implement ->read_sock() Hannes Reinecke
2023-06-20 13:21   ` Sagi Grimberg
2023-06-20 17:08     ` Jakub Kicinski
2023-06-21  6:44       ` Hannes Reinecke
2023-06-21  8:39         ` Sagi Grimberg
2023-06-21  9:08           ` Hannes Reinecke
2023-06-21  9:49             ` Sagi Grimberg
2023-06-21 19:31               ` Jakub Kicinski
  -- strict thread matches above, loose matches on Subject: below --
2023-06-14  6:22 [PATCHv4 0/4] net/tls: fixes for NVMe-over-TLS Hannes Reinecke
2023-06-14  6:22 ` [PATCH 4/4] net/tls: implement ->read_sock() Hannes Reinecke
2023-06-17  6:28   ` Jakub Kicinski
2023-06-17 14:08   ` Simon Horman
2023-06-19  8:16     ` Dan Carpenter
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).