[PATCH 0/4] implement vectored registered buffers for sendzc

io-uring.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* [PATCH 0/4] implement vectored registered buffers for sendzc
@ 2024-10-23  2:38 Pavel Begunkov
  2024-10-23  2:38 ` [PATCH 1/4] io_uring/net: introduce io_kmsg_set_iovec Pavel Begunkov
                   ` (4 more replies)
  0 siblings, 5 replies; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-23  2:38 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Allow registered buffers to be used with zerocopy sendmsg, where the
passed iovec becomes a scatter list into the registered buffer
specified by sqe->buf_index. See patches 3 and 4 for more details.

To get performance out of it, it'll need a bit more work on top for
optimising allocations and cleaning up send setups. We can also
implement it for non zerocopy variants and reads/writes in the future.

Tested by enabling it in test/send-zerocopy.c, which checks payloads,
and exercises lots of corner cases, especially around send sizes,
offsets and non aligned registered buffers.

Pavel Begunkov (4):
  io_uring/net: introduce io_kmsg_set_iovec
  io_uring/net: allow mixed bvec/iovec caching
  io_uring: vectored registered buffer import
  io_uring/net: sendzc with vectored fixed buffers

 io_uring/net.c  | 132 +++++++++++++++++++++++++++++++++---------------
 io_uring/net.h  |   4 +-
 io_uring/rsrc.c |  60 ++++++++++++++++++++++
 io_uring/rsrc.h |   3 ++
 4 files changed, 156 insertions(+), 43 deletions(-)

-- 
2.46.0

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH 1/4] io_uring/net: introduce io_kmsg_set_iovec
  2024-10-23  2:38 [PATCH 0/4] implement vectored registered buffers for sendzc Pavel Begunkov
@ 2024-10-23  2:38 ` Pavel Begunkov
  2024-10-23  2:38 ` [PATCH 2/4] io_uring/net: allow mixed bvec/iovec caching Pavel Begunkov
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-23  2:38 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

A prep patch, add a helper function taking an allocated iovec and
assigning it to the kmsg cache. It'll be expanded upon in the following
patch.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/net.c | 22 ++++++++++++----------
 1 file changed, 12 insertions(+), 10 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index 0a0b148a153c..bd24290fa646 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -125,12 +125,18 @@ static bool io_net_retry(struct socket *sock, int flags)
 	return sock->type == SOCK_STREAM || sock->type == SOCK_SEQPACKET;
 }
 
+static inline void io_kmsg_set_iovec(struct io_async_msghdr *kmsg,
+				     struct iovec *iov, int nr)
+{
+	kmsg->free_iov_nr = nr;
+	kmsg->free_iov = iov;
+}
+
 static void io_netmsg_iovec_free(struct io_async_msghdr *kmsg)
 {
 	if (kmsg->free_iov) {
 		kfree(kmsg->free_iov);
-		kmsg->free_iov_nr = 0;
-		kmsg->free_iov = NULL;
+		io_kmsg_set_iovec(kmsg, NULL, 0);
 	}
 }
 
@@ -174,8 +180,7 @@ static struct io_async_msghdr *io_msg_alloc_async(struct io_kiocb *req)
 
 	if (!io_alloc_async_data(req)) {
 		hdr = req->async_data;
-		hdr->free_iov_nr = 0;
-		hdr->free_iov = NULL;
+		io_kmsg_set_iovec(hdr, NULL, 0);
 		return hdr;
 	}
 	return NULL;
@@ -187,10 +192,9 @@ static int io_net_vec_assign(struct io_kiocb *req, struct io_async_msghdr *kmsg,
 {
 	if (iov) {
 		req->flags |= REQ_F_NEED_CLEANUP;
-		kmsg->free_iov_nr = kmsg->msg.msg_iter.nr_segs;
 		if (kmsg->free_iov)
 			kfree(kmsg->free_iov);
-		kmsg->free_iov = iov;
+		io_kmsg_set_iovec(kmsg, iov, kmsg->msg.msg_iter.nr_segs);
 	}
 	return 0;
 }
@@ -623,8 +627,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 			return ret;
 
 		if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->free_iov) {
-			kmsg->free_iov_nr = ret;
-			kmsg->free_iov = arg.iovs;
+			io_kmsg_set_iovec(kmsg, arg.iovs, ret);
 			req->flags |= REQ_F_NEED_CLEANUP;
 		}
 		sr->len = arg.out_len;
@@ -1107,8 +1110,7 @@ static int io_recv_buf_select(struct io_kiocb *req, struct io_async_msghdr *kmsg
 		iov_iter_init(&kmsg->msg.msg_iter, ITER_DEST, arg.iovs, ret,
 				arg.out_len);
 		if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->free_iov) {
-			kmsg->free_iov_nr = ret;
-			kmsg->free_iov = arg.iovs;
+			io_kmsg_set_iovec(kmsg, arg.iovs, ret);
 			req->flags |= REQ_F_NEED_CLEANUP;
 		}
 	} else {
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 2/4] io_uring/net: allow mixed bvec/iovec caching
  2024-10-23  2:38 [PATCH 0/4] implement vectored registered buffers for sendzc Pavel Begunkov
  2024-10-23  2:38 ` [PATCH 1/4] io_uring/net: introduce io_kmsg_set_iovec Pavel Begunkov
@ 2024-10-23  2:38 ` Pavel Begunkov
  2024-10-23  2:38 ` [PATCH 3/4] io_uring: vectored registered buffer import Pavel Begunkov
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-23  2:38 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

We're going to have allocated bvecs shortly, we need a place to store
them and intra releasing it. Reuse the struct io_async_msghdr iovec
caching for that. Get rid of typing and switch to bytes instead of
keeping the number of iov elements the cached array can store.
Performance wise it should be just fine as divisions will be compiled
into binary shifts.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/net.c | 67 +++++++++++++++++++++++++++-----------------------
 io_uring/net.h |  4 +--
 2 files changed, 38 insertions(+), 33 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index bd24290fa646..bc062b5a7a55 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -128,14 +128,19 @@ static bool io_net_retry(struct socket *sock, int flags)
 static inline void io_kmsg_set_iovec(struct io_async_msghdr *kmsg,
 				     struct iovec *iov, int nr)
 {
-	kmsg->free_iov_nr = nr;
-	kmsg->free_iov = iov;
+	kmsg->free_vec_bytes = nr * sizeof(*iov);
+	kmsg->free_vec = iov;
+}
+
+static int io_kmsg_nr_free_iov(struct io_async_msghdr *kmsg)
+{
+	return kmsg->free_vec_bytes / sizeof(struct iovec);
 }
 
 static void io_netmsg_iovec_free(struct io_async_msghdr *kmsg)
 {
-	if (kmsg->free_iov) {
-		kfree(kmsg->free_iov);
+	if (kmsg->free_vec) {
+		kfree(kmsg->free_vec);
 		io_kmsg_set_iovec(kmsg, NULL, 0);
 	}
 }
@@ -143,7 +148,7 @@ static void io_netmsg_iovec_free(struct io_async_msghdr *kmsg)
 static void io_netmsg_recycle(struct io_kiocb *req, unsigned int issue_flags)
 {
 	struct io_async_msghdr *hdr = req->async_data;
-	struct iovec *iov;
+	void *vec;
 
 	/* can't recycle, ensure we free the iovec if we have one */
 	if (unlikely(issue_flags & IO_URING_F_UNLOCKED)) {
@@ -152,10 +157,10 @@ static void io_netmsg_recycle(struct io_kiocb *req, unsigned int issue_flags)
 	}
 
 	/* Let normal cleanup path reap it if we fail adding to the cache */
-	iov = hdr->free_iov;
+	vec = hdr->free_vec;
 	if (io_alloc_cache_put(&req->ctx->netmsg_cache, hdr)) {
-		if (iov)
-			kasan_mempool_poison_object(iov);
+		if (vec)
+			kasan_mempool_poison_object(vec);
 		req->async_data = NULL;
 		req->flags &= ~REQ_F_ASYNC_DATA;
 	}
@@ -168,9 +173,9 @@ static struct io_async_msghdr *io_msg_alloc_async(struct io_kiocb *req)
 
 	hdr = io_alloc_cache_get(&ctx->netmsg_cache);
 	if (hdr) {
-		if (hdr->free_iov) {
-			kasan_mempool_unpoison_object(hdr->free_iov,
-				hdr->free_iov_nr * sizeof(struct iovec));
+		if (hdr->free_vec) {
+			kasan_mempool_unpoison_object(hdr->free_vec,
+						hdr->free_vec_bytes);
 			req->flags |= REQ_F_NEED_CLEANUP;
 		}
 		req->flags |= REQ_F_ASYNC_DATA;
@@ -192,8 +197,8 @@ static int io_net_vec_assign(struct io_kiocb *req, struct io_async_msghdr *kmsg,
 {
 	if (iov) {
 		req->flags |= REQ_F_NEED_CLEANUP;
-		if (kmsg->free_iov)
-			kfree(kmsg->free_iov);
+		if (kmsg->free_vec)
+			kfree(kmsg->free_vec);
 		io_kmsg_set_iovec(kmsg, iov, kmsg->msg.msg_iter.nr_segs);
 	}
 	return 0;
@@ -220,9 +225,9 @@ static int io_compat_msg_copy_hdr(struct io_kiocb *req,
 	struct iovec *iov;
 	int ret, nr_segs;
 
-	if (iomsg->free_iov) {
-		nr_segs = iomsg->free_iov_nr;
-		iov = iomsg->free_iov;
+	if (iomsg->free_vec) {
+		nr_segs = io_kmsg_nr_free_iov(iomsg);
+		iov = iomsg->free_vec;
 	} else {
 		iov = &iomsg->fast_iov;
 		nr_segs = 1;
@@ -270,9 +275,9 @@ static int io_msg_copy_hdr(struct io_kiocb *req, struct io_async_msghdr *iomsg,
 	struct iovec *iov;
 	int ret, nr_segs;
 
-	if (iomsg->free_iov) {
-		nr_segs = iomsg->free_iov_nr;
-		iov = iomsg->free_iov;
+	if (iomsg->free_vec) {
+		nr_segs = io_kmsg_nr_free_iov(iomsg);
+		iov = iomsg->free_vec;
 	} else {
 		iov = &iomsg->fast_iov;
 		nr_segs = 1;
@@ -478,7 +483,7 @@ static int io_bundle_nbufs(struct io_async_msghdr *kmsg, int ret)
 	if (iter_is_ubuf(&kmsg->msg.msg_iter))
 		return 1;
 
-	iov = kmsg->free_iov;
+	iov = kmsg->free_vec;
 	if (!iov)
 		iov = &kmsg->fast_iov;
 
@@ -611,9 +616,9 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 			.nr_iovs = 1,
 		};
 
-		if (kmsg->free_iov) {
-			arg.nr_iovs = kmsg->free_iov_nr;
-			arg.iovs = kmsg->free_iov;
+		if (kmsg->free_vec) {
+			arg.nr_iovs = io_kmsg_nr_free_iov(kmsg);
+			arg.iovs = kmsg->free_vec;
 			arg.mode = KBUF_MODE_FREE;
 		}
 
@@ -626,7 +631,7 @@ int io_send(struct io_kiocb *req, unsigned int issue_flags)
 		if (unlikely(ret < 0))
 			return ret;
 
-		if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->free_iov) {
+		if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->free_vec) {
 			io_kmsg_set_iovec(kmsg, arg.iovs, ret);
 			req->flags |= REQ_F_NEED_CLEANUP;
 		}
@@ -1088,9 +1093,9 @@ static int io_recv_buf_select(struct io_kiocb *req, struct io_async_msghdr *kmsg
 			.mode = KBUF_MODE_EXPAND,
 		};
 
-		if (kmsg->free_iov) {
-			arg.nr_iovs = kmsg->free_iov_nr;
-			arg.iovs = kmsg->free_iov;
+		if (kmsg->free_vec) {
+			arg.nr_iovs = io_kmsg_nr_free_iov(kmsg);
+			arg.iovs = kmsg->free_vec;
 			arg.mode |= KBUF_MODE_FREE;
 		}
 
@@ -1109,7 +1114,7 @@ static int io_recv_buf_select(struct io_kiocb *req, struct io_async_msghdr *kmsg
 		}
 		iov_iter_init(&kmsg->msg.msg_iter, ITER_DEST, arg.iovs, ret,
 				arg.out_len);
-		if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->free_iov) {
+		if (arg.iovs != &kmsg->fast_iov && arg.iovs != kmsg->free_vec) {
 			io_kmsg_set_iovec(kmsg, arg.iovs, ret);
 			req->flags |= REQ_F_NEED_CLEANUP;
 		}
@@ -1807,9 +1812,9 @@ void io_netmsg_cache_free(const void *entry)
 {
 	struct io_async_msghdr *kmsg = (struct io_async_msghdr *) entry;
 
-	if (kmsg->free_iov) {
-		kasan_mempool_unpoison_object(kmsg->free_iov,
-				kmsg->free_iov_nr * sizeof(struct iovec));
+	if (kmsg->free_vec) {
+		kasan_mempool_unpoison_object(kmsg->free_vec,
+					kmsg->free_vec_bytes);
 		io_netmsg_iovec_free(kmsg);
 	}
 	kfree(kmsg);
diff --git a/io_uring/net.h b/io_uring/net.h
index 52bfee05f06a..65d497985572 100644
--- a/io_uring/net.h
+++ b/io_uring/net.h
@@ -7,8 +7,8 @@ struct io_async_msghdr {
 #if defined(CONFIG_NET)
 	struct iovec			fast_iov;
 	/* points to an allocated iov, if NULL we use fast_iov instead */
-	struct iovec			*free_iov;
-	int				free_iov_nr;
+	void				*free_vec;
+	int				free_vec_bytes;
 	int				namelen;
 	__kernel_size_t			controllen;
 	__kernel_size_t			payloadlen;
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 3/4] io_uring: vectored registered buffer import
  2024-10-23  2:38 [PATCH 0/4] implement vectored registered buffers for sendzc Pavel Begunkov
  2024-10-23  2:38 ` [PATCH 1/4] io_uring/net: introduce io_kmsg_set_iovec Pavel Begunkov
  2024-10-23  2:38 ` [PATCH 2/4] io_uring/net: allow mixed bvec/iovec caching Pavel Begunkov
@ 2024-10-23  2:38 ` Pavel Begunkov
  2024-10-23  2:38 ` [PATCH 4/4] io_uring/net: sendzc with vectored fixed buffers Pavel Begunkov
  2024-10-23 13:52 ` [PATCH 0/4] implement vectored registered buffers for sendzc Jens Axboe
  4 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-23  2:38 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Add a helper that takes a registered buffer and an iovec with addresses
pointing into that registered buffer, and return a new bvec
corresponding to the given iovec. Essentially, each iov entry is
resolved into a bvec array, which gives us an array of arrays of struct
bio_vec, which the function flattens into a single long bvec.

Note, max_segs is overestimated, that can be improved later. The
allocation also can be optimised by doing it inline into the same array.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/rsrc.c | 60 +++++++++++++++++++++++++++++++++++++++++++++++++
 io_uring/rsrc.h |  3 +++
 2 files changed, 63 insertions(+)

diff --git a/io_uring/rsrc.c b/io_uring/rsrc.c
index fa5f27496aef..6f9f3cb4a2ef 100644
--- a/io_uring/rsrc.c
+++ b/io_uring/rsrc.c
@@ -1085,6 +1085,66 @@ int io_sqe_buffers_register(struct io_ring_ctx *ctx, void __user *arg,
 	return ret;
 }
 
+struct bio_vec *io_import_fixed_vec(int ddir, struct iov_iter *iter,
+				    struct io_mapped_ubuf *imu,
+				    struct iovec *iovec, int nr_iovs)
+{
+	unsigned long folio_size = (1 << imu->folio_shift);
+	unsigned long folio_mask = folio_size - 1;
+	struct bio_vec *res_bvec;
+	size_t total_len = 0;
+	int max_segs = 0;
+	int bvec_idx = 0;
+	int iov_idx;
+
+	if (WARN_ON_ONCE(!imu))
+		return ERR_PTR(-EFAULT);
+
+	for (iov_idx = 0; iov_idx < nr_iovs; iov_idx++) {
+		size_t iov_len = iovec[iov_idx].iov_len;
+		u64 buf_addr = (u64)iovec[iov_idx].iov_base;
+		u64 buf_end;
+
+		if (unlikely(check_add_overflow(buf_addr, (u64)iov_len, &buf_end)))
+			return ERR_PTR(-EFAULT);
+		/* not inside the mapped region */
+		if (unlikely(buf_addr < imu->ubuf || buf_end > (imu->ubuf + imu->len)))
+			return ERR_PTR(-EFAULT);
+		max_segs += (iov_len >> imu->folio_shift) + 2;
+	}
+
+	res_bvec = kmalloc_array(max_segs, sizeof(*res_bvec), GFP_KERNEL);
+	if (!res_bvec)
+		return ERR_PTR(-ENOMEM);
+
+	for (iov_idx = 0; iov_idx < nr_iovs; iov_idx++) {
+		size_t iov_len = iovec[iov_idx].iov_len;
+		u64 buf_addr = (u64)iovec[iov_idx].iov_base;
+		u64 folio_addr = imu->ubuf & ~folio_mask;
+		struct bio_vec *src_bvec;
+		size_t offset;
+
+		total_len += iov_len;
+		/* by using folio address it also accounts for bvec offset */
+		offset = buf_addr - folio_addr;
+		src_bvec = imu->bvec + (offset >> imu->folio_shift);
+		offset &= folio_mask;
+
+		for (; iov_len; offset = 0, bvec_idx++, src_bvec++) {
+			size_t seg_size = min_t(size_t, iov_len,
+						folio_size - offset);
+
+			res_bvec[bvec_idx].bv_page = src_bvec->bv_page;
+			res_bvec[bvec_idx].bv_offset = offset;
+			res_bvec[bvec_idx].bv_len = seg_size;
+			iov_len -= seg_size;
+		}
+	}
+
+	iov_iter_bvec(iter, ddir, res_bvec, bvec_idx, total_len);
+	return res_bvec;
+}
+
 int io_import_fixed(int ddir, struct iov_iter *iter,
 			   struct io_mapped_ubuf *imu,
 			   u64 buf_addr, size_t len)
diff --git a/io_uring/rsrc.h b/io_uring/rsrc.h
index 8ed588036210..675161cf8b92 100644
--- a/io_uring/rsrc.h
+++ b/io_uring/rsrc.h
@@ -66,6 +66,9 @@ int io_queue_rsrc_removal(struct io_rsrc_data *data, unsigned idx, void *rsrc);
 int io_import_fixed(int ddir, struct iov_iter *iter,
 			   struct io_mapped_ubuf *imu,
 			   u64 buf_addr, size_t len);
+struct bio_vec *io_import_fixed_vec(int ddir, struct iov_iter *iter,
+				    struct io_mapped_ubuf *imu,
+				    struct iovec *iovec, int nr_iovs);
 
 int io_register_clone_buffers(struct io_ring_ctx *ctx, void __user *arg);
 void __io_sqe_buffers_unregister(struct io_ring_ctx *ctx);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* [PATCH 4/4] io_uring/net: sendzc with vectored fixed buffers
  2024-10-23  2:38 [PATCH 0/4] implement vectored registered buffers for sendzc Pavel Begunkov
                   ` (2 preceding siblings ...)
  2024-10-23  2:38 ` [PATCH 3/4] io_uring: vectored registered buffer import Pavel Begunkov
@ 2024-10-23  2:38 ` Pavel Begunkov
  2024-10-23 13:52 ` [PATCH 0/4] implement vectored registered buffers for sendzc Jens Axboe
  4 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-23  2:38 UTC (permalink / raw)
  To: io-uring; +Cc: Jens Axboe, asml.silence

Currently, we can use registered buffers with send zerocopy but not
sendmsg. However, users want to use it with zc sendmsg as well, and
pass a scatter list into a registered buffer.

Implement a vectored registered buffer support for sendmsg zerocopy.
The ABI should be intuitive. The user should set sqe->buf_index to the
desired registered buffer and also and pass IORING_RECVSEND_FIXED_BUF
flag. msghdr should still point to an iovec with user addresses served
to calculate offsets how it has always been with registered buffers.
In other words, in most cases the user passes the same iovec it'd pass
to the non-registered buffer version.

It's the first step and requires some more work cleaning the
infrastructure. It'll also need some imporvement on the bvec caching
side.

Note, we can easily enable it for non zc version, and even extend the
feature to read/write requests.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/net.c | 47 +++++++++++++++++++++++++++++++++++++++++++++--
 1 file changed, 45 insertions(+), 2 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index bc062b5a7a55..6a19b6a7dc06 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -204,6 +204,18 @@ static int io_net_vec_assign(struct io_kiocb *req, struct io_async_msghdr *kmsg,
 	return 0;
 }
 
+static void io_net_bvec_assign(struct io_kiocb *req, struct io_async_msghdr *kmsg,
+				struct bio_vec *bvec, int max_segs)
+{
+	if (bvec) {
+		req->flags |= REQ_F_NEED_CLEANUP;
+		if (kmsg->free_vec)
+			kfree(kmsg->free_vec);
+		kmsg->free_vec_bytes = max_segs * sizeof(*bvec);
+		kmsg->free_vec = bvec;
+	}
+}
+
 static inline void io_mshot_prep_retry(struct io_kiocb *req,
 				       struct io_async_msghdr *kmsg)
 {
@@ -267,6 +279,31 @@ static int io_compat_msg_copy_hdr(struct io_kiocb *req,
 }
 #endif
 
+static int io_send_setup_sg_fixed(struct io_kiocb *req, struct iovec *iovec,
+				  int nr_iovs, int ddir)
+{
+	struct io_sr_msg *sr = io_kiocb_to_cmd(req, struct io_sr_msg);
+	struct io_async_msghdr *iomsg = req->async_data;
+	struct iov_iter *iter = &iomsg->msg.msg_iter;
+	struct io_ring_ctx *ctx = req->ctx;
+	struct io_mapped_ubuf *imu;
+	struct bio_vec *bvec;
+	int idx;
+
+	if (unlikely(sr->buf_index >= ctx->nr_user_bufs))
+		return -EFAULT;
+	idx = array_index_nospec(sr->buf_index, ctx->nr_user_bufs);
+	imu = READ_ONCE(ctx->user_bufs[idx]);
+	io_req_set_rsrc_node(sr->notif, ctx, 0);
+
+	bvec = io_import_fixed_vec(ddir, iter, imu, iovec, nr_iovs);
+	if (unlikely(IS_ERR(bvec)))
+		return PTR_ERR(bvec);
+
+	io_net_bvec_assign(req, iomsg, bvec, iter->nr_segs);
+	return 0;
+}
+
 static int io_msg_copy_hdr(struct io_kiocb *req, struct io_async_msghdr *iomsg,
 			   struct user_msghdr *msg, int ddir)
 {
@@ -413,6 +450,14 @@ static int io_sendmsg_setup(struct io_kiocb *req, const struct io_uring_sqe *sqe
 	ret = io_sendmsg_copy_hdr(req, kmsg);
 	if (!ret)
 		req->flags |= REQ_F_NEED_CLEANUP;
+
+	if (sr->flags & IORING_RECVSEND_FIXED_BUF) {
+		struct iovec *iov;
+
+		iov =  kmsg->free_vec ? kmsg->free_vec : &kmsg->fast_iov;
+		return io_send_setup_sg_fixed(req, iov,
+				kmsg->msg.msg_iter.nr_segs, ITER_SOURCE);
+	}
 	return ret;
 }
 
@@ -1270,8 +1315,6 @@ int io_send_zc_prep(struct io_kiocb *req, const struct io_uring_sqe *sqe)
 	if (req->opcode != IORING_OP_SEND_ZC) {
 		if (unlikely(sqe->addr2 || sqe->file_index))
 			return -EINVAL;
-		if (unlikely(zc->flags & IORING_RECVSEND_FIXED_BUF))
-			return -EINVAL;
 	}
 
 	zc->len = READ_ONCE(sqe->len);
-- 
2.46.0


^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-23  2:38 [PATCH 0/4] implement vectored registered buffers for sendzc Pavel Begunkov
                   ` (3 preceding siblings ...)
  2024-10-23  2:38 ` [PATCH 4/4] io_uring/net: sendzc with vectored fixed buffers Pavel Begunkov
@ 2024-10-23 13:52 ` Jens Axboe
  2024-10-24 15:29   ` Pavel Begunkov
  4 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2024-10-23 13:52 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 10/22/24 8:38 PM, Pavel Begunkov wrote:
> Allow registered buffers to be used with zerocopy sendmsg, where the
> passed iovec becomes a scatter list into the registered buffer
> specified by sqe->buf_index. See patches 3 and 4 for more details.
> 
> To get performance out of it, it'll need a bit more work on top for
> optimising allocations and cleaning up send setups. We can also
> implement it for non zerocopy variants and reads/writes in the future.
> 
> Tested by enabling it in test/send-zerocopy.c, which checks payloads,
> and exercises lots of corner cases, especially around send sizes,
> offsets and non aligned registered buffers.

Just for the edification of the list readers, Pavel and I discussed this
a bit last night. There's a decent amount of overlap with the send zc
provided + registered buffers work that I did last week, but haven't
posted yet. It's here;

https://git.kernel.dk/cgit/linux/log/?h=io_uring-sendzc-provided

in terms of needing and using both bvec and iovec in the array, and
having the suitable caching for the arrays rather than needing a full
alloc + free every time.

The send zc part can map into bvecs upfront and hence don't need the
iovec array storage at the same time, which this one does as the sendmsg
zc opcode needs to import an iovec. But perhaps there's a way to still
unify the storage and retain the caching, without needing to come up
with something new.

Just a brief summary of the out-of-band discussion.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-23 13:52 ` [PATCH 0/4] implement vectored registered buffers for sendzc Jens Axboe
@ 2024-10-24 15:29   ` Pavel Begunkov
  2024-10-24 15:45     ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-24 15:29 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 10/23/24 14:52, Jens Axboe wrote:
> On 10/22/24 8:38 PM, Pavel Begunkov wrote:
>> Allow registered buffers to be used with zerocopy sendmsg, where the
>> passed iovec becomes a scatter list into the registered buffer
>> specified by sqe->buf_index. See patches 3 and 4 for more details.
>>
>> To get performance out of it, it'll need a bit more work on top for
>> optimising allocations and cleaning up send setups. We can also
>> implement it for non zerocopy variants and reads/writes in the future.
>>
>> Tested by enabling it in test/send-zerocopy.c, which checks payloads,
>> and exercises lots of corner cases, especially around send sizes,
>> offsets and non aligned registered buffers.
> 
> Just for the edification of the list readers, Pavel and I discussed this
> a bit last night. There's a decent amount of overlap with the send zc
> provided + registered buffers work that I did last week, but haven't
> posted yet. It's here;
> 
> https://git.kernel.dk/cgit/linux/log/?h=io_uring-sendzc-provided
> 
> in terms of needing and using both bvec and iovec in the array, and
> having the suitable caching for the arrays rather than needing a full
> alloc + free every time.

And as I mentioned, that can be well done in-place (in the same
array) as one option.

> The send zc part can map into bvecs upfront and hence don't need the
> iovec array storage at the same time, which this one does as the sendmsg
> zc opcode needs to import an iovec. But perhaps there's a way to still
> unify the storage and retain the caching, without needing to come up
> with something new.

I looked through. The first problem is that your thing consuming
entries from the ring, with iovecs it'd need to be reading it
from the user one by one. Considering allocations in your helpers,
that would turn it into a bunch of copy_from_user, and it might
just easier and cleaner to copy the entire iovec.

And you just made one towards delaying the imu resolution, which
is conceptually the right thing to do because of the mess with
links, just like it is with fixed files. That's why it need to
copy the iovec at the prep stage and resolve at the issue time.

> Just a brief summary of the out-of-band discussion.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-24 15:29   ` Pavel Begunkov
@ 2024-10-24 15:45     ` Jens Axboe
  2024-10-24 16:06       ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2024-10-24 15:45 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 10/24/24 9:29 AM, Pavel Begunkov wrote:
> On 10/23/24 14:52, Jens Axboe wrote:
>> On 10/22/24 8:38 PM, Pavel Begunkov wrote:
>>> Allow registered buffers to be used with zerocopy sendmsg, where the
>>> passed iovec becomes a scatter list into the registered buffer
>>> specified by sqe->buf_index. See patches 3 and 4 for more details.
>>>
>>> To get performance out of it, it'll need a bit more work on top for
>>> optimising allocations and cleaning up send setups. We can also
>>> implement it for non zerocopy variants and reads/writes in the future.
>>>
>>> Tested by enabling it in test/send-zerocopy.c, which checks payloads,
>>> and exercises lots of corner cases, especially around send sizes,
>>> offsets and non aligned registered buffers.
>>
>> Just for the edification of the list readers, Pavel and I discussed this
>> a bit last night. There's a decent amount of overlap with the send zc
>> provided + registered buffers work that I did last week, but haven't
>> posted yet. It's here;
>>
>> https://git.kernel.dk/cgit/linux/log/?h=io_uring-sendzc-provided
>>
>> in terms of needing and using both bvec and iovec in the array, and
>> having the suitable caching for the arrays rather than needing a full
>> alloc + free every time.
> 
> And as I mentioned, that can be well done in-place (in the same
> array) as one option.

And that would be the way to do it, like I mentioned as well, that is
how my first iteration of the above did it too. But since this one just
needs to end up with an array of bvec, it was pointless for my series to
do the iovec import and only then turn it into bvecs.

So somewhat orthogonal, as the use cases aren't exactly the same. One
deals with iovecs out of necessity, the other one only previously did as
a matter of convenience to reuse existing iovec based helpers.

>> The send zc part can map into bvecs upfront and hence don't need the
>> iovec array storage at the same time, which this one does as the sendmsg
>> zc opcode needs to import an iovec. But perhaps there's a way to still
>> unify the storage and retain the caching, without needing to come up
>> with something new.
> 
> I looked through. The first problem is that your thing consuming
> entries from the ring, with iovecs it'd need to be reading it
> from the user one by one. Considering allocations in your helpers,
> that would turn it into a bunch of copy_from_user, and it might
> just easier and cleaner to copy the entire iovec.

I think for you case, incremental import would be the way to go. Eg
something ala:

if (!user_access_begin(user_iovec, nr_vecs * sizeof(*user_iovec)))
	return -EFAULT;

bv = kmsg->bvec;
for_each_iov {
	struct iovec iov;

	unsafe_get_user(iov.iov_base, &user_iovec->iov_base, foo);
	unsafe_get_user(iov.iov_len, &user_iovec->iov_len, foo);

	import_to_bvec(bv, &iov);

	user_iovec++;
	bv++;
}

if it can be done at prep time, because then there's no need to store
the iovec at all, it's already stable, just in bvecs. And this avoids
overlapping iovec/bvec memory, and it avoids doing two iterations for
import. Purely a poc, just tossing out ideas.

But I haven't looked too closely at your series yet. In any case,
whatever ends up working for you, will most likely be find for the
bundled zerocopy send (non-vectored) as well, and I can just put it on
top of that.

> And you just made one towards delaying the imu resolution, which
> is conceptually the right thing to do because of the mess with
> links, just like it is with fixed files. That's why it need to
> copy the iovec at the prep stage and resolve at the issue time.

Indeed, prep time is certainly the place to do it. And the above
incremental import should work fine then, as we won't care abot
user_iovec being stable being prep.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-24 15:45     ` Jens Axboe
@ 2024-10-24 16:06       ` Pavel Begunkov
  2024-10-24 17:19         ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-24 16:06 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 10/24/24 16:45, Jens Axboe wrote:
> On 10/24/24 9:29 AM, Pavel Begunkov wrote:
>> On 10/23/24 14:52, Jens Axboe wrote:
>>> On 10/22/24 8:38 PM, Pavel Begunkov wrote:
>>>> Allow registered buffers to be used with zerocopy sendmsg, where the
>>>> passed iovec becomes a scatter list into the registered buffer
>>>> specified by sqe->buf_index. See patches 3 and 4 for more details.
>>>>
>>>> To get performance out of it, it'll need a bit more work on top for
>>>> optimising allocations and cleaning up send setups. We can also
>>>> implement it for non zerocopy variants and reads/writes in the future.
>>>>
>>>> Tested by enabling it in test/send-zerocopy.c, which checks payloads,
>>>> and exercises lots of corner cases, especially around send sizes,
>>>> offsets and non aligned registered buffers.
>>>
>>> Just for the edification of the list readers, Pavel and I discussed this
>>> a bit last night. There's a decent amount of overlap with the send zc
>>> provided + registered buffers work that I did last week, but haven't
>>> posted yet. It's here;
>>>
>>> https://git.kernel.dk/cgit/linux/log/?h=io_uring-sendzc-provided
>>>
>>> in terms of needing and using both bvec and iovec in the array, and
>>> having the suitable caching for the arrays rather than needing a full
>>> alloc + free every time.
>>
>> And as I mentioned, that can be well done in-place (in the same
>> array) as one option.
> 
> And that would be the way to do it, like I mentioned as well, that is
> how my first iteration of the above did it too. But since this one just
> needs to end up with an array of bvec, it was pointless for my series to
> do the iovec import and only then turn it into bvecs.
> 
> So somewhat orthogonal, as the use cases aren't exactly the same. One
> deals with iovecs out of necessity, the other one only previously did as
> a matter of convenience to reuse existing iovec based helpers.
> 
>>> The send zc part can map into bvecs upfront and hence don't need the
>>> iovec array storage at the same time, which this one does as the sendmsg
>>> zc opcode needs to import an iovec. But perhaps there's a way to still
>>> unify the storage and retain the caching, without needing to come up
>>> with something new.
>>
>> I looked through. The first problem is that your thing consuming
>> entries from the ring, with iovecs it'd need to be reading it
>> from the user one by one. Considering allocations in your helpers,
>> that would turn it into a bunch of copy_from_user, and it might
>> just easier and cleaner to copy the entire iovec.
> 
> I think for you case, incremental import would be the way to go. Eg
> something ala:
> 
> if (!user_access_begin(user_iovec, nr_vecs * sizeof(*user_iovec)))
> 	return -EFAULT;

Is it even legal? I don't know how it's implemented specifically
but I assume there can be any kind of magic, e.g. intercepting
page faults. I'd definitely prefer to avoid anything but the simplest
handling like read from/write to memory, e.g. no allocations.

> 
> bv = kmsg->bvec;
> for_each_iov {
> 	struct iovec iov;
> 
> 	unsafe_get_user(iov.iov_base, &user_iovec->iov_base, foo);
> 	unsafe_get_user(iov.iov_len, &user_iovec->iov_len, foo);
> 
> 	import_to_bvec(bv, &iov);
> 
> 	user_iovec++;
> 	bv++;
> }
> 
> if it can be done at prep time, because then there's no need to store
> the iovec at all, it's already stable, just in bvecs. And this avoids
> overlapping iovec/bvec memory, and it avoids doing two iterations for
> import. Purely a poc, just tossing out ideas.
> 
> But I haven't looked too closely at your series yet. In any case,
> whatever ends up working for you, will most likely be find for the
> bundled zerocopy send (non-vectored) as well, and I can just put it on
> top of that.
> 
>> And you just made one towards delaying the imu resolution, which
>> is conceptually the right thing to do because of the mess with
>> links, just like it is with fixed files. That's why it need to
>> copy the iovec at the prep stage and resolve at the issue time.
> 
> Indeed, prep time is certainly the place to do it. And the above
> incremental import should work fine then, as we won't care abot
> user_iovec being stable being prep.

Seems like you're agreeing but then stating the opposite, there
is some confusion. I'm saying that IMHO the right API wise way
is resolving an imu at issue time, just like it's done for fixed
files, and what your recent series did for send zc.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-24 16:06       ` Pavel Begunkov
@ 2024-10-24 17:19         ` Jens Axboe
  2024-10-24 17:56           ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2024-10-24 17:19 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 10/24/24 10:06 AM, Pavel Begunkov wrote:
> On 10/24/24 16:45, Jens Axboe wrote:
>> On 10/24/24 9:29 AM, Pavel Begunkov wrote:
>>> On 10/23/24 14:52, Jens Axboe wrote:
>>>> On 10/22/24 8:38 PM, Pavel Begunkov wrote:
>>>>> Allow registered buffers to be used with zerocopy sendmsg, where the
>>>>> passed iovec becomes a scatter list into the registered buffer
>>>>> specified by sqe->buf_index. See patches 3 and 4 for more details.
>>>>>
>>>>> To get performance out of it, it'll need a bit more work on top for
>>>>> optimising allocations and cleaning up send setups. We can also
>>>>> implement it for non zerocopy variants and reads/writes in the future.
>>>>>
>>>>> Tested by enabling it in test/send-zerocopy.c, which checks payloads,
>>>>> and exercises lots of corner cases, especially around send sizes,
>>>>> offsets and non aligned registered buffers.
>>>>
>>>> Just for the edification of the list readers, Pavel and I discussed this
>>>> a bit last night. There's a decent amount of overlap with the send zc
>>>> provided + registered buffers work that I did last week, but haven't
>>>> posted yet. It's here;
>>>>
>>>> https://git.kernel.dk/cgit/linux/log/?h=io_uring-sendzc-provided
>>>>
>>>> in terms of needing and using both bvec and iovec in the array, and
>>>> having the suitable caching for the arrays rather than needing a full
>>>> alloc + free every time.
>>>
>>> And as I mentioned, that can be well done in-place (in the same
>>> array) as one option.
>>
>> And that would be the way to do it, like I mentioned as well, that is
>> how my first iteration of the above did it too. But since this one just
>> needs to end up with an array of bvec, it was pointless for my series to
>> do the iovec import and only then turn it into bvecs.
>>
>> So somewhat orthogonal, as the use cases aren't exactly the same. One
>> deals with iovecs out of necessity, the other one only previously did as
>> a matter of convenience to reuse existing iovec based helpers.
>>
>>>> The send zc part can map into bvecs upfront and hence don't need the
>>>> iovec array storage at the same time, which this one does as the sendmsg
>>>> zc opcode needs to import an iovec. But perhaps there's a way to still
>>>> unify the storage and retain the caching, without needing to come up
>>>> with something new.
>>>
>>> I looked through. The first problem is that your thing consuming
>>> entries from the ring, with iovecs it'd need to be reading it
>>> from the user one by one. Considering allocations in your helpers,
>>> that would turn it into a bunch of copy_from_user, and it might
>>> just easier and cleaner to copy the entire iovec.
>>
>> I think for you case, incremental import would be the way to go. Eg
>> something ala:
>>
>> if (!user_access_begin(user_iovec, nr_vecs * sizeof(*user_iovec)))
>>     return -EFAULT;
> 
> Is it even legal? I don't know how it's implemented specifically
> but I assume there can be any kind of magic, e.g. intercepting
> page faults. I'd definitely prefer to avoid anything but the simplest
> handling like read from/write to memory, e.g. no allocations.
> 
>>
>> bv = kmsg->bvec;
>> for_each_iov {
>>     struct iovec iov;
>>
>>     unsafe_get_user(iov.iov_base, &user_iovec->iov_base, foo);
>>     unsafe_get_user(iov.iov_len, &user_iovec->iov_len, foo);
>>
>>     import_to_bvec(bv, &iov);
>>
>>     user_iovec++;
>>     bv++;
>> }
>>
>> if it can be done at prep time, because then there's no need to store
>> the iovec at all, it's already stable, just in bvecs. And this avoids
>> overlapping iovec/bvec memory, and it avoids doing two iterations for
>> import. Purely a poc, just tossing out ideas.
>>
>> But I haven't looked too closely at your series yet. In any case,
>> whatever ends up working for you, will most likely be find for the
>> bundled zerocopy send (non-vectored) as well, and I can just put it on
>> top of that.
>>
>>> And you just made one towards delaying the imu resolution, which
>>> is conceptually the right thing to do because of the mess with
>>> links, just like it is with fixed files. That's why it need to
>>> copy the iovec at the prep stage and resolve at the issue time.
>>
>> Indeed, prep time is certainly the place to do it. And the above
>> incremental import should work fine then, as we won't care abot
>> user_iovec being stable being prep.
> 
> Seems like you're agreeing but then stating the opposite, there
> is some confusion. I'm saying that IMHO the right API wise way
> is resolving an imu at issue time, just like it's done for fixed
> files, and what your recent series did for send zc.

Yeah early morning confusion I guess. And I do agree in principle,
though for registered buffers, those have to be registered upfront
anyway, so no confusion possible with prep vs issue there. For provided
buffers, it only matters for the legacy ones, which generally should not
be used. Doesn't change the fact that you're technically correct, the
right time to resolve them would be at issue time.

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-24 17:19         ` Jens Axboe
@ 2024-10-24 17:56           ` Pavel Begunkov
  2024-10-24 18:00             ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-24 17:56 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 10/24/24 18:19, Jens Axboe wrote:
> On 10/24/24 10:06 AM, Pavel Begunkov wrote:
>> On 10/24/24 16:45, Jens Axboe wrote:
...
>>> bv = kmsg->bvec;
>>> for_each_iov {
>>>      struct iovec iov;
>>>
>>>      unsafe_get_user(iov.iov_base, &user_iovec->iov_base, foo);
>>>      unsafe_get_user(iov.iov_len, &user_iovec->iov_len, foo);
>>>
>>>      import_to_bvec(bv, &iov);
>>>
>>>      user_iovec++;
>>>      bv++;
>>> }
>>>
>>> if it can be done at prep time, because then there's no need to store
>>> the iovec at all, it's already stable, just in bvecs. And this avoids
>>> overlapping iovec/bvec memory, and it avoids doing two iterations for
>>> import. Purely a poc, just tossing out ideas.
>>>
>>> But I haven't looked too closely at your series yet. In any case,
>>> whatever ends up working for you, will most likely be find for the
>>> bundled zerocopy send (non-vectored) as well, and I can just put it on
>>> top of that.
>>>
>>>> And you just made one towards delaying the imu resolution, which
>>>> is conceptually the right thing to do because of the mess with
>>>> links, just like it is with fixed files. That's why it need to
>>>> copy the iovec at the prep stage and resolve at the issue time.
>>>
>>> Indeed, prep time is certainly the place to do it. And the above
>>> incremental import should work fine then, as we won't care abot
>>> user_iovec being stable being prep.
>>
>> Seems like you're agreeing but then stating the opposite, there
>> is some confusion. I'm saying that IMHO the right API wise way
>> is resolving an imu at issue time, just like it's done for fixed
>> files, and what your recent series did for send zc.
> 
> Yeah early morning confusion I guess. And I do agree in principle,
> though for registered buffers, those have to be registered upfront
> anyway, so no confusion possible with prep vs issue there. For provided
> buffers, it only matters for the legacy ones, which generally should not
> be used. Doesn't change the fact that you're technically correct, the
> right time to resolve them would be at issue time.

I'm talking about sendmsg with iovec. Registered buffers should
be registered upfront, that's right, but iovec should be copied
at prep, and finally resolved into bvecs incl the imu/buffer lookup
at the issue time. And those are two different points in time,
maybe because of links, draining or anything else. And if they
should be at different moments, there is no way to do it while
copying iovec.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-24 17:56           ` Pavel Begunkov
@ 2024-10-24 18:00             ` Jens Axboe
  2024-10-24 18:13               ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2024-10-24 18:00 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 10/24/24 11:56 AM, Pavel Begunkov wrote:
> On 10/24/24 18:19, Jens Axboe wrote:
>> On 10/24/24 10:06 AM, Pavel Begunkov wrote:
>>> On 10/24/24 16:45, Jens Axboe wrote:
> ...
>>>> bv = kmsg->bvec;
>>>> for_each_iov {
>>>>      struct iovec iov;
>>>>
>>>>      unsafe_get_user(iov.iov_base, &user_iovec->iov_base, foo);
>>>>      unsafe_get_user(iov.iov_len, &user_iovec->iov_len, foo);
>>>>
>>>>      import_to_bvec(bv, &iov);
>>>>
>>>>      user_iovec++;
>>>>      bv++;
>>>> }
>>>>
>>>> if it can be done at prep time, because then there's no need to store
>>>> the iovec at all, it's already stable, just in bvecs. And this avoids
>>>> overlapping iovec/bvec memory, and it avoids doing two iterations for
>>>> import. Purely a poc, just tossing out ideas.
>>>>
>>>> But I haven't looked too closely at your series yet. In any case,
>>>> whatever ends up working for you, will most likely be find for the
>>>> bundled zerocopy send (non-vectored) as well, and I can just put it on
>>>> top of that.
>>>>
>>>>> And you just made one towards delaying the imu resolution, which
>>>>> is conceptually the right thing to do because of the mess with
>>>>> links, just like it is with fixed files. That's why it need to
>>>>> copy the iovec at the prep stage and resolve at the issue time.
>>>>
>>>> Indeed, prep time is certainly the place to do it. And the above
>>>> incremental import should work fine then, as we won't care abot
>>>> user_iovec being stable being prep.
>>>
>>> Seems like you're agreeing but then stating the opposite, there
>>> is some confusion. I'm saying that IMHO the right API wise way
>>> is resolving an imu at issue time, just like it's done for fixed
>>> files, and what your recent series did for send zc.
>>
>> Yeah early morning confusion I guess. And I do agree in principle,
>> though for registered buffers, those have to be registered upfront
>> anyway, so no confusion possible with prep vs issue there. For provided
>> buffers, it only matters for the legacy ones, which generally should not
>> be used. Doesn't change the fact that you're technically correct, the
>> right time to resolve them would be at issue time.
> 
> I'm talking about sendmsg with iovec. Registered buffers should
> be registered upfront, that's right, but iovec should be copied
> at prep, and finally resolved into bvecs incl the imu/buffer lookup
> at the issue time. And those are two different points in time,
> maybe because of links, draining or anything else. And if they
> should be at different moments, there is no way to do it while
> copying iovec.

Oh I totally follow, the incremental approach would only work if it can
be done at prep time. If at issue time, then it has to turn an existing
iovec array into the appropriate bvec array. And that's where you'd have
to do some clever bits to avoid holding both a full bvec and iovec array
in memory, which would be pretty wasteful/inefficient. If done at issue
time, then there's no way around a second iteration :/

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-24 18:00             ` Jens Axboe
@ 2024-10-24 18:13               ` Pavel Begunkov
  2024-10-24 19:56                 ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-24 18:13 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 10/24/24 19:00, Jens Axboe wrote:
> On 10/24/24 11:56 AM, Pavel Begunkov wrote:
>> On 10/24/24 18:19, Jens Axboe wrote:
>>> On 10/24/24 10:06 AM, Pavel Begunkov wrote:
>>>> On 10/24/24 16:45, Jens Axboe wrote:
...>>>> Seems like you're agreeing but then stating the opposite, there
>>>> is some confusion. I'm saying that IMHO the right API wise way
>>>> is resolving an imu at issue time, just like it's done for fixed
>>>> files, and what your recent series did for send zc.
>>>
>>> Yeah early morning confusion I guess. And I do agree in principle,
>>> though for registered buffers, those have to be registered upfront
>>> anyway, so no confusion possible with prep vs issue there. For provided
>>> buffers, it only matters for the legacy ones, which generally should not
>>> be used. Doesn't change the fact that you're technically correct, the
>>> right time to resolve them would be at issue time.
>>
>> I'm talking about sendmsg with iovec. Registered buffers should
>> be registered upfront, that's right, but iovec should be copied
>> at prep, and finally resolved into bvecs incl the imu/buffer lookup
>> at the issue time. And those are two different points in time,
>> maybe because of links, draining or anything else. And if they
>> should be at different moments, there is no way to do it while
>> copying iovec.
> 
> Oh I totally follow, the incremental approach would only work if it can
> be done at prep time. If at issue time, then it has to turn an existing
> iovec array into the appropriate bvec array. And that's where you'd have
> to do some clever bits to avoid holding both a full bvec and iovec array
> in memory, which would be pretty wasteful/inefficient. If done at issue

Why would it be wasteful and inefficient? No more than jumping
though that incremental infra for each chunk, doubling the size
of the array / reallocating / memcpy'ing it, instead of a tight
loop doing the entire conversion.

> time, then there's no way around a second iteration :/
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-24 18:13               ` Pavel Begunkov
@ 2024-10-24 19:56                 ` Jens Axboe
  2024-10-24 22:14                   ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2024-10-24 19:56 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 10/24/24 12:13 PM, Pavel Begunkov wrote:
> On 10/24/24 19:00, Jens Axboe wrote:
>> On 10/24/24 11:56 AM, Pavel Begunkov wrote:
>>> On 10/24/24 18:19, Jens Axboe wrote:
>>>> On 10/24/24 10:06 AM, Pavel Begunkov wrote:
>>>>> On 10/24/24 16:45, Jens Axboe wrote:
> ...>>>> Seems like you're agreeing but then stating the opposite, there
>>>>> is some confusion. I'm saying that IMHO the right API wise way
>>>>> is resolving an imu at issue time, just like it's done for fixed
>>>>> files, and what your recent series did for send zc.
>>>>
>>>> Yeah early morning confusion I guess. And I do agree in principle,
>>>> though for registered buffers, those have to be registered upfront
>>>> anyway, so no confusion possible with prep vs issue there. For provided
>>>> buffers, it only matters for the legacy ones, which generally should not
>>>> be used. Doesn't change the fact that you're technically correct, the
>>>> right time to resolve them would be at issue time.
>>>
>>> I'm talking about sendmsg with iovec. Registered buffers should
>>> be registered upfront, that's right, but iovec should be copied
>>> at prep, and finally resolved into bvecs incl the imu/buffer lookup
>>> at the issue time. And those are two different points in time,
>>> maybe because of links, draining or anything else. And if they
>>> should be at different moments, there is no way to do it while
>>> copying iovec.
>>
>> Oh I totally follow, the incremental approach would only work if it can
>> be done at prep time. If at issue time, then it has to turn an existing
>> iovec array into the appropriate bvec array. And that's where you'd have
>> to do some clever bits to avoid holding both a full bvec and iovec array
>> in memory, which would be pretty wasteful/inefficient. If done at issue
> 
> Why would it be wasteful and inefficient? No more than jumping
> though that incremental infra for each chunk, doubling the size
> of the array / reallocating / memcpy'ing it, instead of a tight
> loop doing the entire conversion.

Because it would prevent doing an iovec at-the-time import, then turning
it into the desired bvec. That's one loop instead of two. You would have
the space upfront, there should be no need to realloc+memcpy. And then
there's the space concern, where the initial import is an iovec, and
then you need a bvec. For 64-bit that's fine as they take up the same
amount of space, but for 32-bit it'd make incremental importing from a
stable iovec to a bvec array a bit more tricky (and would need realloc,
unless you over-alloc'ed for the iovec array upfront).

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-24 19:56                 ` Jens Axboe
@ 2024-10-24 22:14                   ` Pavel Begunkov
  2024-10-24 22:22                     ` Jens Axboe
  0 siblings, 1 reply; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-24 22:14 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 10/24/24 20:56, Jens Axboe wrote:
> On 10/24/24 12:13 PM, Pavel Begunkov wrote:
>> On 10/24/24 19:00, Jens Axboe wrote:
>>> On 10/24/24 11:56 AM, Pavel Begunkov wrote:
>>>> On 10/24/24 18:19, Jens Axboe wrote:
>>>>> On 10/24/24 10:06 AM, Pavel Begunkov wrote:
>>>>>> On 10/24/24 16:45, Jens Axboe wrote:
>> ...>>>> Seems like you're agreeing but then stating the opposite, there
>>>>>> is some confusion. I'm saying that IMHO the right API wise way
>>>>>> is resolving an imu at issue time, just like it's done for fixed
>>>>>> files, and what your recent series did for send zc.
>>>>>
>>>>> Yeah early morning confusion I guess. And I do agree in principle,
>>>>> though for registered buffers, those have to be registered upfront
>>>>> anyway, so no confusion possible with prep vs issue there. For provided
>>>>> buffers, it only matters for the legacy ones, which generally should not
>>>>> be used. Doesn't change the fact that you're technically correct, the
>>>>> right time to resolve them would be at issue time.
>>>>
>>>> I'm talking about sendmsg with iovec. Registered buffers should
>>>> be registered upfront, that's right, but iovec should be copied
>>>> at prep, and finally resolved into bvecs incl the imu/buffer lookup
>>>> at the issue time. And those are two different points in time,
>>>> maybe because of links, draining or anything else. And if they
>>>> should be at different moments, there is no way to do it while
>>>> copying iovec.
>>>
>>> Oh I totally follow, the incremental approach would only work if it can
>>> be done at prep time. If at issue time, then it has to turn an existing
>>> iovec array into the appropriate bvec array. And that's where you'd have
>>> to do some clever bits to avoid holding both a full bvec and iovec array
>>> in memory, which would be pretty wasteful/inefficient. If done at issue
>>
>> Why would it be wasteful and inefficient? No more than jumping
>> though that incremental infra for each chunk, doubling the size
>> of the array / reallocating / memcpy'ing it, instead of a tight
>> loop doing the entire conversion.
> 
> Because it would prevent doing an iovec at-the-time import, then turning
> it into the desired bvec. That's one loop instead of two. You would have
> the space upfront, there should be no need to realloc+memcpy. And then
> there's the space concern, where the initial import is an iovec, and
> then you need a bvec. For 64-bit that's fine as they take up the same
> amount of space,

That's not true, each iov can produce multiple bvec entries so
iovs might get overwritten if you do it the simplest way.

> but for 32-bit it'd make incremental importing from a
> stable iovec to a bvec array a bit more tricky (and would need realloc,
> unless you over-alloc'ed for the iovec array upfront).

And that's not true, you can still well do it in place if
iovec is placed right in the memory, which I explicitly
noted there are simple enough ways to do it in place
without extra reallocs.

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-24 22:14                   ` Pavel Begunkov
@ 2024-10-24 22:22                     ` Jens Axboe
  2024-10-24 22:51                       ` Pavel Begunkov
  0 siblings, 1 reply; 17+ messages in thread
From: Jens Axboe @ 2024-10-24 22:22 UTC (permalink / raw)
  To: Pavel Begunkov, io-uring

On 10/24/24 4:14 PM, Pavel Begunkov wrote:
> On 10/24/24 20:56, Jens Axboe wrote:
>> On 10/24/24 12:13 PM, Pavel Begunkov wrote:
>>> On 10/24/24 19:00, Jens Axboe wrote:
>>>> On 10/24/24 11:56 AM, Pavel Begunkov wrote:
>>>>> On 10/24/24 18:19, Jens Axboe wrote:
>>>>>> On 10/24/24 10:06 AM, Pavel Begunkov wrote:
>>>>>>> On 10/24/24 16:45, Jens Axboe wrote:
>>> ...>>>> Seems like you're agreeing but then stating the opposite, there
>>>>>>> is some confusion. I'm saying that IMHO the right API wise way
>>>>>>> is resolving an imu at issue time, just like it's done for fixed
>>>>>>> files, and what your recent series did for send zc.
>>>>>>
>>>>>> Yeah early morning confusion I guess. And I do agree in principle,
>>>>>> though for registered buffers, those have to be registered upfront
>>>>>> anyway, so no confusion possible with prep vs issue there. For provided
>>>>>> buffers, it only matters for the legacy ones, which generally should not
>>>>>> be used. Doesn't change the fact that you're technically correct, the
>>>>>> right time to resolve them would be at issue time.
>>>>>
>>>>> I'm talking about sendmsg with iovec. Registered buffers should
>>>>> be registered upfront, that's right, but iovec should be copied
>>>>> at prep, and finally resolved into bvecs incl the imu/buffer lookup
>>>>> at the issue time. And those are two different points in time,
>>>>> maybe because of links, draining or anything else. And if they
>>>>> should be at different moments, there is no way to do it while
>>>>> copying iovec.
>>>>
>>>> Oh I totally follow, the incremental approach would only work if it can
>>>> be done at prep time. If at issue time, then it has to turn an existing
>>>> iovec array into the appropriate bvec array. And that's where you'd have
>>>> to do some clever bits to avoid holding both a full bvec and iovec array
>>>> in memory, which would be pretty wasteful/inefficient. If done at issue
>>>
>>> Why would it be wasteful and inefficient? No more than jumping
>>> though that incremental infra for each chunk, doubling the size
>>> of the array / reallocating / memcpy'ing it, instead of a tight
>>> loop doing the entire conversion.
>>
>> Because it would prevent doing an iovec at-the-time import, then turning
>> it into the desired bvec. That's one loop instead of two. You would have
>> the space upfront, there should be no need to realloc+memcpy. And then
>> there's the space concern, where the initial import is an iovec, and
>> then you need a bvec. For 64-bit that's fine as they take up the same
>> amount of space,
> 
> That's not true, each iov can produce multiple bvec entries so
> iovs might get overwritten if you do it the simplest way.

What part isn't true? Yeah one iovec can turn into multiple bvec
segments, the provided send zc stuff I sent does deal with that. So yeah
it's not necessarily a 1:1 mapping, and even if they have the same size,
you may need more elements on the bvec size.

Doesn't change the fact that you can loop once and do it. If you need to
expand the bvec size, that would be a realloc+copy. But that part is
true even if you first import all iovecs, and then iterate them to map
the bvecs. Unless you do some upfront tracking to know how many elements
you need, but that would seem overly convoluted. With caching, the
expansion should be a rare occurence outside of the initial import into
a new region.

>> but for 32-bit it'd make incremental importing from a
>> stable iovec to a bvec array a bit more tricky (and would need realloc,
>> unless you over-alloc'ed for the iovec array upfront).
> 
> And that's not true, you can still well do it in place if
> iovec is placed right in the memory, which I explicitly
> noted there are simple enough ways to do it in place
> without extra reallocs.

I don't think anything stated there is untrue, just saying it's a bit
more tricky. Which is certainly true, if it's the same memory region and
there's overlaps. But let's just see the code for it, much easier to
discuss over those parts rather than pontificate hypotheticals :-)

-- 
Jens Axboe

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH 0/4] implement vectored registered buffers for sendzc
  2024-10-24 22:22                     ` Jens Axboe
@ 2024-10-24 22:51                       ` Pavel Begunkov
  0 siblings, 0 replies; 17+ messages in thread
From: Pavel Begunkov @ 2024-10-24 22:51 UTC (permalink / raw)
  To: Jens Axboe, io-uring

On 10/24/24 23:22, Jens Axboe wrote:
> On 10/24/24 4:14 PM, Pavel Begunkov wrote:
>> On 10/24/24 20:56, Jens Axboe wrote:
>>> On 10/24/24 12:13 PM, Pavel Begunkov wrote:
>>>> On 10/24/24 19:00, Jens Axboe wrote:
>>>>> On 10/24/24 11:56 AM, Pavel Begunkov wrote:
>>>>>> On 10/24/24 18:19, Jens Axboe wrote:
>>>>>>> On 10/24/24 10:06 AM, Pavel Begunkov wrote:
>>>>>>>> On 10/24/24 16:45, Jens Axboe wrote:
>>>> ...>>>> Seems like you're agreeing but then stating the opposite, there
>>>>>>>> is some confusion. I'm saying that IMHO the right API wise way
>>>>>>>> is resolving an imu at issue time, just like it's done for fixed
>>>>>>>> files, and what your recent series did for send zc.
>>>>>>>
>>>>>>> Yeah early morning confusion I guess. And I do agree in principle,
>>>>>>> though for registered buffers, those have to be registered upfront
>>>>>>> anyway, so no confusion possible with prep vs issue there. For provided
>>>>>>> buffers, it only matters for the legacy ones, which generally should not
>>>>>>> be used. Doesn't change the fact that you're technically correct, the
>>>>>>> right time to resolve them would be at issue time.
>>>>>>
>>>>>> I'm talking about sendmsg with iovec. Registered buffers should
>>>>>> be registered upfront, that's right, but iovec should be copied
>>>>>> at prep, and finally resolved into bvecs incl the imu/buffer lookup
>>>>>> at the issue time. And those are two different points in time,
>>>>>> maybe because of links, draining or anything else. And if they
>>>>>> should be at different moments, there is no way to do it while
>>>>>> copying iovec.
>>>>>
>>>>> Oh I totally follow, the incremental approach would only work if it can
>>>>> be done at prep time. If at issue time, then it has to turn an existing
>>>>> iovec array into the appropriate bvec array. And that's where you'd have
>>>>> to do some clever bits to avoid holding both a full bvec and iovec array
>>>>> in memory, which would be pretty wasteful/inefficient. If done at issue
>>>>
>>>> Why would it be wasteful and inefficient? No more than jumping
>>>> though that incremental infra for each chunk, doubling the size
>>>> of the array / reallocating / memcpy'ing it, instead of a tight
>>>> loop doing the entire conversion.
>>>
>>> Because it would prevent doing an iovec at-the-time import, then turning
>>> it into the desired bvec. That's one loop instead of two. You would have
>>> the space upfront, there should be no need to realloc+memcpy. And then
>>> there's the space concern, where the initial import is an iovec, and
>>> then you need a bvec. For 64-bit that's fine as they take up the same
>>> amount of space,
>>
>> That's not true, each iov can produce multiple bvec entries so
>> iovs might get overwritten if you do it the simplest way.
> 
> What part isn't true? Yeah one iovec can turn into multiple bvec
> segments, the provided send zc stuff I sent does deal with that. So yeah
> it's not necessarily a 1:1 mapping, and even if they have the same size,
> you may need more elements on the bvec size.

Ok, you didn't state why 64 bit is fine, what I'm saying is that
irrelevant of the element size, if you have an iovec array with
free space at the end just enough so that after overwriting iovecs
it can fit in the resulting bvec, a simple in place algorithm from
left to right will still fail.

> Doesn't change the fact that you can loop once and do it. If you need to
> expand the bvec size, that would be a realloc+copy. But that part is
> true even if you first import all iovecs, and then iterate them to map
> the bvecs. Unless you do some upfront tracking to know how many elements
> you need, but that would seem overly convoluted. With caching, the
> expansion should be a rare occurence outside of the initial import into
> a new region.
> 
>>> but for 32-bit it'd make incremental importing from a
>>> stable iovec to a bvec array a bit more tricky (and would need realloc,
>>> unless you over-alloc'ed for the iovec array upfront).
>>
>> And that's not true, you can still well do it in place if
>> iovec is placed right in the memory, which I explicitly
>> noted there are simple enough ways to do it in place
>> without extra reallocs.
> 
> I don't think anything stated there is untrue, just saying it's a bit

"and would need realloc, unless you over-alloc'ed for the iovec array
upfront", that one is if in the second part you're talking about
array_size = bvec_size + iovec_size (in bytes). All you need is max
of two for making it in place.

> more tricky. Which is certainly true, if it's the same memory region and
> there's overlaps. But let's just see the code for it, much easier to
> discuss over those parts rather than pontificate hypotheticals :-)
> 

-- 
Pavel Begunkov

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2024-10-24 22:51 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-23  2:38 [PATCH 0/4] implement vectored registered buffers for sendzc Pavel Begunkov
2024-10-23  2:38 ` [PATCH 1/4] io_uring/net: introduce io_kmsg_set_iovec Pavel Begunkov
2024-10-23  2:38 ` [PATCH 2/4] io_uring/net: allow mixed bvec/iovec caching Pavel Begunkov
2024-10-23  2:38 ` [PATCH 3/4] io_uring: vectored registered buffer import Pavel Begunkov
2024-10-23  2:38 ` [PATCH 4/4] io_uring/net: sendzc with vectored fixed buffers Pavel Begunkov
2024-10-23 13:52 ` [PATCH 0/4] implement vectored registered buffers for sendzc Jens Axboe
2024-10-24 15:29   ` Pavel Begunkov
2024-10-24 15:45     ` Jens Axboe
2024-10-24 16:06       ` Pavel Begunkov
2024-10-24 17:19         ` Jens Axboe
2024-10-24 17:56           ` Pavel Begunkov
2024-10-24 18:00             ` Jens Axboe
2024-10-24 18:13               ` Pavel Begunkov
2024-10-24 19:56                 ` Jens Axboe
2024-10-24 22:14                   ` Pavel Begunkov
2024-10-24 22:22                     ` Jens Axboe
2024-10-24 22:51                       ` Pavel Begunkov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).