All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting
@ 2022-08-10 15:49 Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 01/11] net: introduce struct ubuf_info_msgzc Pavel Begunkov
                   ` (10 more replies)
  0 siblings, 11 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

There is a couple of tricks we can do with io_uring to improve ubuf_info
refcounting. First, we ammortise reference grabbing and then give them
away to the network layer, which is implemented in 8 and 11. Also, we
don't need need additional pinning for TCP, which is removed by 7.

1-4 are needed because otherwise we're out of space in io_notif_data and
using ->desc or some other field of ubuf_info would be ugly. It'll also
facilitate further ideas like adding a simpler notification model for UDP.

liburing/examples/io_uring-sendzc benchmark using a branch containing the
patchset and some more [1] showed ~1.6% qps improvement for UDP (dummy dev),
and ~1% for TCP (localhost + hacks enabling zc).

I didn't specifically test xen and vhost and not sure how, would love
some help with that.

[1] https://github.com/isilence/linux/tree/net/zc-ref-optimisation

Pavel Begunkov (11):
  net: introduce struct ubuf_info_msgzc
  xen/netback: use struct ubuf_info_msgzc
  vhost/net: use struct ubuf_info_msgzc
  net: shrink struct ubuf_info
  net: rename ubuf_info's flags
  net: add flags for controlling ubuf_info
  net/tcp: optimise tcp ubuf refcounting
  net: let callers provide ->msg_ubuf refs
  io_uring/notif: add helper for flushing refs
  io_uring/notif: mark notifs with UARGFL_CALLER_PINNED
  io_uring/notif: add ubuf_info ref caching

 drivers/net/xen-netback/common.h    |  2 +-
 drivers/net/xen-netback/interface.c |  4 +--
 drivers/net/xen-netback/netback.c   |  7 +++---
 drivers/vhost/net.c                 | 17 +++++++------
 include/linux/skbuff.h              | 35 +++++++++++++++++++++++---
 io_uring/net.c                      |  8 +++++-
 io_uring/notif.c                    | 21 ++++++++++------
 io_uring/notif.h                    | 22 +++++++++++++++-
 net/core/skbuff.c                   | 39 ++++++++++++++++-------------
 net/ipv4/ip_output.c                |  3 ++-
 net/ipv4/tcp.c                      | 11 +++++---
 net/ipv6/ip6_output.c               |  3 ++-
 12 files changed, 123 insertions(+), 49 deletions(-)

-- 
2.37.0


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 01/11] net: introduce struct ubuf_info_msgzc
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 02/11] xen/netback: use " Pavel Begunkov
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

We're going to split struct ubuf_info and leave there only
mandatory fields. Users are free to extend it. Add struct
ubuf_info_msgzc, which will be an extended version for MSG_ZEROCOPY and
some other users. It duplicates of struct ubuf_info for now and will be
removed in a couple of patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h | 21 +++++++++++++++++++++
 1 file changed, 21 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ca8afa382bf2..f8ac3678dab8 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -554,7 +554,28 @@ struct ubuf_info {
 	} mmp;
 };
 
+struct ubuf_info_msgzc {
+	struct ubuf_info ubuf;
+
+	union {
+		struct {
+			unsigned long desc;
+			void *ctx;
+		};
+		struct {
+			u32 id;
+			u16 len;
+			u16 zerocopy:1;
+			u32 bytelen;
+		};
+	};
+
+	struct mmpin mmp;
+};
+
 #define skb_uarg(SKB)	((struct ubuf_info *)(skb_shinfo(SKB)->destructor_arg))
+#define uarg_to_msgzc(ubuf_ptr)	container_of((ubuf_ptr), struct ubuf_info_msgzc, \
+					     ubuf)
 
 int mm_account_pinned_pages(struct mmpin *mmp, size_t size);
 void mm_unaccount_pinned_pages(struct mmpin *mmp);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 02/11] xen/netback: use struct ubuf_info_msgzc
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 01/11] net: introduce struct ubuf_info_msgzc Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 03/11] vhost/net: " Pavel Begunkov
                   ` (8 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

struct ubuf_info will be changed, use ubuf_info_msgzc instead.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/net/xen-netback/common.h    | 2 +-
 drivers/net/xen-netback/interface.c | 4 ++--
 drivers/net/xen-netback/netback.c   | 7 ++++---
 3 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/xen-netback/common.h b/drivers/net/xen-netback/common.h
index 8174d7b2966c..1545cbee77a4 100644
--- a/drivers/net/xen-netback/common.h
+++ b/drivers/net/xen-netback/common.h
@@ -62,7 +62,7 @@ struct pending_tx_info {
 	 * ubuf_to_vif is a helper which finds the struct xenvif from a pointer
 	 * to this field.
 	 */
-	struct ubuf_info callback_struct;
+	struct ubuf_info_msgzc callback_struct;
 };
 
 #define XEN_NETIF_TX_RING_SIZE __CONST_RING_SIZE(xen_netif_tx, XEN_PAGE_SIZE)
diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c
index fb32ae82d9b0..e579ecd40b74 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -591,8 +591,8 @@ int xenvif_init_queue(struct xenvif_queue *queue)
 	}
 
 	for (i = 0; i < MAX_PENDING_REQS; i++) {
-		queue->pending_tx_info[i].callback_struct = (struct ubuf_info)
-			{ .callback = xenvif_zerocopy_callback,
+		queue->pending_tx_info[i].callback_struct = (struct ubuf_info_msgzc)
+			{ { .callback = xenvif_zerocopy_callback },
 			  { { .ctx = NULL,
 			      .desc = i } } };
 		queue->grant_tx_handle[i] = NETBACK_INVALID_HANDLE;
diff --git a/drivers/net/xen-netback/netback.c b/drivers/net/xen-netback/netback.c
index a256695fc89e..3d2081bbbc86 100644
--- a/drivers/net/xen-netback/netback.c
+++ b/drivers/net/xen-netback/netback.c
@@ -133,7 +133,7 @@ static inline unsigned long idx_to_kaddr(struct xenvif_queue *queue,
 
 /* Find the containing VIF's structure from a pointer in pending_tx_info array
  */
-static inline struct xenvif_queue *ubuf_to_queue(const struct ubuf_info *ubuf)
+static inline struct xenvif_queue *ubuf_to_queue(const struct ubuf_info_msgzc *ubuf)
 {
 	u16 pending_idx = ubuf->desc;
 	struct pending_tx_info *temp =
@@ -1228,11 +1228,12 @@ static int xenvif_tx_submit(struct xenvif_queue *queue)
 	return work_done;
 }
 
-void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf,
+void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf_base,
 			      bool zerocopy_success)
 {
 	unsigned long flags;
 	pending_ring_idx_t index;
+	struct ubuf_info_msgzc *ubuf = uarg_to_msgzc(ubuf_base);
 	struct xenvif_queue *queue = ubuf_to_queue(ubuf);
 
 	/* This is the only place where we grab this lock, to protect callbacks
@@ -1241,7 +1242,7 @@ void xenvif_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *ubuf,
 	spin_lock_irqsave(&queue->callback_lock, flags);
 	do {
 		u16 pending_idx = ubuf->desc;
-		ubuf = (struct ubuf_info *) ubuf->ctx;
+		ubuf = (struct ubuf_info_msgzc *) ubuf->ctx;
 		BUG_ON(queue->dealloc_prod - queue->dealloc_cons >=
 			MAX_PENDING_REQS);
 		index = pending_index(queue->dealloc_prod);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 03/11] vhost/net: use struct ubuf_info_msgzc
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 01/11] net: introduce struct ubuf_info_msgzc Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 02/11] xen/netback: use " Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 04/11] net: shrink struct ubuf_info Pavel Begunkov
                   ` (7 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

struct ubuf_info will be changed, use ubuf_info_msgzc instead.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 drivers/vhost/net.c | 17 +++++++++--------
 1 file changed, 9 insertions(+), 8 deletions(-)

diff --git a/drivers/vhost/net.c b/drivers/vhost/net.c
index 68e4ecd1cc0e..9b616536dd9e 100644
--- a/drivers/vhost/net.c
+++ b/drivers/vhost/net.c
@@ -118,7 +118,7 @@ struct vhost_net_virtqueue {
 	/* Number of XDP frames batched */
 	int batched_xdp;
 	/* an array of userspace buffers info */
-	struct ubuf_info *ubuf_info;
+	struct ubuf_info_msgzc *ubuf_info;
 	/* Reference counting for outstanding ubufs.
 	 * Protected by vq mutex. Writers must also take device mutex. */
 	struct vhost_net_ubuf_ref *ubufs;
@@ -288,7 +288,7 @@ static int vhost_net_set_ubuf_info(struct vhost_net *n)
 		n->vqs[i].ubuf_info =
 			kmalloc_array(UIO_MAXIOV,
 				      sizeof(*n->vqs[i].ubuf_info),
-				      GFP_KERNEL);
+				      GFP_KERNEL | __GFP_ZERO);
 		if  (!n->vqs[i].ubuf_info)
 			goto err;
 	}
@@ -382,8 +382,9 @@ static void vhost_zerocopy_signal_used(struct vhost_net *net,
 }
 
 static void vhost_zerocopy_callback(struct sk_buff *skb,
-				    struct ubuf_info *ubuf, bool success)
+				    struct ubuf_info *ubuf_base, bool success)
 {
+	struct ubuf_info_msgzc *ubuf = uarg_to_msgzc(ubuf_base);
 	struct vhost_net_ubuf_ref *ubufs = ubuf->ctx;
 	struct vhost_virtqueue *vq = ubufs->vq;
 	int cnt;
@@ -871,7 +872,7 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 	size_t len, total_len = 0;
 	int err;
 	struct vhost_net_ubuf_ref *ubufs;
-	struct ubuf_info *ubuf;
+	struct ubuf_info_msgzc *ubuf;
 	bool zcopy_used;
 	int sent_pkts = 0;
 
@@ -907,14 +908,14 @@ static void handle_tx_zerocopy(struct vhost_net *net, struct socket *sock)
 			ubuf = nvq->ubuf_info + nvq->upend_idx;
 			vq->heads[nvq->upend_idx].id = cpu_to_vhost32(vq, head);
 			vq->heads[nvq->upend_idx].len = VHOST_DMA_IN_PROGRESS;
-			ubuf->callback = vhost_zerocopy_callback;
 			ubuf->ctx = nvq->ubufs;
 			ubuf->desc = nvq->upend_idx;
-			ubuf->flags = SKBFL_ZEROCOPY_FRAG;
-			refcount_set(&ubuf->refcnt, 1);
+			ubuf->ubuf.callback = vhost_zerocopy_callback;
+			ubuf->ubuf.flags = SKBFL_ZEROCOPY_FRAG;
+			refcount_set(&ubuf->ubuf.refcnt, 1);
 			msg.msg_control = &ctl;
 			ctl.type = TUN_MSG_UBUF;
-			ctl.ptr = ubuf;
+			ctl.ptr = &ubuf->ubuf;
 			msg.msg_controllen = sizeof(ctl);
 			ubufs = nvq->ubufs;
 			atomic_inc(&ubufs->refcount);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 04/11] net: shrink struct ubuf_info
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
                   ` (2 preceding siblings ...)
  2022-08-10 15:49 ` [RFC net-next io_uring 03/11] vhost/net: " Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 05/11] net: rename ubuf_info's flags Pavel Begunkov
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

We can benefit from a smaller struct ubuf_info, so leave only mandatory
fields and let users to decide how they want to extend it. Convert
MSG_ZEROCOPY to struct ubuf_info_msgzc and remove duplicated fields.
This reduces the size from 48 bytes to just 16.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h | 22 ++++------------------
 net/core/skbuff.c      | 38 +++++++++++++++++++++-----------------
 net/ipv4/ip_output.c   |  2 +-
 net/ipv4/tcp.c         |  2 +-
 net/ipv6/ip6_output.c  |  2 +-
 5 files changed, 28 insertions(+), 38 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index f8ac3678dab8..afd7400d7f62 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -533,25 +533,8 @@ enum {
 struct ubuf_info {
 	void (*callback)(struct sk_buff *, struct ubuf_info *,
 			 bool zerocopy_success);
-	union {
-		struct {
-			unsigned long desc;
-			void *ctx;
-		};
-		struct {
-			u32 id;
-			u16 len;
-			u16 zerocopy:1;
-			u32 bytelen;
-		};
-	};
 	refcount_t refcnt;
 	u8 flags;
-
-	struct mmpin {
-		struct user_struct *user;
-		unsigned int num_pg;
-	} mmp;
 };
 
 struct ubuf_info_msgzc {
@@ -570,7 +553,10 @@ struct ubuf_info_msgzc {
 		};
 	};
 
-	struct mmpin mmp;
+	struct mmpin {
+		struct user_struct *user;
+		unsigned int num_pg;
+	} mmp;
 };
 
 #define skb_uarg(SKB)	((struct ubuf_info *)(skb_shinfo(SKB)->destructor_arg))
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 974bbbbe7138..b047a773acd7 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1183,7 +1183,7 @@ EXPORT_SYMBOL_GPL(mm_unaccount_pinned_pages);
 
 static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size)
 {
-	struct ubuf_info *uarg;
+	struct ubuf_info_msgzc *uarg;
 	struct sk_buff *skb;
 
 	WARN_ON_ONCE(!in_task());
@@ -1201,19 +1201,19 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size)
 		return NULL;
 	}
 
-	uarg->callback = msg_zerocopy_callback;
+	uarg->ubuf.callback = msg_zerocopy_callback;
 	uarg->id = ((u32)atomic_inc_return(&sk->sk_zckey)) - 1;
 	uarg->len = 1;
 	uarg->bytelen = size;
 	uarg->zerocopy = 1;
-	uarg->flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN;
-	refcount_set(&uarg->refcnt, 1);
+	uarg->ubuf.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN;
+	refcount_set(&uarg->ubuf.refcnt, 1);
 	sock_hold(sk);
 
-	return uarg;
+	return &uarg->ubuf;
 }
 
-static inline struct sk_buff *skb_from_uarg(struct ubuf_info *uarg)
+static inline struct sk_buff *skb_from_uarg(struct ubuf_info_msgzc *uarg)
 {
 	return container_of((void *)uarg, struct sk_buff, cb);
 }
@@ -1222,6 +1222,7 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
 				       struct ubuf_info *uarg)
 {
 	if (uarg) {
+		struct ubuf_info_msgzc *uarg_zc;
 		const u32 byte_limit = 1 << 19;		/* limit to a few TSO */
 		u32 bytelen, next;
 
@@ -1237,8 +1238,9 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
 			return NULL;
 		}
 
-		bytelen = uarg->bytelen + size;
-		if (uarg->len == USHRT_MAX - 1 || bytelen > byte_limit) {
+		uarg_zc = uarg_to_msgzc(uarg);
+		bytelen = uarg_zc->bytelen + size;
+		if (uarg_zc->len == USHRT_MAX - 1 || bytelen > byte_limit) {
 			/* TCP can create new skb to attach new uarg */
 			if (sk->sk_type == SOCK_STREAM)
 				goto new_alloc;
@@ -1246,11 +1248,11 @@ struct ubuf_info *msg_zerocopy_realloc(struct sock *sk, size_t size,
 		}
 
 		next = (u32)atomic_read(&sk->sk_zckey);
-		if ((u32)(uarg->id + uarg->len) == next) {
-			if (mm_account_pinned_pages(&uarg->mmp, size))
+		if ((u32)(uarg_zc->id + uarg_zc->len) == next) {
+			if (mm_account_pinned_pages(&uarg_zc->mmp, size))
 				return NULL;
-			uarg->len++;
-			uarg->bytelen = bytelen;
+			uarg_zc->len++;
+			uarg_zc->bytelen = bytelen;
 			atomic_set(&sk->sk_zckey, ++next);
 
 			/* no extra ref when appending to datagram (MSG_MORE) */
@@ -1286,7 +1288,7 @@ static bool skb_zerocopy_notify_extend(struct sk_buff *skb, u32 lo, u16 len)
 	return true;
 }
 
-static void __msg_zerocopy_callback(struct ubuf_info *uarg)
+static void __msg_zerocopy_callback(struct ubuf_info_msgzc *uarg)
 {
 	struct sk_buff *tail, *skb = skb_from_uarg(uarg);
 	struct sock_exterr_skb *serr;
@@ -1339,19 +1341,21 @@ static void __msg_zerocopy_callback(struct ubuf_info *uarg)
 void msg_zerocopy_callback(struct sk_buff *skb, struct ubuf_info *uarg,
 			   bool success)
 {
-	uarg->zerocopy = uarg->zerocopy & success;
+	struct ubuf_info_msgzc *uarg_zc = uarg_to_msgzc(uarg);
+
+	uarg_zc->zerocopy = uarg_zc->zerocopy & success;
 
 	if (refcount_dec_and_test(&uarg->refcnt))
-		__msg_zerocopy_callback(uarg);
+		__msg_zerocopy_callback(uarg_zc);
 }
 EXPORT_SYMBOL_GPL(msg_zerocopy_callback);
 
 void msg_zerocopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
-	struct sock *sk = skb_from_uarg(uarg)->sk;
+	struct sock *sk = skb_from_uarg(uarg_to_msgzc(uarg))->sk;
 
 	atomic_dec(&sk->sk_zckey);
-	uarg->len--;
+	uarg_to_msgzc(uarg)->len--;
 
 	if (have_uref)
 		msg_zerocopy_callback(NULL, uarg, true);
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index d7bd1daf022b..546897a4b4fa 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1043,7 +1043,7 @@ static int __ip_append_data(struct sock *sk,
 				paged = true;
 				zc = true;
 			} else {
-				uarg->zerocopy = 0;
+				uarg_to_msgzc(uarg)->zerocopy = 0;
 				skb_zcopy_set(skb, uarg, &extra_uref);
 			}
 		}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 970e9a2cca4a..3152da8f4763 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1239,7 +1239,7 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 			}
 			zc = sk->sk_route_caps & NETIF_F_SG;
 			if (!zc)
-				uarg->zerocopy = 0;
+				uarg_to_msgzc(uarg)->zerocopy = 0;
 		}
 	}
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 897ca4f9b791..6d4f01a0cf6e 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1568,7 +1568,7 @@ static int __ip6_append_data(struct sock *sk,
 				paged = true;
 				zc = true;
 			} else {
-				uarg->zerocopy = 0;
+				uarg_to_msgzc(uarg)->zerocopy = 0;
 				skb_zcopy_set(skb, uarg, &extra_uref);
 			}
 		}
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 05/11] net: rename ubuf_info's flags
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
                   ` (3 preceding siblings ...)
  2022-08-10 15:49 ` [RFC net-next io_uring 04/11] net: shrink struct ubuf_info Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 06/11] net: add flags for controlling ubuf_info Pavel Begunkov
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

ubuf_info::flags contains SKBFL_* flags that we copy into skbs, change
the field name to stress that it keeps skb flags.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h | 4 ++--
 io_uring/notif.c       | 2 +-
 net/core/skbuff.c      | 2 +-
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index afd7400d7f62..e749b5d3868d 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -534,7 +534,7 @@ struct ubuf_info {
 	void (*callback)(struct sk_buff *, struct ubuf_info *,
 			 bool zerocopy_success);
 	refcount_t refcnt;
-	u8 flags;
+	u8 skb_flags;
 };
 
 struct ubuf_info_msgzc {
@@ -1664,7 +1664,7 @@ static inline void net_zcopy_get(struct ubuf_info *uarg)
 static inline void skb_zcopy_init(struct sk_buff *skb, struct ubuf_info *uarg)
 {
 	skb_shinfo(skb)->destructor_arg = uarg;
-	skb_shinfo(skb)->flags |= uarg->flags;
+	skb_shinfo(skb)->flags |= uarg->skb_flags;
 }
 
 static inline void skb_zcopy_set(struct sk_buff *skb, struct ubuf_info *uarg,
diff --git a/io_uring/notif.c b/io_uring/notif.c
index b5f989dff9de..97cb4a7e8849 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -65,7 +65,7 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx,
 
 	nd = io_notif_to_data(notif);
 	nd->account_pages = 0;
-	nd->uarg.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN;
+	nd->uarg.skb_flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN;
 	nd->uarg.callback = io_uring_tx_zerocopy_callback;
 	/* master ref owned by io_notif_slot, will be dropped on flush */
 	refcount_set(&nd->uarg.refcnt, 1);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index b047a773acd7..40bb84986800 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1206,7 +1206,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size)
 	uarg->len = 1;
 	uarg->bytelen = size;
 	uarg->zerocopy = 1;
-	uarg->ubuf.flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN;
+	uarg->ubuf.skb_flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN;
 	refcount_set(&uarg->ubuf.refcnt, 1);
 	sock_hold(sk);
 
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 06/11] net: add flags for controlling ubuf_info
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
                   ` (4 preceding siblings ...)
  2022-08-10 15:49 ` [RFC net-next io_uring 05/11] net: rename ubuf_info's flags Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 07/11] net/tcp: optimise tcp ubuf refcounting Pavel Begunkov
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

There are already skb_flags in ubuf_info, which enhancing skbs. Also add
flags controlling ubuf_info, mainly to hint about various referencing
aspects of it, which will be introduced in later patches.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h | 1 +
 io_uring/notif.c       | 1 +
 net/core/skbuff.c      | 1 +
 3 files changed, 3 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index e749b5d3868d..2b2e0020030b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -535,6 +535,7 @@ struct ubuf_info {
 			 bool zerocopy_success);
 	refcount_t refcnt;
 	u8 skb_flags;
+	u8 flags;
 };
 
 struct ubuf_info_msgzc {
diff --git a/io_uring/notif.c b/io_uring/notif.c
index 97cb4a7e8849..a2ba1e35a59f 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -66,6 +66,7 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx,
 	nd = io_notif_to_data(notif);
 	nd->account_pages = 0;
 	nd->uarg.skb_flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN;
+	nd->uarg.flags = 0;
 	nd->uarg.callback = io_uring_tx_zerocopy_callback;
 	/* master ref owned by io_notif_slot, will be dropped on flush */
 	refcount_set(&nd->uarg.refcnt, 1);
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 40bb84986800..7e102373482c 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1207,6 +1207,7 @@ static struct ubuf_info *msg_zerocopy_alloc(struct sock *sk, size_t size)
 	uarg->bytelen = size;
 	uarg->zerocopy = 1;
 	uarg->ubuf.skb_flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN;
+	uarg->ubuf.flags = 0;
 	refcount_set(&uarg->ubuf.refcnt, 1);
 	sock_hold(sk);
 
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 07/11] net/tcp: optimise tcp ubuf refcounting
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
                   ` (5 preceding siblings ...)
  2022-08-10 15:49 ` [RFC net-next io_uring 06/11] net: add flags for controlling ubuf_info Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 08/11] net: let callers provide ->msg_ubuf refs Pavel Begunkov
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

Add UARGFL_CALLER_PINNED letting protocols know that the caller holds a
reference to the ubuf_info and so it doesn't need additional refcounting
for purposes of keeping it alive. With that TCP can save a refcount
put/get pair per send when used with ->msg_ubuf.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h | 7 +++++++
 net/ipv4/tcp.c         | 9 ++++++---
 2 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2b2e0020030b..45fe7f0648d0 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -522,6 +522,13 @@ enum {
 #define SKBFL_ALL_ZEROCOPY	(SKBFL_ZEROCOPY_FRAG | SKBFL_PURE_ZEROCOPY | \
 				 SKBFL_DONT_ORPHAN | SKBFL_MANAGED_FRAG_REFS)
 
+enum {
+	/* The caller holds a reference during the submission so the ubuf won't
+	 * be freed until we return.
+	 */
+	UARGFL_CALLER_PINNED = BIT(0),
+};
+
 /*
  * The callback notifies userspace to release buffers when skb DMA is done in
  * lower device, the skb last reference should be 0 when calling this.
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3152da8f4763..4925107de57d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -1229,7 +1229,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 
 		if (msg->msg_ubuf) {
 			uarg = msg->msg_ubuf;
-			net_zcopy_get(uarg);
+			if (!(uarg->flags & UARGFL_CALLER_PINNED))
+				net_zcopy_get(uarg);
 			zc = sk->sk_route_caps & NETIF_F_SG;
 		} else if (sock_flag(sk, SOCK_ZEROCOPY)) {
 			uarg = msg_zerocopy_realloc(sk, size, skb_zcopy(skb));
@@ -1455,7 +1456,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 		tcp_push(sk, flags, mss_now, tp->nonagle, size_goal);
 	}
 out_nopush:
-	net_zcopy_put(uarg);
+	if (uarg && !(uarg->flags & UARGFL_CALLER_PINNED))
+		net_zcopy_put(uarg);
 	return copied + copied_syn;
 
 do_error:
@@ -1464,7 +1466,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size)
 	if (copied + copied_syn)
 		goto out;
 out_err:
-	net_zcopy_put_abort(uarg, true);
+	if (uarg && !(uarg->flags & UARGFL_CALLER_PINNED))
+		net_zcopy_put_abort(uarg, true);
 	err = sk_stream_error(sk, flags, err);
 	/* make sure we wake any epoll edge trigger waiter */
 	if (unlikely(tcp_rtx_and_write_queues_empty(sk) && err == -EAGAIN)) {
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 08/11] net: let callers provide ->msg_ubuf refs
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
                   ` (6 preceding siblings ...)
  2022-08-10 15:49 ` [RFC net-next io_uring 07/11] net/tcp: optimise tcp ubuf refcounting Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 09/11] io_uring/notif: add helper for flushing refs Pavel Begunkov
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

Some msg_ubuf providers like io_uring can keep elaborated ubuf_info
reference batching and caching, so it will be of benefit to let the
network layer to optionally steal some of the cached refs.

Add UARGFL_GIFT_REF, if set the caller has at least one extra reference
that it can gift away. If the network decides to take the ref it should
clear the flag.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 include/linux/skbuff.h | 14 ++++++++++++++
 net/ipv4/ip_output.c   |  1 +
 net/ipv6/ip6_output.c  |  1 +
 3 files changed, 16 insertions(+)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 45fe7f0648d0..972ec676e222 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -527,6 +527,11 @@ enum {
 	 * be freed until we return.
 	 */
 	UARGFL_CALLER_PINNED = BIT(0),
+
+	/* The caller can gift one ubuf reference. The flag should be cleared
+	 * when the reference is taken.
+	 */
+	UARGFL_GIFT_REF = BIT(1),
 };
 
 /*
@@ -1709,6 +1714,15 @@ static inline void net_zcopy_put(struct ubuf_info *uarg)
 		uarg->callback(NULL, uarg, true);
 }
 
+static inline bool net_zcopy_get_gift_ref(struct ubuf_info *uarg)
+{
+	bool has_ref;
+
+	has_ref = uarg->flags & UARGFL_GIFT_REF;
+	uarg->flags &= ~UARGFL_GIFT_REF;
+	return has_ref;
+}
+
 static inline void net_zcopy_put_abort(struct ubuf_info *uarg, bool have_uref)
 {
 	if (uarg) {
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 546897a4b4fa..9d42b6dd6b78 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1032,6 +1032,7 @@ static int __ip_append_data(struct sock *sk,
 				paged = true;
 				zc = true;
 				uarg = msg->msg_ubuf;
+				extra_uref = net_zcopy_get_gift_ref(uarg);
 			}
 		} else if (sock_flag(sk, SOCK_ZEROCOPY)) {
 			uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index 6d4f01a0cf6e..8d8a8bbdb8df 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1557,6 +1557,7 @@ static int __ip6_append_data(struct sock *sk,
 				paged = true;
 				zc = true;
 				uarg = msg->msg_ubuf;
+				extra_uref = net_zcopy_get_gift_ref(uarg);
 			}
 		} else if (sock_flag(sk, SOCK_ZEROCOPY)) {
 			uarg = msg_zerocopy_realloc(sk, length, skb_zcopy(skb));
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 09/11] io_uring/notif: add helper for flushing refs
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
                   ` (7 preceding siblings ...)
  2022-08-10 15:49 ` [RFC net-next io_uring 08/11] net: let callers provide ->msg_ubuf refs Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 10/11] io_uring/notif: mark notifs with UARGFL_CALLER_PINNED Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 11/11] io_uring/notif: add ubuf_info ref caching Pavel Begunkov
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

Add a helper for dropping notification references during flush. It's a
preparation patch, currently it's only one master ref, but we're going
to add ref caching.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/notif.c | 14 +++++++++-----
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/io_uring/notif.c b/io_uring/notif.c
index a2ba1e35a59f..5661681b3b44 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -73,6 +73,13 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx,
 	return notif;
 }
 
+static inline bool io_notif_drop_refs(struct io_notif_data *nd)
+{
+	int refs = 1;
+
+	return refcount_sub_and_test(refs, &nd->uarg.refcnt);
+}
+
 void io_notif_slot_flush(struct io_notif_slot *slot)
 	__must_hold(&ctx->uring_lock)
 {
@@ -81,8 +88,7 @@ void io_notif_slot_flush(struct io_notif_slot *slot)
 
 	slot->notif = NULL;
 
-	/* drop slot's master ref */
-	if (refcount_dec_and_test(&nd->uarg.refcnt))
+	if (io_notif_drop_refs(nd))
 		io_notif_complete(notif);
 }
 
@@ -97,13 +103,11 @@ __cold int io_notif_unregister(struct io_ring_ctx *ctx)
 	for (i = 0; i < ctx->nr_notif_slots; i++) {
 		struct io_notif_slot *slot = &ctx->notif_slots[i];
 		struct io_kiocb *notif = slot->notif;
-		struct io_notif_data *nd;
 
 		if (!notif)
 			continue;
-		nd = io_kiocb_to_cmd(notif);
 		slot->notif = NULL;
-		if (!refcount_dec_and_test(&nd->uarg.refcnt))
+		if (!io_notif_drop_refs(io_kiocb_to_cmd(notif)))
 			continue;
 		notif->io_task_work.func = __io_notif_complete_tw;
 		io_req_task_work_add(notif);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 10/11] io_uring/notif: mark notifs with UARGFL_CALLER_PINNED
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
                   ` (8 preceding siblings ...)
  2022-08-10 15:49 ` [RFC net-next io_uring 09/11] io_uring/notif: add helper for flushing refs Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  2022-08-10 15:49 ` [RFC net-next io_uring 11/11] io_uring/notif: add ubuf_info ref caching Pavel Begunkov
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

We always keep references to active notifications and drop them only
when we flush, so they're always pinned during sock_sendmsg() and we can
add UARGFL_CALLER_PINNED.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/notif.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/io_uring/notif.c b/io_uring/notif.c
index 5661681b3b44..dd346ea67580 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -66,7 +66,7 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx,
 	nd = io_notif_to_data(notif);
 	nd->account_pages = 0;
 	nd->uarg.skb_flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN;
-	nd->uarg.flags = 0;
+	nd->uarg.flags = UARGFL_CALLER_PINNED;
 	nd->uarg.callback = io_uring_tx_zerocopy_callback;
 	/* master ref owned by io_notif_slot, will be dropped on flush */
 	refcount_set(&nd->uarg.refcnt, 1);
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [RFC net-next io_uring 11/11] io_uring/notif: add ubuf_info ref caching
  2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
                   ` (9 preceding siblings ...)
  2022-08-10 15:49 ` [RFC net-next io_uring 10/11] io_uring/notif: mark notifs with UARGFL_CALLER_PINNED Pavel Begunkov
@ 2022-08-10 15:49 ` Pavel Begunkov
  10 siblings, 0 replies; 12+ messages in thread
From: Pavel Begunkov @ 2022-08-10 15:49 UTC (permalink / raw)
  To: io-uring, netdev
  Cc: Jens Axboe, David S . Miller, Jakub Kicinski, kernel-team,
	linux-kernel, xen-devel, Wei Liu, Paul Durrant, kvm,
	virtualization, Michael S . Tsirkin, Jason Wang, Pavel Begunkov

Cache some active notifier references at the io_uring side and get them
in batches, so the ammortised cost is low. Then these references can be
given away to the network layer using UARGFL_GIFT_REF.

Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 io_uring/net.c   |  8 +++++++-
 io_uring/notif.c |  6 ++++--
 io_uring/notif.h | 22 +++++++++++++++++++++-
 3 files changed, 32 insertions(+), 4 deletions(-)

diff --git a/io_uring/net.c b/io_uring/net.c
index e6fc9748fbd2..bdaf9b10bd1b 100644
--- a/io_uring/net.c
+++ b/io_uring/net.c
@@ -949,6 +949,7 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags)
 	struct io_sendzc *zc = io_kiocb_to_cmd(req);
 	struct io_notif_slot *notif_slot;
 	struct io_kiocb *notif;
+	struct ubuf_info *ubuf;
 	struct msghdr msg;
 	struct iovec iov;
 	struct socket *sock;
@@ -1007,10 +1008,15 @@ int io_sendzc(struct io_kiocb *req, unsigned int issue_flags)
 		min_ret = iov_iter_count(&msg.msg_iter);
 
 	msg.msg_flags = msg_flags;
-	msg.msg_ubuf = &io_notif_to_data(notif)->uarg;
 	msg.sg_from_iter = io_sg_from_iter;
+	msg.msg_ubuf = ubuf = &io_notif_to_data(notif)->uarg;
+	ubuf->flags |= UARGFL_GIFT_REF;
 	ret = sock_sendmsg(sock, &msg);
 
+	/* check if the send consumed an additional ref */
+	if (likely(!(ubuf->flags & UARGFL_GIFT_REF)))
+		io_notif_consume_ref(notif);
+
 	if (unlikely(ret < min_ret)) {
 		if (ret == -EAGAIN && (issue_flags & IO_URING_F_NONBLOCK))
 			return -EAGAIN;
diff --git a/io_uring/notif.c b/io_uring/notif.c
index dd346ea67580..73bbda5de07d 100644
--- a/io_uring/notif.c
+++ b/io_uring/notif.c
@@ -68,15 +68,17 @@ struct io_kiocb *io_alloc_notif(struct io_ring_ctx *ctx,
 	nd->uarg.skb_flags = SKBFL_ZEROCOPY_FRAG | SKBFL_DONT_ORPHAN;
 	nd->uarg.flags = UARGFL_CALLER_PINNED;
 	nd->uarg.callback = io_uring_tx_zerocopy_callback;
+	nd->cached_refs = IO_NOTIF_REF_CACHE_NR;
 	/* master ref owned by io_notif_slot, will be dropped on flush */
-	refcount_set(&nd->uarg.refcnt, 1);
+	refcount_set(&nd->uarg.refcnt, IO_NOTIF_REF_CACHE_NR + 1);
 	return notif;
 }
 
 static inline bool io_notif_drop_refs(struct io_notif_data *nd)
 {
-	int refs = 1;
+	int refs = nd->cached_refs + 1;
 
+	nd->cached_refs = 0;
 	return refcount_sub_and_test(refs, &nd->uarg.refcnt);
 }
 
diff --git a/io_uring/notif.h b/io_uring/notif.h
index 0819304d7e00..2a263055a53b 100644
--- a/io_uring/notif.h
+++ b/io_uring/notif.h
@@ -9,11 +9,14 @@
 
 #define IO_NOTIF_SPLICE_BATCH	32
 #define IORING_MAX_NOTIF_SLOTS (1U << 10)
+#define IO_NOTIF_REF_CACHE_NR	64
 
 struct io_notif_data {
 	struct file		*file;
-	struct ubuf_info	uarg;
 	unsigned long		account_pages;
+	/* extra uarg->refcnt refs */
+	int			cached_refs;
+	struct ubuf_info	uarg;
 };
 
 struct io_notif_slot {
@@ -88,3 +91,20 @@ static inline int io_notif_account_mem(struct io_kiocb *notif, unsigned len)
 	}
 	return 0;
 }
+
+static inline void io_notif_consume_ref(struct io_kiocb *notif)
+	__must_hold(&ctx->uring_lock)
+{
+	struct io_notif_data *nd = io_notif_to_data(notif);
+
+	nd->cached_refs--;
+
+	/*
+	* Issue sends without looking at notif->cached_refs first, so we
+	* always have to have at least one ref cached
+	*/
+	if (unlikely(!nd->cached_refs)) {
+		refcount_add(IO_NOTIF_REF_CACHE_NR, &nd->uarg.refcnt);
+		nd->cached_refs += IO_NOTIF_REF_CACHE_NR;
+	}
+}
-- 
2.37.0


^ permalink raw reply related	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-08-10 15:53 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-10 15:49 [RFC net-next io_uring 00/11] improve io_uring's ubuf_info refcounting Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 01/11] net: introduce struct ubuf_info_msgzc Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 02/11] xen/netback: use " Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 03/11] vhost/net: " Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 04/11] net: shrink struct ubuf_info Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 05/11] net: rename ubuf_info's flags Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 06/11] net: add flags for controlling ubuf_info Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 07/11] net/tcp: optimise tcp ubuf refcounting Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 08/11] net: let callers provide ->msg_ubuf refs Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 09/11] io_uring/notif: add helper for flushing refs Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 10/11] io_uring/notif: mark notifs with UARGFL_CALLER_PINNED Pavel Begunkov
2022-08-10 15:49 ` [RFC net-next io_uring 11/11] io_uring/notif: add ubuf_info ref caching Pavel Begunkov

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.