netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jonathan Lemon <jonathan.lemon@gmail.com>
To: <netdev@vger.kernel.org>
Cc: <kernel-team@fb.com>, <axboe@kernel.dk>
Subject: [RFC PATCH 20/21] core/skbuff: use skb_zdata for testing whether skb is zerocopy
Date: Thu, 18 Jun 2020 09:09:40 -0700	[thread overview]
Message-ID: <20200618160941.879717-21-jonathan.lemon@gmail.com> (raw)
In-Reply-To: <20200618160941.879717-1-jonathan.lemon@gmail.com>

skb_zcopy() flag indicates whether the skb has a zerocopy ubuf.
netgpu does not use ubufs, so skb_zdata() indicates whether the
skb is carrying zero copy data, and should be left alone, while
skb_zcopy() indicates whhether there is an attached ubuf.

Also, when a write() on a zero-copy socket returns EWOULDBLOCK,
this is not synchronized with select(), which will only look at
the send buffer, and return writability if there is tcp space.

This appears to be caused by some ubuf logic, leading to iperf
spending 70% of its time in select() for ZC transmits.  With this
change, the time spent drops to 20%.

Signed-off-by: Jonathan Lemon <jonathan.lemon@gmail.com>
---
 include/linux/skbuff.h | 24 +++++++++++++++++++++++-
 net/core/skbuff.c      | 16 ++++++++++++----
 2 files changed, 35 insertions(+), 5 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index ba41d1a383f8..3c2efd45655b 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -443,8 +443,12 @@ enum {
 
 	/* generate software time stamp when entering packet scheduling */
 	SKBTX_SCHED_TSTAMP = 1 << 6,
+
+	/* fragments are accessed only via DMA */
+	SKBTX_DEV_NETDMA = 1 << 7,
 };
 
+#define SKBTX_ZERODATA_FRAG	(SKBTX_DEV_ZEROCOPY | SKBTX_DEV_NETDMA)
 #define SKBTX_ZEROCOPY_FRAG	(SKBTX_DEV_ZEROCOPY | SKBTX_SHARED_FRAG)
 #define SKBTX_ANY_SW_TSTAMP	(SKBTX_SW_TSTAMP    | \
 				 SKBTX_SCHED_TSTAMP)
@@ -1416,6 +1420,24 @@ static inline struct skb_shared_hwtstamps *skb_hwtstamps(struct sk_buff *skb)
 	return &skb_shinfo(skb)->hwtstamps;
 }
 
+static inline bool skb_netdma(struct sk_buff *skb)
+{
+	return skb && skb_shinfo(skb)->tx_flags & SKBTX_DEV_NETDMA;
+}
+
+static inline bool skb_zdata(struct sk_buff *skb)
+{
+	return skb && skb_shinfo(skb)->tx_flags & SKBTX_ZERODATA_FRAG;
+}
+
+static inline void skb_netdma_set(struct sk_buff *skb, bool netdma)
+{
+	if (skb && netdma) {
+		skb_shinfo(skb)->tx_flags |= SKBTX_DEV_NETDMA;
+		skb_shinfo(skb)->destructor_arg = NULL;
+	}
+}
+
 static inline struct ubuf_info *skb_zcopy(struct sk_buff *skb)
 {
 	bool is_zcopy = skb && skb_shinfo(skb)->tx_flags & SKBTX_DEV_ZEROCOPY;
@@ -3260,7 +3282,7 @@ static inline int skb_add_data(struct sk_buff *skb,
 static inline bool skb_can_coalesce(struct sk_buff *skb, int i,
 				    const struct page *page, int off)
 {
-	if (skb_zcopy(skb))
+	if (skb_zdata(skb))
 		return false;
 	if (i) {
 		const skb_frag_t *frag = &skb_shinfo(skb)->frags[i - 1];
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 2b4176cab578..67a421257a27 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -1323,6 +1323,8 @@ int skb_zerocopy_iter_stream(struct sock *sk, struct sk_buff *skb,
 	}
 
 	skb_zcopy_set(skb, uarg, NULL);
+	skb_netdma_set(skb, sk->sk_user_data);
+
 	return skb->len - orig_len;
 }
 EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream);
@@ -1330,8 +1332,8 @@ EXPORT_SYMBOL_GPL(skb_zerocopy_iter_stream);
 static int skb_zerocopy_clone(struct sk_buff *nskb, struct sk_buff *orig,
 			      gfp_t gfp_mask)
 {
-	if (skb_zcopy(orig)) {
-		if (skb_zcopy(nskb)) {
+	if (skb_zdata(orig)) {
+		if (skb_zdata(nskb)) {
 			/* !gfp_mask callers are verified to !skb_zcopy(nskb) */
 			if (!gfp_mask) {
 				WARN_ON_ONCE(1);
@@ -1343,6 +1345,7 @@ static int skb_zerocopy_clone(struct sk_buff *nskb, struct sk_buff *orig,
 				return -EIO;
 		}
 		skb_zcopy_set(nskb, skb_uarg(orig), NULL);
+		skb_netdma_set(nskb, skb_netdma(orig));
 	}
 	return 0;
 }
@@ -1372,6 +1375,9 @@ int skb_copy_ubufs(struct sk_buff *skb, gfp_t gfp_mask)
 	if (skb_shared(skb) || skb_unclone(skb, gfp_mask))
 		return -EINVAL;
 
+	if (!skb_has_shared_frag(skb))
+		return -EINVAL;
+
 	if (!num_frags)
 		goto release;
 
@@ -2078,6 +2084,8 @@ void *__pskb_pull_tail(struct sk_buff *skb, int delta)
 	 */
 	int i, k, eat = (skb->tail + delta) - skb->end;
 
+	BUG_ON(skb_netdma(skb));
+
 	if (eat > 0 || skb_cloned(skb)) {
 		if (pskb_expand_head(skb, 0, eat > 0 ? eat + 128 : 0,
 				     GFP_ATOMIC))
@@ -3328,7 +3336,7 @@ int skb_shift(struct sk_buff *tgt, struct sk_buff *skb, int shiftlen)
 
 	if (skb_headlen(skb))
 		return 0;
-	if (skb_zcopy(tgt) || skb_zcopy(skb))
+	if (skb_zdata(tgt) || skb_zdata(skb))
 		return 0;
 
 	todo = shiftlen;
@@ -5171,7 +5179,7 @@ bool skb_try_coalesce(struct sk_buff *to, struct sk_buff *from,
 	from_shinfo = skb_shinfo(from);
 	if (to_shinfo->frag_list || from_shinfo->frag_list)
 		return false;
-	if (skb_zcopy(to) || skb_zcopy(from))
+	if (skb_zdata(to) || skb_zdata(from))
 		return false;
 
 	if (skb_headlen(from) != 0) {
-- 
2.24.1


  parent reply	other threads:[~2020-06-18 16:10 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-06-18 16:09 [RFC PATCH 00/21] netgpu: networking between NIC and GPU/CPU Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 01/21] mm: add {add|release}_memory_pages Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 02/21] mm: Allow DMA mapping of pages which are not online Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 03/21] tcp: Pad TCP options out to a fixed size Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 04/21] mlx5: add definitions for header split and netgpu Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 05/21] mlx5/xsk: check that xsk does not conflict with netgpu Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 06/21] mlx5: add header_split flag Jonathan Lemon
2020-06-18 18:12   ` Eric Dumazet
2020-06-18 20:25     ` Michal Kubecek
2020-06-18 22:45       ` Eric Dumazet
2020-06-18 21:50     ` Jonathan Lemon
2020-06-18 22:34       ` Eric Dumazet
2020-06-18 22:36       ` Eric Dumazet
2020-06-18 16:09 ` [RFC PATCH 07/21] mlx5: remove the umem parameter from mlx5e_open_channel Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 08/21] misc: add shqueue.h for prototyping Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 09/21] include: add definitions for netgpu Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 10/21] mlx5: add netgpu queue functions Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 11/21] skbuff: add a zc_netgpu bitflag Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 12/21] mlx5: hook up the netgpu channel functions Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 13/21] netdevice: add SETUP_NETGPU to the netdev_bpf structure Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 14/21] kernel: export free_uid Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 15/21] netgpu: add network/gpu dma module Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 16/21] lib: have __zerocopy_sg_from_iter get netgpu pages for a sk Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 17/21] net/core: add the SO_REGISTER_DMA socket option Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 18/21] tcp: add MSG_NETDMA flag for sendmsg() Jonathan Lemon
2020-06-18 16:09 ` [RFC PATCH 19/21] core: add page recycling logic for netgpu pages Jonathan Lemon
2020-06-18 16:09 ` Jonathan Lemon [this message]
2020-06-18 16:09 ` [RFC PATCH 21/21] mlx5: add XDP_SETUP_NETGPU hook Jonathan Lemon

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200618160941.879717-21-jonathan.lemon@gmail.com \
    --to=jonathan.lemon@gmail.com \
    --cc=axboe@kernel.dk \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).