netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded
@ 2023-08-22 17:59 Jesper Dangaard Brouer
  2023-08-22 17:59 ` [PATCH net-next RFC v1 1/4] veth: use same bpf_xdp_adjust_head check as generic-XDP Jesper Dangaard Brouer
                   ` (4 more replies)
  0 siblings, 5 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-22 17:59 UTC (permalink / raw)
  To: netdev, edumazet
  Cc: Jesper Dangaard Brouer, pabeni, kuba, davem, lorenzo,
	Ilias Apalodimas, mtahhan, huangjie.albert, Yunsheng Lin,
	Liang Chen

Loading an XDP bpf-prog on veth device driver results in a significant
performance degradation (for normal unrelated traffic) due to
veth_convert_skb_to_xdp_buff() in most cases fully reallocates an SKB and copy
data over, even when XDP prog does nothing (e.g. XDP_PASS).

This patchset reduce the cases that cause reallocation.
After patchset UDP and AF_XDP sending avoids reallocations.

Future work will investigate TCP.

---

Jesper Dangaard Brouer (4):
      veth: use same bpf_xdp_adjust_head check as generic-XDP
      veth: use generic-XDP functions when dealing with SKBs
      veth: lift skb_head_is_locked restriction for SKB based XDP
      veth: when XDP is loaded increase needed_headroom


 drivers/net/veth.c | 86 +++++++++++++++++++---------------------------
 net/core/dev.c     |  1 +
 net/core/filter.c  |  1 +
 3 files changed, 38 insertions(+), 50 deletions(-)

--
Jesper


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH net-next RFC v1 1/4] veth: use same bpf_xdp_adjust_head check as generic-XDP
  2023-08-22 17:59 [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded Jesper Dangaard Brouer
@ 2023-08-22 17:59 ` Jesper Dangaard Brouer
  2023-08-22 17:59 ` [PATCH net-next RFC v1 2/4] veth: use generic-XDP functions when dealing with SKBs Jesper Dangaard Brouer
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-22 17:59 UTC (permalink / raw)
  To: netdev, edumazet
  Cc: Jesper Dangaard Brouer, pabeni, kuba, davem, lorenzo,
	Ilias Apalodimas, mtahhan, huangjie.albert, Yunsheng Lin,
	Liang Chen

This is a consistency patch, no functional change.

Both veth_xdp_rcv_skb() and bpf_prog_run_generic_xdp() checks if XDP bpf_prog
adjusted packet head via BPF-helper bpf_xdp_adjust_head(). The order of
subtracting orig_data and xdp->data are opposite between the two functions. This
is confusing when reviewing and reading the code.

This patch choose to follow generic-XDP and adjust veth_xdp_rcv_skb().

In veth_xdp_rcv_skb() the skb_mac_header length have been __skb_push()'ed.
Thus, we skip the skb->mac_header adjustments like bpf_prog_run_generic_xdp()
and instead do a skb_reset_mac_header() as skb->data point to Eth header.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
 drivers/net/veth.c |   13 +++++++------
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 953f6d8f8db0..be7b62f57087 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -897,12 +897,13 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	rcu_read_unlock();
 
 	/* check if bpf_xdp_adjust_head was used */
-	off = orig_data - xdp->data;
-	if (off > 0)
-		__skb_push(skb, off);
-	else if (off < 0)
-		__skb_pull(skb, -off);
-
+	off = xdp->data - orig_data;
+	if (off) {
+		if (off > 0)
+			__skb_pull(skb, off);
+		else if (off < 0)
+			__skb_push(skb, -off);
+	}
 	skb_reset_mac_header(skb);
 
 	/* check if bpf_xdp_adjust_tail was used */



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next RFC v1 2/4] veth: use generic-XDP functions when dealing with SKBs
  2023-08-22 17:59 [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded Jesper Dangaard Brouer
  2023-08-22 17:59 ` [PATCH net-next RFC v1 1/4] veth: use same bpf_xdp_adjust_head check as generic-XDP Jesper Dangaard Brouer
@ 2023-08-22 17:59 ` Jesper Dangaard Brouer
  2023-08-24 10:30   ` Toke Høiland-Jørgensen
  2023-08-22 17:59 ` [PATCH net-next RFC v1 3/4] veth: lift skb_head_is_locked restriction for SKB based XDP Jesper Dangaard Brouer
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-22 17:59 UTC (permalink / raw)
  To: netdev, edumazet
  Cc: Jesper Dangaard Brouer, pabeni, kuba, davem, lorenzo,
	Ilias Apalodimas, mtahhan, huangjie.albert, Yunsheng Lin,
	Liang Chen

The root-cause the realloc issue is that veth_xdp_rcv_skb() code path (that
handles SKBs like generic-XDP) is calling a native-XDP function
xdp_do_redirect(), instead of simply using xdp_do_generic_redirect() that can
handle SKBs.

The existing code tries to steal the packet-data from the SKB (and frees the SKB
itself). This cause issues as SKBs can have different memory models that are
incompatible with native-XDP call xdp_do_redirect(). For this reason the checks
in veth_convert_skb_to_xdp_buff() becomes more strict. This in turn makes this a
bad approach. Simply leveraging generic-XDP helpers e.g. generic_xdp_tx() and
xdp_do_generic_redirect() as this resolves the issue given netstack can handle
these different SKB memory models.

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
 drivers/net/veth.c |   69 ++++++++++++++++++++--------------------------------
 net/core/dev.c     |    1 +
 net/core/filter.c  |    1 +
 3 files changed, 28 insertions(+), 43 deletions(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index be7b62f57087..192547035194 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -713,19 +713,6 @@ static void veth_xdp_rcv_bulk_skb(struct veth_rq *rq, void **frames,
 	}
 }
 
-static void veth_xdp_get(struct xdp_buff *xdp)
-{
-	struct skb_shared_info *sinfo = xdp_get_shared_info_from_buff(xdp);
-	int i;
-
-	get_page(virt_to_page(xdp->data));
-	if (likely(!xdp_buff_has_frags(xdp)))
-		return;
-
-	for (i = 0; i < sinfo->nr_frags; i++)
-		__skb_frag_ref(&sinfo->frags[i]);
-}
-
 static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 					struct xdp_buff *xdp,
 					struct sk_buff **pskb)
@@ -837,7 +824,7 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	struct veth_xdp_buff vxbuf;
 	struct xdp_buff *xdp = &vxbuf.xdp;
 	u32 act, metalen;
-	int off;
+	int off, err;
 
 	skb_prepare_for_gro(skb);
 
@@ -860,30 +847,10 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 
 	switch (act) {
 	case XDP_PASS:
-		break;
 	case XDP_TX:
-		veth_xdp_get(xdp);
-		consume_skb(skb);
-		xdp->rxq->mem = rq->xdp_mem;
-		if (unlikely(veth_xdp_tx(rq, xdp, bq) < 0)) {
-			trace_xdp_exception(rq->dev, xdp_prog, act);
-			stats->rx_drops++;
-			goto err_xdp;
-		}
-		stats->xdp_tx++;
-		rcu_read_unlock();
-		goto xdp_xmit;
 	case XDP_REDIRECT:
-		veth_xdp_get(xdp);
-		consume_skb(skb);
-		xdp->rxq->mem = rq->xdp_mem;
-		if (xdp_do_redirect(rq->dev, xdp, xdp_prog)) {
-			stats->rx_drops++;
-			goto err_xdp;
-		}
-		stats->xdp_redirect++;
-		rcu_read_unlock();
-		goto xdp_xmit;
+		/* Postpone actions to after potential SKB geometry update */
+		break;
 	default:
 		bpf_warn_invalid_xdp_action(rq->dev, xdp_prog, act);
 		fallthrough;
@@ -894,7 +861,6 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 		stats->xdp_drops++;
 		goto xdp_drop;
 	}
-	rcu_read_unlock();
 
 	/* check if bpf_xdp_adjust_head was used */
 	off = xdp->data - orig_data;
@@ -919,11 +885,32 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 	else
 		skb->data_len = 0;
 
-	skb->protocol = eth_type_trans(skb, rq->dev);
-
 	metalen = xdp->data - xdp->data_meta;
 	if (metalen)
 		skb_metadata_set(skb, metalen);
+
+	switch (act) {
+	case XDP_PASS:
+		/* This skb_pull's off mac_len, __skb_push'ed above */
+		skb->protocol = eth_type_trans(skb, rq->dev);
+		break;
+	case XDP_REDIRECT:
+		err = xdp_do_generic_redirect(rq->dev, skb, xdp, xdp_prog);
+		if (unlikely(err)) {
+			trace_xdp_exception(rq->dev, xdp_prog, act);
+			goto xdp_drop;
+		}
+		stats->xdp_redirect++;
+		rcu_read_unlock();
+		goto xdp_xmit;
+	case XDP_TX:
+		/* TODO: this can be optimized to be veth specific */
+		generic_xdp_tx(skb, xdp_prog);
+		stats->xdp_tx++;
+		rcu_read_unlock();
+		goto xdp_xmit;
+	}
+	rcu_read_unlock();
 out:
 	return skb;
 drop:
@@ -931,10 +918,6 @@ static struct sk_buff *veth_xdp_rcv_skb(struct veth_rq *rq,
 xdp_drop:
 	rcu_read_unlock();
 	kfree_skb(skb);
-	return NULL;
-err_xdp:
-	rcu_read_unlock();
-	xdp_return_buff(xdp);
 xdp_xmit:
 	return NULL;
 }
diff --git a/net/core/dev.c b/net/core/dev.c
index 17e6281e408c..1187bfced9ec 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -4987,6 +4987,7 @@ void generic_xdp_tx(struct sk_buff *skb, struct bpf_prog *xdp_prog)
 		kfree_skb(skb);
 	}
 }
+EXPORT_SYMBOL_GPL(generic_xdp_tx);
 
 static DEFINE_STATIC_KEY_FALSE(generic_xdp_needed_key);
 
diff --git a/net/core/filter.c b/net/core/filter.c
index a094694899c9..a6fd7ba901ba 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -4443,6 +4443,7 @@ int xdp_do_generic_redirect(struct net_device *dev, struct sk_buff *skb,
 	_trace_xdp_redirect_err(dev, xdp_prog, ri->tgt_index, err);
 	return err;
 }
+EXPORT_SYMBOL_GPL(xdp_do_generic_redirect);
 
 BPF_CALL_2(bpf_xdp_redirect, u32, ifindex, u64, flags)
 {



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next RFC v1 3/4] veth: lift skb_head_is_locked restriction for SKB based XDP
  2023-08-22 17:59 [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded Jesper Dangaard Brouer
  2023-08-22 17:59 ` [PATCH net-next RFC v1 1/4] veth: use same bpf_xdp_adjust_head check as generic-XDP Jesper Dangaard Brouer
  2023-08-22 17:59 ` [PATCH net-next RFC v1 2/4] veth: use generic-XDP functions when dealing with SKBs Jesper Dangaard Brouer
@ 2023-08-22 17:59 ` Jesper Dangaard Brouer
  2023-08-22 17:59 ` [PATCH net-next RFC v1 4/4] veth: when XDP is loaded increase needed_headroom Jesper Dangaard Brouer
  2023-08-24  4:27 ` [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded Liang Chen
  4 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-22 17:59 UTC (permalink / raw)
  To: netdev, edumazet
  Cc: Jesper Dangaard Brouer, pabeni, kuba, davem, lorenzo,
	Ilias Apalodimas, mtahhan, huangjie.albert, Yunsheng Lin,
	Liang Chen

As veth_xdp_rcv_skb() no-longer steals SKB data and no-longer uses
XDP native xdp_do_redirect(), then it is possible to handle
more types of SKBs with other memory types.

Replacing skb_head_is_locked() with skb_cloned().

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
 drivers/net/veth.c |    2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 192547035194..8e117cc44fda 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -720,7 +720,7 @@ static int veth_convert_skb_to_xdp_buff(struct veth_rq *rq,
 	struct sk_buff *skb = *pskb;
 	u32 frame_sz;
 
-	if (skb_shared(skb) || skb_head_is_locked(skb) ||
+	if (skb_shared(skb) || skb_cloned(skb) ||
 	    skb_shinfo(skb)->nr_frags ||
 	    skb_headroom(skb) < XDP_PACKET_HEADROOM) {
 		u32 size, len, max_head_size, off;



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH net-next RFC v1 4/4] veth: when XDP is loaded increase needed_headroom
  2023-08-22 17:59 [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded Jesper Dangaard Brouer
                   ` (2 preceding siblings ...)
  2023-08-22 17:59 ` [PATCH net-next RFC v1 3/4] veth: lift skb_head_is_locked restriction for SKB based XDP Jesper Dangaard Brouer
@ 2023-08-22 17:59 ` Jesper Dangaard Brouer
  2023-08-24  4:27 ` [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded Liang Chen
  4 siblings, 0 replies; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-22 17:59 UTC (permalink / raw)
  To: netdev, edumazet
  Cc: Jesper Dangaard Brouer, pabeni, kuba, davem, lorenzo,
	Ilias Apalodimas, mtahhan, huangjie.albert, Yunsheng Lin,
	Liang Chen

When sending (sendmsg) SKBs out an veth device, the SKB headroom is too small
to satisfy XDP on the receiving veth peer device.

For AF_XDP (normal non-zero-copy) it is worth noticing that xsk_build_skb()
adjust headroom according to dev->needed_headroom. Other parts of the kernel
also take this into account (see macro LL_RESERVED_SPACE).

This solves the XDP_PACKET_HEADROOM check in veth_convert_skb_to_xdp_buff().

Signed-off-by: Jesper Dangaard Brouer <hawk@kernel.org>
---
 drivers/net/veth.c |    2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/net/veth.c b/drivers/net/veth.c
index 8e117cc44fda..3630e9124071 100644
--- a/drivers/net/veth.c
+++ b/drivers/net/veth.c
@@ -1649,6 +1649,7 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 		if (!old_prog) {
 			peer->hw_features &= ~NETIF_F_GSO_SOFTWARE;
 			peer->max_mtu = max_mtu;
+			veth_set_rx_headroom(dev, XDP_PACKET_HEADROOM);
 		}
 
 		xdp_features_set_redirect_target(peer, true);
@@ -1666,6 +1667,7 @@ static int veth_xdp_set(struct net_device *dev, struct bpf_prog *prog,
 				peer->hw_features |= NETIF_F_GSO_SOFTWARE;
 				peer->max_mtu = ETH_MAX_MTU;
 			}
+			veth_set_rx_headroom(dev, -1);
 		}
 		bpf_prog_put(old_prog);
 	}



^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded
  2023-08-22 17:59 [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded Jesper Dangaard Brouer
                   ` (3 preceding siblings ...)
  2023-08-22 17:59 ` [PATCH net-next RFC v1 4/4] veth: when XDP is loaded increase needed_headroom Jesper Dangaard Brouer
@ 2023-08-24  4:27 ` Liang Chen
  4 siblings, 0 replies; 9+ messages in thread
From: Liang Chen @ 2023-08-24  4:27 UTC (permalink / raw)
  To: Jesper Dangaard Brouer
  Cc: netdev, edumazet, pabeni, kuba, davem, lorenzo, Ilias Apalodimas,
	mtahhan, huangjie.albert, Yunsheng Lin

On Wed, Aug 23, 2023 at 1:59 AM Jesper Dangaard Brouer <hawk@kernel.org> wrote:
>
> Loading an XDP bpf-prog on veth device driver results in a significant
> performance degradation (for normal unrelated traffic) due to
> veth_convert_skb_to_xdp_buff() in most cases fully reallocates an SKB and copy
> data over, even when XDP prog does nothing (e.g. XDP_PASS).
>
> This patchset reduce the cases that cause reallocation.
> After patchset UDP and AF_XDP sending avoids reallocations.
>

This approach is a lot more elegant than registering two XDP memory
models and fiddling with the skb and XDP buffer. The tests conducted
in our environment show similar figures in terms of performance
improvements. For example, using pktgen (skb data buffer allocated by
kmalloc) with the following setup: pktgen -> veth1 -> veth0 (XDP_TX)
-> veth1 (XDP_PASS) gives an improvement of around 23%. Thanks!


Thanks,
Liang

> Future work will investigate TCP.
>
> ---
>
> Jesper Dangaard Brouer (4):
>       veth: use same bpf_xdp_adjust_head check as generic-XDP
>       veth: use generic-XDP functions when dealing with SKBs
>       veth: lift skb_head_is_locked restriction for SKB based XDP
>       veth: when XDP is loaded increase needed_headroom
>
>
>  drivers/net/veth.c | 86 +++++++++++++++++++---------------------------
>  net/core/dev.c     |  1 +
>  net/core/filter.c  |  1 +
>  3 files changed, 38 insertions(+), 50 deletions(-)
>
> --
> Jesper
>

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next RFC v1 2/4] veth: use generic-XDP functions when dealing with SKBs
  2023-08-22 17:59 ` [PATCH net-next RFC v1 2/4] veth: use generic-XDP functions when dealing with SKBs Jesper Dangaard Brouer
@ 2023-08-24 10:30   ` Toke Høiland-Jørgensen
  2023-08-29 14:37     ` Jesper Dangaard Brouer
  0 siblings, 1 reply; 9+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-08-24 10:30 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev, edumazet
  Cc: Jesper Dangaard Brouer, pabeni, kuba, davem, lorenzo,
	Ilias Apalodimas, mtahhan, huangjie.albert, Yunsheng Lin,
	Liang Chen

Jesper Dangaard Brouer <hawk@kernel.org> writes:

> The root-cause the realloc issue is that veth_xdp_rcv_skb() code path (that
> handles SKBs like generic-XDP) is calling a native-XDP function
> xdp_do_redirect(), instead of simply using xdp_do_generic_redirect() that can
> handle SKBs.
>
> The existing code tries to steal the packet-data from the SKB (and frees the SKB
> itself). This cause issues as SKBs can have different memory models that are
> incompatible with native-XDP call xdp_do_redirect(). For this reason the checks
> in veth_convert_skb_to_xdp_buff() becomes more strict. This in turn makes this a
> bad approach. Simply leveraging generic-XDP helpers e.g. generic_xdp_tx() and
> xdp_do_generic_redirect() as this resolves the issue given netstack can handle
> these different SKB memory models.

While this does solve the memory issue, it's also a subtle change of
semantics. For one thing, generic_xdp_tx() has this comment above it:

/* When doing generic XDP we have to bypass the qdisc layer and the
 * network taps in order to match in-driver-XDP behavior. This also means
 * that XDP packets are able to starve other packets going through a qdisc,
 * and DDOS attacks will be more effective. In-driver-XDP use dedicated TX
 * queues, so they do not have this starvation issue.
 */

Also, more generally, this means that if you have a setup with
XDP_REDIRECT-based forwarding in on a host with a mix of physical and
veth devices, all the traffic originating from the veth devices will go
on different TXQs than that originating from a physical NIC. Or if a
veth device has a mix of xdp_frame-backed packets and skb-backed
packets, those will also go on different queues, potentially leading to
reordering.

I'm not sure exactly how much of an issue this is in practice, but at
least from a conceptual PoV it's a change in behaviour that I don't
think we should be making lightly. WDYT?

-Toke


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next RFC v1 2/4] veth: use generic-XDP functions when dealing with SKBs
  2023-08-24 10:30   ` Toke Høiland-Jørgensen
@ 2023-08-29 14:37     ` Jesper Dangaard Brouer
  2023-09-01 13:32       ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 9+ messages in thread
From: Jesper Dangaard Brouer @ 2023-08-29 14:37 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, netdev
  Cc: hawk, pabeni, kuba, davem, lorenzo, Ilias Apalodimas, mtahhan,
	huangjie.albert, Yunsheng Lin, edumazet, Liang Chen



On 24/08/2023 12.30, Toke Høiland-Jørgensen wrote:
> Jesper Dangaard Brouer <hawk@kernel.org> writes:
> 
>> The root-cause the realloc issue is that veth_xdp_rcv_skb() code path (that
>> handles SKBs like generic-XDP) is calling a native-XDP function
>> xdp_do_redirect(), instead of simply using xdp_do_generic_redirect() that can
>> handle SKBs.
>>
>> The existing code tries to steal the packet-data from the SKB (and frees the SKB
>> itself). This cause issues as SKBs can have different memory models that are
>> incompatible with native-XDP call xdp_do_redirect(). For this reason the checks
>> in veth_convert_skb_to_xdp_buff() becomes more strict. This in turn makes this a
>> bad approach. Simply leveraging generic-XDP helpers e.g. generic_xdp_tx() and
>> xdp_do_generic_redirect() as this resolves the issue given netstack can handle
>> these different SKB memory models.
> 
> While this does solve the memory issue, it's also a subtle change of
> semantics. For one thing, generic_xdp_tx() has this comment above it:
> 
> /* When doing generic XDP we have to bypass the qdisc layer and the
>   * network taps in order to match in-driver-XDP behavior. This also means
>   * that XDP packets are able to starve other packets going through a qdisc,
>   * and DDOS attacks will be more effective. In-driver-XDP use dedicated TX
>   * queues, so they do not have this starvation issue.
>   */
> 
> Also, more generally, this means that if you have a setup with
> XDP_REDIRECT-based forwarding in on a host with a mix of physical and
> veth devices, all the traffic originating from the veth devices will go
> on different TXQs than that originating from a physical NIC. Or if a
> veth device has a mix of xdp_frame-backed packets and skb-backed
> packets, those will also go on different queues, potentially leading to
> reordering.
> 

Mixing xdp_frame-backed packets and skb-backed packet (towards veth)
will naturally come from two different data paths, and the BPF-developer
that redirected the xdp_frame (into veth) will have taken this choice,
including the chance of reordering (given the two data/code paths).

I will claim that (for SKBs) current code cause reordering on TXQs (as
you explain), and my code changes actually fix this problem.

Consider a userspace app (inside namespace) sending packets out (to veth
peer).  Routing (or bridging) will make netstack send out device A
(maybe a physical device).  On veth peer we have XDP-prog running, that
will XDP-redirect every 2nd packet to device A.  With current code TXQ
reordering will occur, as calling "native" xdp_do_redirect() will select
TXQ based on current-running CPU, while normal SKBs will use
netdev_core_pick_tx().  After my change, using
xdp_do_generic_redirect(), the code end-up using generic_xdp_tx() which
(looking at the code) also use netdev_core_pick_tx() to select the TXQ.
Thus, I will claim it is more correct (even-though XDP in general
doesn't give this guarantee).

> I'm not sure exactly how much of an issue this is in practice, but at
> least from a conceptual PoV it's a change in behaviour that I don't
> think we should be making lightly. WDYT?

As desc above, I think this patchset is an improvement.  It might even
fix/address the concern that was raised.


[Outside the scope of this patchset]

The single XDP BPF-prog getting attached to (RX-side) on a veth device,
actually needs to handle *both* xdp_frame-backed packets and SKB-backed
packets, and it cannot tell them apart. (Easy fix: implement a kfunc
RX-metadata hint to expose this?).

For the use-case[1] of implementing NFV (Network Function Virt) chaining
via veth device, where each veth-pairs XDP BPF-prog implement a network
"function" and redirect/chain to the next veth/container NFV.  For this
use-case, I would like the ability to either skip SKB-backed packet or
turn off BPF-prog seeing any SKB-backed packets. There is a huge
performance advantage when XDP-redirecting an xdp_frame into veth
devices in this way, approx 6Mpps for traversing 4 veth devices as
benchmarked in [1]. (p.s. I was going to improve this performance
further, but I got distracted by other work).

  [1] 
https://github.com/xdp-project/xdp-project/blob/master/areas/core/xdp_frame03_overhead.org

The veth-NFV like use-cases are hampered by the SKB-based XDP code-path
causing a significant slowdown for normal netstack packets.  Plus, it
need to parse-and-filter those SKB-based packets too.  This, patchset
"just" significantly reduce the overhead of the SKB-based XDP code path,
which IMHO is a good first step.  Then we can discuss if should have a
switch to turn off the SKB-based XDP code-path in veth, afterwards.

--Jesper


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH net-next RFC v1 2/4] veth: use generic-XDP functions when dealing with SKBs
  2023-08-29 14:37     ` Jesper Dangaard Brouer
@ 2023-09-01 13:32       ` Toke Høiland-Jørgensen
  0 siblings, 0 replies; 9+ messages in thread
From: Toke Høiland-Jørgensen @ 2023-09-01 13:32 UTC (permalink / raw)
  To: Jesper Dangaard Brouer, netdev
  Cc: hawk, pabeni, kuba, davem, lorenzo, Ilias Apalodimas, mtahhan,
	huangjie.albert, Yunsheng Lin, edumazet, Liang Chen

Jesper Dangaard Brouer <hawk@kernel.org> writes:

> On 24/08/2023 12.30, Toke Høiland-Jørgensen wrote:
>> Jesper Dangaard Brouer <hawk@kernel.org> writes:
>> 
>>> The root-cause the realloc issue is that veth_xdp_rcv_skb() code path (that
>>> handles SKBs like generic-XDP) is calling a native-XDP function
>>> xdp_do_redirect(), instead of simply using xdp_do_generic_redirect() that can
>>> handle SKBs.
>>>
>>> The existing code tries to steal the packet-data from the SKB (and frees the SKB
>>> itself). This cause issues as SKBs can have different memory models that are
>>> incompatible with native-XDP call xdp_do_redirect(). For this reason the checks
>>> in veth_convert_skb_to_xdp_buff() becomes more strict. This in turn makes this a
>>> bad approach. Simply leveraging generic-XDP helpers e.g. generic_xdp_tx() and
>>> xdp_do_generic_redirect() as this resolves the issue given netstack can handle
>>> these different SKB memory models.
>> 
>> While this does solve the memory issue, it's also a subtle change of
>> semantics. For one thing, generic_xdp_tx() has this comment above it:
>> 
>> /* When doing generic XDP we have to bypass the qdisc layer and the
>>   * network taps in order to match in-driver-XDP behavior. This also means
>>   * that XDP packets are able to starve other packets going through a qdisc,
>>   * and DDOS attacks will be more effective. In-driver-XDP use dedicated TX
>>   * queues, so they do not have this starvation issue.
>>   */
>> 
>> Also, more generally, this means that if you have a setup with
>> XDP_REDIRECT-based forwarding in on a host with a mix of physical and
>> veth devices, all the traffic originating from the veth devices will go
>> on different TXQs than that originating from a physical NIC. Or if a
>> veth device has a mix of xdp_frame-backed packets and skb-backed
>> packets, those will also go on different queues, potentially leading to
>> reordering.
>> 
>
> Mixing xdp_frame-backed packets and skb-backed packet (towards veth)
> will naturally come from two different data paths, and the BPF-developer
> that redirected the xdp_frame (into veth) will have taken this choice,
> including the chance of reordering (given the two data/code paths).

I'm not sure we can quite conclude that this is a choice any XDP
developers will be actively aware of. At best it's a very implicit
choice :)

> I will claim that (for SKBs) current code cause reordering on TXQs (as
> you explain), and my code changes actually fix this problem.
>
> Consider a userspace app (inside namespace) sending packets out (to veth
> peer).  Routing (or bridging) will make netstack send out device A
> (maybe a physical device).  On veth peer we have XDP-prog running, that
> will XDP-redirect every 2nd packet to device A.  With current code TXQ
> reordering will occur, as calling "native" xdp_do_redirect() will select
> TXQ based on current-running CPU, while normal SKBs will use
> netdev_core_pick_tx().  After my change, using
> xdp_do_generic_redirect(), the code end-up using generic_xdp_tx() which
> (looking at the code) also use netdev_core_pick_tx() to select the TXQ.
> Thus, I will claim it is more correct (even-though XDP in general
> doesn't give this guarantee).
>
>> I'm not sure exactly how much of an issue this is in practice, but at
>> least from a conceptual PoV it's a change in behaviour that I don't
>> think we should be making lightly. WDYT?
>
> As desc above, I think this patchset is an improvement.  It might even
> fix/address the concern that was raised.

Well, you can obviously construct examples in both direction (i.e.,
where the old behaviour leads to reordering but the new one doesn't, and
vice versa). I believe you could also reasonably argue that either
behaviour is more "correct", so if we were just picking between
behaviours I wouldn't be objecting, I think.

However, we're not just picking between two equally good behaviours,
we're changing one long-standing behaviour to a different one, and I
worry this will introduce regressions because there are applications
that (explicitly or implicitly) rely on the old behaviour.

Also, there's the starvation issue mentioned in the comment I quoted
above: with this patch it is possible for traffic redirected from a veth
to effectively starve the host TXQ, where before it wouldn't.

I don't really have a good answer for how we can make sure of this
either way, but I believe it's cause for concern, which is really my
main reservation with this change :)

-Toke


^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-09-01 13:32 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-08-22 17:59 [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded Jesper Dangaard Brouer
2023-08-22 17:59 ` [PATCH net-next RFC v1 1/4] veth: use same bpf_xdp_adjust_head check as generic-XDP Jesper Dangaard Brouer
2023-08-22 17:59 ` [PATCH net-next RFC v1 2/4] veth: use generic-XDP functions when dealing with SKBs Jesper Dangaard Brouer
2023-08-24 10:30   ` Toke Høiland-Jørgensen
2023-08-29 14:37     ` Jesper Dangaard Brouer
2023-09-01 13:32       ` Toke Høiland-Jørgensen
2023-08-22 17:59 ` [PATCH net-next RFC v1 3/4] veth: lift skb_head_is_locked restriction for SKB based XDP Jesper Dangaard Brouer
2023-08-22 17:59 ` [PATCH net-next RFC v1 4/4] veth: when XDP is loaded increase needed_headroom Jesper Dangaard Brouer
2023-08-24  4:27 ` [PATCH net-next RFC v1 0/4] veth: reduce reallocations of SKBs when XDP bpf-prog is loaded Liang Chen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).