Re: [PATCH net] veth: Avoid drop packets when xdp_redirect performs

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Heng Qi <hengqi@linux.alibaba.com>, netdev@vger.kernel.org
Cc: "David S. Miller" <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>,
	Xuan Zhuo <xuanzhuo@linux.alibaba.com>
Subject: Re: [PATCH net] veth: Avoid drop packets when xdp_redirect performs
Date: Wed, 28 Sep 2022 16:58:25 +0200	[thread overview]
Message-ID: <87v8p7r1f2.fsf@toke.dk> (raw)
In-Reply-To: <f760701a-fb9d-11e5-f555-ebcf773922c3@linux.alibaba.com>

Heng Qi <hengqi@linux.alibaba.com> writes:

> 在 2022/9/27 下午8:20, Toke Høiland-Jørgensen 写道:
>> Heng Qi <hengqi@linux.alibaba.com> writes:
>>
>>> In the current processing logic, when xdp_redirect occurs, it transmits
>>> the xdp frame based on napi.
>>>
>>> If napi of the peer veth is not ready, the veth will drop the packets.
>>> This doesn't meet our expectations.
>> Erm, why don't you just enable NAPI? Loading an XDP program is not
>> needed these days, you can just enable GRO on both peers...
>
> In general, we don't expect veth to drop packets when it doesn't mount
> the xdp program or otherwise, because this is not as expected.

Well, did you consider that maybe your expectation is wrong? ;)

>>> In this context, if napi is not ready, we convert the xdp frame to a skb,
>>> and then use veth_xmit() to deliver it to the peer veth.
>>>
>>> Like the following case:
>>> Even if veth1's napi cannot be used, the packet redirected from the NIC
>>> will be transmitted to veth1 successfully:
>>>
>>> NIC   ->   veth0----veth1
>>>   |                   |
>>> (XDP)             (no XDP)
>>>
>>> Signed-off-by: Heng Qi <hengqi@linux.alibaba.com>
>>> Signed-off-by: Xuan Zhuo <xuanzhuo@linux.alibaba.com>
>>> ---
>>>   drivers/net/veth.c | 36 +++++++++++++++++++++++++++++++++++-
>>>   1 file changed, 35 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
>>> index 466da01..e1f5561 100644
>>> --- a/drivers/net/veth.c
>>> +++ b/drivers/net/veth.c
>>> @@ -469,8 +469,42 @@ static int veth_xdp_xmit(struct net_device *dev, int n,
>>>   	/* The napi pointer is set if NAPI is enabled, which ensures that
>>>   	 * xdp_ring is initialized on receive side and the peer device is up.
>>>   	 */
>>> -	if (!rcu_access_pointer(rq->napi))
>>> +	if (!rcu_access_pointer(rq->napi)) {
>>> +		for (i = 0; i < n; i++) {
>>> +			struct xdp_frame *xdpf = frames[i];
>>> +			struct netdev_queue *txq = NULL;
>>> +			struct sk_buff *skb;
>>> +			int queue_mapping;
>>> +			u16 mac_len;
>>> +
>>> +			skb = xdp_build_skb_from_frame(xdpf, dev);
>>> +			if (unlikely(!skb)) {
>>> +				ret = nxmit;
>>> +				goto out;
>>> +			}
>>> +
>>> +			/* We need to restore ETH header, because it is pulled
>>> +			 * in eth_type_trans.
>>> +			 */
>>> +			mac_len = skb->data - skb_mac_header(skb);
>>> +			skb_push(skb, mac_len);
>>> +
>>> +			nxmit++;
>>> +
>>> +			queue_mapping = skb_get_queue_mapping(skb);
>>> +			txq = netdev_get_tx_queue(dev, netdev_cap_txqueue(dev, queue_mapping));
>>> +			__netif_tx_lock(txq, smp_processor_id());
>>> +			if (unlikely(veth_xmit(skb, dev) != NETDEV_TX_OK)) {
>>> +				__netif_tx_unlock(txq);
>>> +				ret = nxmit;
>>> +				goto out;
>>> +			}
>>> +			__netif_tx_unlock(txq);
>> Locking and unlocking the txq repeatedly for each packet? Yikes! Did you
>> measure the performance overhead of this?
>
> Yes, there are indeed some optimizations that can be done here,
> like putting the lock outside the loop.
> But in __dev_queue_xmit(), where each packet sent is also protected by a lock.

...which is another reason why this is a bad idea: it's going to perform
terribly, which means we'll just end up with users wondering why their
XDP performance is terrible and we're going to have to tell them to turn
on GRO anyway. So why not do this from the beginning?

If you want to change the default, flipping GRO to be on by default is a
better solution IMO. I don't actually recall why we didn't do that when
the support was added, but maybe Paolo remembers?

-Toke

next prev parent reply	other threads:[~2022-09-28 14:58 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-27  8:30 [PATCH net] veth: Avoid drop packets when xdp_redirect performs Heng Qi
2022-09-27 12:20 ` Toke Høiland-Jørgensen
2022-09-28 11:57   ` Heng Qi
2022-09-28 14:58     ` Toke Høiland-Jørgensen [this message]
2022-09-29  2:50       ` Heng Qi
2022-09-29  6:57         ` Paolo Abeni
2022-09-29  7:33           ` Heng Qi
2022-09-29 12:08             ` Toke Høiland-Jørgensen
2022-10-20  2:23               ` Heng Qi
2022-10-20 16:34                 ` Toke Høiland-Jørgensen
2022-10-21  6:31                   ` Heng Qi
2022-10-24 11:20                     ` Heng Qi
2022-10-24 13:34                       ` Toke Høiland-Jørgensen
2022-10-24 13:39                         ` Heng Qi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8p7r1f2.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=hengqi@linux.alibaba.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).