From: John Fastabend <john.fastabend@gmail.com>
To: Daniel Borkmann <daniel@iogearbox.net>,
davem@davemloft.net, kuba@kernel.org
Cc: netdev@vger.kernel.org, laurent.bernaille@datadoghq.com,
maciej.fijalkowski@intel.com, toshiaki.makita1@gmail.com,
eric.dumazet@gmail.com, pabeni@redhat.com,
john.fastabend@gmail.com, willemb@google.com,
Daniel Borkmann <daniel@iogearbox.net>
Subject: RE: [PATCH net-next] veth: Do not record rx queue hint in veth_xmit
Date: Wed, 05 Jan 2022 19:22:49 -0800 [thread overview]
Message-ID: <61d660891a4ed_6ee92085e@john.notmuch> (raw)
In-Reply-To: <ef4cf3168907944502c81d8bf45e24eea1061e47.1641427152.git.daniel@iogearbox.net>
Daniel Borkmann wrote:
> Laurent reported that they have seen a significant amount of TCP retransmissions
> at high throughput from applications residing in network namespaces talking to
> the outside world via veths. The drops were seen on the qdisc layer (fq_codel,
> as per systemd default) of the phys device such as ena or virtio_net due to all
> traffic hitting a _single_ TX queue _despite_ multi-queue device. (Note that the
> setup was _not_ using XDP on veths as the issue is generic.)
>
> More specifically, after edbea9220251 ("veth: Store queue_mapping independently
> of XDP prog presence") which made it all the way back to v4.19.184+,
> skb_record_rx_queue() would set skb->queue_mapping to 1 (given 1 RX and 1 TX
> queue by default for veths) instead of leaving at 0.
>
> This is eventually retained and callbacks like ena_select_queue() will also pick
> single queue via netdev_core_pick_tx()'s ndo_select_queue() once all the traffic
> is forwarded to that device via upper stack or other means. Similarly, for others
> not implementing ndo_select_queue() if XPS is disabled, netdev_pick_tx() might
> call into the skb_tx_hash() and check for prior skb_rx_queue_recorded() as well.
>
> In general, it is a _bad_ idea for virtual devices like veth to mess around with
> queue selection [by default]. Given dev->real_num_tx_queues is by default 1,
> the skb->queue_mapping was left untouched, and so prior to edbea9220251 the
> netdev_core_pick_tx() could do its job upon __dev_queue_xmit() on the phys device.
>
> Unbreak this and restore prior behavior by removing the skb_record_rx_queue()
> from veth_xmit() altogether.
>
> If the veth peer has an XDP program attached, then it would return the first RX
> queue index in xdp_md->rx_queue_index (unless configured in non-default manner).
> However, this is still better than breaking the generic case.
Agree on all the above. Fix LGTM thanks!
Acked-by: John Fastabend <john.fastabend@gmail.com>
>
> Fixes: edbea9220251 ("veth: Store queue_mapping independently of XDP prog presence")
> Fixes: 638264dc9022 ("veth: Support per queue XDP ring")
> Reported-by: Laurent Bernaille <laurent.bernaille@datadoghq.com>
> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net>
> Cc: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> Cc: Toshiaki Makita <toshiaki.makita1@gmail.com>
> Cc: Eric Dumazet <eric.dumazet@gmail.com>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: John Fastabend <john.fastabend@gmail.com>
> Cc: Willem de Bruijn <willemb@google.com>
> ---
> drivers/net/veth.c | 1 -
> 1 file changed, 1 deletion(-)
>
> diff --git a/drivers/net/veth.c b/drivers/net/veth.c
> index d21dd25f429e..354a963075c5 100644
> --- a/drivers/net/veth.c
> +++ b/drivers/net/veth.c
> @@ -335,7 +335,6 @@ static netdev_tx_t veth_xmit(struct sk_buff *skb, struct net_device *dev)
> */
> use_napi = rcu_access_pointer(rq->napi) &&
> veth_skb_is_eligible_for_gro(dev, rcv, skb);
> - skb_record_rx_queue(skb, rxq);
> }
>
> skb_tx_timestamp(skb);
> --
> 2.21.0
>
next prev parent reply other threads:[~2022-01-06 3:23 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-06 0:46 [PATCH net-next] veth: Do not record rx queue hint in veth_xmit Daniel Borkmann
2022-01-06 3:22 ` John Fastabend [this message]
2022-01-06 9:00 ` Eric Dumazet
2022-01-06 12:57 ` Toshiaki Makita
2022-01-07 13:56 ` Daniel Borkmann
2022-01-10 6:52 ` Toshiaki Makita
2022-01-06 14:00 ` patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=61d660891a4ed_6ee92085e@john.notmuch \
--to=john.fastabend@gmail.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=eric.dumazet@gmail.com \
--cc=kuba@kernel.org \
--cc=laurent.bernaille@datadoghq.com \
--cc=maciej.fijalkowski@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=toshiaki.makita1@gmail.com \
--cc=willemb@google.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.