From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4DB18C43334 for ; Wed, 13 Jul 2022 10:17:18 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S230516AbiGMKRQ (ORCPT ); Wed, 13 Jul 2022 06:17:16 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:42198 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S230472AbiGMKRP (ORCPT ); Wed, 13 Jul 2022 06:17:15 -0400 Received: from mga04.intel.com (mga04.intel.com [192.55.52.120]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id 07B66F54E2 for ; Wed, 13 Jul 2022 03:17:15 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=intel.com; i=@intel.com; q=dns/txt; s=Intel; t=1657707435; x=1689243435; h=date:from:to:cc:subject:message-id:references: mime-version:content-transfer-encoding:in-reply-to; bh=zkuMpTW6HoWiWH570IPWhUEQalqXYmY/y3t74Aj48AY=; b=HwtwynJvAd0e7BtWexK30iajQHZ6LHbGLkNRPgaXEdm8g8s696k2S9KS 4QwqNTBoLTAVbcRYmEXnCZdGh7eUhA0YfSwRN5qpNh3evsWJ+uBn6n1Ue vw0d7w3hv++YlXqxuF6rOYDfWCHYfPL4EI6qhvEoWtr7G+HYPdrjrtDDt Ewo63cA5rnWcOWuhvD1en3FyHAnQRjo+0CiddUUly8BkwLBSOu0AjVKao ZxsBtDJP1V9LSKeOej54vpdxb8p7UkefHG57gaUFxo0NxVYaUs4ghEW8r Bf/jwIrS/0QKcrIXzhPaWbNFjdi1FWN9Rf3leSblByfHevjcaH9fdO5XX A==; X-IronPort-AV: E=McAfee;i="6400,9594,10406"; a="283926184" X-IronPort-AV: E=Sophos;i="5.92,267,1650956400"; d="scan'208";a="283926184" Received: from orsmga007.jf.intel.com ([10.7.209.58]) by fmsmga104.fm.intel.com with ESMTP/TLS/ECDHE-RSA-AES256-GCM-SHA384; 13 Jul 2022 03:16:51 -0700 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.92,267,1650956400"; d="scan'208";a="592945876" Received: from boxer.igk.intel.com (HELO boxer) ([10.102.20.173]) by orsmga007.jf.intel.com with ESMTP; 13 Jul 2022 03:16:50 -0700 Date: Wed, 13 Jul 2022 12:16:44 +0200 From: Maciej Fijalkowski To: Toke =?iso-8859-1?Q?H=F8iland-J=F8rgensen?= Cc: Adam Smith , xdp-newbies@vger.kernel.org Subject: Re: XDP redirect throughput with multi-CPU i40e Message-ID: References: <87o7xuowq8.fsf@toke.dk> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-1 Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87o7xuowq8.fsf@toke.dk> Precedence: bulk List-ID: X-Mailing-List: xdp-newbies@vger.kernel.org On Tue, Jul 12, 2022 at 11:19:11PM +0200, Toke Høiland-Jørgensen wrote: > Adam Smith writes: > > > Hello, > > > > I have a question regarding bpf_redirect/bpf_redirect_map and latency > > that we are seeing in a test. The environment is as follows: > > > > - Debian Bullseye, running 5.18.0-0.bpo.1-amd64 kernel from > > Bullseye-backports (Also tested on 5.16) > > - Intel Xeon X3430 @ 2.40GHz. 4 cores, no HT > > - Intel X710-DA2 using i40e driver included with the kernel. > > - Both interfaces (enp1s0f0 and enps0f1) in a simple netfilter bridge. > > - Ring parameters for rx/tx are both set to the max of 4096, with no > > other nic-specific parameters changed. > > > > Each interface has 4 combined IRQs, pinned per set_irq_affinity. > > `irqbalanced` is not installed. > > > > Traffic is generated by another directly attached machine via iperf3 > > 3.9 (`iperf3 -c -t 0 192.168.1.3 --bidir`) to a directly attached > > server on the other side. > > > > The server in question does nothing more than forward packets as a > > transparent bridge. > > > > An XDP program is installed on f0 to redirect to f1, and f1 to > > redirect to f0. I have tried programs that simply call > > `bpf_redirect()`, as well as programs that share a device map and call > > `bpf_redirect_map()`, with idententical results. > > > > When channel parameters for each interface are reduced to a single IRQ > > via `ethtool -L enp1s0f0 combined 1`, and both interface IRQs are > > bound to the same CPU core via smp_affinity, XDP produces improved > > bitrate with reduced CPU utilization over non-XDP tests: > > - Stock netfilter bridge: 9.11 Gbps in both directions at 98% > > utilization of pinned core. > > - XDP: Approximately 9.18 Gbps in both directions at 50% utilization > > of pinned core. > > > > However, when multiple cores are engaged (combined 4, with > > set_irq_affinity), XDP processes markedly fewer packets per second > > (950,000 vs approximately 1.6 million). iperf3 also shows a large > > number of retransmissions in its output regardless of CPU engagement > > (approximately 6,500 with XDP over 2 minutes vs 850 with single core > > tests). > > > > This is a sample taken from linux/samples xdp_monitor showing > > redirection and transmission of packets with XDP engaged: > > > > Summary 944,508 redir/s 0 > > err,drop/s 944,506 xmit/s > > kthread 0 pkt/s > > 0 drop/s 0 sched > > redirect total 944,508 redir/s > > cpu:0 470,148 redir/s > > cpu:2 15,078 redir/s > > cpu:3 459,282 redir/s > > redirect_err 0 error/s > > xdp_exception 0 hit/s > > devmap_xmit total 944,506 xmit/s 0 > > drop/s 0 drv_err/s > > cpu:0 470,148 xmit/s > > 0 drop/s 0 drv_err/s > > cpu:2 15,078 xmit/s > > 0 drop/s 0 drv_err/s > > cpu:3 459,280 xmit/s > > 0 drop/s 0 drv_err/s > > xmit enp1s0f0->enp1s0f1 485,249 xmit/s 0 drop/s > > 0 drv_err/s > > cpu:0 470,172 xmit/s > > 0 drop/s 0 drv_err/s > > cpu:2 15,078 xmit/s > > 0 drop/s 0 drv_err/s > > xmit enp1s0f1->enp1s0f0 459,263 xmit/s 0 drop/s > > 0 drv_err/s > > cpu:3 459,263 xmit/s > > 0 drop/s 0 drv_err/s > > > > Our current hypothesis is that this is a CPU affinity issue. We > > believe a different core is being used for transmission. In efforts to > > prove this, how can we successfully measure if bpf_redirect() is > > causing packets to be transmitted by a different core than they were > > received by? We are still trying to understand how bpf_redirect() > > selects which core/IRQ to transmit on and would appreciate any insight > > or followup material to research. > > There is no mechanism in bpf_redirect() to switch CPUs (outside of > cpumap). When you call XDP_REDIRECT, the frame will be added to a > per-device per-CPU flush list, which will be flushed (on that same CPU). > The i40e allocates separate rings for XDP, though, and not sure how it > does that, so maybe those are what's missing. You should be able to see > drops in the output if that's what's going on; and the packets should > still be processed by XDP. > > So sounds more like the hardware configuration is causing packet loss > before it even hits XDP. Do you see anything in the ethtool stats that > might explain where packets are being dropped? I don't know how irqs are exactly bound to which cpus but most probably this is driver issue as Toke is saying. i40e_xdp_xmit() uses smp_processor_id() as an index to xdp rings array, so if you limit queue count to 4 and bound irq to say cpu 10, you'll return with -ENXIO as queue_index will be >= than vsi->num_queue_pairs. I believe that such issues were addressed on ice driver. In there, xdp rings array is sized to num_possible_cpus() regardless of user's queue count setting and smp_processor_id() can be safely used. Adam, could you skip the `ethtool -L $IFACE combined 4` and work with your 4 flows to see if there is any difference? Maciej > > -Toke >