From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jesper Dangaard Brouer Subject: Re: Kernel 4.19 network performance - forwarding/routing normal users traffic Date: Fri, 9 Nov 2018 08:52:49 +0100 Message-ID: <20181109085249.462d8ce7@redhat.com> References: <61697e49-e839-befc-8330-fc00187c48ee@itcare.pl> <3a88bb53-9d17-3e85-638e-a605f5bfe0fb@gmail.com> <20181101115522.10b0dd0a@redhat.com> <63198d68-6752-3695-f406-d86fb395c12b@itcare.pl> <7141e1e0-93e4-ab20-bce6-17f1e14682f1@gmail.com> <394a0bf2-fa97-1085-2eda-98ddf476895c@itcare.pl> <6ed1666d-47bc-24e7-d432-a0c0027452ed@gmail.com> <8dde3b32-59ce-38f3-5913-2ce08264e9dc@itcare.pl> <6165513d-1e27-31dc-8f94-9de029a73f93@gmail.com> <11199f9f-da21-527b-f5db-0bbf1e448a8b@itcare.pl> <87a2a15c-f9bf-743b-b4c5-7d37da0bd887@itcare.pl> <68cc8279-5e3f-85c2-673c-aa3d4a47b353@gmail.com> <8cb2630e-e7fe-cd44-7798-070f2e6d348a@itcare.pl> <754d9d5d-efd2-52e0-cb2b-13caf15f0737@gmail.com> <13d8e510ec3287ac0680dfaa311b10d79353c5e7.camel@mellanox.com> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8BIT Cc: "dsahern@gmail.com" , "pstaszewski@itcare.pl" , "netdev@vger.kernel.org" , "yoel@kviknet.dk" , brouer@redhat.com, John Fastabend , Tariq Toukan , Toke =?UTF-8?B?SMO4aWxhbmQtSsO4cmdlbnNlbg==?= To: Saeed Mahameed Return-path: Received: from mx1.redhat.com ([209.132.183.28]:55434 "EHLO mx1.redhat.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1727962AbeKIRcT (ORCPT ); Fri, 9 Nov 2018 12:32:19 -0500 In-Reply-To: <13d8e510ec3287ac0680dfaa311b10d79353c5e7.camel@mellanox.com> Sender: netdev-owner@vger.kernel.org List-ID: On Fri, 9 Nov 2018 04:52:01 +0000 Saeed Mahameed wrote: > On Thu, 2018-11-08 at 17:42 -0700, David Ahern wrote: > > On 11/8/18 5:40 PM, Paweł Staszewski wrote: > > > > > > W dniu 08.11.2018 o 17:32, David Ahern pisze: > > > > On 11/8/18 9:27 AM, Paweł Staszewski wrote: > > > > > > > What hardware is this? > > > > > > > > > > > > mellanox connectx 4 > > > > > ethtool -i enp175s0f0 > > > > > driver: mlx5_core > > > > > version: 5.0-0 > > > > > firmware-version: 12.21.1000 (SM_2001000001033) > > > > > expansion-rom-version: > > > > > bus-info: 0000:af:00.0 > > > > > supports-statistics: yes > > > > > supports-test: yes > > > > > supports-eeprom-access: no > > > > > supports-register-dump: no > > > > > supports-priv-flags: yes > > > > > > > > > > ethtool -i enp175s0f1 > > > > > driver: mlx5_core > > > > > version: 5.0-0 > > > > > firmware-version: 12.21.1000 (SM_2001000001033) > > > > > expansion-rom-version: > > > > > bus-info: 0000:af:00.1 > > > > > supports-statistics: yes > > > > > supports-test: yes > > > > > supports-eeprom-access: no > > > > > supports-register-dump: no > > > > > supports-priv-flags: yes > > > > > > > > > > > > Start with: > > > > > > > > > > > > > > echo 1 > /sys/kernel/debug/tracing/events/xdp/enable > > > > > > > cat /sys/kernel/debug/tracing/trace_pipe > > > > > > cat /sys/kernel/debug/tracing/trace_pipe > > > > > > -0 [045] ..s. 68469.467752: > > > > > > xdp_devmap_xmit: > > > > > > ndo_xdp_xmit map_id=32 map_index=5 action=REDIRECT sent=0 > > > > > > drops=1 > > > > > > from_ifindex=4 to_ifindex=5 err=-6 > > > > FIB lookup is good, the redirect is happening, but the mlx5 > > > > driver does > > > > not like it. > > > > > > > > I think the -6 is coming from the mlx5 driver and the packet is > > > > getting > > > > dropped. Perhaps this check in mlx5e_xdp_xmit: > > > > > > > > if (unlikely(sq_num >= priv->channels.num)) > > > > return -ENXIO; > > > I removed that part and recompiled - but after running now xdp_fwd > > > i > > > have kernel pamic :) > > > > hh, no please don't do such thing :) > > It must be because the tx netdev has less tx queues than the rx netdev. > or the rx netdev rings are bound to a high cpu indexes. > > anyway, best practice is to open #cores RX/TX netdev on both sides > > ethtool -L enp175s0f0 combined $(nproc) > ethtool -L enp175s0f1 combined $(nproc) > > > Jesper or one of the Mellanox folks needs to respond about the config > > needed to run XDP with this NIC. I don't have a 40G or 100G card to > > play with. Saeed already answered with a solution... you need to increase the number of RX/TX queues to be equal to the number of CPUs. IHMO this again shows that the resource allocations around ndo_xdp_xmit needs a better API. The implicit requirement is that once ndo_xdp_xmit is enabled the driver MUST allocate for each CPU a dedicated TX for XDP. It seems for mlx5 that this is a manual process. And as Pawel discovered it is hard to troubleshoot and only via tracepoints. I think we need to do better in this area, both regarding usability and more graceful handling when the HW doesn't have the resources. The original requirement for a XDP-TX queue per CPU was necessary because the ndo_xdp_xmit was only sending 1-packet at the time. After my recent changes, the ndo_xdp_xmit can now send in bulks. Thus, performance wise it is feasible to use an (array of) locks, if e.g. the HW cannot allocated more TX-HW queues, or e.g. allow sysadm to set the mode of operation (if the system as a hole have issues allocating TX completion IRQs for all these queues). -- Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer