Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Pavel Vazharov <pavel@x3me.net>
Cc: "Magnus Karlsson" <magnus.karlsson@gmail.com>,
	"Toke Høiland-Jørgensen" <toke@kernel.org>,
	"Jakub Kicinski" <kuba@kernel.org>,
	netdev@vger.kernel.org
Subject: Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
Date: Fri, 16 Feb 2024 18:24:07 +0100	[thread overview]
Message-ID: <Zc+aN4rYKZKu3vKx@boxer> (raw)
In-Reply-To: <CAJEV1ijnUrJXOuGW5xnuCvMTtaC1VKhOXQ0_4iJnqR5Vco4yLg@mail.gmail.com>

> > > > >
> > > > > Back to the issue.
> > > > > I just want to say again that we are not binding the XDP sockets to
> > > > > the bonding device.
> > > > > We are binding the sockets to the queues of the physical interfaces
> > > > > "below" the bonding device.
> > > > > My further observation this time is that when the issue happens and
> > > > > the remote device reports
> > > > > the LACP error there is no incoming LACP traffic on the corresponding
> > > > > local port,
> > > > > as seen by the xdump.
> > > > > The tcpdump at the same time sees only outgoing LACP packets and
> > > > > nothing incoming.
> > > > > For example:
> > > > > Remote device
> > > > >                           Local Server
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> > > > > And when the remote device reports "received an abnormal LACPDU"
> > > > > for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> > > > > is no incoming LACP traffic
> > > >
> > > > Hey Pavel,
> > > >
> > > > can you also look at /proc/interrupts at eth4 and what ethtool -S shows
> > > > there?
> > > I reproduced the problem but this time the interface with the weird
> > > state was eth0.
> > > It's different every time and sometimes even two of the interfaces are
> > > in such a state.
> > > Here are the requested info while being in this state:
> > > ~# ethtool -S eth0 > /tmp/stats0.txt ; sleep 10 ; ethtool -S eth0 >
> > > /tmp/stats1.txt ; diff /tmp/stats0.txt /tmp/stats1.txt
> > > 6c6
> > > <      rx_pkts_nic: 81426
> > > ---
> > > >      rx_pkts_nic: 81436
> > > 8c8
> > > <      rx_bytes_nic: 10286521
> > > ---
> > > >      rx_bytes_nic: 10287801
> > > 17c17
> > > <      multicast: 72216
> > > ---
> > > >      multicast: 72226
> > > 48c48
> > > <      rx_no_dma_resources: 1109
> > > ---
> > > >      rx_no_dma_resources: 1119
> > >
> > > ~# cat /proc/interrupts | grep eth0 > /tmp/interrupts0.txt ; sleep 10
> > > ; cat /proc/interrupts | grep eth0 > /tmp/interrupts1.txt
> > > interrupts0: 430 3098 64 108199 108199 108199 108199 108199 108199
> > > 108199 108201 63 64 1865 108199  61
> > > interrupts1: 435 3103 69 117967 117967  117967 117967 117967  117967
> > > 117967 117969 68 69 1870  117967 66
> > >
> > > So, it seems that packets are coming on the interface but they don't
> > > reach to the XDP layer and deeper.
> > > rx_no_dma_resources - this counter seems to give clues about a possible issue?
> > >
> > > >
> > > > > on eth4 but there is incoming LACP traffic on eth0 and eth2.
> > > > > At the same time, according to the dmesg the kernel sees all of the
> > > > > interfaces as
> > > > > "link status definitely up, 10000 Mbps full duplex".
> > > > > The issue goes aways if I stop the application even without removing
> > > > > the XDP programs
> > > > > from the interfaces - the running xdpdump starts showing the incoming
> > > > > LACP traffic immediately.
> > > > > The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
> > > >
> > > > and the setup is what when doing the link flap? XDP progs are loaded to
> > > > each of the 3 interfaces of bond?
> > > Yes, the same XDP program is loaded on application startup on each one
> > > of the interfaces which are part of bond0 (eth0, eth2, eth4):
> > > # xdp-loader status
> > > CURRENT XDP PROGRAM STATUS:
> > >
> > > Interface        Prio  Program name      Mode     ID   Tag
> > >   Chain actions
> > > --------------------------------------------------------------------------------------
> > > lo                     <No XDP program loaded!>
> > > eth0                   xdp_dispatcher    native   1320 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1329
> > > 3b185187f1855c4c  XDP_PASS
> > > eth1                   <No XDP program loaded!>
> > > eth2                   xdp_dispatcher    native   1334 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1337
> > > 3b185187f1855c4c  XDP_PASS
> > > eth3                   <No XDP program loaded!>
> > > eth4                   xdp_dispatcher    native   1342 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1345
> > > 3b185187f1855c4c  XDP_PASS
> > > eth5                   <No XDP program loaded!>
> > > eth6                   <No XDP program loaded!>
> > > eth7                   <No XDP program loaded!>
> > > bond0                  <No XDP program loaded!>
> > > Each of these interfaces is setup to have 16 queues i.e. the application,
> > > through the DPDK machinery, opens 3x16 XSK sockets each bound to the
> > > corresponding queue of the corresponding interface.
> > > ~# ethtool -l eth0 # It's same for the other 2 devices
> > > Channel parameters for eth0:
> > > Pre-set maximums:
> > > RX:             n/a
> > > TX:             n/a
> > > Other:          1
> > > Combined:       48
> > > Current hardware settings:
> > > RX:             n/a
> > > TX:             n/a
> > > Other:          1
> > > Combined:       16
> > >
> > > >
> > > > > However, I'm not sure what happens with the bound XDP sockets in this case
> > > > > because I haven't tested further.
> > > >
> > > > can you also try to bind xsk sockets before attaching XDP progs?
> > > I looked into the DPDK code again.
> > > The DPDK framework provides callback hooks like eth_rx_queue_setup
> > > and each "driver" implements it as needed. Each Rx/Tx queue of the device is
> > > set up separately. The af_xdp driver currently does this for each Rx
> > > queue separately:
> > > 1. configures the umem for the queue
> > > 2. loads the XDP program on the corresponding interface, if not already loaded
> > >    (i.e. this happens only once per interface when its first queue is set up).
> > > 3. does xsk_socket__create which as far as I looked also internally binds the
> > > socket to the given queue
> > > 4. places the socket in the XSKS map of the XDP program via bpf_map_update_elem
> > >
> > > So, it seems to me that the change needed will be a bit more involved.
> > > I'm not sure if it'll be possible to hardcode, just for the test, the
> > > program loading and
> > > the placing of all XSK sockets in the map to happen when the setup of the last
> > > "queue" for the given interface is done. I need to think a bit more about this.
> > Changed the code of the DPDK af_xdp "driver" to create and bind all of
> > the XSK sockets
> > to the queues of the corresponding interface and after that, after the
> > initialization of the
> > last XSK socket, I added the logic for the attachment of the XDP
> > program to the interface
> > and the population of the XSK map with the created sockets.
> > The issue was still there but it was kind of harder to reproduce - it
> > happened once for 5
> > starts of the application.
> >
> > >
> > > >
> > > > >
> > > > > It seems to me that something racy happens when the interfaces go down
> > > > > and back up
> > > > > (visible in the dmesg) when the XDP sockets are bound to their queues.
> > > > > I mean, I'm not sure why the interfaces go down and up but setting
> > > > > only the XDP programs
> > > > > on the interfaces doesn't cause this behavior. So, I assume it's
> > > > > caused by the binding of the XDP sockets.
> > > >
> > > > hmm i'm lost here, above you said you got no incoming traffic on eth4 even
> > > > without xsk sockets being bound?
> > > Probably I've phrased something in a wrong way.
> > > The issue is not observed if I load the XDP program on all interfaces
> > > (eth0, eth2, eth4)
> > > with the xdp-loader:
> > > xdp-loader load --mode native <iface> <path-to-the-xdp-program>
> > > It's not observed probably because there are no interface down/up actions.
> > > I also modified the DPDK "driver" to not remove the XDP program on exit and thus
> > > when the application stops only the XSK sockets are closed but the
> > > program remains
> > > loaded at the interfaces. When I stop this version of the application
> > > while running the
> > > xdpdump at the same time I see that the traffic immediately appears in
> > > the xdpdump.
> > > Also, note that I basically trimmed the XDP program to simply contain
> > > the XSK map
> > > (BPF_MAP_TYPE_XSKMAP) and the function just does "return XDP_PASS;".
> > > I wanted to exclude every possibility for the XDP program to do something wrong.
> > > So, from the above it seems to me that the issue is triggered somehow by the XSK
> > > sockets usage.
> > >
> > > >
> > > > > It could be that the issue is not related to the XDP sockets but just
> > > > > to the down/up actions of the interfaces.
> > > > > On the other hand, I'm not sure why the issue is easily reproducible
> > > > > when the zero copy mode is enabled
> > > > > (4 out of 5 tests reproduced the issue).
> > > > > However, when the zero copy is disabled this issue doesn't happen
> > > > > (I tried 10 times in a row and it doesn't happen).
> > > >
> > > > any chances that you could rule out the bond of the picture of this issue?
> > > I'll need to talk to the network support guys because they manage the network
> > > devices and they'll need to change the LACP/Trunk setup of the above
> > > "remote device".
> > > I can't promise that they'll agree though.
> We changed the setup and I did the tests with a single port, no
> bonding involved.
> The port was configured with 16 queues (and 16 XSK sockets bound to them).
> I tested with about 100 Mbps of traffic to not break lots of users.
> During the tests I observed the traffic on the real time graph on the
> remote device port
> connected to the server machine where the application was running in
> L3 forward mode:
> - with zero copy enabled the traffic to the server was about 100 Mbps
> but the traffic
> coming out of the server was about 50 Mbps (i.e. half of it).
> - with no zero copy the traffic in both directions was the same - the
> two graphs matched perfectly
> Nothing else was changed during the both tests, only the ZC option.
> Can I check some stats or something else for this testing scenario
> which could be
> used to reveal more info about the issue?

FWIW I don't see this on my side. My guess would be that some of the
queues stalled on ZC due to buggy enable/disable ring pair routines that I
am (fingers crossed :)) fixing, or trying to fix in previous email. You
could try something as simple as:

$ watch -n 1 "ethtool -S eth_ixgbe | grep rx | grep bytes"

and verify each of the queues that are supposed to receive traffic. Do the
same thing with tx, similarly. 

> 
> > >

next prev parent reply	other threads:[~2024-02-16 17:24 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-26 15:54 Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device Pavel Vazharov
2024-01-26 19:28 ` Toke Høiland-Jørgensen
2024-01-27  3:58   ` Pavel Vazharov
2024-01-27  4:39     ` Jakub Kicinski
2024-01-27  5:08       ` Pavel Vazharov
     [not found]         ` <CAJEV1ij=K5Xi5LtpH7SHXLxve+JqMWhimdF50Ddy99G0E9dj_Q@mail.gmail.com>
2024-01-30 13:54           ` Pavel Vazharov
2024-01-30 14:32             ` Toke Høiland-Jørgensen
2024-01-30 14:40               ` Pavel Vazharov
2024-01-30 14:54                 ` Toke Høiland-Jørgensen
2024-02-05  7:07                   ` Magnus Karlsson
2024-02-07 15:49                     ` Pavel Vazharov
2024-02-07 16:07                       ` Pavel Vazharov
2024-02-07 19:00                       ` Maciej Fijalkowski
2024-02-08 10:59                         ` Pavel Vazharov
2024-02-08 15:47                           ` Pavel Vazharov
2024-02-09  9:03                             ` Pavel Vazharov
2024-02-09 18:37                               ` Maciej Fijalkowski
2024-02-16 15:18                                 ` Maciej Fijalkowski
2024-02-16 17:24                               ` Maciej Fijalkowski [this message]
2024-02-19 13:45                                 ` Pavel Vazharov
2024-02-19 14:56                                   ` Maciej Fijalkowski
2024-03-08 10:05                                     ` Pavel Vazharov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zc+aN4rYKZKu3vKx@boxer \
    --to=maciej.fijalkowski@intel.com \
    --cc=kuba@kernel.org \
    --cc=magnus.karlsson@gmail.com \
    --cc=netdev@vger.kernel.org \
    --cc=pavel@x3me.net \
    --cc=toke@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.