Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
@ 2024-01-26 15:54 Pavel Vazharov
  2024-01-26 19:28 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Vazharov @ 2024-01-26 15:54 UTC (permalink / raw)
  To: netdev

Hi there,

We've a DPDK application which runs on top of XDP sockets, using the
DPDK AF_XDP driver. It was a pure DPDK application but lately it was
migrated to run on top of XDP sockets because we need to split the
traffic entering the machine between the DPDK application and other
"standard-Linux" applications running on the same machine.
The application seems to work OK when working on a single port.
However, running on the interfaces behind a bonding device causes the
remote device (switch) to start reporting: "The member of the LACP
mode Eth-Trunk interface received an abnormal LACPDU, which may be
caused by optical fiber misconnection" and the bonding stops working.
Note that the application needs to work with multiple queues and thus
the XDP sockets are not bound to the bonding device but to the
physical interfaces behind the bonding device. As far as I checked the
bonding device supports binding only a single XDP socket and makes it
unusable for our purposes.
In the concrete example, there are 3 physical ports in bonding and
each port is set up to have 16 Rx/Tx (combined) queues. The
application (the DPDK layer) opens an XDP socket for each queue of the
physical ports (Basically the DPDK layer creates 3 virtual af_xdp
devices and each one of them has Rx/Tx 16 queues where each queue is
actually an XDP socket). I've run the application in different
threading scenarios but each one of them exhibit the above problem:
- single thread - where all of the Rx/Tx on the queues is handled by a
single thread
- two threads - where the first thread handles Rx/Tx on (dev:0
queues:0-15) and (dev:1 queues:0-7) and the second thread handles
Rx/Tx on (dev:1 queues:8-15) and (dev:2 queues:0-15).
- three threads - where the first thread handles Rx/Tx on (dev:0
queues:0-15), the second thread handles Rx/Tx on (dev:1 queues:0-15),
the third thread handles Rx/Tx on (dev:2 queues:0-15).
I've tried with and without busy polling in the above threading
schemes and the problem was still there.

Related to the above, I've the following questions:
1. Is it possible to use multiple XDP sockets with a bonding device? I
mean, if we use the above example, will it be possible to open 16 XDP
sockets on top of the bonding device which has 3 ports and each have
16 Rx/Tx queues.
2. If point 1 is not possible then is the above scheme supposed to
work in general or is it not right to bind the XDP sockets to the
queues of the underlying physical ports?
3. If the above scheme is supposed to work then is the bonding logic
(LACP management traffic) affected by the access pattern of the XDP
sockets? I mean, the order of Rx/Tx operations on the XDP sockets or
something like that.

Any other advice on what I should check again or change or research is
greatly appreciated.

Regards,
Pavel.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-01-26 15:54 Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device Pavel Vazharov
@ 2024-01-26 19:28 ` Toke Høiland-Jørgensen
  2024-01-27  3:58   ` Pavel Vazharov
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-01-26 19:28 UTC (permalink / raw)
  To: Pavel Vazharov, netdev

Pavel Vazharov <pavel@x3me.net> writes:

> 3. If the above scheme is supposed to work then is the bonding logic
> (LACP management traffic) affected by the access pattern of the XDP
> sockets? I mean, the order of Rx/Tx operations on the XDP sockets or
> something like that.

Well, it will be up to your application to ensure that it is not. The
XDP program will run before the stack sees the LACP management traffic,
so you will have to take some measure to ensure that any such management
traffic gets routed to the stack instead of to the DPDK application. My
immediate guess would be that this is the cause of those warnings?

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-01-26 19:28 ` Toke Høiland-Jørgensen
@ 2024-01-27  3:58   ` Pavel Vazharov
  2024-01-27  4:39     ` Jakub Kicinski
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Vazharov @ 2024-01-27  3:58 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: netdev

On Fri, Jan 26, 2024 at 9:28 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
>
> Pavel Vazharov <pavel@x3me.net> writes:
>
> > 3. If the above scheme is supposed to work then is the bonding logic
> > (LACP management traffic) affected by the access pattern of the XDP
> > sockets? I mean, the order of Rx/Tx operations on the XDP sockets or
> > something like that.
>
> Well, it will be up to your application to ensure that it is not. The
> XDP program will run before the stack sees the LACP management traffic,
> so you will have to take some measure to ensure that any such management
> traffic gets routed to the stack instead of to the DPDK application. My
> immediate guess would be that this is the cause of those warnings?
>
> -Toke
Thank you for the response.
I already checked the XDP program.
It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
Everything else is passed to the Linux kernel.
However, I'll check it again. Just to be sure.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-01-27  3:58   ` Pavel Vazharov
@ 2024-01-27  4:39     ` Jakub Kicinski
  2024-01-27  5:08       ` Pavel Vazharov
  0 siblings, 1 reply; 22+ messages in thread
From: Jakub Kicinski @ 2024-01-27  4:39 UTC (permalink / raw)
  To: Pavel Vazharov; +Cc: Toke Høiland-Jørgensen, netdev

On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> > Well, it will be up to your application to ensure that it is not. The
> > XDP program will run before the stack sees the LACP management traffic,
> > so you will have to take some measure to ensure that any such management
> > traffic gets routed to the stack instead of to the DPDK application. My
> > immediate guess would be that this is the cause of those warnings?
>
> Thank you for the response.
> I already checked the XDP program.
> It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> Everything else is passed to the Linux kernel.
> However, I'll check it again. Just to be sure.

What device driver are you using, if you don't mind sharing?
The pass thru code path may be much less well tested in AF_XDP
drivers.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-01-27  4:39     ` Jakub Kicinski
@ 2024-01-27  5:08       ` Pavel Vazharov
       [not found]         ` <CAJEV1ij=K5Xi5LtpH7SHXLxve+JqMWhimdF50Ddy99G0E9dj_Q@mail.gmail.com>
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Vazharov @ 2024-01-27  5:08 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Toke Høiland-Jørgensen, netdev

On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> > > Well, it will be up to your application to ensure that it is not. The
> > > XDP program will run before the stack sees the LACP management traffic,
> > > so you will have to take some measure to ensure that any such management
> > > traffic gets routed to the stack instead of to the DPDK application. My
> > > immediate guess would be that this is the cause of those warnings?
> >
> > Thank you for the response.
> > I already checked the XDP program.
> > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> > Everything else is passed to the Linux kernel.
> > However, I'll check it again. Just to be sure.
>
> What device driver are you using, if you don't mind sharing?
> The pass thru code path may be much less well tested in AF_XDP
> drivers.
These are the kernel version and the drivers for the 3 ports in the
above bonding.
~# uname -a
Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
SFI/SFP+ Network Connection (rev 01)
       ...
        Kernel driver in use: ixgbe
--
3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
SFI/SFP+ Network Connection (rev 01)
        ...
        Kernel driver in use: ixgbe
--
5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
SFI/SFP+ Network Connection (rev 01)
        ...
        Kernel driver in use: ixgbe

I think they should be well supported, right?
So far, it seems that the present usage scenario should work and the
problem is somewhere in my code.
I'll double check it again and try to simplify everything in order to
pinpoint the problem.

^ permalink raw reply	[flat|nested] 22+ messages in thread

[parent not found: <CAJEV1ij=K5Xi5LtpH7SHXLxve+JqMWhimdF50Ddy99G0E9dj_Q@mail.gmail.com>]

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
       [not found]         ` <CAJEV1ij=K5Xi5LtpH7SHXLxve+JqMWhimdF50Ddy99G0E9dj_Q@mail.gmail.com>
@ 2024-01-30 13:54           ` Pavel Vazharov
  2024-01-30 14:32             ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Vazharov @ 2024-01-30 13:54 UTC (permalink / raw)
  To: Jakub Kicinski; +Cc: Toke Høiland-Jørgensen, netdev

> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
>>
>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
>> >
>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
>> > > > Well, it will be up to your application to ensure that it is not. The
>> > > > XDP program will run before the stack sees the LACP management traffic,
>> > > > so you will have to take some measure to ensure that any such management
>> > > > traffic gets routed to the stack instead of to the DPDK application. My
>> > > > immediate guess would be that this is the cause of those warnings?
>> > >
>> > > Thank you for the response.
>> > > I already checked the XDP program.
>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
>> > > Everything else is passed to the Linux kernel.
>> > > However, I'll check it again. Just to be sure.
>> >
>> > What device driver are you using, if you don't mind sharing?
>> > The pass thru code path may be much less well tested in AF_XDP
>> > drivers.
>> These are the kernel version and the drivers for the 3 ports in the
>> above bonding.
>> ~# uname -a
>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>> SFI/SFP+ Network Connection (rev 01)
>>        ...
>>         Kernel driver in use: ixgbe
>> --
>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>> SFI/SFP+ Network Connection (rev 01)
>>         ...
>>         Kernel driver in use: ixgbe
>> --
>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>> SFI/SFP+ Network Connection (rev 01)
>>         ...
>>         Kernel driver in use: ixgbe
>>
>> I think they should be well supported, right?
>> So far, it seems that the present usage scenario should work and the
>> problem is somewhere in my code.
>> I'll double check it again and try to simplify everything in order to
>> pinpoint the problem.
I've managed to pinpoint that forcing the copying of the packets
between the kernel and the user space
(XDP_COPY) fixes the issue with the malformed LACPDUs and the not
working bonding.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-01-30 13:54           ` Pavel Vazharov
@ 2024-01-30 14:32             ` Toke Høiland-Jørgensen
  2024-01-30 14:40               ` Pavel Vazharov
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-01-30 14:32 UTC (permalink / raw)
  To: Pavel Vazharov, Jakub Kicinski; +Cc: netdev, Magnus Karlsson

Pavel Vazharov <pavel@x3me.net> writes:

>> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
>>>
>>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
>>> >
>>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
>>> > > > Well, it will be up to your application to ensure that it is not. The
>>> > > > XDP program will run before the stack sees the LACP management traffic,
>>> > > > so you will have to take some measure to ensure that any such management
>>> > > > traffic gets routed to the stack instead of to the DPDK application. My
>>> > > > immediate guess would be that this is the cause of those warnings?
>>> > >
>>> > > Thank you for the response.
>>> > > I already checked the XDP program.
>>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
>>> > > Everything else is passed to the Linux kernel.
>>> > > However, I'll check it again. Just to be sure.
>>> >
>>> > What device driver are you using, if you don't mind sharing?
>>> > The pass thru code path may be much less well tested in AF_XDP
>>> > drivers.
>>> These are the kernel version and the drivers for the 3 ports in the
>>> above bonding.
>>> ~# uname -a
>>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
>>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
>>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>>> SFI/SFP+ Network Connection (rev 01)
>>>        ...
>>>         Kernel driver in use: ixgbe
>>> --
>>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>>> SFI/SFP+ Network Connection (rev 01)
>>>         ...
>>>         Kernel driver in use: ixgbe
>>> --
>>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>>> SFI/SFP+ Network Connection (rev 01)
>>>         ...
>>>         Kernel driver in use: ixgbe
>>>
>>> I think they should be well supported, right?
>>> So far, it seems that the present usage scenario should work and the
>>> problem is somewhere in my code.
>>> I'll double check it again and try to simplify everything in order to
>>> pinpoint the problem.
> I've managed to pinpoint that forcing the copying of the packets
> between the kernel and the user space
> (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> working bonding.

(+Magnus)

Right, okay, that seems to suggest a bug in the internal kernel copying
that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
any chance you could test with a different driver and see if the same
issue appears there?

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-01-30 14:32             ` Toke Høiland-Jørgensen
@ 2024-01-30 14:40               ` Pavel Vazharov
  2024-01-30 14:54                 ` Toke Høiland-Jørgensen
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Vazharov @ 2024-01-30 14:40 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen; +Cc: Jakub Kicinski, netdev, Magnus Karlsson

On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
>
> Pavel Vazharov <pavel@x3me.net> writes:
>
> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
> >>>
> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >>> >
> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> >>> > > > Well, it will be up to your application to ensure that it is not. The
> >>> > > > XDP program will run before the stack sees the LACP management traffic,
> >>> > > > so you will have to take some measure to ensure that any such management
> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
> >>> > > > immediate guess would be that this is the cause of those warnings?
> >>> > >
> >>> > > Thank you for the response.
> >>> > > I already checked the XDP program.
> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> >>> > > Everything else is passed to the Linux kernel.
> >>> > > However, I'll check it again. Just to be sure.
> >>> >
> >>> > What device driver are you using, if you don't mind sharing?
> >>> > The pass thru code path may be much less well tested in AF_XDP
> >>> > drivers.
> >>> These are the kernel version and the drivers for the 3 ports in the
> >>> above bonding.
> >>> ~# uname -a
> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> >>> SFI/SFP+ Network Connection (rev 01)
> >>>        ...
> >>>         Kernel driver in use: ixgbe
> >>> --
> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> >>> SFI/SFP+ Network Connection (rev 01)
> >>>         ...
> >>>         Kernel driver in use: ixgbe
> >>> --
> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> >>> SFI/SFP+ Network Connection (rev 01)
> >>>         ...
> >>>         Kernel driver in use: ixgbe
> >>>
> >>> I think they should be well supported, right?
> >>> So far, it seems that the present usage scenario should work and the
> >>> problem is somewhere in my code.
> >>> I'll double check it again and try to simplify everything in order to
> >>> pinpoint the problem.
> > I've managed to pinpoint that forcing the copying of the packets
> > between the kernel and the user space
> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> > working bonding.
>
> (+Magnus)
>
> Right, okay, that seems to suggest a bug in the internal kernel copying
> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
> any chance you could test with a different driver and see if the same
> issue appears there?
>
> -Toke
No, sorry.
We have only servers with Intel 82599ES with ixgbe drivers.
And one lab machine with Intel 82540EM with igb driver but we can't
set up bonding there
and the problem is not reproducible there.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-01-30 14:40               ` Pavel Vazharov
@ 2024-01-30 14:54                 ` Toke Høiland-Jørgensen
  2024-02-05  7:07                   ` Magnus Karlsson
  0 siblings, 1 reply; 22+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-01-30 14:54 UTC (permalink / raw)
  To: Pavel Vazharov; +Cc: Jakub Kicinski, netdev, Magnus Karlsson

Pavel Vazharov <pavel@x3me.net> writes:

> On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
>>
>> Pavel Vazharov <pavel@x3me.net> writes:
>>
>> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
>> >>>
>> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
>> >>> >
>> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
>> >>> > > > Well, it will be up to your application to ensure that it is not. The
>> >>> > > > XDP program will run before the stack sees the LACP management traffic,
>> >>> > > > so you will have to take some measure to ensure that any such management
>> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
>> >>> > > > immediate guess would be that this is the cause of those warnings?
>> >>> > >
>> >>> > > Thank you for the response.
>> >>> > > I already checked the XDP program.
>> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
>> >>> > > Everything else is passed to the Linux kernel.
>> >>> > > However, I'll check it again. Just to be sure.
>> >>> >
>> >>> > What device driver are you using, if you don't mind sharing?
>> >>> > The pass thru code path may be much less well tested in AF_XDP
>> >>> > drivers.
>> >>> These are the kernel version and the drivers for the 3 ports in the
>> >>> above bonding.
>> >>> ~# uname -a
>> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
>> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
>> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>> >>> SFI/SFP+ Network Connection (rev 01)
>> >>>        ...
>> >>>         Kernel driver in use: ixgbe
>> >>> --
>> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>> >>> SFI/SFP+ Network Connection (rev 01)
>> >>>         ...
>> >>>         Kernel driver in use: ixgbe
>> >>> --
>> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
>> >>> SFI/SFP+ Network Connection (rev 01)
>> >>>         ...
>> >>>         Kernel driver in use: ixgbe
>> >>>
>> >>> I think they should be well supported, right?
>> >>> So far, it seems that the present usage scenario should work and the
>> >>> problem is somewhere in my code.
>> >>> I'll double check it again and try to simplify everything in order to
>> >>> pinpoint the problem.
>> > I've managed to pinpoint that forcing the copying of the packets
>> > between the kernel and the user space
>> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
>> > working bonding.
>>
>> (+Magnus)
>>
>> Right, okay, that seems to suggest a bug in the internal kernel copying
>> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
>> any chance you could test with a different driver and see if the same
>> issue appears there?
>>
>> -Toke
> No, sorry.
> We have only servers with Intel 82599ES with ixgbe drivers.
> And one lab machine with Intel 82540EM with igb driver but we can't
> set up bonding there
> and the problem is not reproducible there.

Right, okay. Another thing that may be of some use is to try to capture
the packets on the physical devices using tcpdump. That should (I think)
show you the LACDPU packets as they come in, before they hit the bonding
device, but after they are copied from the XDP frame. If it's a packet
corruption issue, that should be visible in the captured packet; you can
compare with an xdpdump capture to see if there are any differences...

-Toke

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-01-30 14:54                 ` Toke Høiland-Jørgensen
@ 2024-02-05  7:07                   ` Magnus Karlsson
  2024-02-07 15:49                     ` Pavel Vazharov
  0 siblings, 1 reply; 22+ messages in thread
From: Magnus Karlsson @ 2024-02-05  7:07 UTC (permalink / raw)
  To: Toke Høiland-Jørgensen, Pavel Vazharov
  Cc: Jakub Kicinski, netdev, Fijalkowski, Maciej

On Tue, 30 Jan 2024 at 15:54, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
>
> Pavel Vazharov <pavel@x3me.net> writes:
>
> > On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> >>
> >> Pavel Vazharov <pavel@x3me.net> writes:
> >>
> >> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
> >> >>>
> >> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> >> >>> >
> >> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> >> >>> > > > Well, it will be up to your application to ensure that it is not. The
> >> >>> > > > XDP program will run before the stack sees the LACP management traffic,
> >> >>> > > > so you will have to take some measure to ensure that any such management
> >> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
> >> >>> > > > immediate guess would be that this is the cause of those warnings?
> >> >>> > >
> >> >>> > > Thank you for the response.
> >> >>> > > I already checked the XDP program.
> >> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> >> >>> > > Everything else is passed to the Linux kernel.
> >> >>> > > However, I'll check it again. Just to be sure.
> >> >>> >
> >> >>> > What device driver are you using, if you don't mind sharing?
> >> >>> > The pass thru code path may be much less well tested in AF_XDP
> >> >>> > drivers.
> >> >>> These are the kernel version and the drivers for the 3 ports in the
> >> >>> above bonding.
> >> >>> ~# uname -a
> >> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
> >> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
> >> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> >> >>> SFI/SFP+ Network Connection (rev 01)
> >> >>>        ...
> >> >>>         Kernel driver in use: ixgbe
> >> >>> --
> >> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> >> >>> SFI/SFP+ Network Connection (rev 01)
> >> >>>         ...
> >> >>>         Kernel driver in use: ixgbe
> >> >>> --
> >> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> >> >>> SFI/SFP+ Network Connection (rev 01)
> >> >>>         ...
> >> >>>         Kernel driver in use: ixgbe
> >> >>>
> >> >>> I think they should be well supported, right?
> >> >>> So far, it seems that the present usage scenario should work and the
> >> >>> problem is somewhere in my code.
> >> >>> I'll double check it again and try to simplify everything in order to
> >> >>> pinpoint the problem.
> >> > I've managed to pinpoint that forcing the copying of the packets
> >> > between the kernel and the user space
> >> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> >> > working bonding.
> >>
> >> (+Magnus)
> >>
> >> Right, okay, that seems to suggest a bug in the internal kernel copying
> >> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
> >> any chance you could test with a different driver and see if the same
> >> issue appears there?
> >>
> >> -Toke
> > No, sorry.
> > We have only servers with Intel 82599ES with ixgbe drivers.
> > And one lab machine with Intel 82540EM with igb driver but we can't
> > set up bonding there
> > and the problem is not reproducible there.
>
> Right, okay. Another thing that may be of some use is to try to capture
> the packets on the physical devices using tcpdump. That should (I think)
> show you the LACDPU packets as they come in, before they hit the bonding
> device, but after they are copied from the XDP frame. If it's a packet
> corruption issue, that should be visible in the captured packet; you can
> compare with an xdpdump capture to see if there are any differences...

Pavel,

Sounds like an issue with the driver in zero-copy mode as it works
fine in copy mode. Maciej and I will take a look at it.

> -Toke
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-05  7:07                   ` Magnus Karlsson
@ 2024-02-07 15:49                     ` Pavel Vazharov
  2024-02-07 16:07                       ` Pavel Vazharov
  2024-02-07 19:00                       ` Maciej Fijalkowski
  0 siblings, 2 replies; 22+ messages in thread
From: Pavel Vazharov @ 2024-02-07 15:49 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Toke Høiland-Jørgensen, Jakub Kicinski, netdev,
	Fijalkowski, Maciej

On Mon, Feb 5, 2024 at 9:07 AM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Tue, 30 Jan 2024 at 15:54, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> >
> > Pavel Vazharov <pavel@x3me.net> writes:
> >
> > > On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > >>
> > >> Pavel Vazharov <pavel@x3me.net> writes:
> > >>
> > >> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
> > >> >>>
> > >> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > >> >>> >
> > >> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> > >> >>> > > > Well, it will be up to your application to ensure that it is not. The
> > >> >>> > > > XDP program will run before the stack sees the LACP management traffic,
> > >> >>> > > > so you will have to take some measure to ensure that any such management
> > >> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
> > >> >>> > > > immediate guess would be that this is the cause of those warnings?
> > >> >>> > >
> > >> >>> > > Thank you for the response.
> > >> >>> > > I already checked the XDP program.
> > >> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> > >> >>> > > Everything else is passed to the Linux kernel.
> > >> >>> > > However, I'll check it again. Just to be sure.
> > >> >>> >
> > >> >>> > What device driver are you using, if you don't mind sharing?
> > >> >>> > The pass thru code path may be much less well tested in AF_XDP
> > >> >>> > drivers.
> > >> >>> These are the kernel version and the drivers for the 3 ports in the
> > >> >>> above bonding.
> > >> >>> ~# uname -a
> > >> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
> > >> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
> > >> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > >> >>> SFI/SFP+ Network Connection (rev 01)
> > >> >>>        ...
> > >> >>>         Kernel driver in use: ixgbe
> > >> >>> --
> > >> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > >> >>> SFI/SFP+ Network Connection (rev 01)
> > >> >>>         ...
> > >> >>>         Kernel driver in use: ixgbe
> > >> >>> --
> > >> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > >> >>> SFI/SFP+ Network Connection (rev 01)
> > >> >>>         ...
> > >> >>>         Kernel driver in use: ixgbe
> > >> >>>
> > >> >>> I think they should be well supported, right?
> > >> >>> So far, it seems that the present usage scenario should work and the
> > >> >>> problem is somewhere in my code.
> > >> >>> I'll double check it again and try to simplify everything in order to
> > >> >>> pinpoint the problem.
> > >> > I've managed to pinpoint that forcing the copying of the packets
> > >> > between the kernel and the user space
> > >> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> > >> > working bonding.
> > >>
> > >> (+Magnus)
> > >>
> > >> Right, okay, that seems to suggest a bug in the internal kernel copying
> > >> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
> > >> any chance you could test with a different driver and see if the same
> > >> issue appears there?
> > >>
> > >> -Toke
> > > No, sorry.
> > > We have only servers with Intel 82599ES with ixgbe drivers.
> > > And one lab machine with Intel 82540EM with igb driver but we can't
> > > set up bonding there
> > > and the problem is not reproducible there.
> >
> > Right, okay. Another thing that may be of some use is to try to capture
> > the packets on the physical devices using tcpdump. That should (I think)
> > show you the LACDPU packets as they come in, before they hit the bonding
> > device, but after they are copied from the XDP frame. If it's a packet
> > corruption issue, that should be visible in the captured packet; you can
> > compare with an xdpdump capture to see if there are any differences...
>
> Pavel,
>
> Sounds like an issue with the driver in zero-copy mode as it works
> fine in copy mode. Maciej and I will take a look at it.
>
> > -Toke
> >

First I want to apologize for not responding for such a long time.
I had different tasks the previous week and this week went back to this issue.
I had to modify the code of the af_xdp driver inside the DPDK so that it loads
the XDP program in a way which is compatible with the xdp-dispatcher.
Finally, I was able to run our application with the XDP sockets and the xdpdump
at the same time.

Back to the issue.
I just want to say again that we are not binding the XDP sockets to
the bonding device.
We are binding the sockets to the queues of the physical interfaces
"below" the bonding device.
My further observation this time is that when the issue happens and
the remote device reports
the LACP error there is no incoming LACP traffic on the corresponding
local port,
as seen by the xdump.
The tcpdump at the same time sees only outgoing LACP packets and
nothing incoming.
For example:
Remote device
                          Local Server
TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
And when the remote device reports "received an abnormal LACPDU"
for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
is no incoming LACP traffic
on eth4 but there is incoming LACP traffic on eth0 and eth2.
At the same time, according to the dmesg the kernel sees all of the
interfaces as
"link status definitely up, 10000 Mbps full duplex".
The issue goes aways if I stop the application even without removing
the XDP programs
from the interfaces - the running xdpdump starts showing the incoming
LACP traffic immediately.
The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
However, I'm not sure what happens with the bound XDP sockets in this case
because I haven't tested further.

It seems to me that something racy happens when the interfaces go down
and back up
(visible in the dmesg) when the XDP sockets are bound to their queues.
I mean, I'm not sure why the interfaces go down and up but setting
only the XDP programs
on the interfaces doesn't cause this behavior. So, I assume it's
caused by the binding of the XDP sockets.
It could be that the issue is not related to the XDP sockets but just
to the down/up actions of the interfaces.
On the other hand, I'm not sure why the issue is easily reproducible
when the zero copy mode is enabled
(4 out of 5 tests reproduced the issue).
However, when the zero copy is disabled this issue doesn't happen
(I tried 10 times in a row and it doesn't happen).

Pavel.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-07 15:49                     ` Pavel Vazharov
@ 2024-02-07 16:07                       ` Pavel Vazharov
  2024-02-07 19:00                       ` Maciej Fijalkowski
  1 sibling, 0 replies; 22+ messages in thread
From: Pavel Vazharov @ 2024-02-07 16:07 UTC (permalink / raw)
  To: Magnus Karlsson
  Cc: Toke Høiland-Jørgensen, Jakub Kicinski, netdev,
	Fijalkowski, Maciej

On Wed, Feb 7, 2024 at 5:49 PM Pavel Vazharov <pavel@x3me.net> wrote:
>
> On Mon, Feb 5, 2024 at 9:07 AM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Tue, 30 Jan 2024 at 15:54, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > >
> > > Pavel Vazharov <pavel@x3me.net> writes:
> > >
> > > > On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > >>
> > > >> Pavel Vazharov <pavel@x3me.net> writes:
> > > >>
> > > >> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
> > > >> >>>
> > > >> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > >> >>> >
> > > >> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> > > >> >>> > > > Well, it will be up to your application to ensure that it is not. The
> > > >> >>> > > > XDP program will run before the stack sees the LACP management traffic,
> > > >> >>> > > > so you will have to take some measure to ensure that any such management
> > > >> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
> > > >> >>> > > > immediate guess would be that this is the cause of those warnings?
> > > >> >>> > >
> > > >> >>> > > Thank you for the response.
> > > >> >>> > > I already checked the XDP program.
> > > >> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> > > >> >>> > > Everything else is passed to the Linux kernel.
> > > >> >>> > > However, I'll check it again. Just to be sure.
> > > >> >>> >
> > > >> >>> > What device driver are you using, if you don't mind sharing?
> > > >> >>> > The pass thru code path may be much less well tested in AF_XDP
> > > >> >>> > drivers.
> > > >> >>> These are the kernel version and the drivers for the 3 ports in the
> > > >> >>> above bonding.
> > > >> >>> ~# uname -a
> > > >> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
> > > >> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
> > > >> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > >> >>>        ...
> > > >> >>>         Kernel driver in use: ixgbe
> > > >> >>> --
> > > >> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > >> >>>         ...
> > > >> >>>         Kernel driver in use: ixgbe
> > > >> >>> --
> > > >> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > >> >>>         ...
> > > >> >>>         Kernel driver in use: ixgbe
> > > >> >>>
> > > >> >>> I think they should be well supported, right?
> > > >> >>> So far, it seems that the present usage scenario should work and the
> > > >> >>> problem is somewhere in my code.
> > > >> >>> I'll double check it again and try to simplify everything in order to
> > > >> >>> pinpoint the problem.
> > > >> > I've managed to pinpoint that forcing the copying of the packets
> > > >> > between the kernel and the user space
> > > >> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> > > >> > working bonding.
> > > >>
> > > >> (+Magnus)
> > > >>
> > > >> Right, okay, that seems to suggest a bug in the internal kernel copying
> > > >> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
> > > >> any chance you could test with a different driver and see if the same
> > > >> issue appears there?
> > > >>
> > > >> -Toke
> > > > No, sorry.
> > > > We have only servers with Intel 82599ES with ixgbe drivers.
> > > > And one lab machine with Intel 82540EM with igb driver but we can't
> > > > set up bonding there
> > > > and the problem is not reproducible there.
> > >
> > > Right, okay. Another thing that may be of some use is to try to capture
> > > the packets on the physical devices using tcpdump. That should (I think)
> > > show you the LACDPU packets as they come in, before they hit the bonding
> > > device, but after they are copied from the XDP frame. If it's a packet
> > > corruption issue, that should be visible in the captured packet; you can
> > > compare with an xdpdump capture to see if there are any differences...
> >
> > Pavel,
> >
> > Sounds like an issue with the driver in zero-copy mode as it works
> > fine in copy mode. Maciej and I will take a look at it.
> >
> > > -Toke
> > >
>
> First I want to apologize for not responding for such a long time.
> I had different tasks the previous week and this week went back to this issue.
> I had to modify the code of the af_xdp driver inside the DPDK so that it loads
> the XDP program in a way which is compatible with the xdp-dispatcher.
> Finally, I was able to run our application with the XDP sockets and the xdpdump
> at the same time.
>
> Back to the issue.
> I just want to say again that we are not binding the XDP sockets to
> the bonding device.
> We are binding the sockets to the queues of the physical interfaces
> "below" the bonding device.
> My further observation this time is that when the issue happens and
> the remote device reports
> the LACP error there is no incoming LACP traffic on the corresponding
> local port,
> as seen by the xdump.
> The tcpdump at the same time sees only outgoing LACP packets and
> nothing incoming.
> For example:
> Remote device
>                           Local Server
> TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> And when the remote device reports "received an abnormal LACPDU"
> for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> is no incoming LACP traffic
> on eth4 but there is incoming LACP traffic on eth0 and eth2.
> At the same time, according to the dmesg the kernel sees all of the
> interfaces as
> "link status definitely up, 10000 Mbps full duplex".
> The issue goes aways if I stop the application even without removing
> the XDP programs
> from the interfaces - the running xdpdump starts showing the incoming
> LACP traffic immediately.
> The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
> However, I'm not sure what happens with the bound XDP sockets in this case
> because I haven't tested further.
>
> It seems to me that something racy happens when the interfaces go down
> and back up
> (visible in the dmesg) when the XDP sockets are bound to their queues.
> I mean, I'm not sure why the interfaces go down and up but setting
> only the XDP programs
> on the interfaces doesn't cause this behavior. So, I assume it's
> caused by the binding of the XDP sockets.
> It could be that the issue is not related to the XDP sockets but just
> to the down/up actions of the interfaces.
> On the other hand, I'm not sure why the issue is easily reproducible
> when the zero copy mode is enabled
> (4 out of 5 tests reproduced the issue).
> However, when the zero copy is disabled this issue doesn't happen
> (I tried 10 times in a row and it doesn't happen).
>
> Pavel.

My thoughts at the end are not correct. I forgot that we tested with
traffic too.
Even when the bonding/LACP looked OK after the application start, it started
breaking later when the traffic is started for the case of zero copy mode.
However, it worked OK when the zero copy is disabled.

Pavel.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-07 15:49                     ` Pavel Vazharov
  2024-02-07 16:07                       ` Pavel Vazharov
@ 2024-02-07 19:00                       ` Maciej Fijalkowski
  2024-02-08 10:59                         ` Pavel Vazharov
  1 sibling, 1 reply; 22+ messages in thread
From: Maciej Fijalkowski @ 2024-02-07 19:00 UTC (permalink / raw)
  To: Pavel Vazharov
  Cc: Magnus Karlsson, Toke Høiland-Jørgensen, Jakub Kicinski,
	netdev

On Wed, Feb 07, 2024 at 05:49:47PM +0200, Pavel Vazharov wrote:
> On Mon, Feb 5, 2024 at 9:07 AM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Tue, 30 Jan 2024 at 15:54, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > >
> > > Pavel Vazharov <pavel@x3me.net> writes:
> > >
> > > > On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > >>
> > > >> Pavel Vazharov <pavel@x3me.net> writes:
> > > >>
> > > >> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
> > > >> >>>
> > > >> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > >> >>> >
> > > >> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> > > >> >>> > > > Well, it will be up to your application to ensure that it is not. The
> > > >> >>> > > > XDP program will run before the stack sees the LACP management traffic,
> > > >> >>> > > > so you will have to take some measure to ensure that any such management
> > > >> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
> > > >> >>> > > > immediate guess would be that this is the cause of those warnings?
> > > >> >>> > >
> > > >> >>> > > Thank you for the response.
> > > >> >>> > > I already checked the XDP program.
> > > >> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> > > >> >>> > > Everything else is passed to the Linux kernel.
> > > >> >>> > > However, I'll check it again. Just to be sure.
> > > >> >>> >
> > > >> >>> > What device driver are you using, if you don't mind sharing?
> > > >> >>> > The pass thru code path may be much less well tested in AF_XDP
> > > >> >>> > drivers.
> > > >> >>> These are the kernel version and the drivers for the 3 ports in the
> > > >> >>> above bonding.
> > > >> >>> ~# uname -a
> > > >> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
> > > >> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
> > > >> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > >> >>>        ...
> > > >> >>>         Kernel driver in use: ixgbe
> > > >> >>> --
> > > >> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > >> >>>         ...
> > > >> >>>         Kernel driver in use: ixgbe
> > > >> >>> --
> > > >> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > >> >>>         ...
> > > >> >>>         Kernel driver in use: ixgbe
> > > >> >>>
> > > >> >>> I think they should be well supported, right?
> > > >> >>> So far, it seems that the present usage scenario should work and the
> > > >> >>> problem is somewhere in my code.
> > > >> >>> I'll double check it again and try to simplify everything in order to
> > > >> >>> pinpoint the problem.
> > > >> > I've managed to pinpoint that forcing the copying of the packets
> > > >> > between the kernel and the user space
> > > >> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> > > >> > working bonding.
> > > >>
> > > >> (+Magnus)
> > > >>
> > > >> Right, okay, that seems to suggest a bug in the internal kernel copying
> > > >> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
> > > >> any chance you could test with a different driver and see if the same
> > > >> issue appears there?
> > > >>
> > > >> -Toke
> > > > No, sorry.
> > > > We have only servers with Intel 82599ES with ixgbe drivers.
> > > > And one lab machine with Intel 82540EM with igb driver but we can't
> > > > set up bonding there
> > > > and the problem is not reproducible there.
> > >
> > > Right, okay. Another thing that may be of some use is to try to capture
> > > the packets on the physical devices using tcpdump. That should (I think)
> > > show you the LACDPU packets as they come in, before they hit the bonding
> > > device, but after they are copied from the XDP frame. If it's a packet
> > > corruption issue, that should be visible in the captured packet; you can
> > > compare with an xdpdump capture to see if there are any differences...
> >
> > Pavel,
> >
> > Sounds like an issue with the driver in zero-copy mode as it works
> > fine in copy mode. Maciej and I will take a look at it.
> >
> > > -Toke
> > >
> 
> First I want to apologize for not responding for such a long time.
> I had different tasks the previous week and this week went back to this issue.
> I had to modify the code of the af_xdp driver inside the DPDK so that it loads
> the XDP program in a way which is compatible with the xdp-dispatcher.
> Finally, I was able to run our application with the XDP sockets and the xdpdump
> at the same time.
> 
> Back to the issue.
> I just want to say again that we are not binding the XDP sockets to
> the bonding device.
> We are binding the sockets to the queues of the physical interfaces
> "below" the bonding device.
> My further observation this time is that when the issue happens and
> the remote device reports
> the LACP error there is no incoming LACP traffic on the corresponding
> local port,
> as seen by the xdump.
> The tcpdump at the same time sees only outgoing LACP packets and
> nothing incoming.
> For example:
> Remote device
>                           Local Server
> TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> And when the remote device reports "received an abnormal LACPDU"
> for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> is no incoming LACP traffic

Hey Pavel,

can you also look at /proc/interrupts at eth4 and what ethtool -S shows
there?

> on eth4 but there is incoming LACP traffic on eth0 and eth2.
> At the same time, according to the dmesg the kernel sees all of the
> interfaces as
> "link status definitely up, 10000 Mbps full duplex".
> The issue goes aways if I stop the application even without removing
> the XDP programs
> from the interfaces - the running xdpdump starts showing the incoming
> LACP traffic immediately.
> The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".

and the setup is what when doing the link flap? XDP progs are loaded to
each of the 3 interfaces of bond?

> However, I'm not sure what happens with the bound XDP sockets in this case
> because I haven't tested further.

can you also try to bind xsk sockets before attaching XDP progs?

> 
> It seems to me that something racy happens when the interfaces go down
> and back up
> (visible in the dmesg) when the XDP sockets are bound to their queues.
> I mean, I'm not sure why the interfaces go down and up but setting
> only the XDP programs
> on the interfaces doesn't cause this behavior. So, I assume it's
> caused by the binding of the XDP sockets.

hmm i'm lost here, above you said you got no incoming traffic on eth4 even
without xsk sockets being bound?

> It could be that the issue is not related to the XDP sockets but just
> to the down/up actions of the interfaces.
> On the other hand, I'm not sure why the issue is easily reproducible
> when the zero copy mode is enabled
> (4 out of 5 tests reproduced the issue).
> However, when the zero copy is disabled this issue doesn't happen
> (I tried 10 times in a row and it doesn't happen).

any chances that you could rule out the bond of the picture of this issue?
on my side i'll try to play with multiple xsk sockets within same netdev
served by ixgbe and see if i observe something broken. I recently fixed
i40e Tx disable timeout issue, so maybe ixgbe has something off in down/up
actions as you state as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-07 19:00                       ` Maciej Fijalkowski
@ 2024-02-08 10:59                         ` Pavel Vazharov
  2024-02-08 15:47                           ` Pavel Vazharov
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Vazharov @ 2024-02-08 10:59 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Magnus Karlsson, Toke Høiland-Jørgensen, Jakub Kicinski,
	netdev

On Wed, Feb 7, 2024 at 9:00 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Wed, Feb 07, 2024 at 05:49:47PM +0200, Pavel Vazharov wrote:
> > On Mon, Feb 5, 2024 at 9:07 AM Magnus Karlsson
> > <magnus.karlsson@gmail.com> wrote:
> > >
> > > On Tue, 30 Jan 2024 at 15:54, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > >
> > > > Pavel Vazharov <pavel@x3me.net> writes:
> > > >
> > > > > On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > > >>
> > > > >> Pavel Vazharov <pavel@x3me.net> writes:
> > > > >>
> > > > >> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
> > > > >> >>>
> > > > >> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > >> >>> >
> > > > >> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> > > > >> >>> > > > Well, it will be up to your application to ensure that it is not. The
> > > > >> >>> > > > XDP program will run before the stack sees the LACP management traffic,
> > > > >> >>> > > > so you will have to take some measure to ensure that any such management
> > > > >> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
> > > > >> >>> > > > immediate guess would be that this is the cause of those warnings?
> > > > >> >>> > >
> > > > >> >>> > > Thank you for the response.
> > > > >> >>> > > I already checked the XDP program.
> > > > >> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> > > > >> >>> > > Everything else is passed to the Linux kernel.
> > > > >> >>> > > However, I'll check it again. Just to be sure.
> > > > >> >>> >
> > > > >> >>> > What device driver are you using, if you don't mind sharing?
> > > > >> >>> > The pass thru code path may be much less well tested in AF_XDP
> > > > >> >>> > drivers.
> > > > >> >>> These are the kernel version and the drivers for the 3 ports in the
> > > > >> >>> above bonding.
> > > > >> >>> ~# uname -a
> > > > >> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
> > > > >> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
> > > > >> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > >> >>>        ...
> > > > >> >>>         Kernel driver in use: ixgbe
> > > > >> >>> --
> > > > >> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > >> >>>         ...
> > > > >> >>>         Kernel driver in use: ixgbe
> > > > >> >>> --
> > > > >> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > >> >>>         ...
> > > > >> >>>         Kernel driver in use: ixgbe
> > > > >> >>>
> > > > >> >>> I think they should be well supported, right?
> > > > >> >>> So far, it seems that the present usage scenario should work and the
> > > > >> >>> problem is somewhere in my code.
> > > > >> >>> I'll double check it again and try to simplify everything in order to
> > > > >> >>> pinpoint the problem.
> > > > >> > I've managed to pinpoint that forcing the copying of the packets
> > > > >> > between the kernel and the user space
> > > > >> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> > > > >> > working bonding.
> > > > >>
> > > > >> (+Magnus)
> > > > >>
> > > > >> Right, okay, that seems to suggest a bug in the internal kernel copying
> > > > >> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
> > > > >> any chance you could test with a different driver and see if the same
> > > > >> issue appears there?
> > > > >>
> > > > >> -Toke
> > > > > No, sorry.
> > > > > We have only servers with Intel 82599ES with ixgbe drivers.
> > > > > And one lab machine with Intel 82540EM with igb driver but we can't
> > > > > set up bonding there
> > > > > and the problem is not reproducible there.
> > > >
> > > > Right, okay. Another thing that may be of some use is to try to capture
> > > > the packets on the physical devices using tcpdump. That should (I think)
> > > > show you the LACDPU packets as they come in, before they hit the bonding
> > > > device, but after they are copied from the XDP frame. If it's a packet
> > > > corruption issue, that should be visible in the captured packet; you can
> > > > compare with an xdpdump capture to see if there are any differences...
> > >
> > > Pavel,
> > >
> > > Sounds like an issue with the driver in zero-copy mode as it works
> > > fine in copy mode. Maciej and I will take a look at it.
> > >
> > > > -Toke
> > > >
> >
> > First I want to apologize for not responding for such a long time.
> > I had different tasks the previous week and this week went back to this issue.
> > I had to modify the code of the af_xdp driver inside the DPDK so that it loads
> > the XDP program in a way which is compatible with the xdp-dispatcher.
> > Finally, I was able to run our application with the XDP sockets and the xdpdump
> > at the same time.
> >
> > Back to the issue.
> > I just want to say again that we are not binding the XDP sockets to
> > the bonding device.
> > We are binding the sockets to the queues of the physical interfaces
> > "below" the bonding device.
> > My further observation this time is that when the issue happens and
> > the remote device reports
> > the LACP error there is no incoming LACP traffic on the corresponding
> > local port,
> > as seen by the xdump.
> > The tcpdump at the same time sees only outgoing LACP packets and
> > nothing incoming.
> > For example:
> > Remote device
> >                           Local Server
> > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> > And when the remote device reports "received an abnormal LACPDU"
> > for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> > is no incoming LACP traffic
>
> Hey Pavel,
>
> can you also look at /proc/interrupts at eth4 and what ethtool -S shows
> there?
I reproduced the problem but this time the interface with the weird
state was eth0.
It's different every time and sometimes even two of the interfaces are
in such a state.
Here are the requested info while being in this state:
~# ethtool -S eth0 > /tmp/stats0.txt ; sleep 10 ; ethtool -S eth0 >
/tmp/stats1.txt ; diff /tmp/stats0.txt /tmp/stats1.txt
6c6
<      rx_pkts_nic: 81426
---
>      rx_pkts_nic: 81436
8c8
<      rx_bytes_nic: 10286521
---
>      rx_bytes_nic: 10287801
17c17
<      multicast: 72216
---
>      multicast: 72226
48c48
<      rx_no_dma_resources: 1109
---
>      rx_no_dma_resources: 1119

~# cat /proc/interrupts | grep eth0 > /tmp/interrupts0.txt ; sleep 10
; cat /proc/interrupts | grep eth0 > /tmp/interrupts1.txt
interrupts0: 430 3098 64 108199 108199 108199 108199 108199 108199
108199 108201 63 64 1865 108199  61
interrupts1: 435 3103 69 117967 117967  117967 117967 117967  117967
117967 117969 68 69 1870  117967 66

So, it seems that packets are coming on the interface but they don't
reach to the XDP layer and deeper.
rx_no_dma_resources - this counter seems to give clues about a possible issue?

>
> > on eth4 but there is incoming LACP traffic on eth0 and eth2.
> > At the same time, according to the dmesg the kernel sees all of the
> > interfaces as
> > "link status definitely up, 10000 Mbps full duplex".
> > The issue goes aways if I stop the application even without removing
> > the XDP programs
> > from the interfaces - the running xdpdump starts showing the incoming
> > LACP traffic immediately.
> > The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
>
> and the setup is what when doing the link flap? XDP progs are loaded to
> each of the 3 interfaces of bond?
Yes, the same XDP program is loaded on application startup on each one
of the interfaces which are part of bond0 (eth0, eth2, eth4):
# xdp-loader status
CURRENT XDP PROGRAM STATUS:

Interface        Prio  Program name      Mode     ID   Tag
  Chain actions
--------------------------------------------------------------------------------------
lo                     <No XDP program loaded!>
eth0                   xdp_dispatcher    native   1320 90f686eb86991928
 =>              50     x3sp_splitter_func          1329
3b185187f1855c4c  XDP_PASS
eth1                   <No XDP program loaded!>
eth2                   xdp_dispatcher    native   1334 90f686eb86991928
 =>              50     x3sp_splitter_func          1337
3b185187f1855c4c  XDP_PASS
eth3                   <No XDP program loaded!>
eth4                   xdp_dispatcher    native   1342 90f686eb86991928
 =>              50     x3sp_splitter_func          1345
3b185187f1855c4c  XDP_PASS
eth5                   <No XDP program loaded!>
eth6                   <No XDP program loaded!>
eth7                   <No XDP program loaded!>
bond0                  <No XDP program loaded!>
Each of these interfaces is setup to have 16 queues i.e. the application,
through the DPDK machinery, opens 3x16 XSK sockets each bound to the
corresponding queue of the corresponding interface.
~# ethtool -l eth0 # It's same for the other 2 devices
Channel parameters for eth0:
Pre-set maximums:
RX:             n/a
TX:             n/a
Other:          1
Combined:       48
Current hardware settings:
RX:             n/a
TX:             n/a
Other:          1
Combined:       16

>
> > However, I'm not sure what happens with the bound XDP sockets in this case
> > because I haven't tested further.
>
> can you also try to bind xsk sockets before attaching XDP progs?
I looked into the DPDK code again.
The DPDK framework provides callback hooks like eth_rx_queue_setup
and each "driver" implements it as needed. Each Rx/Tx queue of the device is
set up separately. The af_xdp driver currently does this for each Rx
queue separately:
1. configures the umem for the queue
2. loads the XDP program on the corresponding interface, if not already loaded
   (i.e. this happens only once per interface when its first queue is set up).
3. does xsk_socket__create which as far as I looked also internally binds the
socket to the given queue
4. places the socket in the XSKS map of the XDP program via bpf_map_update_elem

So, it seems to me that the change needed will be a bit more involved.
I'm not sure if it'll be possible to hardcode, just for the test, the
program loading and
the placing of all XSK sockets in the map to happen when the setup of the last
"queue" for the given interface is done. I need to think a bit more about this.

>
> >
> > It seems to me that something racy happens when the interfaces go down
> > and back up
> > (visible in the dmesg) when the XDP sockets are bound to their queues.
> > I mean, I'm not sure why the interfaces go down and up but setting
> > only the XDP programs
> > on the interfaces doesn't cause this behavior. So, I assume it's
> > caused by the binding of the XDP sockets.
>
> hmm i'm lost here, above you said you got no incoming traffic on eth4 even
> without xsk sockets being bound?
Probably I've phrased something in a wrong way.
The issue is not observed if I load the XDP program on all interfaces
(eth0, eth2, eth4)
with the xdp-loader:
xdp-loader load --mode native <iface> <path-to-the-xdp-program>
It's not observed probably because there are no interface down/up actions.
I also modified the DPDK "driver" to not remove the XDP program on exit and thus
when the application stops only the XSK sockets are closed but the
program remains
loaded at the interfaces. When I stop this version of the application
while running the
xdpdump at the same time I see that the traffic immediately appears in
the xdpdump.
Also, note that I basically trimmed the XDP program to simply contain
the XSK map
(BPF_MAP_TYPE_XSKMAP) and the function just does "return XDP_PASS;".
I wanted to exclude every possibility for the XDP program to do something wrong.
So, from the above it seems to me that the issue is triggered somehow by the XSK
sockets usage.

>
> > It could be that the issue is not related to the XDP sockets but just
> > to the down/up actions of the interfaces.
> > On the other hand, I'm not sure why the issue is easily reproducible
> > when the zero copy mode is enabled
> > (4 out of 5 tests reproduced the issue).
> > However, when the zero copy is disabled this issue doesn't happen
> > (I tried 10 times in a row and it doesn't happen).
>
> any chances that you could rule out the bond of the picture of this issue?
I'll need to talk to the network support guys because they manage the network
devices and they'll need to change the LACP/Trunk setup of the above
"remote device".
I can't promise that they'll agree though.

> on my side i'll try to play with multiple xsk sockets within same netdev
> served by ixgbe and see if i observe something broken. I recently fixed
> i40e Tx disable timeout issue, so maybe ixgbe has something off in down/up
> actions as you state as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-08 10:59                         ` Pavel Vazharov
@ 2024-02-08 15:47                           ` Pavel Vazharov
  2024-02-09  9:03                             ` Pavel Vazharov
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Vazharov @ 2024-02-08 15:47 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Magnus Karlsson, Toke Høiland-Jørgensen, Jakub Kicinski,
	netdev

On Thu, Feb 8, 2024 at 12:59 PM Pavel Vazharov <pavel@x3me.net> wrote:
>
> On Wed, Feb 7, 2024 at 9:00 PM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > On Wed, Feb 07, 2024 at 05:49:47PM +0200, Pavel Vazharov wrote:
> > > On Mon, Feb 5, 2024 at 9:07 AM Magnus Karlsson
> > > <magnus.karlsson@gmail.com> wrote:
> > > >
> > > > On Tue, 30 Jan 2024 at 15:54, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > > >
> > > > > Pavel Vazharov <pavel@x3me.net> writes:
> > > > >
> > > > > > On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > > > >>
> > > > > >> Pavel Vazharov <pavel@x3me.net> writes:
> > > > > >>
> > > > > >> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
> > > > > >> >>>
> > > > > >> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > >> >>> >
> > > > > >> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> > > > > >> >>> > > > Well, it will be up to your application to ensure that it is not. The
> > > > > >> >>> > > > XDP program will run before the stack sees the LACP management traffic,
> > > > > >> >>> > > > so you will have to take some measure to ensure that any such management
> > > > > >> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
> > > > > >> >>> > > > immediate guess would be that this is the cause of those warnings?
> > > > > >> >>> > >
> > > > > >> >>> > > Thank you for the response.
> > > > > >> >>> > > I already checked the XDP program.
> > > > > >> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> > > > > >> >>> > > Everything else is passed to the Linux kernel.
> > > > > >> >>> > > However, I'll check it again. Just to be sure.
> > > > > >> >>> >
> > > > > >> >>> > What device driver are you using, if you don't mind sharing?
> > > > > >> >>> > The pass thru code path may be much less well tested in AF_XDP
> > > > > >> >>> > drivers.
> > > > > >> >>> These are the kernel version and the drivers for the 3 ports in the
> > > > > >> >>> above bonding.
> > > > > >> >>> ~# uname -a
> > > > > >> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
> > > > > >> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
> > > > > >> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > >> >>>        ...
> > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > >> >>> --
> > > > > >> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > >> >>>         ...
> > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > >> >>> --
> > > > > >> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > >> >>>         ...
> > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > >> >>>
> > > > > >> >>> I think they should be well supported, right?
> > > > > >> >>> So far, it seems that the present usage scenario should work and the
> > > > > >> >>> problem is somewhere in my code.
> > > > > >> >>> I'll double check it again and try to simplify everything in order to
> > > > > >> >>> pinpoint the problem.
> > > > > >> > I've managed to pinpoint that forcing the copying of the packets
> > > > > >> > between the kernel and the user space
> > > > > >> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> > > > > >> > working bonding.
> > > > > >>
> > > > > >> (+Magnus)
> > > > > >>
> > > > > >> Right, okay, that seems to suggest a bug in the internal kernel copying
> > > > > >> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
> > > > > >> any chance you could test with a different driver and see if the same
> > > > > >> issue appears there?
> > > > > >>
> > > > > >> -Toke
> > > > > > No, sorry.
> > > > > > We have only servers with Intel 82599ES with ixgbe drivers.
> > > > > > And one lab machine with Intel 82540EM with igb driver but we can't
> > > > > > set up bonding there
> > > > > > and the problem is not reproducible there.
> > > > >
> > > > > Right, okay. Another thing that may be of some use is to try to capture
> > > > > the packets on the physical devices using tcpdump. That should (I think)
> > > > > show you the LACDPU packets as they come in, before they hit the bonding
> > > > > device, but after they are copied from the XDP frame. If it's a packet
> > > > > corruption issue, that should be visible in the captured packet; you can
> > > > > compare with an xdpdump capture to see if there are any differences...
> > > >
> > > > Pavel,
> > > >
> > > > Sounds like an issue with the driver in zero-copy mode as it works
> > > > fine in copy mode. Maciej and I will take a look at it.
> > > >
> > > > > -Toke
> > > > >
> > >
> > > First I want to apologize for not responding for such a long time.
> > > I had different tasks the previous week and this week went back to this issue.
> > > I had to modify the code of the af_xdp driver inside the DPDK so that it loads
> > > the XDP program in a way which is compatible with the xdp-dispatcher.
> > > Finally, I was able to run our application with the XDP sockets and the xdpdump
> > > at the same time.
> > >
> > > Back to the issue.
> > > I just want to say again that we are not binding the XDP sockets to
> > > the bonding device.
> > > We are binding the sockets to the queues of the physical interfaces
> > > "below" the bonding device.
> > > My further observation this time is that when the issue happens and
> > > the remote device reports
> > > the LACP error there is no incoming LACP traffic on the corresponding
> > > local port,
> > > as seen by the xdump.
> > > The tcpdump at the same time sees only outgoing LACP packets and
> > > nothing incoming.
> > > For example:
> > > Remote device
> > >                           Local Server
> > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> > > And when the remote device reports "received an abnormal LACPDU"
> > > for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> > > is no incoming LACP traffic
> >
> > Hey Pavel,
> >
> > can you also look at /proc/interrupts at eth4 and what ethtool -S shows
> > there?
> I reproduced the problem but this time the interface with the weird
> state was eth0.
> It's different every time and sometimes even two of the interfaces are
> in such a state.
> Here are the requested info while being in this state:
> ~# ethtool -S eth0 > /tmp/stats0.txt ; sleep 10 ; ethtool -S eth0 >
> /tmp/stats1.txt ; diff /tmp/stats0.txt /tmp/stats1.txt
> 6c6
> <      rx_pkts_nic: 81426
> ---
> >      rx_pkts_nic: 81436
> 8c8
> <      rx_bytes_nic: 10286521
> ---
> >      rx_bytes_nic: 10287801
> 17c17
> <      multicast: 72216
> ---
> >      multicast: 72226
> 48c48
> <      rx_no_dma_resources: 1109
> ---
> >      rx_no_dma_resources: 1119
>
> ~# cat /proc/interrupts | grep eth0 > /tmp/interrupts0.txt ; sleep 10
> ; cat /proc/interrupts | grep eth0 > /tmp/interrupts1.txt
> interrupts0: 430 3098 64 108199 108199 108199 108199 108199 108199
> 108199 108201 63 64 1865 108199  61
> interrupts1: 435 3103 69 117967 117967  117967 117967 117967  117967
> 117967 117969 68 69 1870  117967 66
>
> So, it seems that packets are coming on the interface but they don't
> reach to the XDP layer and deeper.
> rx_no_dma_resources - this counter seems to give clues about a possible issue?
>
> >
> > > on eth4 but there is incoming LACP traffic on eth0 and eth2.
> > > At the same time, according to the dmesg the kernel sees all of the
> > > interfaces as
> > > "link status definitely up, 10000 Mbps full duplex".
> > > The issue goes aways if I stop the application even without removing
> > > the XDP programs
> > > from the interfaces - the running xdpdump starts showing the incoming
> > > LACP traffic immediately.
> > > The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
> >
> > and the setup is what when doing the link flap? XDP progs are loaded to
> > each of the 3 interfaces of bond?
> Yes, the same XDP program is loaded on application startup on each one
> of the interfaces which are part of bond0 (eth0, eth2, eth4):
> # xdp-loader status
> CURRENT XDP PROGRAM STATUS:
>
> Interface        Prio  Program name      Mode     ID   Tag
>   Chain actions
> --------------------------------------------------------------------------------------
> lo                     <No XDP program loaded!>
> eth0                   xdp_dispatcher    native   1320 90f686eb86991928
>  =>              50     x3sp_splitter_func          1329
> 3b185187f1855c4c  XDP_PASS
> eth1                   <No XDP program loaded!>
> eth2                   xdp_dispatcher    native   1334 90f686eb86991928
>  =>              50     x3sp_splitter_func          1337
> 3b185187f1855c4c  XDP_PASS
> eth3                   <No XDP program loaded!>
> eth4                   xdp_dispatcher    native   1342 90f686eb86991928
>  =>              50     x3sp_splitter_func          1345
> 3b185187f1855c4c  XDP_PASS
> eth5                   <No XDP program loaded!>
> eth6                   <No XDP program loaded!>
> eth7                   <No XDP program loaded!>
> bond0                  <No XDP program loaded!>
> Each of these interfaces is setup to have 16 queues i.e. the application,
> through the DPDK machinery, opens 3x16 XSK sockets each bound to the
> corresponding queue of the corresponding interface.
> ~# ethtool -l eth0 # It's same for the other 2 devices
> Channel parameters for eth0:
> Pre-set maximums:
> RX:             n/a
> TX:             n/a
> Other:          1
> Combined:       48
> Current hardware settings:
> RX:             n/a
> TX:             n/a
> Other:          1
> Combined:       16
>
> >
> > > However, I'm not sure what happens with the bound XDP sockets in this case
> > > because I haven't tested further.
> >
> > can you also try to bind xsk sockets before attaching XDP progs?
> I looked into the DPDK code again.
> The DPDK framework provides callback hooks like eth_rx_queue_setup
> and each "driver" implements it as needed. Each Rx/Tx queue of the device is
> set up separately. The af_xdp driver currently does this for each Rx
> queue separately:
> 1. configures the umem for the queue
> 2. loads the XDP program on the corresponding interface, if not already loaded
>    (i.e. this happens only once per interface when its first queue is set up).
> 3. does xsk_socket__create which as far as I looked also internally binds the
> socket to the given queue
> 4. places the socket in the XSKS map of the XDP program via bpf_map_update_elem
>
> So, it seems to me that the change needed will be a bit more involved.
> I'm not sure if it'll be possible to hardcode, just for the test, the
> program loading and
> the placing of all XSK sockets in the map to happen when the setup of the last
> "queue" for the given interface is done. I need to think a bit more about this.
Changed the code of the DPDK af_xdp "driver" to create and bind all of
the XSK sockets
to the queues of the corresponding interface and after that, after the
initialization of the
last XSK socket, I added the logic for the attachment of the XDP
program to the interface
and the population of the XSK map with the created sockets.
The issue was still there but it was kind of harder to reproduce - it
happened once for 5
starts of the application.

>
> >
> > >
> > > It seems to me that something racy happens when the interfaces go down
> > > and back up
> > > (visible in the dmesg) when the XDP sockets are bound to their queues.
> > > I mean, I'm not sure why the interfaces go down and up but setting
> > > only the XDP programs
> > > on the interfaces doesn't cause this behavior. So, I assume it's
> > > caused by the binding of the XDP sockets.
> >
> > hmm i'm lost here, above you said you got no incoming traffic on eth4 even
> > without xsk sockets being bound?
> Probably I've phrased something in a wrong way.
> The issue is not observed if I load the XDP program on all interfaces
> (eth0, eth2, eth4)
> with the xdp-loader:
> xdp-loader load --mode native <iface> <path-to-the-xdp-program>
> It's not observed probably because there are no interface down/up actions.
> I also modified the DPDK "driver" to not remove the XDP program on exit and thus
> when the application stops only the XSK sockets are closed but the
> program remains
> loaded at the interfaces. When I stop this version of the application
> while running the
> xdpdump at the same time I see that the traffic immediately appears in
> the xdpdump.
> Also, note that I basically trimmed the XDP program to simply contain
> the XSK map
> (BPF_MAP_TYPE_XSKMAP) and the function just does "return XDP_PASS;".
> I wanted to exclude every possibility for the XDP program to do something wrong.
> So, from the above it seems to me that the issue is triggered somehow by the XSK
> sockets usage.
>
> >
> > > It could be that the issue is not related to the XDP sockets but just
> > > to the down/up actions of the interfaces.
> > > On the other hand, I'm not sure why the issue is easily reproducible
> > > when the zero copy mode is enabled
> > > (4 out of 5 tests reproduced the issue).
> > > However, when the zero copy is disabled this issue doesn't happen
> > > (I tried 10 times in a row and it doesn't happen).
> >
> > any chances that you could rule out the bond of the picture of this issue?
> I'll need to talk to the network support guys because they manage the network
> devices and they'll need to change the LACP/Trunk setup of the above
> "remote device".
> I can't promise that they'll agree though.
>
> > on my side i'll try to play with multiple xsk sockets within same netdev
> > served by ixgbe and see if i observe something broken. I recently fixed
> > i40e Tx disable timeout issue, so maybe ixgbe has something off in down/up
> > actions as you state as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-08 15:47                           ` Pavel Vazharov
@ 2024-02-09  9:03                             ` Pavel Vazharov
  2024-02-09 18:37                               ` Maciej Fijalkowski
  2024-02-16 17:24                               ` Maciej Fijalkowski
  0 siblings, 2 replies; 22+ messages in thread
From: Pavel Vazharov @ 2024-02-09  9:03 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Magnus Karlsson, Toke Høiland-Jørgensen, Jakub Kicinski,
	netdev

On Thu, Feb 8, 2024 at 5:47 PM Pavel Vazharov <pavel@x3me.net> wrote:
>
> On Thu, Feb 8, 2024 at 12:59 PM Pavel Vazharov <pavel@x3me.net> wrote:
> >
> > On Wed, Feb 7, 2024 at 9:00 PM Maciej Fijalkowski
> > <maciej.fijalkowski@intel.com> wrote:
> > >
> > > On Wed, Feb 07, 2024 at 05:49:47PM +0200, Pavel Vazharov wrote:
> > > > On Mon, Feb 5, 2024 at 9:07 AM Magnus Karlsson
> > > > <magnus.karlsson@gmail.com> wrote:
> > > > >
> > > > > On Tue, 30 Jan 2024 at 15:54, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > > > >
> > > > > > Pavel Vazharov <pavel@x3me.net> writes:
> > > > > >
> > > > > > > On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > > > > >>
> > > > > > >> Pavel Vazharov <pavel@x3me.net> writes:
> > > > > > >>
> > > > > > >> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
> > > > > > >> >>>
> > > > > > >> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > >> >>> >
> > > > > > >> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> > > > > > >> >>> > > > Well, it will be up to your application to ensure that it is not. The
> > > > > > >> >>> > > > XDP program will run before the stack sees the LACP management traffic,
> > > > > > >> >>> > > > so you will have to take some measure to ensure that any such management
> > > > > > >> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
> > > > > > >> >>> > > > immediate guess would be that this is the cause of those warnings?
> > > > > > >> >>> > >
> > > > > > >> >>> > > Thank you for the response.
> > > > > > >> >>> > > I already checked the XDP program.
> > > > > > >> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> > > > > > >> >>> > > Everything else is passed to the Linux kernel.
> > > > > > >> >>> > > However, I'll check it again. Just to be sure.
> > > > > > >> >>> >
> > > > > > >> >>> > What device driver are you using, if you don't mind sharing?
> > > > > > >> >>> > The pass thru code path may be much less well tested in AF_XDP
> > > > > > >> >>> > drivers.
> > > > > > >> >>> These are the kernel version and the drivers for the 3 ports in the
> > > > > > >> >>> above bonding.
> > > > > > >> >>> ~# uname -a
> > > > > > >> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
> > > > > > >> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
> > > > > > >> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > > >> >>>        ...
> > > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > > >> >>> --
> > > > > > >> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > > >> >>>         ...
> > > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > > >> >>> --
> > > > > > >> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > > >> >>>         ...
> > > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > > >> >>>
> > > > > > >> >>> I think they should be well supported, right?
> > > > > > >> >>> So far, it seems that the present usage scenario should work and the
> > > > > > >> >>> problem is somewhere in my code.
> > > > > > >> >>> I'll double check it again and try to simplify everything in order to
> > > > > > >> >>> pinpoint the problem.
> > > > > > >> > I've managed to pinpoint that forcing the copying of the packets
> > > > > > >> > between the kernel and the user space
> > > > > > >> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> > > > > > >> > working bonding.
> > > > > > >>
> > > > > > >> (+Magnus)
> > > > > > >>
> > > > > > >> Right, okay, that seems to suggest a bug in the internal kernel copying
> > > > > > >> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
> > > > > > >> any chance you could test with a different driver and see if the same
> > > > > > >> issue appears there?
> > > > > > >>
> > > > > > >> -Toke
> > > > > > > No, sorry.
> > > > > > > We have only servers with Intel 82599ES with ixgbe drivers.
> > > > > > > And one lab machine with Intel 82540EM with igb driver but we can't
> > > > > > > set up bonding there
> > > > > > > and the problem is not reproducible there.
> > > > > >
> > > > > > Right, okay. Another thing that may be of some use is to try to capture
> > > > > > the packets on the physical devices using tcpdump. That should (I think)
> > > > > > show you the LACDPU packets as they come in, before they hit the bonding
> > > > > > device, but after they are copied from the XDP frame. If it's a packet
> > > > > > corruption issue, that should be visible in the captured packet; you can
> > > > > > compare with an xdpdump capture to see if there are any differences...
> > > > >
> > > > > Pavel,
> > > > >
> > > > > Sounds like an issue with the driver in zero-copy mode as it works
> > > > > fine in copy mode. Maciej and I will take a look at it.
> > > > >
> > > > > > -Toke
> > > > > >
> > > >
> > > > First I want to apologize for not responding for such a long time.
> > > > I had different tasks the previous week and this week went back to this issue.
> > > > I had to modify the code of the af_xdp driver inside the DPDK so that it loads
> > > > the XDP program in a way which is compatible with the xdp-dispatcher.
> > > > Finally, I was able to run our application with the XDP sockets and the xdpdump
> > > > at the same time.
> > > >
> > > > Back to the issue.
> > > > I just want to say again that we are not binding the XDP sockets to
> > > > the bonding device.
> > > > We are binding the sockets to the queues of the physical interfaces
> > > > "below" the bonding device.
> > > > My further observation this time is that when the issue happens and
> > > > the remote device reports
> > > > the LACP error there is no incoming LACP traffic on the corresponding
> > > > local port,
> > > > as seen by the xdump.
> > > > The tcpdump at the same time sees only outgoing LACP packets and
> > > > nothing incoming.
> > > > For example:
> > > > Remote device
> > > >                           Local Server
> > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> > > > And when the remote device reports "received an abnormal LACPDU"
> > > > for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> > > > is no incoming LACP traffic
> > >
> > > Hey Pavel,
> > >
> > > can you also look at /proc/interrupts at eth4 and what ethtool -S shows
> > > there?
> > I reproduced the problem but this time the interface with the weird
> > state was eth0.
> > It's different every time and sometimes even two of the interfaces are
> > in such a state.
> > Here are the requested info while being in this state:
> > ~# ethtool -S eth0 > /tmp/stats0.txt ; sleep 10 ; ethtool -S eth0 >
> > /tmp/stats1.txt ; diff /tmp/stats0.txt /tmp/stats1.txt
> > 6c6
> > <      rx_pkts_nic: 81426
> > ---
> > >      rx_pkts_nic: 81436
> > 8c8
> > <      rx_bytes_nic: 10286521
> > ---
> > >      rx_bytes_nic: 10287801
> > 17c17
> > <      multicast: 72216
> > ---
> > >      multicast: 72226
> > 48c48
> > <      rx_no_dma_resources: 1109
> > ---
> > >      rx_no_dma_resources: 1119
> >
> > ~# cat /proc/interrupts | grep eth0 > /tmp/interrupts0.txt ; sleep 10
> > ; cat /proc/interrupts | grep eth0 > /tmp/interrupts1.txt
> > interrupts0: 430 3098 64 108199 108199 108199 108199 108199 108199
> > 108199 108201 63 64 1865 108199  61
> > interrupts1: 435 3103 69 117967 117967  117967 117967 117967  117967
> > 117967 117969 68 69 1870  117967 66
> >
> > So, it seems that packets are coming on the interface but they don't
> > reach to the XDP layer and deeper.
> > rx_no_dma_resources - this counter seems to give clues about a possible issue?
> >
> > >
> > > > on eth4 but there is incoming LACP traffic on eth0 and eth2.
> > > > At the same time, according to the dmesg the kernel sees all of the
> > > > interfaces as
> > > > "link status definitely up, 10000 Mbps full duplex".
> > > > The issue goes aways if I stop the application even without removing
> > > > the XDP programs
> > > > from the interfaces - the running xdpdump starts showing the incoming
> > > > LACP traffic immediately.
> > > > The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
> > >
> > > and the setup is what when doing the link flap? XDP progs are loaded to
> > > each of the 3 interfaces of bond?
> > Yes, the same XDP program is loaded on application startup on each one
> > of the interfaces which are part of bond0 (eth0, eth2, eth4):
> > # xdp-loader status
> > CURRENT XDP PROGRAM STATUS:
> >
> > Interface        Prio  Program name      Mode     ID   Tag
> >   Chain actions
> > --------------------------------------------------------------------------------------
> > lo                     <No XDP program loaded!>
> > eth0                   xdp_dispatcher    native   1320 90f686eb86991928
> >  =>              50     x3sp_splitter_func          1329
> > 3b185187f1855c4c  XDP_PASS
> > eth1                   <No XDP program loaded!>
> > eth2                   xdp_dispatcher    native   1334 90f686eb86991928
> >  =>              50     x3sp_splitter_func          1337
> > 3b185187f1855c4c  XDP_PASS
> > eth3                   <No XDP program loaded!>
> > eth4                   xdp_dispatcher    native   1342 90f686eb86991928
> >  =>              50     x3sp_splitter_func          1345
> > 3b185187f1855c4c  XDP_PASS
> > eth5                   <No XDP program loaded!>
> > eth6                   <No XDP program loaded!>
> > eth7                   <No XDP program loaded!>
> > bond0                  <No XDP program loaded!>
> > Each of these interfaces is setup to have 16 queues i.e. the application,
> > through the DPDK machinery, opens 3x16 XSK sockets each bound to the
> > corresponding queue of the corresponding interface.
> > ~# ethtool -l eth0 # It's same for the other 2 devices
> > Channel parameters for eth0:
> > Pre-set maximums:
> > RX:             n/a
> > TX:             n/a
> > Other:          1
> > Combined:       48
> > Current hardware settings:
> > RX:             n/a
> > TX:             n/a
> > Other:          1
> > Combined:       16
> >
> > >
> > > > However, I'm not sure what happens with the bound XDP sockets in this case
> > > > because I haven't tested further.
> > >
> > > can you also try to bind xsk sockets before attaching XDP progs?
> > I looked into the DPDK code again.
> > The DPDK framework provides callback hooks like eth_rx_queue_setup
> > and each "driver" implements it as needed. Each Rx/Tx queue of the device is
> > set up separately. The af_xdp driver currently does this for each Rx
> > queue separately:
> > 1. configures the umem for the queue
> > 2. loads the XDP program on the corresponding interface, if not already loaded
> >    (i.e. this happens only once per interface when its first queue is set up).
> > 3. does xsk_socket__create which as far as I looked also internally binds the
> > socket to the given queue
> > 4. places the socket in the XSKS map of the XDP program via bpf_map_update_elem
> >
> > So, it seems to me that the change needed will be a bit more involved.
> > I'm not sure if it'll be possible to hardcode, just for the test, the
> > program loading and
> > the placing of all XSK sockets in the map to happen when the setup of the last
> > "queue" for the given interface is done. I need to think a bit more about this.
> Changed the code of the DPDK af_xdp "driver" to create and bind all of
> the XSK sockets
> to the queues of the corresponding interface and after that, after the
> initialization of the
> last XSK socket, I added the logic for the attachment of the XDP
> program to the interface
> and the population of the XSK map with the created sockets.
> The issue was still there but it was kind of harder to reproduce - it
> happened once for 5
> starts of the application.
>
> >
> > >
> > > >
> > > > It seems to me that something racy happens when the interfaces go down
> > > > and back up
> > > > (visible in the dmesg) when the XDP sockets are bound to their queues.
> > > > I mean, I'm not sure why the interfaces go down and up but setting
> > > > only the XDP programs
> > > > on the interfaces doesn't cause this behavior. So, I assume it's
> > > > caused by the binding of the XDP sockets.
> > >
> > > hmm i'm lost here, above you said you got no incoming traffic on eth4 even
> > > without xsk sockets being bound?
> > Probably I've phrased something in a wrong way.
> > The issue is not observed if I load the XDP program on all interfaces
> > (eth0, eth2, eth4)
> > with the xdp-loader:
> > xdp-loader load --mode native <iface> <path-to-the-xdp-program>
> > It's not observed probably because there are no interface down/up actions.
> > I also modified the DPDK "driver" to not remove the XDP program on exit and thus
> > when the application stops only the XSK sockets are closed but the
> > program remains
> > loaded at the interfaces. When I stop this version of the application
> > while running the
> > xdpdump at the same time I see that the traffic immediately appears in
> > the xdpdump.
> > Also, note that I basically trimmed the XDP program to simply contain
> > the XSK map
> > (BPF_MAP_TYPE_XSKMAP) and the function just does "return XDP_PASS;".
> > I wanted to exclude every possibility for the XDP program to do something wrong.
> > So, from the above it seems to me that the issue is triggered somehow by the XSK
> > sockets usage.
> >
> > >
> > > > It could be that the issue is not related to the XDP sockets but just
> > > > to the down/up actions of the interfaces.
> > > > On the other hand, I'm not sure why the issue is easily reproducible
> > > > when the zero copy mode is enabled
> > > > (4 out of 5 tests reproduced the issue).
> > > > However, when the zero copy is disabled this issue doesn't happen
> > > > (I tried 10 times in a row and it doesn't happen).
> > >
> > > any chances that you could rule out the bond of the picture of this issue?
> > I'll need to talk to the network support guys because they manage the network
> > devices and they'll need to change the LACP/Trunk setup of the above
> > "remote device".
> > I can't promise that they'll agree though.
We changed the setup and I did the tests with a single port, no
bonding involved.
The port was configured with 16 queues (and 16 XSK sockets bound to them).
I tested with about 100 Mbps of traffic to not break lots of users.
During the tests I observed the traffic on the real time graph on the
remote device port
connected to the server machine where the application was running in
L3 forward mode:
- with zero copy enabled the traffic to the server was about 100 Mbps
but the traffic
coming out of the server was about 50 Mbps (i.e. half of it).
- with no zero copy the traffic in both directions was the same - the
two graphs matched perfectly
Nothing else was changed during the both tests, only the ZC option.
Can I check some stats or something else for this testing scenario
which could be
used to reveal more info about the issue?

> >
> > > on my side i'll try to play with multiple xsk sockets within same netdev
> > > served by ixgbe and see if i observe something broken. I recently fixed
> > > i40e Tx disable timeout issue, so maybe ixgbe has something off in down/up
> > > actions as you state as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-09  9:03                             ` Pavel Vazharov
@ 2024-02-09 18:37                               ` Maciej Fijalkowski
  2024-02-16 15:18                                 ` Maciej Fijalkowski
  2024-02-16 17:24                               ` Maciej Fijalkowski
  1 sibling, 1 reply; 22+ messages in thread
From: Maciej Fijalkowski @ 2024-02-09 18:37 UTC (permalink / raw)
  To: Pavel Vazharov
  Cc: Magnus Karlsson, Toke Høiland-Jørgensen, Jakub Kicinski,
	netdev

On Fri, Feb 09, 2024 at 11:03:51AM +0200, Pavel Vazharov wrote:
> On Thu, Feb 8, 2024 at 5:47 PM Pavel Vazharov <pavel@x3me.net> wrote:
> >
> > On Thu, Feb 8, 2024 at 12:59 PM Pavel Vazharov <pavel@x3me.net> wrote:
> > >
> > > On Wed, Feb 7, 2024 at 9:00 PM Maciej Fijalkowski
> > > <maciej.fijalkowski@intel.com> wrote:
> > > >
> > > > On Wed, Feb 07, 2024 at 05:49:47PM +0200, Pavel Vazharov wrote:
> > > > > On Mon, Feb 5, 2024 at 9:07 AM Magnus Karlsson
> > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, 30 Jan 2024 at 15:54, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > > > > >
> > > > > > > Pavel Vazharov <pavel@x3me.net> writes:
> > > > > > >
> > > > > > > > On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > > > > > >>
> > > > > > > >> Pavel Vazharov <pavel@x3me.net> writes:
> > > > > > > >>
> > > > > > > >> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
> > > > > > > >> >>>
> > > > > > > >> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > >> >>> >
> > > > > > > >> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> > > > > > > >> >>> > > > Well, it will be up to your application to ensure that it is not. The
> > > > > > > >> >>> > > > XDP program will run before the stack sees the LACP management traffic,
> > > > > > > >> >>> > > > so you will have to take some measure to ensure that any such management
> > > > > > > >> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
> > > > > > > >> >>> > > > immediate guess would be that this is the cause of those warnings?
> > > > > > > >> >>> > >
> > > > > > > >> >>> > > Thank you for the response.
> > > > > > > >> >>> > > I already checked the XDP program.
> > > > > > > >> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> > > > > > > >> >>> > > Everything else is passed to the Linux kernel.
> > > > > > > >> >>> > > However, I'll check it again. Just to be sure.
> > > > > > > >> >>> >
> > > > > > > >> >>> > What device driver are you using, if you don't mind sharing?
> > > > > > > >> >>> > The pass thru code path may be much less well tested in AF_XDP
> > > > > > > >> >>> > drivers.
> > > > > > > >> >>> These are the kernel version and the drivers for the 3 ports in the
> > > > > > > >> >>> above bonding.
> > > > > > > >> >>> ~# uname -a
> > > > > > > >> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
> > > > > > > >> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
> > > > > > > >> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > > > >> >>>        ...
> > > > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > > > >> >>> --
> > > > > > > >> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > > > >> >>>         ...
> > > > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > > > >> >>> --
> > > > > > > >> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > > > >> >>>         ...
> > > > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > > > >> >>>
> > > > > > > >> >>> I think they should be well supported, right?
> > > > > > > >> >>> So far, it seems that the present usage scenario should work and the
> > > > > > > >> >>> problem is somewhere in my code.
> > > > > > > >> >>> I'll double check it again and try to simplify everything in order to
> > > > > > > >> >>> pinpoint the problem.
> > > > > > > >> > I've managed to pinpoint that forcing the copying of the packets
> > > > > > > >> > between the kernel and the user space
> > > > > > > >> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> > > > > > > >> > working bonding.
> > > > > > > >>
> > > > > > > >> (+Magnus)
> > > > > > > >>
> > > > > > > >> Right, okay, that seems to suggest a bug in the internal kernel copying
> > > > > > > >> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
> > > > > > > >> any chance you could test with a different driver and see if the same
> > > > > > > >> issue appears there?
> > > > > > > >>
> > > > > > > >> -Toke
> > > > > > > > No, sorry.
> > > > > > > > We have only servers with Intel 82599ES with ixgbe drivers.
> > > > > > > > And one lab machine with Intel 82540EM with igb driver but we can't
> > > > > > > > set up bonding there
> > > > > > > > and the problem is not reproducible there.
> > > > > > >
> > > > > > > Right, okay. Another thing that may be of some use is to try to capture
> > > > > > > the packets on the physical devices using tcpdump. That should (I think)
> > > > > > > show you the LACDPU packets as they come in, before they hit the bonding
> > > > > > > device, but after they are copied from the XDP frame. If it's a packet
> > > > > > > corruption issue, that should be visible in the captured packet; you can
> > > > > > > compare with an xdpdump capture to see if there are any differences...
> > > > > >
> > > > > > Pavel,
> > > > > >
> > > > > > Sounds like an issue with the driver in zero-copy mode as it works
> > > > > > fine in copy mode. Maciej and I will take a look at it.
> > > > > >
> > > > > > > -Toke
> > > > > > >
> > > > >
> > > > > First I want to apologize for not responding for such a long time.
> > > > > I had different tasks the previous week and this week went back to this issue.
> > > > > I had to modify the code of the af_xdp driver inside the DPDK so that it loads
> > > > > the XDP program in a way which is compatible with the xdp-dispatcher.
> > > > > Finally, I was able to run our application with the XDP sockets and the xdpdump
> > > > > at the same time.
> > > > >
> > > > > Back to the issue.
> > > > > I just want to say again that we are not binding the XDP sockets to
> > > > > the bonding device.
> > > > > We are binding the sockets to the queues of the physical interfaces
> > > > > "below" the bonding device.
> > > > > My further observation this time is that when the issue happens and
> > > > > the remote device reports
> > > > > the LACP error there is no incoming LACP traffic on the corresponding
> > > > > local port,
> > > > > as seen by the xdump.
> > > > > The tcpdump at the same time sees only outgoing LACP packets and
> > > > > nothing incoming.
> > > > > For example:
> > > > > Remote device
> > > > >                           Local Server
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> > > > > And when the remote device reports "received an abnormal LACPDU"
> > > > > for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> > > > > is no incoming LACP traffic
> > > >
> > > > Hey Pavel,
> > > >
> > > > can you also look at /proc/interrupts at eth4 and what ethtool -S shows
> > > > there?
> > > I reproduced the problem but this time the interface with the weird
> > > state was eth0.
> > > It's different every time and sometimes even two of the interfaces are
> > > in such a state.
> > > Here are the requested info while being in this state:
> > > ~# ethtool -S eth0 > /tmp/stats0.txt ; sleep 10 ; ethtool -S eth0 >
> > > /tmp/stats1.txt ; diff /tmp/stats0.txt /tmp/stats1.txt
> > > 6c6
> > > <      rx_pkts_nic: 81426
> > > ---
> > > >      rx_pkts_nic: 81436
> > > 8c8
> > > <      rx_bytes_nic: 10286521
> > > ---
> > > >      rx_bytes_nic: 10287801
> > > 17c17
> > > <      multicast: 72216
> > > ---
> > > >      multicast: 72226
> > > 48c48
> > > <      rx_no_dma_resources: 1109
> > > ---
> > > >      rx_no_dma_resources: 1119
> > >
> > > ~# cat /proc/interrupts | grep eth0 > /tmp/interrupts0.txt ; sleep 10
> > > ; cat /proc/interrupts | grep eth0 > /tmp/interrupts1.txt
> > > interrupts0: 430 3098 64 108199 108199 108199 108199 108199 108199
> > > 108199 108201 63 64 1865 108199  61
> > > interrupts1: 435 3103 69 117967 117967  117967 117967 117967  117967
> > > 117967 117969 68 69 1870  117967 66
> > >
> > > So, it seems that packets are coming on the interface but they don't
> > > reach to the XDP layer and deeper.
> > > rx_no_dma_resources - this counter seems to give clues about a possible issue?
> > >
> > > >
> > > > > on eth4 but there is incoming LACP traffic on eth0 and eth2.
> > > > > At the same time, according to the dmesg the kernel sees all of the
> > > > > interfaces as
> > > > > "link status definitely up, 10000 Mbps full duplex".
> > > > > The issue goes aways if I stop the application even without removing
> > > > > the XDP programs
> > > > > from the interfaces - the running xdpdump starts showing the incoming
> > > > > LACP traffic immediately.
> > > > > The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
> > > >
> > > > and the setup is what when doing the link flap? XDP progs are loaded to
> > > > each of the 3 interfaces of bond?
> > > Yes, the same XDP program is loaded on application startup on each one
> > > of the interfaces which are part of bond0 (eth0, eth2, eth4):
> > > # xdp-loader status
> > > CURRENT XDP PROGRAM STATUS:
> > >
> > > Interface        Prio  Program name      Mode     ID   Tag
> > >   Chain actions
> > > --------------------------------------------------------------------------------------
> > > lo                     <No XDP program loaded!>
> > > eth0                   xdp_dispatcher    native   1320 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1329
> > > 3b185187f1855c4c  XDP_PASS
> > > eth1                   <No XDP program loaded!>
> > > eth2                   xdp_dispatcher    native   1334 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1337
> > > 3b185187f1855c4c  XDP_PASS
> > > eth3                   <No XDP program loaded!>
> > > eth4                   xdp_dispatcher    native   1342 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1345
> > > 3b185187f1855c4c  XDP_PASS
> > > eth5                   <No XDP program loaded!>
> > > eth6                   <No XDP program loaded!>
> > > eth7                   <No XDP program loaded!>
> > > bond0                  <No XDP program loaded!>
> > > Each of these interfaces is setup to have 16 queues i.e. the application,
> > > through the DPDK machinery, opens 3x16 XSK sockets each bound to the
> > > corresponding queue of the corresponding interface.
> > > ~# ethtool -l eth0 # It's same for the other 2 devices
> > > Channel parameters for eth0:
> > > Pre-set maximums:
> > > RX:             n/a
> > > TX:             n/a
> > > Other:          1
> > > Combined:       48
> > > Current hardware settings:
> > > RX:             n/a
> > > TX:             n/a
> > > Other:          1
> > > Combined:       16
> > >
> > > >
> > > > > However, I'm not sure what happens with the bound XDP sockets in this case
> > > > > because I haven't tested further.
> > > >
> > > > can you also try to bind xsk sockets before attaching XDP progs?
> > > I looked into the DPDK code again.
> > > The DPDK framework provides callback hooks like eth_rx_queue_setup
> > > and each "driver" implements it as needed. Each Rx/Tx queue of the device is
> > > set up separately. The af_xdp driver currently does this for each Rx
> > > queue separately:
> > > 1. configures the umem for the queue
> > > 2. loads the XDP program on the corresponding interface, if not already loaded
> > >    (i.e. this happens only once per interface when its first queue is set up).
> > > 3. does xsk_socket__create which as far as I looked also internally binds the
> > > socket to the given queue
> > > 4. places the socket in the XSKS map of the XDP program via bpf_map_update_elem
> > >
> > > So, it seems to me that the change needed will be a bit more involved.
> > > I'm not sure if it'll be possible to hardcode, just for the test, the
> > > program loading and
> > > the placing of all XSK sockets in the map to happen when the setup of the last
> > > "queue" for the given interface is done. I need to think a bit more about this.
> > Changed the code of the DPDK af_xdp "driver" to create and bind all of
> > the XSK sockets
> > to the queues of the corresponding interface and after that, after the
> > initialization of the
> > last XSK socket, I added the logic for the attachment of the XDP
> > program to the interface
> > and the population of the XSK map with the created sockets.
> > The issue was still there but it was kind of harder to reproduce - it
> > happened once for 5
> > starts of the application.
> >
> > >
> > > >
> > > > >
> > > > > It seems to me that something racy happens when the interfaces go down
> > > > > and back up
> > > > > (visible in the dmesg) when the XDP sockets are bound to their queues.
> > > > > I mean, I'm not sure why the interfaces go down and up but setting
> > > > > only the XDP programs
> > > > > on the interfaces doesn't cause this behavior. So, I assume it's
> > > > > caused by the binding of the XDP sockets.
> > > >
> > > > hmm i'm lost here, above you said you got no incoming traffic on eth4 even
> > > > without xsk sockets being bound?
> > > Probably I've phrased something in a wrong way.
> > > The issue is not observed if I load the XDP program on all interfaces
> > > (eth0, eth2, eth4)
> > > with the xdp-loader:
> > > xdp-loader load --mode native <iface> <path-to-the-xdp-program>
> > > It's not observed probably because there are no interface down/up actions.
> > > I also modified the DPDK "driver" to not remove the XDP program on exit and thus
> > > when the application stops only the XSK sockets are closed but the
> > > program remains
> > > loaded at the interfaces. When I stop this version of the application
> > > while running the
> > > xdpdump at the same time I see that the traffic immediately appears in
> > > the xdpdump.
> > > Also, note that I basically trimmed the XDP program to simply contain
> > > the XSK map
> > > (BPF_MAP_TYPE_XSKMAP) and the function just does "return XDP_PASS;".
> > > I wanted to exclude every possibility for the XDP program to do something wrong.
> > > So, from the above it seems to me that the issue is triggered somehow by the XSK
> > > sockets usage.
> > >
> > > >
> > > > > It could be that the issue is not related to the XDP sockets but just
> > > > > to the down/up actions of the interfaces.
> > > > > On the other hand, I'm not sure why the issue is easily reproducible
> > > > > when the zero copy mode is enabled
> > > > > (4 out of 5 tests reproduced the issue).
> > > > > However, when the zero copy is disabled this issue doesn't happen
> > > > > (I tried 10 times in a row and it doesn't happen).
> > > >
> > > > any chances that you could rule out the bond of the picture of this issue?
> > > I'll need to talk to the network support guys because they manage the network
> > > devices and they'll need to change the LACP/Trunk setup of the above
> > > "remote device".
> > > I can't promise that they'll agree though.
> We changed the setup and I did the tests with a single port, no
> bonding involved.
> The port was configured with 16 queues (and 16 XSK sockets bound to them).
> I tested with about 100 Mbps of traffic to not break lots of users.
> During the tests I observed the traffic on the real time graph on the
> remote device port
> connected to the server machine where the application was running in
> L3 forward mode:
> - with zero copy enabled the traffic to the server was about 100 Mbps
> but the traffic
> coming out of the server was about 50 Mbps (i.e. half of it).
> - with no zero copy the traffic in both directions was the same - the
> two graphs matched perfectly
> Nothing else was changed during the both tests, only the ZC option.
> Can I check some stats or something else for this testing scenario
> which could be
> used to reveal more info about the issue?

Hard to say, that might be yet another issue. Ixgbe needs some care in ZC
support, I even spotted some other issue where device got into endless
reset loop when I was working on 3 XSK sockets and I issued link flap.

I'll be looking into those problems next week and I'll keep you informed.

> 
> > >
> > > > on my side i'll try to play with multiple xsk sockets within same netdev
> > > > served by ixgbe and see if i observe something broken. I recently fixed
> > > > i40e Tx disable timeout issue, so maybe ixgbe has something off in down/up
> > > > actions as you state as well.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-09 18:37                               ` Maciej Fijalkowski
@ 2024-02-16 15:18                                 ` Maciej Fijalkowski
  0 siblings, 0 replies; 22+ messages in thread
From: Maciej Fijalkowski @ 2024-02-16 15:18 UTC (permalink / raw)
  To: Pavel Vazharov
  Cc: Magnus Karlsson, Toke Høiland-Jørgensen, Jakub Kicinski,
	netdev

On Fri, Feb 09, 2024 at 07:37:56PM +0100, Maciej Fijalkowski wrote:
> On Fri, Feb 09, 2024 at 11:03:51AM +0200, Pavel Vazharov wrote:
> > On Thu, Feb 8, 2024 at 5:47 PM Pavel Vazharov <pavel@x3me.net> wrote:
> > >
> > > On Thu, Feb 8, 2024 at 12:59 PM Pavel Vazharov <pavel@x3me.net> wrote:
> > > >
> > > > On Wed, Feb 7, 2024 at 9:00 PM Maciej Fijalkowski
> > > > <maciej.fijalkowski@intel.com> wrote:
> > > > >
> > > > > On Wed, Feb 07, 2024 at 05:49:47PM +0200, Pavel Vazharov wrote:
> > > > > > On Mon, Feb 5, 2024 at 9:07 AM Magnus Karlsson
> > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, 30 Jan 2024 at 15:54, Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > > > > > >
> > > > > > > > Pavel Vazharov <pavel@x3me.net> writes:
> > > > > > > >
> > > > > > > > > On Tue, Jan 30, 2024 at 4:32 PM Toke Høiland-Jørgensen <toke@kernel.org> wrote:
> > > > > > > > >>
> > > > > > > > >> Pavel Vazharov <pavel@x3me.net> writes:
> > > > > > > > >>
> > > > > > > > >> >> On Sat, Jan 27, 2024 at 7:08 AM Pavel Vazharov <pavel@x3me.net> wrote:
> > > > > > > > >> >>>
> > > > > > > > >> >>> On Sat, Jan 27, 2024 at 6:39 AM Jakub Kicinski <kuba@kernel.org> wrote:
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > On Sat, 27 Jan 2024 05:58:55 +0200 Pavel Vazharov wrote:
> > > > > > > > >> >>> > > > Well, it will be up to your application to ensure that it is not. The
> > > > > > > > >> >>> > > > XDP program will run before the stack sees the LACP management traffic,
> > > > > > > > >> >>> > > > so you will have to take some measure to ensure that any such management
> > > > > > > > >> >>> > > > traffic gets routed to the stack instead of to the DPDK application. My
> > > > > > > > >> >>> > > > immediate guess would be that this is the cause of those warnings?
> > > > > > > > >> >>> > >
> > > > > > > > >> >>> > > Thank you for the response.
> > > > > > > > >> >>> > > I already checked the XDP program.
> > > > > > > > >> >>> > > It redirects particular pools of IPv4 (TCP or UDP) traffic to the application.
> > > > > > > > >> >>> > > Everything else is passed to the Linux kernel.
> > > > > > > > >> >>> > > However, I'll check it again. Just to be sure.
> > > > > > > > >> >>> >
> > > > > > > > >> >>> > What device driver are you using, if you don't mind sharing?
> > > > > > > > >> >>> > The pass thru code path may be much less well tested in AF_XDP
> > > > > > > > >> >>> > drivers.
> > > > > > > > >> >>> These are the kernel version and the drivers for the 3 ports in the
> > > > > > > > >> >>> above bonding.
> > > > > > > > >> >>> ~# uname -a
> > > > > > > > >> >>> Linux 6.3.2 #1 SMP Wed May 17 08:17:50 UTC 2023 x86_64 GNU/Linux
> > > > > > > > >> >>> ~# lspci -v | grep -A 16 -e 1b:00.0 -e 3b:00.0 -e 5e:00.0
> > > > > > > > >> >>> 1b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > > > > >> >>>        ...
> > > > > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > > > > >> >>> --
> > > > > > > > >> >>> 3b:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > > > > >> >>>         ...
> > > > > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > > > > >> >>> --
> > > > > > > > >> >>> 5e:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > >> >>> SFI/SFP+ Network Connection (rev 01)
> > > > > > > > >> >>>         ...
> > > > > > > > >> >>>         Kernel driver in use: ixgbe
> > > > > > > > >> >>>
> > > > > > > > >> >>> I think they should be well supported, right?
> > > > > > > > >> >>> So far, it seems that the present usage scenario should work and the
> > > > > > > > >> >>> problem is somewhere in my code.
> > > > > > > > >> >>> I'll double check it again and try to simplify everything in order to
> > > > > > > > >> >>> pinpoint the problem.
> > > > > > > > >> > I've managed to pinpoint that forcing the copying of the packets
> > > > > > > > >> > between the kernel and the user space
> > > > > > > > >> > (XDP_COPY) fixes the issue with the malformed LACPDUs and the not
> > > > > > > > >> > working bonding.
> > > > > > > > >>
> > > > > > > > >> (+Magnus)
> > > > > > > > >>
> > > > > > > > >> Right, okay, that seems to suggest a bug in the internal kernel copying
> > > > > > > > >> that happens on XDP_PASS in zero-copy mode. Which would be a driver bug;
> > > > > > > > >> any chance you could test with a different driver and see if the same
> > > > > > > > >> issue appears there?
> > > > > > > > >>
> > > > > > > > >> -Toke
> > > > > > > > > No, sorry.
> > > > > > > > > We have only servers with Intel 82599ES with ixgbe drivers.
> > > > > > > > > And one lab machine with Intel 82540EM with igb driver but we can't
> > > > > > > > > set up bonding there
> > > > > > > > > and the problem is not reproducible there.
> > > > > > > >
> > > > > > > > Right, okay. Another thing that may be of some use is to try to capture
> > > > > > > > the packets on the physical devices using tcpdump. That should (I think)
> > > > > > > > show you the LACDPU packets as they come in, before they hit the bonding
> > > > > > > > device, but after they are copied from the XDP frame. If it's a packet
> > > > > > > > corruption issue, that should be visible in the captured packet; you can
> > > > > > > > compare with an xdpdump capture to see if there are any differences...
> > > > > > >
> > > > > > > Pavel,
> > > > > > >
> > > > > > > Sounds like an issue with the driver in zero-copy mode as it works
> > > > > > > fine in copy mode. Maciej and I will take a look at it.
> > > > > > >
> > > > > > > > -Toke
> > > > > > > >
> > > > > >
> > > > > > First I want to apologize for not responding for such a long time.
> > > > > > I had different tasks the previous week and this week went back to this issue.
> > > > > > I had to modify the code of the af_xdp driver inside the DPDK so that it loads
> > > > > > the XDP program in a way which is compatible with the xdp-dispatcher.
> > > > > > Finally, I was able to run our application with the XDP sockets and the xdpdump
> > > > > > at the same time.
> > > > > >
> > > > > > Back to the issue.
> > > > > > I just want to say again that we are not binding the XDP sockets to
> > > > > > the bonding device.
> > > > > > We are binding the sockets to the queues of the physical interfaces
> > > > > > "below" the bonding device.
> > > > > > My further observation this time is that when the issue happens and
> > > > > > the remote device reports
> > > > > > the LACP error there is no incoming LACP traffic on the corresponding
> > > > > > local port,
> > > > > > as seen by the xdump.
> > > > > > The tcpdump at the same time sees only outgoing LACP packets and
> > > > > > nothing incoming.
> > > > > > For example:
> > > > > > Remote device
> > > > > >                           Local Server
> > > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> > > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> > > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> > > > > > And when the remote device reports "received an abnormal LACPDU"
> > > > > > for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> > > > > > is no incoming LACP traffic
> > > > >
> > > > > Hey Pavel,
> > > > >
> > > > > can you also look at /proc/interrupts at eth4 and what ethtool -S shows
> > > > > there?
> > > > I reproduced the problem but this time the interface with the weird
> > > > state was eth0.
> > > > It's different every time and sometimes even two of the interfaces are
> > > > in such a state.
> > > > Here are the requested info while being in this state:
> > > > ~# ethtool -S eth0 > /tmp/stats0.txt ; sleep 10 ; ethtool -S eth0 >
> > > > /tmp/stats1.txt ; diff /tmp/stats0.txt /tmp/stats1.txt
> > > > 6c6
> > > > <      rx_pkts_nic: 81426
> > > > ---
> > > > >      rx_pkts_nic: 81436
> > > > 8c8
> > > > <      rx_bytes_nic: 10286521
> > > > ---
> > > > >      rx_bytes_nic: 10287801
> > > > 17c17
> > > > <      multicast: 72216
> > > > ---
> > > > >      multicast: 72226
> > > > 48c48
> > > > <      rx_no_dma_resources: 1109
> > > > ---
> > > > >      rx_no_dma_resources: 1119
> > > >
> > > > ~# cat /proc/interrupts | grep eth0 > /tmp/interrupts0.txt ; sleep 10
> > > > ; cat /proc/interrupts | grep eth0 > /tmp/interrupts1.txt
> > > > interrupts0: 430 3098 64 108199 108199 108199 108199 108199 108199
> > > > 108199 108201 63 64 1865 108199  61
> > > > interrupts1: 435 3103 69 117967 117967  117967 117967 117967  117967
> > > > 117967 117969 68 69 1870  117967 66
> > > >
> > > > So, it seems that packets are coming on the interface but they don't
> > > > reach to the XDP layer and deeper.
> > > > rx_no_dma_resources - this counter seems to give clues about a possible issue?
> > > >
> > > > >
> > > > > > on eth4 but there is incoming LACP traffic on eth0 and eth2.
> > > > > > At the same time, according to the dmesg the kernel sees all of the
> > > > > > interfaces as
> > > > > > "link status definitely up, 10000 Mbps full duplex".
> > > > > > The issue goes aways if I stop the application even without removing
> > > > > > the XDP programs
> > > > > > from the interfaces - the running xdpdump starts showing the incoming
> > > > > > LACP traffic immediately.
> > > > > > The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
> > > > >
> > > > > and the setup is what when doing the link flap? XDP progs are loaded to
> > > > > each of the 3 interfaces of bond?
> > > > Yes, the same XDP program is loaded on application startup on each one
> > > > of the interfaces which are part of bond0 (eth0, eth2, eth4):
> > > > # xdp-loader status
> > > > CURRENT XDP PROGRAM STATUS:
> > > >
> > > > Interface        Prio  Program name      Mode     ID   Tag
> > > >   Chain actions
> > > > --------------------------------------------------------------------------------------
> > > > lo                     <No XDP program loaded!>
> > > > eth0                   xdp_dispatcher    native   1320 90f686eb86991928
> > > >  =>              50     x3sp_splitter_func          1329
> > > > 3b185187f1855c4c  XDP_PASS
> > > > eth1                   <No XDP program loaded!>
> > > > eth2                   xdp_dispatcher    native   1334 90f686eb86991928
> > > >  =>              50     x3sp_splitter_func          1337
> > > > 3b185187f1855c4c  XDP_PASS
> > > > eth3                   <No XDP program loaded!>
> > > > eth4                   xdp_dispatcher    native   1342 90f686eb86991928
> > > >  =>              50     x3sp_splitter_func          1345
> > > > 3b185187f1855c4c  XDP_PASS
> > > > eth5                   <No XDP program loaded!>
> > > > eth6                   <No XDP program loaded!>
> > > > eth7                   <No XDP program loaded!>
> > > > bond0                  <No XDP program loaded!>
> > > > Each of these interfaces is setup to have 16 queues i.e. the application,
> > > > through the DPDK machinery, opens 3x16 XSK sockets each bound to the
> > > > corresponding queue of the corresponding interface.
> > > > ~# ethtool -l eth0 # It's same for the other 2 devices
> > > > Channel parameters for eth0:
> > > > Pre-set maximums:
> > > > RX:             n/a
> > > > TX:             n/a
> > > > Other:          1
> > > > Combined:       48
> > > > Current hardware settings:
> > > > RX:             n/a
> > > > TX:             n/a
> > > > Other:          1
> > > > Combined:       16
> > > >
> > > > >
> > > > > > However, I'm not sure what happens with the bound XDP sockets in this case
> > > > > > because I haven't tested further.
> > > > >
> > > > > can you also try to bind xsk sockets before attaching XDP progs?
> > > > I looked into the DPDK code again.
> > > > The DPDK framework provides callback hooks like eth_rx_queue_setup
> > > > and each "driver" implements it as needed. Each Rx/Tx queue of the device is
> > > > set up separately. The af_xdp driver currently does this for each Rx
> > > > queue separately:
> > > > 1. configures the umem for the queue
> > > > 2. loads the XDP program on the corresponding interface, if not already loaded
> > > >    (i.e. this happens only once per interface when its first queue is set up).
> > > > 3. does xsk_socket__create which as far as I looked also internally binds the
> > > > socket to the given queue
> > > > 4. places the socket in the XSKS map of the XDP program via bpf_map_update_elem
> > > >
> > > > So, it seems to me that the change needed will be a bit more involved.
> > > > I'm not sure if it'll be possible to hardcode, just for the test, the
> > > > program loading and
> > > > the placing of all XSK sockets in the map to happen when the setup of the last
> > > > "queue" for the given interface is done. I need to think a bit more about this.
> > > Changed the code of the DPDK af_xdp "driver" to create and bind all of
> > > the XSK sockets
> > > to the queues of the corresponding interface and after that, after the
> > > initialization of the
> > > last XSK socket, I added the logic for the attachment of the XDP
> > > program to the interface
> > > and the population of the XSK map with the created sockets.
> > > The issue was still there but it was kind of harder to reproduce - it
> > > happened once for 5
> > > starts of the application.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > It seems to me that something racy happens when the interfaces go down
> > > > > > and back up
> > > > > > (visible in the dmesg) when the XDP sockets are bound to their queues.
> > > > > > I mean, I'm not sure why the interfaces go down and up but setting
> > > > > > only the XDP programs
> > > > > > on the interfaces doesn't cause this behavior. So, I assume it's
> > > > > > caused by the binding of the XDP sockets.
> > > > >
> > > > > hmm i'm lost here, above you said you got no incoming traffic on eth4 even
> > > > > without xsk sockets being bound?
> > > > Probably I've phrased something in a wrong way.
> > > > The issue is not observed if I load the XDP program on all interfaces
> > > > (eth0, eth2, eth4)
> > > > with the xdp-loader:
> > > > xdp-loader load --mode native <iface> <path-to-the-xdp-program>
> > > > It's not observed probably because there are no interface down/up actions.
> > > > I also modified the DPDK "driver" to not remove the XDP program on exit and thus
> > > > when the application stops only the XSK sockets are closed but the
> > > > program remains
> > > > loaded at the interfaces. When I stop this version of the application
> > > > while running the
> > > > xdpdump at the same time I see that the traffic immediately appears in
> > > > the xdpdump.
> > > > Also, note that I basically trimmed the XDP program to simply contain
> > > > the XSK map
> > > > (BPF_MAP_TYPE_XSKMAP) and the function just does "return XDP_PASS;".
> > > > I wanted to exclude every possibility for the XDP program to do something wrong.
> > > > So, from the above it seems to me that the issue is triggered somehow by the XSK
> > > > sockets usage.
> > > >
> > > > >
> > > > > > It could be that the issue is not related to the XDP sockets but just
> > > > > > to the down/up actions of the interfaces.
> > > > > > On the other hand, I'm not sure why the issue is easily reproducible
> > > > > > when the zero copy mode is enabled
> > > > > > (4 out of 5 tests reproduced the issue).
> > > > > > However, when the zero copy is disabled this issue doesn't happen
> > > > > > (I tried 10 times in a row and it doesn't happen).
> > > > >
> > > > > any chances that you could rule out the bond of the picture of this issue?
> > > > I'll need to talk to the network support guys because they manage the network
> > > > devices and they'll need to change the LACP/Trunk setup of the above
> > > > "remote device".
> > > > I can't promise that they'll agree though.
> > We changed the setup and I did the tests with a single port, no
> > bonding involved.
> > The port was configured with 16 queues (and 16 XSK sockets bound to them).
> > I tested with about 100 Mbps of traffic to not break lots of users.
> > During the tests I observed the traffic on the real time graph on the
> > remote device port
> > connected to the server machine where the application was running in
> > L3 forward mode:
> > - with zero copy enabled the traffic to the server was about 100 Mbps
> > but the traffic
> > coming out of the server was about 50 Mbps (i.e. half of it).
> > - with no zero copy the traffic in both directions was the same - the
> > two graphs matched perfectly
> > Nothing else was changed during the both tests, only the ZC option.
> > Can I check some stats or something else for this testing scenario
> > which could be
> > used to reveal more info about the issue?
> 
> Hard to say, that might be yet another issue. Ixgbe needs some care in ZC
> support, I even spotted some other issue where device got into endless
> reset loop when I was working on 3 XSK sockets and I issued link flap.
> 
> I'll be looking into those problems next week and I'll keep you informed.

Can you try patch included below on your side and see if this helps with
your dead interface issue? I was experiencing something similar and on my
side it was enough to play with several xdpsock apps on very same netdev.
I'll be looking now at performance issue that you reported.


From ee409ba38c7e60e25e079acdaf2c00a6694ab4e5 Mon Sep 17 00:00:00 2001
From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
Date: Wed, 14 Feb 2024 13:55:36 +0100
Subject: [PATCH iwl-net 1/2] ixgbe: {dis,en}able irqs in
 ixgbe_txrx_ring_{dis,en}able

Currently routines that are supposed to toggle state of ring pair do not
take care of associated interrupt with queue vector that these rings
belong to. This causes funky issues such as dead interface due to irq
misconfiguration, as per Pavel's report from Closes: tag.

Add a function responsible for disabling single IRQ in EIMC register and
call this as a very first thing when disabling ring pair during xsk_pool
setup. For enable let's reuse ixgbe_irq_enable_queues(). Besides this,
disable/enable NAPI as first/last thing when dealing with closing or
opening ring pair that xsk_pool is being configured on.

Reported-by: Pavel Vazharov <pavel@x3me.net>
Closes: https://lore.kernel.org/netdev/CAJEV1ijxNyPTwASJER1bcZzS9nMoZJqfR86nu_3jFFVXzZQ4NA@mail.gmail.com/
Fixes: 024aa5800f32 ("ixgbe: added Rx/Tx ring disable/enable functions")
Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
---
 drivers/net/ethernet/intel/ixgbe/ixgbe_main.c | 56 ++++++++++++++++---
 1 file changed, 49 insertions(+), 7 deletions(-)

diff --git a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
index bd541527c8c7..99876b765b08 100644
--- a/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
+++ b/drivers/net/ethernet/intel/ixgbe/ixgbe_main.c
@@ -2939,8 +2939,8 @@ static void ixgbe_check_lsc(struct ixgbe_adapter *adapter)
 static inline void ixgbe_irq_enable_queues(struct ixgbe_adapter *adapter,
 					   u64 qmask)
 {
-	u32 mask;
 	struct ixgbe_hw *hw = &adapter->hw;
+	u32 mask;
 
 	switch (hw->mac.type) {
 	case ixgbe_mac_82598EB:
@@ -10524,6 +10524,44 @@ static void ixgbe_reset_rxr_stats(struct ixgbe_ring *rx_ring)
 	memset(&rx_ring->rx_stats, 0, sizeof(rx_ring->rx_stats));
 }
 
+/**
+ * ixgbe_irq_disable_single - Disable single IRQ vector
+ * @adapter: adapter structure
+ * @ring: ring index
+ **/
+static void ixgbe_irq_disable_single(struct ixgbe_adapter *adapter, u32 ring)
+{
+	struct ixgbe_hw *hw = &adapter->hw;
+	u64 qmask = BIT_ULL(ring);
+	u32 mask;
+
+	switch (adapter->hw.mac.type) {
+	case ixgbe_mac_82598EB:
+		mask = qmask & IXGBE_EIMC_RTX_QUEUE;
+		IXGBE_WRITE_REG(&adapter->hw, IXGBE_EIMC, mask);
+		break;
+	case ixgbe_mac_82599EB:
+	case ixgbe_mac_X540:
+	case ixgbe_mac_X550:
+	case ixgbe_mac_X550EM_x:
+	case ixgbe_mac_x550em_a:
+		mask = (qmask & 0xFFFFFFFF);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(0), mask);
+		mask = (qmask >> 32);
+		if (mask)
+			IXGBE_WRITE_REG(hw, IXGBE_EIMS_EX(1), mask);
+		break;
+	default:
+		break;
+	}
+	IXGBE_WRITE_FLUSH(&adapter->hw);
+	if (adapter->flags & IXGBE_FLAG_MSIX_ENABLED)
+		synchronize_irq(adapter->msix_entries[ring].vector);
+	else
+		synchronize_irq(adapter->pdev->irq);
+}
+
 /**
  * ixgbe_txrx_ring_disable - Disable Rx/Tx/XDP Tx rings
  * @adapter: adapter structure
@@ -10540,6 +10578,11 @@ void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
 	tx_ring = adapter->tx_ring[ring];
 	xdp_ring = adapter->xdp_ring[ring];
 
+	ixgbe_irq_disable_single(adapter, ring);
+
+	/* Rx/Tx/XDP Tx share the same napi context. */
+	napi_disable(&rx_ring->q_vector->napi);
+
 	ixgbe_disable_txr(adapter, tx_ring);
 	if (xdp_ring)
 		ixgbe_disable_txr(adapter, xdp_ring);
@@ -10548,9 +10591,6 @@ void ixgbe_txrx_ring_disable(struct ixgbe_adapter *adapter, int ring)
 	if (xdp_ring)
 		synchronize_rcu();
 
-	/* Rx/Tx/XDP Tx share the same napi context. */
-	napi_disable(&rx_ring->q_vector->napi);
-
 	ixgbe_clean_tx_ring(tx_ring);
 	if (xdp_ring)
 		ixgbe_clean_tx_ring(xdp_ring);
@@ -10578,9 +10618,6 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
 	tx_ring = adapter->tx_ring[ring];
 	xdp_ring = adapter->xdp_ring[ring];
 
-	/* Rx/Tx/XDP Tx share the same napi context. */
-	napi_enable(&rx_ring->q_vector->napi);
-
 	ixgbe_configure_tx_ring(adapter, tx_ring);
 	if (xdp_ring)
 		ixgbe_configure_tx_ring(adapter, xdp_ring);
@@ -10589,6 +10626,11 @@ void ixgbe_txrx_ring_enable(struct ixgbe_adapter *adapter, int ring)
 	clear_bit(__IXGBE_TX_DISABLED, &tx_ring->state);
 	if (xdp_ring)
 		clear_bit(__IXGBE_TX_DISABLED, &xdp_ring->state);
+
+	/* Rx/Tx/XDP Tx share the same napi context. */
+	napi_enable(&rx_ring->q_vector->napi);
+	ixgbe_irq_enable_queues(adapter, BIT_ULL(ring));
+	IXGBE_WRITE_FLUSH(&adapter->hw);
 }
 
 /**
-- 
2.34.1


> 
> > 
> > > >
> > > > > on my side i'll try to play with multiple xsk sockets within same netdev
> > > > > served by ixgbe and see if i observe something broken. I recently fixed
> > > > > i40e Tx disable timeout issue, so maybe ixgbe has something off in down/up
> > > > > actions as you state as well.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-09  9:03                             ` Pavel Vazharov
  2024-02-09 18:37                               ` Maciej Fijalkowski
@ 2024-02-16 17:24                               ` Maciej Fijalkowski
  2024-02-19 13:45                                 ` Pavel Vazharov
  1 sibling, 1 reply; 22+ messages in thread
From: Maciej Fijalkowski @ 2024-02-16 17:24 UTC (permalink / raw)
  To: Pavel Vazharov
  Cc: Magnus Karlsson, Toke Høiland-Jørgensen, Jakub Kicinski,
	netdev

> > > > >
> > > > > Back to the issue.
> > > > > I just want to say again that we are not binding the XDP sockets to
> > > > > the bonding device.
> > > > > We are binding the sockets to the queues of the physical interfaces
> > > > > "below" the bonding device.
> > > > > My further observation this time is that when the issue happens and
> > > > > the remote device reports
> > > > > the LACP error there is no incoming LACP traffic on the corresponding
> > > > > local port,
> > > > > as seen by the xdump.
> > > > > The tcpdump at the same time sees only outgoing LACP packets and
> > > > > nothing incoming.
> > > > > For example:
> > > > > Remote device
> > > > >                           Local Server
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> > > > > And when the remote device reports "received an abnormal LACPDU"
> > > > > for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> > > > > is no incoming LACP traffic
> > > >
> > > > Hey Pavel,
> > > >
> > > > can you also look at /proc/interrupts at eth4 and what ethtool -S shows
> > > > there?
> > > I reproduced the problem but this time the interface with the weird
> > > state was eth0.
> > > It's different every time and sometimes even two of the interfaces are
> > > in such a state.
> > > Here are the requested info while being in this state:
> > > ~# ethtool -S eth0 > /tmp/stats0.txt ; sleep 10 ; ethtool -S eth0 >
> > > /tmp/stats1.txt ; diff /tmp/stats0.txt /tmp/stats1.txt
> > > 6c6
> > > <      rx_pkts_nic: 81426
> > > ---
> > > >      rx_pkts_nic: 81436
> > > 8c8
> > > <      rx_bytes_nic: 10286521
> > > ---
> > > >      rx_bytes_nic: 10287801
> > > 17c17
> > > <      multicast: 72216
> > > ---
> > > >      multicast: 72226
> > > 48c48
> > > <      rx_no_dma_resources: 1109
> > > ---
> > > >      rx_no_dma_resources: 1119
> > >
> > > ~# cat /proc/interrupts | grep eth0 > /tmp/interrupts0.txt ; sleep 10
> > > ; cat /proc/interrupts | grep eth0 > /tmp/interrupts1.txt
> > > interrupts0: 430 3098 64 108199 108199 108199 108199 108199 108199
> > > 108199 108201 63 64 1865 108199  61
> > > interrupts1: 435 3103 69 117967 117967  117967 117967 117967  117967
> > > 117967 117969 68 69 1870  117967 66
> > >
> > > So, it seems that packets are coming on the interface but they don't
> > > reach to the XDP layer and deeper.
> > > rx_no_dma_resources - this counter seems to give clues about a possible issue?
> > >
> > > >
> > > > > on eth4 but there is incoming LACP traffic on eth0 and eth2.
> > > > > At the same time, according to the dmesg the kernel sees all of the
> > > > > interfaces as
> > > > > "link status definitely up, 10000 Mbps full duplex".
> > > > > The issue goes aways if I stop the application even without removing
> > > > > the XDP programs
> > > > > from the interfaces - the running xdpdump starts showing the incoming
> > > > > LACP traffic immediately.
> > > > > The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
> > > >
> > > > and the setup is what when doing the link flap? XDP progs are loaded to
> > > > each of the 3 interfaces of bond?
> > > Yes, the same XDP program is loaded on application startup on each one
> > > of the interfaces which are part of bond0 (eth0, eth2, eth4):
> > > # xdp-loader status
> > > CURRENT XDP PROGRAM STATUS:
> > >
> > > Interface        Prio  Program name      Mode     ID   Tag
> > >   Chain actions
> > > --------------------------------------------------------------------------------------
> > > lo                     <No XDP program loaded!>
> > > eth0                   xdp_dispatcher    native   1320 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1329
> > > 3b185187f1855c4c  XDP_PASS
> > > eth1                   <No XDP program loaded!>
> > > eth2                   xdp_dispatcher    native   1334 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1337
> > > 3b185187f1855c4c  XDP_PASS
> > > eth3                   <No XDP program loaded!>
> > > eth4                   xdp_dispatcher    native   1342 90f686eb86991928
> > >  =>              50     x3sp_splitter_func          1345
> > > 3b185187f1855c4c  XDP_PASS
> > > eth5                   <No XDP program loaded!>
> > > eth6                   <No XDP program loaded!>
> > > eth7                   <No XDP program loaded!>
> > > bond0                  <No XDP program loaded!>
> > > Each of these interfaces is setup to have 16 queues i.e. the application,
> > > through the DPDK machinery, opens 3x16 XSK sockets each bound to the
> > > corresponding queue of the corresponding interface.
> > > ~# ethtool -l eth0 # It's same for the other 2 devices
> > > Channel parameters for eth0:
> > > Pre-set maximums:
> > > RX:             n/a
> > > TX:             n/a
> > > Other:          1
> > > Combined:       48
> > > Current hardware settings:
> > > RX:             n/a
> > > TX:             n/a
> > > Other:          1
> > > Combined:       16
> > >
> > > >
> > > > > However, I'm not sure what happens with the bound XDP sockets in this case
> > > > > because I haven't tested further.
> > > >
> > > > can you also try to bind xsk sockets before attaching XDP progs?
> > > I looked into the DPDK code again.
> > > The DPDK framework provides callback hooks like eth_rx_queue_setup
> > > and each "driver" implements it as needed. Each Rx/Tx queue of the device is
> > > set up separately. The af_xdp driver currently does this for each Rx
> > > queue separately:
> > > 1. configures the umem for the queue
> > > 2. loads the XDP program on the corresponding interface, if not already loaded
> > >    (i.e. this happens only once per interface when its first queue is set up).
> > > 3. does xsk_socket__create which as far as I looked also internally binds the
> > > socket to the given queue
> > > 4. places the socket in the XSKS map of the XDP program via bpf_map_update_elem
> > >
> > > So, it seems to me that the change needed will be a bit more involved.
> > > I'm not sure if it'll be possible to hardcode, just for the test, the
> > > program loading and
> > > the placing of all XSK sockets in the map to happen when the setup of the last
> > > "queue" for the given interface is done. I need to think a bit more about this.
> > Changed the code of the DPDK af_xdp "driver" to create and bind all of
> > the XSK sockets
> > to the queues of the corresponding interface and after that, after the
> > initialization of the
> > last XSK socket, I added the logic for the attachment of the XDP
> > program to the interface
> > and the population of the XSK map with the created sockets.
> > The issue was still there but it was kind of harder to reproduce - it
> > happened once for 5
> > starts of the application.
> >
> > >
> > > >
> > > > >
> > > > > It seems to me that something racy happens when the interfaces go down
> > > > > and back up
> > > > > (visible in the dmesg) when the XDP sockets are bound to their queues.
> > > > > I mean, I'm not sure why the interfaces go down and up but setting
> > > > > only the XDP programs
> > > > > on the interfaces doesn't cause this behavior. So, I assume it's
> > > > > caused by the binding of the XDP sockets.
> > > >
> > > > hmm i'm lost here, above you said you got no incoming traffic on eth4 even
> > > > without xsk sockets being bound?
> > > Probably I've phrased something in a wrong way.
> > > The issue is not observed if I load the XDP program on all interfaces
> > > (eth0, eth2, eth4)
> > > with the xdp-loader:
> > > xdp-loader load --mode native <iface> <path-to-the-xdp-program>
> > > It's not observed probably because there are no interface down/up actions.
> > > I also modified the DPDK "driver" to not remove the XDP program on exit and thus
> > > when the application stops only the XSK sockets are closed but the
> > > program remains
> > > loaded at the interfaces. When I stop this version of the application
> > > while running the
> > > xdpdump at the same time I see that the traffic immediately appears in
> > > the xdpdump.
> > > Also, note that I basically trimmed the XDP program to simply contain
> > > the XSK map
> > > (BPF_MAP_TYPE_XSKMAP) and the function just does "return XDP_PASS;".
> > > I wanted to exclude every possibility for the XDP program to do something wrong.
> > > So, from the above it seems to me that the issue is triggered somehow by the XSK
> > > sockets usage.
> > >
> > > >
> > > > > It could be that the issue is not related to the XDP sockets but just
> > > > > to the down/up actions of the interfaces.
> > > > > On the other hand, I'm not sure why the issue is easily reproducible
> > > > > when the zero copy mode is enabled
> > > > > (4 out of 5 tests reproduced the issue).
> > > > > However, when the zero copy is disabled this issue doesn't happen
> > > > > (I tried 10 times in a row and it doesn't happen).
> > > >
> > > > any chances that you could rule out the bond of the picture of this issue?
> > > I'll need to talk to the network support guys because they manage the network
> > > devices and they'll need to change the LACP/Trunk setup of the above
> > > "remote device".
> > > I can't promise that they'll agree though.
> We changed the setup and I did the tests with a single port, no
> bonding involved.
> The port was configured with 16 queues (and 16 XSK sockets bound to them).
> I tested with about 100 Mbps of traffic to not break lots of users.
> During the tests I observed the traffic on the real time graph on the
> remote device port
> connected to the server machine where the application was running in
> L3 forward mode:
> - with zero copy enabled the traffic to the server was about 100 Mbps
> but the traffic
> coming out of the server was about 50 Mbps (i.e. half of it).
> - with no zero copy the traffic in both directions was the same - the
> two graphs matched perfectly
> Nothing else was changed during the both tests, only the ZC option.
> Can I check some stats or something else for this testing scenario
> which could be
> used to reveal more info about the issue?

FWIW I don't see this on my side. My guess would be that some of the
queues stalled on ZC due to buggy enable/disable ring pair routines that I
am (fingers crossed :)) fixing, or trying to fix in previous email. You
could try something as simple as:

$ watch -n 1 "ethtool -S eth_ixgbe | grep rx | grep bytes"

and verify each of the queues that are supposed to receive traffic. Do the
same thing with tx, similarly. 

> 
> > >

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-16 17:24                               ` Maciej Fijalkowski
@ 2024-02-19 13:45                                 ` Pavel Vazharov
  2024-02-19 14:56                                   ` Maciej Fijalkowski
  0 siblings, 1 reply; 22+ messages in thread
From: Pavel Vazharov @ 2024-02-19 13:45 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Magnus Karlsson, Toke Høiland-Jørgensen, Jakub Kicinski,
	netdev

On Fri, Feb 16, 2024 at 7:24 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> > > > > >
> > > > > > Back to the issue.
> > > > > > I just want to say again that we are not binding the XDP sockets to
> > > > > > the bonding device.
> > > > > > We are binding the sockets to the queues of the physical interfaces
> > > > > > "below" the bonding device.
> > > > > > My further observation this time is that when the issue happens and
> > > > > > the remote device reports
> > > > > > the LACP error there is no incoming LACP traffic on the corresponding
> > > > > > local port,
> > > > > > as seen by the xdump.
> > > > > > The tcpdump at the same time sees only outgoing LACP packets and
> > > > > > nothing incoming.
> > > > > > For example:
> > > > > > Remote device
> > > > > >                           Local Server
> > > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/12 <---> eth0
> > > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/13 <---> eth2
> > > > > > TrunkName=Eth-Trunk20, PortName=XGigabitEthernet0/0/14 <---> eth4
> > > > > > And when the remote device reports "received an abnormal LACPDU"
> > > > > > for PortName=XGigabitEthernet0/0/14 I can see via xdpdump that there
> > > > > > is no incoming LACP traffic
> > > > >
> > > > > Hey Pavel,
> > > > >
> > > > > can you also look at /proc/interrupts at eth4 and what ethtool -S shows
> > > > > there?
> > > > I reproduced the problem but this time the interface with the weird
> > > > state was eth0.
> > > > It's different every time and sometimes even two of the interfaces are
> > > > in such a state.
> > > > Here are the requested info while being in this state:
> > > > ~# ethtool -S eth0 > /tmp/stats0.txt ; sleep 10 ; ethtool -S eth0 >
> > > > /tmp/stats1.txt ; diff /tmp/stats0.txt /tmp/stats1.txt
> > > > 6c6
> > > > <      rx_pkts_nic: 81426
> > > > ---
> > > > >      rx_pkts_nic: 81436
> > > > 8c8
> > > > <      rx_bytes_nic: 10286521
> > > > ---
> > > > >      rx_bytes_nic: 10287801
> > > > 17c17
> > > > <      multicast: 72216
> > > > ---
> > > > >      multicast: 72226
> > > > 48c48
> > > > <      rx_no_dma_resources: 1109
> > > > ---
> > > > >      rx_no_dma_resources: 1119
> > > >
> > > > ~# cat /proc/interrupts | grep eth0 > /tmp/interrupts0.txt ; sleep 10
> > > > ; cat /proc/interrupts | grep eth0 > /tmp/interrupts1.txt
> > > > interrupts0: 430 3098 64 108199 108199 108199 108199 108199 108199
> > > > 108199 108201 63 64 1865 108199  61
> > > > interrupts1: 435 3103 69 117967 117967  117967 117967 117967  117967
> > > > 117967 117969 68 69 1870  117967 66
> > > >
> > > > So, it seems that packets are coming on the interface but they don't
> > > > reach to the XDP layer and deeper.
> > > > rx_no_dma_resources - this counter seems to give clues about a possible issue?
> > > >
> > > > >
> > > > > > on eth4 but there is incoming LACP traffic on eth0 and eth2.
> > > > > > At the same time, according to the dmesg the kernel sees all of the
> > > > > > interfaces as
> > > > > > "link status definitely up, 10000 Mbps full duplex".
> > > > > > The issue goes aways if I stop the application even without removing
> > > > > > the XDP programs
> > > > > > from the interfaces - the running xdpdump starts showing the incoming
> > > > > > LACP traffic immediately.
> > > > > > The issue also goes away if I do "ip link set down eth4 && ip link set up eth4".
> > > > >
> > > > > and the setup is what when doing the link flap? XDP progs are loaded to
> > > > > each of the 3 interfaces of bond?
> > > > Yes, the same XDP program is loaded on application startup on each one
> > > > of the interfaces which are part of bond0 (eth0, eth2, eth4):
> > > > # xdp-loader status
> > > > CURRENT XDP PROGRAM STATUS:
> > > >
> > > > Interface        Prio  Program name      Mode     ID   Tag
> > > >   Chain actions
> > > > --------------------------------------------------------------------------------------
> > > > lo                     <No XDP program loaded!>
> > > > eth0                   xdp_dispatcher    native   1320 90f686eb86991928
> > > >  =>              50     x3sp_splitter_func          1329
> > > > 3b185187f1855c4c  XDP_PASS
> > > > eth1                   <No XDP program loaded!>
> > > > eth2                   xdp_dispatcher    native   1334 90f686eb86991928
> > > >  =>              50     x3sp_splitter_func          1337
> > > > 3b185187f1855c4c  XDP_PASS
> > > > eth3                   <No XDP program loaded!>
> > > > eth4                   xdp_dispatcher    native   1342 90f686eb86991928
> > > >  =>              50     x3sp_splitter_func          1345
> > > > 3b185187f1855c4c  XDP_PASS
> > > > eth5                   <No XDP program loaded!>
> > > > eth6                   <No XDP program loaded!>
> > > > eth7                   <No XDP program loaded!>
> > > > bond0                  <No XDP program loaded!>
> > > > Each of these interfaces is setup to have 16 queues i.e. the application,
> > > > through the DPDK machinery, opens 3x16 XSK sockets each bound to the
> > > > corresponding queue of the corresponding interface.
> > > > ~# ethtool -l eth0 # It's same for the other 2 devices
> > > > Channel parameters for eth0:
> > > > Pre-set maximums:
> > > > RX:             n/a
> > > > TX:             n/a
> > > > Other:          1
> > > > Combined:       48
> > > > Current hardware settings:
> > > > RX:             n/a
> > > > TX:             n/a
> > > > Other:          1
> > > > Combined:       16
> > > >
> > > > >
> > > > > > However, I'm not sure what happens with the bound XDP sockets in this case
> > > > > > because I haven't tested further.
> > > > >
> > > > > can you also try to bind xsk sockets before attaching XDP progs?
> > > > I looked into the DPDK code again.
> > > > The DPDK framework provides callback hooks like eth_rx_queue_setup
> > > > and each "driver" implements it as needed. Each Rx/Tx queue of the device is
> > > > set up separately. The af_xdp driver currently does this for each Rx
> > > > queue separately:
> > > > 1. configures the umem for the queue
> > > > 2. loads the XDP program on the corresponding interface, if not already loaded
> > > >    (i.e. this happens only once per interface when its first queue is set up).
> > > > 3. does xsk_socket__create which as far as I looked also internally binds the
> > > > socket to the given queue
> > > > 4. places the socket in the XSKS map of the XDP program via bpf_map_update_elem
> > > >
> > > > So, it seems to me that the change needed will be a bit more involved.
> > > > I'm not sure if it'll be possible to hardcode, just for the test, the
> > > > program loading and
> > > > the placing of all XSK sockets in the map to happen when the setup of the last
> > > > "queue" for the given interface is done. I need to think a bit more about this.
> > > Changed the code of the DPDK af_xdp "driver" to create and bind all of
> > > the XSK sockets
> > > to the queues of the corresponding interface and after that, after the
> > > initialization of the
> > > last XSK socket, I added the logic for the attachment of the XDP
> > > program to the interface
> > > and the population of the XSK map with the created sockets.
> > > The issue was still there but it was kind of harder to reproduce - it
> > > happened once for 5
> > > starts of the application.
> > >
> > > >
> > > > >
> > > > > >
> > > > > > It seems to me that something racy happens when the interfaces go down
> > > > > > and back up
> > > > > > (visible in the dmesg) when the XDP sockets are bound to their queues.
> > > > > > I mean, I'm not sure why the interfaces go down and up but setting
> > > > > > only the XDP programs
> > > > > > on the interfaces doesn't cause this behavior. So, I assume it's
> > > > > > caused by the binding of the XDP sockets.
> > > > >
> > > > > hmm i'm lost here, above you said you got no incoming traffic on eth4 even
> > > > > without xsk sockets being bound?
> > > > Probably I've phrased something in a wrong way.
> > > > The issue is not observed if I load the XDP program on all interfaces
> > > > (eth0, eth2, eth4)
> > > > with the xdp-loader:
> > > > xdp-loader load --mode native <iface> <path-to-the-xdp-program>
> > > > It's not observed probably because there are no interface down/up actions.
> > > > I also modified the DPDK "driver" to not remove the XDP program on exit and thus
> > > > when the application stops only the XSK sockets are closed but the
> > > > program remains
> > > > loaded at the interfaces. When I stop this version of the application
> > > > while running the
> > > > xdpdump at the same time I see that the traffic immediately appears in
> > > > the xdpdump.
> > > > Also, note that I basically trimmed the XDP program to simply contain
> > > > the XSK map
> > > > (BPF_MAP_TYPE_XSKMAP) and the function just does "return XDP_PASS;".
> > > > I wanted to exclude every possibility for the XDP program to do something wrong.
> > > > So, from the above it seems to me that the issue is triggered somehow by the XSK
> > > > sockets usage.
> > > >
> > > > >
> > > > > > It could be that the issue is not related to the XDP sockets but just
> > > > > > to the down/up actions of the interfaces.
> > > > > > On the other hand, I'm not sure why the issue is easily reproducible
> > > > > > when the zero copy mode is enabled
> > > > > > (4 out of 5 tests reproduced the issue).
> > > > > > However, when the zero copy is disabled this issue doesn't happen
> > > > > > (I tried 10 times in a row and it doesn't happen).
> > > > >
> > > > > any chances that you could rule out the bond of the picture of this issue?
> > > > I'll need to talk to the network support guys because they manage the network
> > > > devices and they'll need to change the LACP/Trunk setup of the above
> > > > "remote device".
> > > > I can't promise that they'll agree though.
> > We changed the setup and I did the tests with a single port, no
> > bonding involved.
> > The port was configured with 16 queues (and 16 XSK sockets bound to them).
> > I tested with about 100 Mbps of traffic to not break lots of users.
> > During the tests I observed the traffic on the real time graph on the
> > remote device port
> > connected to the server machine where the application was running in
> > L3 forward mode:
> > - with zero copy enabled the traffic to the server was about 100 Mbps
> > but the traffic
> > coming out of the server was about 50 Mbps (i.e. half of it).
> > - with no zero copy the traffic in both directions was the same - the
> > two graphs matched perfectly
> > Nothing else was changed during the both tests, only the ZC option.
> > Can I check some stats or something else for this testing scenario
> > which could be
> > used to reveal more info about the issue?
>
> FWIW I don't see this on my side. My guess would be that some of the
> queues stalled on ZC due to buggy enable/disable ring pair routines that I
> am (fingers crossed :)) fixing, or trying to fix in previous email. You
> could try something as simple as:
>
> $ watch -n 1 "ethtool -S eth_ixgbe | grep rx | grep bytes"
>
> and verify each of the queues that are supposed to receive traffic. Do the
> same thing with tx, similarly.
>
> >
> > > >
Thank you for the help.

I tried the given patch on kernel 6.7.5.
The bonding issue, that I described in the above e-mails, seems fixed.
I can no longer reproduce the issue with the malformed LACP messages.

However, I tested again with traffic and the issue remains:
- when traffic is redirected to the machine and simply forwarded at L3
by our application only about 1/2 - 2/3 of it exits the machine
- disabling only the Zero Copy (and nothing else in the application)
fixes the issue
- another thing that I noticed is in the device stats - the Rx bytes
looks OK and the counters of every queue increase over the time (with
and without ZC)
ethtool -S eth4 | grep rx | grep bytes
     rx_bytes: 20061532582
     rx_bytes_nic: 27823942900
     rx_queue_0_bytes: 690230537
     rx_queue_1_bytes: 1051217950
     rx_queue_2_bytes: 1494877257
     rx_queue_3_bytes: 1989628734
     rx_queue_4_bytes: 894557655
     rx_queue_5_bytes: 1557310636
     rx_queue_6_bytes: 1459428265
     rx_queue_7_bytes: 1514067682
     rx_queue_8_bytes: 432567753
     rx_queue_9_bytes: 1251708768
     rx_queue_10_bytes: 1091840145
     rx_queue_11_bytes: 904127964
     rx_queue_12_bytes: 1241335871
     rx_queue_13_bytes: 2039939517
     rx_queue_14_bytes: 777819814
     rx_queue_15_bytes: 1670874034

- without ZC the Tx bytes also look OK
ethtool -S eth4 | grep tx | grep bytes
     tx_bytes: 24411467399
     tx_bytes_nic: 29600497994
     tx_queue_0_bytes: 1525672312
     tx_queue_1_bytes: 1527162996
     tx_queue_2_bytes: 1529701681
     tx_queue_3_bytes: 1526220338
     tx_queue_4_bytes: 1524403501
     tx_queue_5_bytes: 1523242084
     tx_queue_6_bytes: 1523543868
     tx_queue_7_bytes: 1525376190
     tx_queue_8_bytes: 1526844278
     tx_queue_9_bytes: 1523938842
     tx_queue_10_bytes: 1522663364
     tx_queue_11_bytes: 1527292259
     tx_queue_12_bytes: 1525206246
     tx_queue_13_bytes: 1526670255
     tx_queue_14_bytes: 1523266153
     tx_queue_15_bytes: 1530263032

- however with ZC enabled the Tx bytes stats don't look OK (some
queues are like doing nothing) - again it's exactly the same
application
The sum bytes increase much more than the sum of the per queue bytes.
ethtool -S eth4 | grep tx | grep bytes ; sleep 1 ; ethtool -S eth4 |
grep tx | grep bytes
     tx_bytes: 256022649
     tx_bytes_nic: 34961074621
     tx_queue_0_bytes: 372
     tx_queue_1_bytes: 0
     tx_queue_2_bytes: 0
     tx_queue_3_bytes: 0
     tx_queue_4_bytes: 9920
     tx_queue_5_bytes: 0
     tx_queue_6_bytes: 0
     tx_queue_7_bytes: 0
     tx_queue_8_bytes: 0
     tx_queue_9_bytes: 1364
     tx_queue_10_bytes: 0
     tx_queue_11_bytes: 0
     tx_queue_12_bytes: 1116
     tx_queue_13_bytes: 0
     tx_queue_14_bytes: 0
     tx_queue_15_bytes: 0

     tx_bytes: 257830280
     tx_bytes_nic: 34962912861
     tx_queue_0_bytes: 372
     tx_queue_1_bytes: 0
     tx_queue_2_bytes: 0
     tx_queue_3_bytes: 0
     tx_queue_4_bytes: 10044
     tx_queue_5_bytes: 0
     tx_queue_6_bytes: 0
     tx_queue_7_bytes: 0
     tx_queue_8_bytes: 0
     tx_queue_9_bytes: 1364
     tx_queue_10_bytes: 0
     tx_queue_11_bytes: 0
     tx_queue_12_bytes: 1116
     tx_queue_13_bytes: 0
     tx_queue_14_bytes: 0
     tx_queue_15_bytes: 0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-19 13:45                                 ` Pavel Vazharov
@ 2024-02-19 14:56                                   ` Maciej Fijalkowski
  2024-03-08 10:05                                     ` Pavel Vazharov
  0 siblings, 1 reply; 22+ messages in thread
From: Maciej Fijalkowski @ 2024-02-19 14:56 UTC (permalink / raw)
  To: Pavel Vazharov
  Cc: Magnus Karlsson, Toke Høiland-Jørgensen, Jakub Kicinski,
	netdev

On Mon, Feb 19, 2024 at 03:45:24PM +0200, Pavel Vazharov wrote:

[...]

> > > We changed the setup and I did the tests with a single port, no
> > > bonding involved.
> > > The port was configured with 16 queues (and 16 XSK sockets bound to them).
> > > I tested with about 100 Mbps of traffic to not break lots of users.
> > > During the tests I observed the traffic on the real time graph on the
> > > remote device port
> > > connected to the server machine where the application was running in
> > > L3 forward mode:
> > > - with zero copy enabled the traffic to the server was about 100 Mbps
> > > but the traffic
> > > coming out of the server was about 50 Mbps (i.e. half of it).
> > > - with no zero copy the traffic in both directions was the same - the
> > > two graphs matched perfectly
> > > Nothing else was changed during the both tests, only the ZC option.
> > > Can I check some stats or something else for this testing scenario
> > > which could be
> > > used to reveal more info about the issue?
> >
> > FWIW I don't see this on my side. My guess would be that some of the
> > queues stalled on ZC due to buggy enable/disable ring pair routines that I
> > am (fingers crossed :)) fixing, or trying to fix in previous email. You
> > could try something as simple as:
> >
> > $ watch -n 1 "ethtool -S eth_ixgbe | grep rx | grep bytes"
> >
> > and verify each of the queues that are supposed to receive traffic. Do the
> > same thing with tx, similarly.
> >
> > >
> > > > >
> Thank you for the help.
> 
> I tried the given patch on kernel 6.7.5.
> The bonding issue, that I described in the above e-mails, seems fixed.
> I can no longer reproduce the issue with the malformed LACP messages.

Awesome! I'll send a fix to lists then.

> 
> However, I tested again with traffic and the issue remains:
> - when traffic is redirected to the machine and simply forwarded at L3
> by our application only about 1/2 - 2/3 of it exits the machine
> - disabling only the Zero Copy (and nothing else in the application)
> fixes the issue
> - another thing that I noticed is in the device stats - the Rx bytes
> looks OK and the counters of every queue increase over the time (with
> and without ZC)
> ethtool -S eth4 | grep rx | grep bytes
>      rx_bytes: 20061532582
>      rx_bytes_nic: 27823942900
>      rx_queue_0_bytes: 690230537
>      rx_queue_1_bytes: 1051217950
>      rx_queue_2_bytes: 1494877257
>      rx_queue_3_bytes: 1989628734
>      rx_queue_4_bytes: 894557655
>      rx_queue_5_bytes: 1557310636
>      rx_queue_6_bytes: 1459428265
>      rx_queue_7_bytes: 1514067682
>      rx_queue_8_bytes: 432567753
>      rx_queue_9_bytes: 1251708768
>      rx_queue_10_bytes: 1091840145
>      rx_queue_11_bytes: 904127964
>      rx_queue_12_bytes: 1241335871
>      rx_queue_13_bytes: 2039939517
>      rx_queue_14_bytes: 777819814
>      rx_queue_15_bytes: 1670874034
> 
> - without ZC the Tx bytes also look OK
> ethtool -S eth4 | grep tx | grep bytes
>      tx_bytes: 24411467399
>      tx_bytes_nic: 29600497994
>      tx_queue_0_bytes: 1525672312
>      tx_queue_1_bytes: 1527162996
>      tx_queue_2_bytes: 1529701681
>      tx_queue_3_bytes: 1526220338
>      tx_queue_4_bytes: 1524403501
>      tx_queue_5_bytes: 1523242084
>      tx_queue_6_bytes: 1523543868
>      tx_queue_7_bytes: 1525376190
>      tx_queue_8_bytes: 1526844278
>      tx_queue_9_bytes: 1523938842
>      tx_queue_10_bytes: 1522663364
>      tx_queue_11_bytes: 1527292259
>      tx_queue_12_bytes: 1525206246
>      tx_queue_13_bytes: 1526670255
>      tx_queue_14_bytes: 1523266153
>      tx_queue_15_bytes: 1530263032
> 
> - however with ZC enabled the Tx bytes stats don't look OK (some
> queues are like doing nothing) - again it's exactly the same
> application
> The sum bytes increase much more than the sum of the per queue bytes.
> ethtool -S eth4 | grep tx | grep bytes ; sleep 1 ; ethtool -S eth4 |
> grep tx | grep bytes
>      tx_bytes: 256022649
>      tx_bytes_nic: 34961074621
>      tx_queue_0_bytes: 372
>      tx_queue_1_bytes: 0
>      tx_queue_2_bytes: 0
>      tx_queue_3_bytes: 0
>      tx_queue_4_bytes: 9920
>      tx_queue_5_bytes: 0
>      tx_queue_6_bytes: 0
>      tx_queue_7_bytes: 0
>      tx_queue_8_bytes: 0
>      tx_queue_9_bytes: 1364
>      tx_queue_10_bytes: 0
>      tx_queue_11_bytes: 0
>      tx_queue_12_bytes: 1116
>      tx_queue_13_bytes: 0
>      tx_queue_14_bytes: 0
>      tx_queue_15_bytes: 0

Yeah here we are looking at Tx rings, not XDP rings that are used for ZC.
XDP rings were acting like rings hidden from user, issue has been brought
several times but currently I am not sure if we have some unified approach
towards that. FWIW ixgbe currently doesn't expose them, sorry for
misleading you.

At this point nothing obvious comes to my mind but I can optimize Tx ZC
path and then let's see where it will take us.

> 
>      tx_bytes: 257830280
>      tx_bytes_nic: 34962912861
>      tx_queue_0_bytes: 372
>      tx_queue_1_bytes: 0
>      tx_queue_2_bytes: 0
>      tx_queue_3_bytes: 0
>      tx_queue_4_bytes: 10044
>      tx_queue_5_bytes: 0
>      tx_queue_6_bytes: 0
>      tx_queue_7_bytes: 0
>      tx_queue_8_bytes: 0
>      tx_queue_9_bytes: 1364
>      tx_queue_10_bytes: 0
>      tx_queue_11_bytes: 0
>      tx_queue_12_bytes: 1116
>      tx_queue_13_bytes: 0
>      tx_queue_14_bytes: 0
>      tx_queue_15_bytes: 0

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device
  2024-02-19 14:56                                   ` Maciej Fijalkowski
@ 2024-03-08 10:05                                     ` Pavel Vazharov
  0 siblings, 0 replies; 22+ messages in thread
From: Pavel Vazharov @ 2024-03-08 10:05 UTC (permalink / raw)
  To: Maciej Fijalkowski
  Cc: Magnus Karlsson, Toke Høiland-Jørgensen, Jakub Kicinski,
	netdev

On Mon, Feb 19, 2024 at 4:56 PM Maciej Fijalkowski
<maciej.fijalkowski@intel.com> wrote:
>
> On Mon, Feb 19, 2024 at 03:45:24PM +0200, Pavel Vazharov wrote:
>
> [...]
>
> > > > We changed the setup and I did the tests with a single port, no
> > > > bonding involved.
> > > > The port was configured with 16 queues (and 16 XSK sockets bound to them).
> > > > I tested with about 100 Mbps of traffic to not break lots of users.
> > > > During the tests I observed the traffic on the real time graph on the
> > > > remote device port
> > > > connected to the server machine where the application was running in
> > > > L3 forward mode:
> > > > - with zero copy enabled the traffic to the server was about 100 Mbps
> > > > but the traffic
> > > > coming out of the server was about 50 Mbps (i.e. half of it).
> > > > - with no zero copy the traffic in both directions was the same - the
> > > > two graphs matched perfectly
> > > > Nothing else was changed during the both tests, only the ZC option.
> > > > Can I check some stats or something else for this testing scenario
> > > > which could be
> > > > used to reveal more info about the issue?
> > >
> > > FWIW I don't see this on my side. My guess would be that some of the
> > > queues stalled on ZC due to buggy enable/disable ring pair routines that I
> > > am (fingers crossed :)) fixing, or trying to fix in previous email. You
> > > could try something as simple as:
> > >
> > > $ watch -n 1 "ethtool -S eth_ixgbe | grep rx | grep bytes"
> > >
> > > and verify each of the queues that are supposed to receive traffic. Do the
> > > same thing with tx, similarly.
> > >
> > > >
> > > > > >
> > Thank you for the help.
> >
> > I tried the given patch on kernel 6.7.5.
> > The bonding issue, that I described in the above e-mails, seems fixed.
> > I can no longer reproduce the issue with the malformed LACP messages.
>
> Awesome! I'll send a fix to lists then.
>
> >
> > However, I tested again with traffic and the issue remains:
> > - when traffic is redirected to the machine and simply forwarded at L3
> > by our application only about 1/2 - 2/3 of it exits the machine
> > - disabling only the Zero Copy (and nothing else in the application)
> > fixes the issue
> > - another thing that I noticed is in the device stats - the Rx bytes
> > looks OK and the counters of every queue increase over the time (with
> > and without ZC)
> > ethtool -S eth4 | grep rx | grep bytes
> >      rx_bytes: 20061532582
> >      rx_bytes_nic: 27823942900
> >      rx_queue_0_bytes: 690230537
> >      rx_queue_1_bytes: 1051217950
> >      rx_queue_2_bytes: 1494877257
> >      rx_queue_3_bytes: 1989628734
> >      rx_queue_4_bytes: 894557655
> >      rx_queue_5_bytes: 1557310636
> >      rx_queue_6_bytes: 1459428265
> >      rx_queue_7_bytes: 1514067682
> >      rx_queue_8_bytes: 432567753
> >      rx_queue_9_bytes: 1251708768
> >      rx_queue_10_bytes: 1091840145
> >      rx_queue_11_bytes: 904127964
> >      rx_queue_12_bytes: 1241335871
> >      rx_queue_13_bytes: 2039939517
> >      rx_queue_14_bytes: 777819814
> >      rx_queue_15_bytes: 1670874034
> >
> > - without ZC the Tx bytes also look OK
> > ethtool -S eth4 | grep tx | grep bytes
> >      tx_bytes: 24411467399
> >      tx_bytes_nic: 29600497994
> >      tx_queue_0_bytes: 1525672312
> >      tx_queue_1_bytes: 1527162996
> >      tx_queue_2_bytes: 1529701681
> >      tx_queue_3_bytes: 1526220338
> >      tx_queue_4_bytes: 1524403501
> >      tx_queue_5_bytes: 1523242084
> >      tx_queue_6_bytes: 1523543868
> >      tx_queue_7_bytes: 1525376190
> >      tx_queue_8_bytes: 1526844278
> >      tx_queue_9_bytes: 1523938842
> >      tx_queue_10_bytes: 1522663364
> >      tx_queue_11_bytes: 1527292259
> >      tx_queue_12_bytes: 1525206246
> >      tx_queue_13_bytes: 1526670255
> >      tx_queue_14_bytes: 1523266153
> >      tx_queue_15_bytes: 1530263032
> >
> > - however with ZC enabled the Tx bytes stats don't look OK (some
> > queues are like doing nothing) - again it's exactly the same
> > application
> > The sum bytes increase much more than the sum of the per queue bytes.
> > ethtool -S eth4 | grep tx | grep bytes ; sleep 1 ; ethtool -S eth4 |
> > grep tx | grep bytes
> >      tx_bytes: 256022649
> >      tx_bytes_nic: 34961074621
> >      tx_queue_0_bytes: 372
> >      tx_queue_1_bytes: 0
> >      tx_queue_2_bytes: 0
> >      tx_queue_3_bytes: 0
> >      tx_queue_4_bytes: 9920
> >      tx_queue_5_bytes: 0
> >      tx_queue_6_bytes: 0
> >      tx_queue_7_bytes: 0
> >      tx_queue_8_bytes: 0
> >      tx_queue_9_bytes: 1364
> >      tx_queue_10_bytes: 0
> >      tx_queue_11_bytes: 0
> >      tx_queue_12_bytes: 1116
> >      tx_queue_13_bytes: 0
> >      tx_queue_14_bytes: 0
> >      tx_queue_15_bytes: 0
>
> Yeah here we are looking at Tx rings, not XDP rings that are used for ZC.
> XDP rings were acting like rings hidden from user, issue has been brought
> several times but currently I am not sure if we have some unified approach
> towards that. FWIW ixgbe currently doesn't expose them, sorry for
> misleading you.
>
> At this point nothing obvious comes to my mind but I can optimize Tx ZC
> path and then let's see where it will take us.
Thank you. I can help with some testing when/if needed.

>
> >
> >      tx_bytes: 257830280
> >      tx_bytes_nic: 34962912861
> >      tx_queue_0_bytes: 372
> >      tx_queue_1_bytes: 0
> >      tx_queue_2_bytes: 0
> >      tx_queue_3_bytes: 0
> >      tx_queue_4_bytes: 10044
> >      tx_queue_5_bytes: 0
> >      tx_queue_6_bytes: 0
> >      tx_queue_7_bytes: 0
> >      tx_queue_8_bytes: 0
> >      tx_queue_9_bytes: 1364
> >      tx_queue_10_bytes: 0
> >      tx_queue_11_bytes: 0
> >      tx_queue_12_bytes: 1116
> >      tx_queue_13_bytes: 0
> >      tx_queue_14_bytes: 0
> >      tx_queue_15_bytes: 0

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-03-08 10:05 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-01-26 15:54 Need of advice for XDP sockets on top of the interfaces behind a Linux bonding device Pavel Vazharov
2024-01-26 19:28 ` Toke Høiland-Jørgensen
2024-01-27  3:58   ` Pavel Vazharov
2024-01-27  4:39     ` Jakub Kicinski
2024-01-27  5:08       ` Pavel Vazharov
     [not found]         ` <CAJEV1ij=K5Xi5LtpH7SHXLxve+JqMWhimdF50Ddy99G0E9dj_Q@mail.gmail.com>
2024-01-30 13:54           ` Pavel Vazharov
2024-01-30 14:32             ` Toke Høiland-Jørgensen
2024-01-30 14:40               ` Pavel Vazharov
2024-01-30 14:54                 ` Toke Høiland-Jørgensen
2024-02-05  7:07                   ` Magnus Karlsson
2024-02-07 15:49                     ` Pavel Vazharov
2024-02-07 16:07                       ` Pavel Vazharov
2024-02-07 19:00                       ` Maciej Fijalkowski
2024-02-08 10:59                         ` Pavel Vazharov
2024-02-08 15:47                           ` Pavel Vazharov
2024-02-09  9:03                             ` Pavel Vazharov
2024-02-09 18:37                               ` Maciej Fijalkowski
2024-02-16 15:18                                 ` Maciej Fijalkowski
2024-02-16 17:24                               ` Maciej Fijalkowski
2024-02-19 13:45                                 ` Pavel Vazharov
2024-02-19 14:56                                   ` Maciej Fijalkowski
2024-03-08 10:05                                     ` Pavel Vazharov

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).