* Redirect to AF_XDP socket not working with bond interface in native mode
@ 2023-12-19 10:45 Prashant Batra
2023-12-19 10:58 ` Prashant Batra
2023-12-19 13:47 ` Magnus Karlsson
0 siblings, 2 replies; 21+ messages in thread
From: Prashant Batra @ 2023-12-19 10:45 UTC (permalink / raw)
To: xdp-newbies
Hi,
I am new to XDP and exploring it's working with different interface
types supported in linux. One of my use cases is to be able to receive
packets from the bond interface.
I used xdpsock sample program specifying the bond interface as the
input interface. However the packets received on the bond interface
are not handed over to the socket by the kernel if the socket is bound
in native mode. The packets are neither being passed to the kernel.
Note that the socket creation does succeed.
In skb mode this works and I am able to receive packets in the
userspace. But in skb mode as expected the performance is not that
great.
Is AF_XDP sockets on bond not supported in native mode? Or since the
packet has be to be handed over to the bond driver post reception on
the phy port, a skb allocation and copy to it is indeed a must?
Another thing I notice is that other XDP programs attached to bond
interface with targets like DROP, REDIRECT to other interface works
and perform better than AF_XDP (skb) based. Does this mean that these
are not allocating skb?
Kindly share your thoughts and advice.
Thanks,
Prashant
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2023-12-19 10:45 Redirect to AF_XDP socket not working with bond interface in native mode Prashant Batra
@ 2023-12-19 10:58 ` Prashant Batra
2023-12-19 13:47 ` Magnus Karlsson
1 sibling, 0 replies; 21+ messages in thread
From: Prashant Batra @ 2023-12-19 10:58 UTC (permalink / raw)
To: xdp-newbies
Apologies. The kernel I am testing this on is-
5.14.0-5.14.0-162.18.1.el9_1
On Tue, Dec 19, 2023 at 4:15 PM Prashant Batra <prbatra.mail@gmail.com> wrote:
>
> Hi,
>
> I am new to XDP and exploring it's working with different interface
> types supported in linux. One of my use cases is to be able to receive
> packets from the bond interface.
> I used xdpsock sample program specifying the bond interface as the
> input interface. However the packets received on the bond interface
> are not handed over to the socket by the kernel if the socket is bound
> in native mode. The packets are neither being passed to the kernel.
> Note that the socket creation does succeed.
> In skb mode this works and I am able to receive packets in the
> userspace. But in skb mode as expected the performance is not that
> great.
>
> Is AF_XDP sockets on bond not supported in native mode? Or since the
> packet has be to be handed over to the bond driver post reception on
> the phy port, a skb allocation and copy to it is indeed a must?
>
> Another thing I notice is that other XDP programs attached to bond
> interface with targets like DROP, REDIRECT to other interface works
> and perform better than AF_XDP (skb) based. Does this mean that these
> are not allocating skb?
>
> Kindly share your thoughts and advice.
>
> Thanks,
> Prashant
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2023-12-19 10:45 Redirect to AF_XDP socket not working with bond interface in native mode Prashant Batra
2023-12-19 10:58 ` Prashant Batra
@ 2023-12-19 13:47 ` Magnus Karlsson
2023-12-19 20:18 ` Prashant Batra
1 sibling, 1 reply; 21+ messages in thread
From: Magnus Karlsson @ 2023-12-19 13:47 UTC (permalink / raw)
To: Prashant Batra; +Cc: xdp-newbies
On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
>
> Hi,
>
> I am new to XDP and exploring it's working with different interface
> types supported in linux. One of my use cases is to be able to receive
> packets from the bond interface.
> I used xdpsock sample program specifying the bond interface as the
> input interface. However the packets received on the bond interface
> are not handed over to the socket by the kernel if the socket is bound
> in native mode. The packets are neither being passed to the kernel.
> Note that the socket creation does succeed.
> In skb mode this works and I am able to receive packets in the
> userspace. But in skb mode as expected the performance is not that
> great.
>
> Is AF_XDP sockets on bond not supported in native mode? Or since the
> packet has be to be handed over to the bond driver post reception on
> the phy port, a skb allocation and copy to it is indeed a must?
I have never tried a bonding interface with AF_XDP, so it might not
work. Can you trace the packet to see where it is being dropped in
native mode? There are no modifications needed to an XDP_REDIRECT
enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
you using?
> Another thing I notice is that other XDP programs attached to bond
> interface with targets like DROP, REDIRECT to other interface works
> and perform better than AF_XDP (skb) based. Does this mean that these
> are not allocating skb?
I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
The packet has to be copied out to user-space then copied into the
kernel again, something that is not needed in the XDP_REDIRECT case.
If you were using zero-copy, on the other hand, it would be faster
with AF_XDP. But the bonding interface does not support zero-copy, so
not an option.
> Kindly share your thoughts and advice.
>
> Thanks,
> Prashant
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2023-12-19 13:47 ` Magnus Karlsson
@ 2023-12-19 20:18 ` Prashant Batra
2023-12-20 8:24 ` Magnus Karlsson
0 siblings, 1 reply; 21+ messages in thread
From: Prashant Batra @ 2023-12-19 20:18 UTC (permalink / raw)
To: Magnus Karlsson; +Cc: xdp-newbies
Thanks for your response. My comments inline.
On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> >
> > Hi,
> >
> > I am new to XDP and exploring it's working with different interface
> > types supported in linux. One of my use cases is to be able to receive
> > packets from the bond interface.
> > I used xdpsock sample program specifying the bond interface as the
> > input interface. However the packets received on the bond interface
> > are not handed over to the socket by the kernel if the socket is bound
> > in native mode. The packets are neither being passed to the kernel.
> > Note that the socket creation does succeed.
> > In skb mode this works and I am able to receive packets in the
> > userspace. But in skb mode as expected the performance is not that
> > great.
> >
> > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > packet has be to be handed over to the bond driver post reception on
> > the phy port, a skb allocation and copy to it is indeed a must?
>
> I have never tried a bonding interface with AF_XDP, so it might not
> work. Can you trace the packet to see where it is being dropped in
> native mode? There are no modifications needed to an XDP_REDIRECT
> enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> you using?
>
I will trace the packet and get back.
The bond is over 2 physical ports part of the Intel NIC card. Those are-
b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
SFI/SFP+ Network Connection (rev 01)
b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
SFI/SFP+ Network Connection (rev 01)
Bonding algo is 802.3ad
CPU is Intel Xeon Gold 3.40GHz
NIC Driver
# ethtool -i ens1f0
driver: ixgbe
version: 5.14.0-362.13.1.el9_3
Features
# xdp-loader features ens1f0
NETDEV_XDP_ACT_BASIC: yes
NETDEV_XDP_ACT_REDIRECT: yes
NETDEV_XDP_ACT_NDO_XMIT: no
NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
NETDEV_XDP_ACT_HW_OFFLOAD: no
NETDEV_XDP_ACT_RX_SG: no
NETDEV_XDP_ACT_NDO_XMIT_SG: no
CPU is
Interesting thing is that the bond0 does advertise both native and ZC
mode. That's because the features are copied from the slave device.
Which explains why there is no error while binding the socket in
native/zero-copy mode.
void bond_xdp_set_features(struct net_device *bond_dev)
{
..
bond_for_each_slave(bond, slave, iter)
val &= slave->dev->xdp_features;
xdp_set_features_flag(bond_dev, val);
}
# ../xdp-loader/xdp-loader features bond0
NETDEV_XDP_ACT_BASIC: yes
NETDEV_XDP_ACT_REDIRECT: yes
NETDEV_XDP_ACT_NDO_XMIT: no
NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
NETDEV_XDP_ACT_HW_OFFLOAD: no
NETDEV_XDP_ACT_RX_SG: no
NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > Another thing I notice is that other XDP programs attached to bond
> > interface with targets like DROP, REDIRECT to other interface works
> > and perform better than AF_XDP (skb) based. Does this mean that these
> > are not allocating skb?
>
> I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> The packet has to be copied out to user-space then copied into the
> kernel again, something that is not needed in the XDP_REDIRECT case.
> If you were using zero-copy, on the other hand, it would be faster
> with AF_XDP. But the bonding interface does not support zero-copy, so
> not an option.
>
Just to put forth the pps numbers with the above mentioned single port
in different modes and a comparison to the bond interface.
Test is using pktgen pumping 64 byte packets on a single flow.
Single AF_XDP sock on a single NIC queue-
AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
══════════════════════════════════════════════════════════
ZC 14M 65% 35%
./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
-i ens1f0 -q 5 -p -n 1 -N -c
SKB_MODE 2.2M 100% 62% ./xdpsock
-r -i ens1f0 -q 5 -p -n 1 -S
* CPU receiving the packet
In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
expected? Especially in ZC mode. Is it majorly because of the BPF
program running in non-HW offloaded mode? Don't have a NIC which can
run BPF in offloaded mode so I cannot compare it.
The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
xdp-bench PPS CPU-SI* Command
═══════════════════════════════════════════════
drop, no-touch 14M 41% ./xdp-bench drop -p
no-touch ens1f0 -e
drop, read-data 14M 55% ./xdp-bench drop -p
read-data ens1f0 -e
drop, parse-ip 14M 58% ./xdp-bench drop -p
parse-ip ens1f0 -e
* CPU receiving the packet
The similar tests on bond interface (above mentioned 2 ports bonded)-
AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
══════════════════════════════════════════════════════════
ZC X X X
./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
XDP_DRV/COPY X X X
./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
SKB_MODE 2M 100% 55% ./xdpsock
-r -i bond0 -q 0 -p -n 1 -S
* CPU receiving the packet
xdp-bench PPS CPU-SI* Command
═══════════════════════════════════════════════
drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
bond0 -e
drop, read-data 10.9M 44% ./xdp-bench drop -p
read-data bond0 -e
drop, parse-ip 10.9M 47% ./xdp-bench drop -p
parse-ip bond0 -e
* CPU receiving the packet
> > Kindly share your thoughts and advice.
> >
> > Thanks,
> > Prashant
> >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2023-12-19 20:18 ` Prashant Batra
@ 2023-12-20 8:24 ` Magnus Karlsson
2023-12-21 12:39 ` Prashant Batra
0 siblings, 1 reply; 21+ messages in thread
From: Magnus Karlsson @ 2023-12-20 8:24 UTC (permalink / raw)
To: Prashant Batra; +Cc: xdp-newbies
On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
>
> Thanks for your response. My comments inline.
>
> On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > I am new to XDP and exploring it's working with different interface
> > > types supported in linux. One of my use cases is to be able to receive
> > > packets from the bond interface.
> > > I used xdpsock sample program specifying the bond interface as the
> > > input interface. However the packets received on the bond interface
> > > are not handed over to the socket by the kernel if the socket is bound
> > > in native mode. The packets are neither being passed to the kernel.
> > > Note that the socket creation does succeed.
> > > In skb mode this works and I am able to receive packets in the
> > > userspace. But in skb mode as expected the performance is not that
> > > great.
> > >
> > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > packet has be to be handed over to the bond driver post reception on
> > > the phy port, a skb allocation and copy to it is indeed a must?
> >
> > I have never tried a bonding interface with AF_XDP, so it might not
> > work. Can you trace the packet to see where it is being dropped in
> > native mode? There are no modifications needed to an XDP_REDIRECT
> > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > you using?
> >
> I will trace the packet and get back.
> The bond is over 2 physical ports part of the Intel NIC card. Those are-
> b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> SFI/SFP+ Network Connection (rev 01)
> b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> SFI/SFP+ Network Connection (rev 01)
>
> Bonding algo is 802.3ad
>
> CPU is Intel Xeon Gold 3.40GHz
>
> NIC Driver
> # ethtool -i ens1f0
> driver: ixgbe
> version: 5.14.0-362.13.1.el9_3
Could you please try with the latest kernel 6.7? 5.14 is quite old and
a lot of things have happened since then.
> Features
> # xdp-loader features ens1f0
> NETDEV_XDP_ACT_BASIC: yes
> NETDEV_XDP_ACT_REDIRECT: yes
> NETDEV_XDP_ACT_NDO_XMIT: no
> NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> NETDEV_XDP_ACT_HW_OFFLOAD: no
> NETDEV_XDP_ACT_RX_SG: no
> NETDEV_XDP_ACT_NDO_XMIT_SG: no
>
> CPU is
>
> Interesting thing is that the bond0 does advertise both native and ZC
> mode. That's because the features are copied from the slave device.
> Which explains why there is no error while binding the socket in
> native/zero-copy mode.
It is probably the intention that if both the bonded devices support a
feature, then the bonding device will too. I just saw that the bonding
device did not implement xsk_wakeup which is used by zero-copy, so
zero-copy is not really supported so that support should not be
advertised. The code in AF_XDP tests for zero-copy support this way:
if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
err = -EOPNOTSUPP;
goto err_unreg_pool;
}
So there are some things needed in the bonding driver to make
zero-copy work. Might not be much though. But your problem is with
XDP_DRV and copy mode, so let us start there.
> void bond_xdp_set_features(struct net_device *bond_dev)
> {
> ..
> bond_for_each_slave(bond, slave, iter)
> val &= slave->dev->xdp_features;
> xdp_set_features_flag(bond_dev, val);
> }
>
> # ../xdp-loader/xdp-loader features bond0
> NETDEV_XDP_ACT_BASIC: yes
> NETDEV_XDP_ACT_REDIRECT: yes
> NETDEV_XDP_ACT_NDO_XMIT: no
> NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> NETDEV_XDP_ACT_HW_OFFLOAD: no
> NETDEV_XDP_ACT_RX_SG: no
> NETDEV_XDP_ACT_NDO_XMIT_SG: no
>
> > > Another thing I notice is that other XDP programs attached to bond
> > > interface with targets like DROP, REDIRECT to other interface works
> > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > are not allocating skb?
> >
> > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > The packet has to be copied out to user-space then copied into the
> > kernel again, something that is not needed in the XDP_REDIRECT case.
> > If you were using zero-copy, on the other hand, it would be faster
> > with AF_XDP. But the bonding interface does not support zero-copy, so
> > not an option.
> >
>
> Just to put forth the pps numbers with the above mentioned single port
> in different modes and a comparison to the bond interface.
> Test is using pktgen pumping 64 byte packets on a single flow.
>
> Single AF_XDP sock on a single NIC queue-
> AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> ══════════════════════════════════════════════════════════
> ZC 14M 65% 35%
> ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> -i ens1f0 -q 5 -p -n 1 -N -c
> SKB_MODE 2.2M 100% 62% ./xdpsock
> -r -i ens1f0 -q 5 -p -n 1 -S
> * CPU receiving the packet
> In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> expected? Especially in ZC mode. Is it majorly because of the BPF
> program running in non-HW offloaded mode? Don't have a NIC which can
> run BPF in offloaded mode so I cannot compare it.
I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
100G card and you are maxing out your 10G card at 65% and 14M. So yes,
sounds reasonable. HW offload cannot be used with AF_XDP. You need to
do the redirect in the CPU for it to work. If you want to know where
time is spent use "perf top". The biggest chunk of time is spent in
the XDP_REDIRECT operation, but there are many other time thiefs too.
> The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> xdp-bench PPS CPU-SI* Command
> ═══════════════════════════════════════════════
> drop, no-touch 14M 41% ./xdp-bench drop -p
> no-touch ens1f0 -e
> drop, read-data 14M 55% ./xdp-bench drop -p
> read-data ens1f0 -e
> drop, parse-ip 14M 58% ./xdp-bench drop -p
> parse-ip ens1f0 -e
> * CPU receiving the packet
>
> The similar tests on bond interface (above mentioned 2 ports bonded)-
> AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> ══════════════════════════════════════════════════════════
> ZC X X X
> ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> XDP_DRV/COPY X X X
> ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> SKB_MODE 2M 100% 55% ./xdpsock
> -r -i bond0 -q 0 -p -n 1 -S
> * CPU receiving the packet
>
> xdp-bench PPS CPU-SI* Command
> ═══════════════════════════════════════════════
> drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> bond0 -e
> drop, read-data 10.9M 44% ./xdp-bench drop -p
> read-data bond0 -e
> drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> parse-ip bond0 -e
> * CPU receiving the packet
>
>
> > > Kindly share your thoughts and advice.
> > >
> > > Thanks,
> > > Prashant
> > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2023-12-20 8:24 ` Magnus Karlsson
@ 2023-12-21 12:39 ` Prashant Batra
2023-12-21 13:45 ` Magnus Karlsson
0 siblings, 1 reply; 21+ messages in thread
From: Prashant Batra @ 2023-12-21 12:39 UTC (permalink / raw)
To: Magnus Karlsson; +Cc: xdp-newbies
On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> >
> > Thanks for your response. My comments inline.
> >
> > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > <magnus.karlsson@gmail.com> wrote:
> > >
> > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I am new to XDP and exploring it's working with different interface
> > > > types supported in linux. One of my use cases is to be able to receive
> > > > packets from the bond interface.
> > > > I used xdpsock sample program specifying the bond interface as the
> > > > input interface. However the packets received on the bond interface
> > > > are not handed over to the socket by the kernel if the socket is bound
> > > > in native mode. The packets are neither being passed to the kernel.
> > > > Note that the socket creation does succeed.
> > > > In skb mode this works and I am able to receive packets in the
> > > > userspace. But in skb mode as expected the performance is not that
> > > > great.
> > > >
> > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > packet has be to be handed over to the bond driver post reception on
> > > > the phy port, a skb allocation and copy to it is indeed a must?
> > >
> > > I have never tried a bonding interface with AF_XDP, so it might not
> > > work. Can you trace the packet to see where it is being dropped in
> > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > you using?
> > >
> > I will trace the packet and get back.
> > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > SFI/SFP+ Network Connection (rev 01)
> > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > SFI/SFP+ Network Connection (rev 01)
> >
> > Bonding algo is 802.3ad
> >
> > CPU is Intel Xeon Gold 3.40GHz
> >
> > NIC Driver
> > # ethtool -i ens1f0
> > driver: ixgbe
> > version: 5.14.0-362.13.1.el9_3
>
> Could you please try with the latest kernel 6.7? 5.14 is quite old and
> a lot of things have happened since then.
>
I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> > Features
> > # xdp-loader features ens1f0
> > NETDEV_XDP_ACT_BASIC: yes
> > NETDEV_XDP_ACT_REDIRECT: yes
> > NETDEV_XDP_ACT_NDO_XMIT: no
> > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > NETDEV_XDP_ACT_RX_SG: no
> > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> >
> > CPU is
> >
> > Interesting thing is that the bond0 does advertise both native and ZC
> > mode. That's because the features are copied from the slave device.
> > Which explains why there is no error while binding the socket in
> > native/zero-copy mode.
>
> It is probably the intention that if both the bonded devices support a
> feature, then the bonding device will too. I just saw that the bonding
> device did not implement xsk_wakeup which is used by zero-copy, so
> zero-copy is not really supported so that support should not be
> advertised. The code in AF_XDP tests for zero-copy support this way:
>
> if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> err = -EOPNOTSUPP;
> goto err_unreg_pool;
> }
>
> So there are some things needed in the bonding driver to make
> zero-copy work. Might not be much though. But your problem is with
> XDP_DRV and copy mode, so let us start there.
>
> > void bond_xdp_set_features(struct net_device *bond_dev)
> > {
> > ..
> > bond_for_each_slave(bond, slave, iter)
> > val &= slave->dev->xdp_features;
> > xdp_set_features_flag(bond_dev, val);
> > }
> >
> > # ../xdp-loader/xdp-loader features bond0
> > NETDEV_XDP_ACT_BASIC: yes
> > NETDEV_XDP_ACT_REDIRECT: yes
> > NETDEV_XDP_ACT_NDO_XMIT: no
> > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > NETDEV_XDP_ACT_RX_SG: no
> > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> >
> > > > Another thing I notice is that other XDP programs attached to bond
> > > > interface with targets like DROP, REDIRECT to other interface works
> > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > are not allocating skb?
> > >
> > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > The packet has to be copied out to user-space then copied into the
> > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > If you were using zero-copy, on the other hand, it would be faster
> > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > not an option.
> > >
> >
> > Just to put forth the pps numbers with the above mentioned single port
> > in different modes and a comparison to the bond interface.
> > Test is using pktgen pumping 64 byte packets on a single flow.
> >
> > Single AF_XDP sock on a single NIC queue-
> > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > ══════════════════════════════════════════════════════════
> > ZC 14M 65% 35%
> > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > -i ens1f0 -q 5 -p -n 1 -N -c
> > SKB_MODE 2.2M 100% 62% ./xdpsock
> > -r -i ens1f0 -q 5 -p -n 1 -S
> > * CPU receiving the packet
> > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > expected? Especially in ZC mode. Is it majorly because of the BPF
> > program running in non-HW offloaded mode? Don't have a NIC which can
> > run BPF in offloaded mode so I cannot compare it.
>
> I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> do the redirect in the CPU for it to work. If you want to know where
> time is spent use "perf top". The biggest chunk of time is spent in
> the XDP_REDIRECT operation, but there are many other time thiefs too.
>
> > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > xdp-bench PPS CPU-SI* Command
> > ═══════════════════════════════════════════════
> > drop, no-touch 14M 41% ./xdp-bench drop -p
> > no-touch ens1f0 -e
> > drop, read-data 14M 55% ./xdp-bench drop -p
> > read-data ens1f0 -e
> > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > parse-ip ens1f0 -e
> > * CPU receiving the packet
> >
> > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > ══════════════════════════════════════════════════════════
> > ZC X X X
> > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > XDP_DRV/COPY X X X
> > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > SKB_MODE 2M 100% 55% ./xdpsock
> > -r -i bond0 -q 0 -p -n 1 -S
> > * CPU receiving the packet
> >
> > xdp-bench PPS CPU-SI* Command
> > ═══════════════════════════════════════════════
> > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > bond0 -e
> > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > read-data bond0 -e
> > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > parse-ip bond0 -e
> > * CPU receiving the packet
> >
> >
> > > > Kindly share your thoughts and advice.
> > > >
> > > > Thanks,
> > > > Prashant
> > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2023-12-21 12:39 ` Prashant Batra
@ 2023-12-21 13:45 ` Magnus Karlsson
2023-12-22 11:23 ` Prashant Batra
0 siblings, 1 reply; 21+ messages in thread
From: Magnus Karlsson @ 2023-12-21 13:45 UTC (permalink / raw)
To: Prashant Batra; +Cc: xdp-newbies
On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@gmail.com> wrote:
>
> On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > >
> > > Thanks for your response. My comments inline.
> > >
> > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > <magnus.karlsson@gmail.com> wrote:
> > > >
> > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > >
> > > > > Hi,
> > > > >
> > > > > I am new to XDP and exploring it's working with different interface
> > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > packets from the bond interface.
> > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > input interface. However the packets received on the bond interface
> > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > Note that the socket creation does succeed.
> > > > > In skb mode this works and I am able to receive packets in the
> > > > > userspace. But in skb mode as expected the performance is not that
> > > > > great.
> > > > >
> > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > packet has be to be handed over to the bond driver post reception on
> > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > >
> > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > work. Can you trace the packet to see where it is being dropped in
> > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > you using?
> > > >
> > > I will trace the packet and get back.
> > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > SFI/SFP+ Network Connection (rev 01)
> > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > SFI/SFP+ Network Connection (rev 01)
> > >
> > > Bonding algo is 802.3ad
> > >
> > > CPU is Intel Xeon Gold 3.40GHz
> > >
> > > NIC Driver
> > > # ethtool -i ens1f0
> > > driver: ixgbe
> > > version: 5.14.0-362.13.1.el9_3
> >
> > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > a lot of things have happened since then.
> >
> I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
OK, good to know. Have you managed to trace where the packet is lost?
> > > Features
> > > # xdp-loader features ens1f0
> > > NETDEV_XDP_ACT_BASIC: yes
> > > NETDEV_XDP_ACT_REDIRECT: yes
> > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > NETDEV_XDP_ACT_RX_SG: no
> > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > >
> > > CPU is
> > >
> > > Interesting thing is that the bond0 does advertise both native and ZC
> > > mode. That's because the features are copied from the slave device.
> > > Which explains why there is no error while binding the socket in
> > > native/zero-copy mode.
> >
> > It is probably the intention that if both the bonded devices support a
> > feature, then the bonding device will too. I just saw that the bonding
> > device did not implement xsk_wakeup which is used by zero-copy, so
> > zero-copy is not really supported so that support should not be
> > advertised. The code in AF_XDP tests for zero-copy support this way:
> >
> > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > err = -EOPNOTSUPP;
> > goto err_unreg_pool;
> > }
> >
> > So there are some things needed in the bonding driver to make
> > zero-copy work. Might not be much though. But your problem is with
> > XDP_DRV and copy mode, so let us start there.
> >
> > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > {
> > > ..
> > > bond_for_each_slave(bond, slave, iter)
> > > val &= slave->dev->xdp_features;
> > > xdp_set_features_flag(bond_dev, val);
> > > }
> > >
> > > # ../xdp-loader/xdp-loader features bond0
> > > NETDEV_XDP_ACT_BASIC: yes
> > > NETDEV_XDP_ACT_REDIRECT: yes
> > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > NETDEV_XDP_ACT_RX_SG: no
> > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > >
> > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > are not allocating skb?
> > > >
> > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > The packet has to be copied out to user-space then copied into the
> > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > If you were using zero-copy, on the other hand, it would be faster
> > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > not an option.
> > > >
> > >
> > > Just to put forth the pps numbers with the above mentioned single port
> > > in different modes and a comparison to the bond interface.
> > > Test is using pktgen pumping 64 byte packets on a single flow.
> > >
> > > Single AF_XDP sock on a single NIC queue-
> > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > ══════════════════════════════════════════════════════════
> > > ZC 14M 65% 35%
> > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > SKB_MODE 2.2M 100% 62% ./xdpsock
> > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > * CPU receiving the packet
> > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > run BPF in offloaded mode so I cannot compare it.
> >
> > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > do the redirect in the CPU for it to work. If you want to know where
> > time is spent use "perf top". The biggest chunk of time is spent in
> > the XDP_REDIRECT operation, but there are many other time thiefs too.
> >
> > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > xdp-bench PPS CPU-SI* Command
> > > ═══════════════════════════════════════════════
> > > drop, no-touch 14M 41% ./xdp-bench drop -p
> > > no-touch ens1f0 -e
> > > drop, read-data 14M 55% ./xdp-bench drop -p
> > > read-data ens1f0 -e
> > > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > > parse-ip ens1f0 -e
> > > * CPU receiving the packet
> > >
> > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > ══════════════════════════════════════════════════════════
> > > ZC X X X
> > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > XDP_DRV/COPY X X X
> > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > SKB_MODE 2M 100% 55% ./xdpsock
> > > -r -i bond0 -q 0 -p -n 1 -S
> > > * CPU receiving the packet
> > >
> > > xdp-bench PPS CPU-SI* Command
> > > ═══════════════════════════════════════════════
> > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > > bond0 -e
> > > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > > read-data bond0 -e
> > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > > parse-ip bond0 -e
> > > * CPU receiving the packet
> > >
> > >
> > > > > Kindly share your thoughts and advice.
> > > > >
> > > > > Thanks,
> > > > > Prashant
> > > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2023-12-21 13:45 ` Magnus Karlsson
@ 2023-12-22 11:23 ` Prashant Batra
2024-01-02 9:57 ` Magnus Karlsson
0 siblings, 1 reply; 21+ messages in thread
From: Prashant Batra @ 2023-12-22 11:23 UTC (permalink / raw)
To: Magnus Karlsson; +Cc: xdp-newbies
Yes, I found the place where the packet is getting dropped. The check
for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
The device in xs is the bond device whereas the one in xdp->rxq is the
slave device on which the packet is received and the xdp program is
being invoked from.
static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
{
--
if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
return -EINVAL;
--
}
Here is the perf backtrace for the xdp_redirect_err event.
ksoftirqd/0 14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
ffffffff873dcbf4 xdp_do_redirect+0x3b4
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
ffffffff873dcbf4 xdp_do_redirect+0x3b4
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
ffffffffc05d0f0f ixgbe_run_xdp+0x10f
(/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
(/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
ffffffffc05d2da0 ixgbe_poll+0xf0
(/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
ffffffff873afad7 __napi_poll+0x27
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
ffffffff873affd3 net_rx_action+0x233
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
ffffffff8762ae27 __do_softirq+0xc7
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
ffffffff86b04cfe run_ksoftirqd+0x1e
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
ffffffff86b33d83 smpboot_thread_fn+0xd3
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
ffffffff86b2956d kthread+0xdd
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
ffffffff86a02289 ret_from_fork+0x29
(/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
I am curious why the xdp program is invoked from the ixgbe driver
(running for slave device) when the xdp program is actually attached
to the bond device? Is this by design?
# xdp-loader status bond0
CURRENT XDP PROGRAM STATUS:
Interface Prio Program name Mode ID Tag
Chain actions
--------------------------------------------------------------------------------------
bond0 xdp_dispatcher native 64 90f686eb86991928
=> 20 xsk_def_prog 73
8f9c40757cb0a6a2 XDP_PASS
# xdp-loader status ens1f0
CURRENT XDP PROGRAM STATUS:
Interface Prio Program name Mode ID Tag
Chain actions
--------------------------------------------------------------------------------------
ens1f0 <No XDP program loaded!>
# xdp-loader status ens1f1
CURRENT XDP PROGRAM STATUS:
Interface Prio Program name Mode ID Tag
Chain actions
--------------------------------------------------------------------------------------
ens1f1 <No XDP program loaded!>
Now, if I skip the device check in xsk_rcv_check(), I can see the
packets being received in the AF_XDP socket in the driver mode.
# ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
sock0@bond0:5 rxdrop xdp-drv poll()
pps pkts 1.00
rx 10,126,924 1,984,092,501
tx 0 0
I am sure we would not want to skip the device check generally
especially for non-bonded devices, etc. Please guide on how to take
this further and get the issue fixed in the mainline.
The ZC mode doesn't work. Mostly because of the problem you had
pointed out before.
# ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"
On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@gmail.com> wrote:
> >
> > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > <magnus.karlsson@gmail.com> wrote:
> > >
> > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > >
> > > > Thanks for your response. My comments inline.
> > > >
> > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > <magnus.karlsson@gmail.com> wrote:
> > > > >
> > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > >
> > > > > > Hi,
> > > > > >
> > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > packets from the bond interface.
> > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > input interface. However the packets received on the bond interface
> > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > Note that the socket creation does succeed.
> > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > great.
> > > > > >
> > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > >
> > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > you using?
> > > > >
> > > > I will trace the packet and get back.
> > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > SFI/SFP+ Network Connection (rev 01)
> > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > SFI/SFP+ Network Connection (rev 01)
> > > >
> > > > Bonding algo is 802.3ad
> > > >
> > > > CPU is Intel Xeon Gold 3.40GHz
> > > >
> > > > NIC Driver
> > > > # ethtool -i ens1f0
> > > > driver: ixgbe
> > > > version: 5.14.0-362.13.1.el9_3
> > >
> > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > a lot of things have happened since then.
> > >
> > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
>
> OK, good to know. Have you managed to trace where the packet is lost?
>
> > > > Features
> > > > # xdp-loader features ens1f0
> > > > NETDEV_XDP_ACT_BASIC: yes
> > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > NETDEV_XDP_ACT_RX_SG: no
> > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > >
> > > > CPU is
> > > >
> > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > mode. That's because the features are copied from the slave device.
> > > > Which explains why there is no error while binding the socket in
> > > > native/zero-copy mode.
> > >
> > > It is probably the intention that if both the bonded devices support a
> > > feature, then the bonding device will too. I just saw that the bonding
> > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > zero-copy is not really supported so that support should not be
> > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > >
> > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > > err = -EOPNOTSUPP;
> > > goto err_unreg_pool;
> > > }
> > >
> > > So there are some things needed in the bonding driver to make
> > > zero-copy work. Might not be much though. But your problem is with
> > > XDP_DRV and copy mode, so let us start there.
> > >
> > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > {
> > > > ..
> > > > bond_for_each_slave(bond, slave, iter)
> > > > val &= slave->dev->xdp_features;
> > > > xdp_set_features_flag(bond_dev, val);
> > > > }
> > > >
> > > > # ../xdp-loader/xdp-loader features bond0
> > > > NETDEV_XDP_ACT_BASIC: yes
> > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > NETDEV_XDP_ACT_RX_SG: no
> > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > >
> > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > are not allocating skb?
> > > > >
> > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > The packet has to be copied out to user-space then copied into the
> > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > not an option.
> > > > >
> > > >
> > > > Just to put forth the pps numbers with the above mentioned single port
> > > > in different modes and a comparison to the bond interface.
> > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > >
> > > > Single AF_XDP sock on a single NIC queue-
> > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > ══════════════════════════════════════════════════════════
> > > > ZC 14M 65% 35%
> > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > > SKB_MODE 2.2M 100% 62% ./xdpsock
> > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > * CPU receiving the packet
> > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > run BPF in offloaded mode so I cannot compare it.
> > >
> > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > do the redirect in the CPU for it to work. If you want to know where
> > > time is spent use "perf top". The biggest chunk of time is spent in
> > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > >
> > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > > xdp-bench PPS CPU-SI* Command
> > > > ═══════════════════════════════════════════════
> > > > drop, no-touch 14M 41% ./xdp-bench drop -p
> > > > no-touch ens1f0 -e
> > > > drop, read-data 14M 55% ./xdp-bench drop -p
> > > > read-data ens1f0 -e
> > > > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > > > parse-ip ens1f0 -e
> > > > * CPU receiving the packet
> > > >
> > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > ══════════════════════════════════════════════════════════
> > > > ZC X X X
> > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > > XDP_DRV/COPY X X X
> > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > > SKB_MODE 2M 100% 55% ./xdpsock
> > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > * CPU receiving the packet
> > > >
> > > > xdp-bench PPS CPU-SI* Command
> > > > ═══════════════════════════════════════════════
> > > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > > > bond0 -e
> > > > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > > > read-data bond0 -e
> > > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > > > parse-ip bond0 -e
> > > > * CPU receiving the packet
> > > >
> > > >
> > > > > > Kindly share your thoughts and advice.
> > > > > >
> > > > > > Thanks,
> > > > > > Prashant
> > > > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2023-12-22 11:23 ` Prashant Batra
@ 2024-01-02 9:57 ` Magnus Karlsson
2024-01-11 10:41 ` Prashant Batra
0 siblings, 1 reply; 21+ messages in thread
From: Magnus Karlsson @ 2024-01-02 9:57 UTC (permalink / raw)
To: Prashant Batra, Fijalkowski, Maciej; +Cc: xdp-newbies
On Fri, 22 Dec 2023 at 12:23, Prashant Batra <prbatra.mail@gmail.com> wrote:
>
> Yes, I found the place where the packet is getting dropped. The check
> for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
> The device in xs is the bond device whereas the one in xdp->rxq is the
> slave device on which the packet is received and the xdp program is
> being invoked from.
>
> static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
> {
> --
> if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> return -EINVAL;
> --
> }
I am now back from the holidays.
Perfect! Thank you for finding the root cause. I will rope in Maciej
and we will get back to you with a solution proposal.
> Here is the perf backtrace for the xdp_redirect_err event.
> ksoftirqd/0 14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
> action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
> ffffffff873dcbf4 xdp_do_redirect+0x3b4
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> ffffffff873dcbf4 xdp_do_redirect+0x3b4
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> ffffffffc05d0f0f ixgbe_run_xdp+0x10f
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> ffffffffc05d2da0 ixgbe_poll+0xf0
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> ffffffff873afad7 __napi_poll+0x27
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> ffffffff873affd3 net_rx_action+0x233
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> ffffffff8762ae27 __do_softirq+0xc7
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> ffffffff86b04cfe run_ksoftirqd+0x1e
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> ffffffff86b33d83 smpboot_thread_fn+0xd3
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> ffffffff86b2956d kthread+0xdd
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> ffffffff86a02289 ret_from_fork+0x29
> (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
>
> I am curious why the xdp program is invoked from the ixgbe driver
> (running for slave device) when the xdp program is actually attached
> to the bond device? Is this by design?
> # xdp-loader status bond0
> CURRENT XDP PROGRAM STATUS:
> Interface Prio Program name Mode ID Tag
> Chain actions
> --------------------------------------------------------------------------------------
> bond0 xdp_dispatcher native 64 90f686eb86991928
> => 20 xsk_def_prog 73
> 8f9c40757cb0a6a2 XDP_PASS
>
> # xdp-loader status ens1f0
> CURRENT XDP PROGRAM STATUS:
> Interface Prio Program name Mode ID Tag
> Chain actions
> --------------------------------------------------------------------------------------
> ens1f0 <No XDP program loaded!>
>
> # xdp-loader status ens1f1
> CURRENT XDP PROGRAM STATUS:
> Interface Prio Program name Mode ID Tag
> Chain actions
> --------------------------------------------------------------------------------------
> ens1f1 <No XDP program loaded!>
>
> Now, if I skip the device check in xsk_rcv_check(), I can see the
> packets being received in the AF_XDP socket in the driver mode.
> # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
> sock0@bond0:5 rxdrop xdp-drv poll()
> pps pkts 1.00
> rx 10,126,924 1,984,092,501
> tx 0 0
>
> I am sure we would not want to skip the device check generally
> especially for non-bonded devices, etc. Please guide on how to take
> this further and get the issue fixed in the mainline.
>
> The ZC mode doesn't work. Mostly because of the problem you had
> pointed out before.
> # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
> xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"
>
>
> On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > >
> > > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > > <magnus.karlsson@gmail.com> wrote:
> > > >
> > > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > >
> > > > > Thanks for your response. My comments inline.
> > > > >
> > > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > > packets from the bond interface.
> > > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > > input interface. However the packets received on the bond interface
> > > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > > Note that the socket creation does succeed.
> > > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > > great.
> > > > > > >
> > > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > > >
> > > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > > you using?
> > > > > >
> > > > > I will trace the packet and get back.
> > > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > SFI/SFP+ Network Connection (rev 01)
> > > > >
> > > > > Bonding algo is 802.3ad
> > > > >
> > > > > CPU is Intel Xeon Gold 3.40GHz
> > > > >
> > > > > NIC Driver
> > > > > # ethtool -i ens1f0
> > > > > driver: ixgbe
> > > > > version: 5.14.0-362.13.1.el9_3
> > > >
> > > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > > a lot of things have happened since then.
> > > >
> > > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> >
> > OK, good to know. Have you managed to trace where the packet is lost?
> >
> > > > > Features
> > > > > # xdp-loader features ens1f0
> > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > >
> > > > > CPU is
> > > > >
> > > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > > mode. That's because the features are copied from the slave device.
> > > > > Which explains why there is no error while binding the socket in
> > > > > native/zero-copy mode.
> > > >
> > > > It is probably the intention that if both the bonded devices support a
> > > > feature, then the bonding device will too. I just saw that the bonding
> > > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > > zero-copy is not really supported so that support should not be
> > > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > > >
> > > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > > > err = -EOPNOTSUPP;
> > > > goto err_unreg_pool;
> > > > }
> > > >
> > > > So there are some things needed in the bonding driver to make
> > > > zero-copy work. Might not be much though. But your problem is with
> > > > XDP_DRV and copy mode, so let us start there.
> > > >
> > > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > > {
> > > > > ..
> > > > > bond_for_each_slave(bond, slave, iter)
> > > > > val &= slave->dev->xdp_features;
> > > > > xdp_set_features_flag(bond_dev, val);
> > > > > }
> > > > >
> > > > > # ../xdp-loader/xdp-loader features bond0
> > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > >
> > > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > > are not allocating skb?
> > > > > >
> > > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > > The packet has to be copied out to user-space then copied into the
> > > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > > not an option.
> > > > > >
> > > > >
> > > > > Just to put forth the pps numbers with the above mentioned single port
> > > > > in different modes and a comparison to the bond interface.
> > > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > > >
> > > > > Single AF_XDP sock on a single NIC queue-
> > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > ══════════════════════════════════════════════════════════
> > > > > ZC 14M 65% 35%
> > > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > > > SKB_MODE 2.2M 100% 62% ./xdpsock
> > > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > > * CPU receiving the packet
> > > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > > run BPF in offloaded mode so I cannot compare it.
> > > >
> > > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > > do the redirect in the CPU for it to work. If you want to know where
> > > > time is spent use "perf top". The biggest chunk of time is spent in
> > > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > > >
> > > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > > > xdp-bench PPS CPU-SI* Command
> > > > > ═══════════════════════════════════════════════
> > > > > drop, no-touch 14M 41% ./xdp-bench drop -p
> > > > > no-touch ens1f0 -e
> > > > > drop, read-data 14M 55% ./xdp-bench drop -p
> > > > > read-data ens1f0 -e
> > > > > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > > > > parse-ip ens1f0 -e
> > > > > * CPU receiving the packet
> > > > >
> > > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > ══════════════════════════════════════════════════════════
> > > > > ZC X X X
> > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > > > XDP_DRV/COPY X X X
> > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > > > SKB_MODE 2M 100% 55% ./xdpsock
> > > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > > * CPU receiving the packet
> > > > >
> > > > > xdp-bench PPS CPU-SI* Command
> > > > > ═══════════════════════════════════════════════
> > > > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > > > > bond0 -e
> > > > > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > > > > read-data bond0 -e
> > > > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > > > > parse-ip bond0 -e
> > > > > * CPU receiving the packet
> > > > >
> > > > >
> > > > > > > Kindly share your thoughts and advice.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Prashant
> > > > > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-01-02 9:57 ` Magnus Karlsson
@ 2024-01-11 10:41 ` Prashant Batra
2024-01-15 9:22 ` Magnus Karlsson
0 siblings, 1 reply; 21+ messages in thread
From: Prashant Batra @ 2024-01-11 10:41 UTC (permalink / raw)
To: Magnus Karlsson; +Cc: Fijalkowski, Maciej, xdp-newbies
On Tue, Jan 2, 2024 at 3:27 PM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Fri, 22 Dec 2023 at 12:23, Prashant Batra <prbatra.mail@gmail.com> wrote:
> >
> > Yes, I found the place where the packet is getting dropped. The check
> > for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
> > The device in xs is the bond device whereas the one in xdp->rxq is the
> > slave device on which the packet is received and the xdp program is
> > being invoked from.
> >
> > static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
> > {
> > --
> > if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> > return -EINVAL;
> > --
> > }
>
> I am now back from the holidays.
>
> Perfect! Thank you for finding the root cause. I will rope in Maciej
> and we will get back to you with a solution proposal.
>
Thanks, will wait for your solution.
> > Here is the perf backtrace for the xdp_redirect_err event.
> > ksoftirqd/0 14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
> > action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
> > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > ffffffffc05d0f0f ixgbe_run_xdp+0x10f
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > ffffffffc05d2da0 ixgbe_poll+0xf0
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > ffffffff873afad7 __napi_poll+0x27
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > ffffffff873affd3 net_rx_action+0x233
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > ffffffff8762ae27 __do_softirq+0xc7
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > ffffffff86b04cfe run_ksoftirqd+0x1e
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > ffffffff86b33d83 smpboot_thread_fn+0xd3
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > ffffffff86b2956d kthread+0xdd
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > ffffffff86a02289 ret_from_fork+0x29
> > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> >
> > I am curious why the xdp program is invoked from the ixgbe driver
> > (running for slave device) when the xdp program is actually attached
> > to the bond device? Is this by design?
> > # xdp-loader status bond0
> > CURRENT XDP PROGRAM STATUS:
> > Interface Prio Program name Mode ID Tag
> > Chain actions
> > --------------------------------------------------------------------------------------
> > bond0 xdp_dispatcher native 64 90f686eb86991928
> > => 20 xsk_def_prog 73
> > 8f9c40757cb0a6a2 XDP_PASS
> >
> > # xdp-loader status ens1f0
> > CURRENT XDP PROGRAM STATUS:
> > Interface Prio Program name Mode ID Tag
> > Chain actions
> > --------------------------------------------------------------------------------------
> > ens1f0 <No XDP program loaded!>
> >
> > # xdp-loader status ens1f1
> > CURRENT XDP PROGRAM STATUS:
> > Interface Prio Program name Mode ID Tag
> > Chain actions
> > --------------------------------------------------------------------------------------
> > ens1f1 <No XDP program loaded!>
> >
> > Now, if I skip the device check in xsk_rcv_check(), I can see the
> > packets being received in the AF_XDP socket in the driver mode.
> > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
> > sock0@bond0:5 rxdrop xdp-drv poll()
> > pps pkts 1.00
> > rx 10,126,924 1,984,092,501
> > tx 0 0
> >
> > I am sure we would not want to skip the device check generally
> > especially for non-bonded devices, etc. Please guide on how to take
> > this further and get the issue fixed in the mainline.
> >
> > The ZC mode doesn't work. Mostly because of the problem you had
> > pointed out before.
> > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
> > xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"
> >
> >
> > On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
> > <magnus.karlsson@gmail.com> wrote:
> > >
> > > On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > >
> > > > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > > > <magnus.karlsson@gmail.com> wrote:
> > > > >
> > > > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > >
> > > > > > Thanks for your response. My comments inline.
> > > > > >
> > > > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > > > packets from the bond interface.
> > > > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > > > input interface. However the packets received on the bond interface
> > > > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > > > Note that the socket creation does succeed.
> > > > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > > > great.
> > > > > > > >
> > > > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > > > >
> > > > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > > > you using?
> > > > > > >
> > > > > > I will trace the packet and get back.
> > > > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > >
> > > > > > Bonding algo is 802.3ad
> > > > > >
> > > > > > CPU is Intel Xeon Gold 3.40GHz
> > > > > >
> > > > > > NIC Driver
> > > > > > # ethtool -i ens1f0
> > > > > > driver: ixgbe
> > > > > > version: 5.14.0-362.13.1.el9_3
> > > > >
> > > > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > > > a lot of things have happened since then.
> > > > >
> > > > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> > >
> > > OK, good to know. Have you managed to trace where the packet is lost?
> > >
> > > > > > Features
> > > > > > # xdp-loader features ens1f0
> > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > >
> > > > > > CPU is
> > > > > >
> > > > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > > > mode. That's because the features are copied from the slave device.
> > > > > > Which explains why there is no error while binding the socket in
> > > > > > native/zero-copy mode.
> > > > >
> > > > > It is probably the intention that if both the bonded devices support a
> > > > > feature, then the bonding device will too. I just saw that the bonding
> > > > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > > > zero-copy is not really supported so that support should not be
> > > > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > > > >
> > > > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > > > > err = -EOPNOTSUPP;
> > > > > goto err_unreg_pool;
> > > > > }
> > > > >
> > > > > So there are some things needed in the bonding driver to make
> > > > > zero-copy work. Might not be much though. But your problem is with
> > > > > XDP_DRV and copy mode, so let us start there.
> > > > >
> > > > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > > > {
> > > > > > ..
> > > > > > bond_for_each_slave(bond, slave, iter)
> > > > > > val &= slave->dev->xdp_features;
> > > > > > xdp_set_features_flag(bond_dev, val);
> > > > > > }
> > > > > >
> > > > > > # ../xdp-loader/xdp-loader features bond0
> > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > >
> > > > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > > > are not allocating skb?
> > > > > > >
> > > > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > > > The packet has to be copied out to user-space then copied into the
> > > > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > > > not an option.
> > > > > > >
> > > > > >
> > > > > > Just to put forth the pps numbers with the above mentioned single port
> > > > > > in different modes and a comparison to the bond interface.
> > > > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > > > >
> > > > > > Single AF_XDP sock on a single NIC queue-
> > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > ══════════════════════════════════════════════════════════
> > > > > > ZC 14M 65% 35%
> > > > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > > > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > > > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > > > > SKB_MODE 2.2M 100% 62% ./xdpsock
> > > > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > > > * CPU receiving the packet
> > > > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > > > run BPF in offloaded mode so I cannot compare it.
> > > > >
> > > > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > > > do the redirect in the CPU for it to work. If you want to know where
> > > > > time is spent use "perf top". The biggest chunk of time is spent in
> > > > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > > > >
> > > > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > ═══════════════════════════════════════════════
> > > > > > drop, no-touch 14M 41% ./xdp-bench drop -p
> > > > > > no-touch ens1f0 -e
> > > > > > drop, read-data 14M 55% ./xdp-bench drop -p
> > > > > > read-data ens1f0 -e
> > > > > > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > > > > > parse-ip ens1f0 -e
> > > > > > * CPU receiving the packet
> > > > > >
> > > > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > ══════════════════════════════════════════════════════════
> > > > > > ZC X X X
> > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > > > > XDP_DRV/COPY X X X
> > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > > > > SKB_MODE 2M 100% 55% ./xdpsock
> > > > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > > > * CPU receiving the packet
> > > > > >
> > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > ═══════════════════════════════════════════════
> > > > > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > > > > > bond0 -e
> > > > > > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > > > > > read-data bond0 -e
> > > > > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > > > > > parse-ip bond0 -e
> > > > > > * CPU receiving the packet
> > > > > >
> > > > > >
> > > > > > > > Kindly share your thoughts and advice.
> > > > > > > >
> > > > > > > > Thanks,
> > > > > > > > Prashant
> > > > > > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-01-11 10:41 ` Prashant Batra
@ 2024-01-15 9:22 ` Magnus Karlsson
2024-01-16 12:48 ` Prashant Batra
0 siblings, 1 reply; 21+ messages in thread
From: Magnus Karlsson @ 2024-01-15 9:22 UTC (permalink / raw)
To: Prashant Batra; +Cc: Fijalkowski, Maciej, xdp-newbies
On Thu, 11 Jan 2024 at 11:41, Prashant Batra <prbatra.mail@gmail.com> wrote:
>
> On Tue, Jan 2, 2024 at 3:27 PM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Fri, 22 Dec 2023 at 12:23, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > >
> > > Yes, I found the place where the packet is getting dropped. The check
> > > for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
> > > The device in xs is the bond device whereas the one in xdp->rxq is the
> > > slave device on which the packet is received and the xdp program is
> > > being invoked from.
> > >
> > > static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
> > > {
> > > --
> > > if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> > > return -EINVAL;
> > > --
> > > }
> >
> > I am now back from the holidays.
> >
> > Perfect! Thank you for finding the root cause. I will rope in Maciej
> > and we will get back to you with a solution proposal.
> >
> Thanks, will wait for your solution.
FYI, I do not have a good solution for this yet. The one I have is too
complicated for my taste. I might have to take this to the list to get
some new ideas on how to tackle it. So this will take longer than
anticipated.
> > > Here is the perf backtrace for the xdp_redirect_err event.
> > > ksoftirqd/0 14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
> > > action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
> > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > ffffffffc05d0f0f ixgbe_run_xdp+0x10f
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > ffffffffc05d2da0 ixgbe_poll+0xf0
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > ffffffff873afad7 __napi_poll+0x27
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > ffffffff873affd3 net_rx_action+0x233
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > ffffffff8762ae27 __do_softirq+0xc7
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > ffffffff86b04cfe run_ksoftirqd+0x1e
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > ffffffff86b33d83 smpboot_thread_fn+0xd3
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > ffffffff86b2956d kthread+0xdd
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > ffffffff86a02289 ret_from_fork+0x29
> > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > >
> > > I am curious why the xdp program is invoked from the ixgbe driver
> > > (running for slave device) when the xdp program is actually attached
> > > to the bond device? Is this by design?
> > > # xdp-loader status bond0
> > > CURRENT XDP PROGRAM STATUS:
> > > Interface Prio Program name Mode ID Tag
> > > Chain actions
> > > --------------------------------------------------------------------------------------
> > > bond0 xdp_dispatcher native 64 90f686eb86991928
> > > => 20 xsk_def_prog 73
> > > 8f9c40757cb0a6a2 XDP_PASS
> > >
> > > # xdp-loader status ens1f0
> > > CURRENT XDP PROGRAM STATUS:
> > > Interface Prio Program name Mode ID Tag
> > > Chain actions
> > > --------------------------------------------------------------------------------------
> > > ens1f0 <No XDP program loaded!>
> > >
> > > # xdp-loader status ens1f1
> > > CURRENT XDP PROGRAM STATUS:
> > > Interface Prio Program name Mode ID Tag
> > > Chain actions
> > > --------------------------------------------------------------------------------------
> > > ens1f1 <No XDP program loaded!>
> > >
> > > Now, if I skip the device check in xsk_rcv_check(), I can see the
> > > packets being received in the AF_XDP socket in the driver mode.
> > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
> > > sock0@bond0:5 rxdrop xdp-drv poll()
> > > pps pkts 1.00
> > > rx 10,126,924 1,984,092,501
> > > tx 0 0
> > >
> > > I am sure we would not want to skip the device check generally
> > > especially for non-bonded devices, etc. Please guide on how to take
> > > this further and get the issue fixed in the mainline.
> > >
> > > The ZC mode doesn't work. Mostly because of the problem you had
> > > pointed out before.
> > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
> > > xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"
> > >
> > >
> > > On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
> > > <magnus.karlsson@gmail.com> wrote:
> > > >
> > > > On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > >
> > > > > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > >
> > > > > > > Thanks for your response. My comments inline.
> > > > > > >
> > > > > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Hi,
> > > > > > > > >
> > > > > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > > > > packets from the bond interface.
> > > > > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > > > > input interface. However the packets received on the bond interface
> > > > > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > > > > Note that the socket creation does succeed.
> > > > > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > > > > great.
> > > > > > > > >
> > > > > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > > > > >
> > > > > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > > > > you using?
> > > > > > > >
> > > > > > > I will trace the packet and get back.
> > > > > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > >
> > > > > > > Bonding algo is 802.3ad
> > > > > > >
> > > > > > > CPU is Intel Xeon Gold 3.40GHz
> > > > > > >
> > > > > > > NIC Driver
> > > > > > > # ethtool -i ens1f0
> > > > > > > driver: ixgbe
> > > > > > > version: 5.14.0-362.13.1.el9_3
> > > > > >
> > > > > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > > > > a lot of things have happened since then.
> > > > > >
> > > > > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> > > >
> > > > OK, good to know. Have you managed to trace where the packet is lost?
> > > >
> > > > > > > Features
> > > > > > > # xdp-loader features ens1f0
> > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > >
> > > > > > > CPU is
> > > > > > >
> > > > > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > > > > mode. That's because the features are copied from the slave device.
> > > > > > > Which explains why there is no error while binding the socket in
> > > > > > > native/zero-copy mode.
> > > > > >
> > > > > > It is probably the intention that if both the bonded devices support a
> > > > > > feature, then the bonding device will too. I just saw that the bonding
> > > > > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > > > > zero-copy is not really supported so that support should not be
> > > > > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > > > > >
> > > > > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > > > > > err = -EOPNOTSUPP;
> > > > > > goto err_unreg_pool;
> > > > > > }
> > > > > >
> > > > > > So there are some things needed in the bonding driver to make
> > > > > > zero-copy work. Might not be much though. But your problem is with
> > > > > > XDP_DRV and copy mode, so let us start there.
> > > > > >
> > > > > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > > > > {
> > > > > > > ..
> > > > > > > bond_for_each_slave(bond, slave, iter)
> > > > > > > val &= slave->dev->xdp_features;
> > > > > > > xdp_set_features_flag(bond_dev, val);
> > > > > > > }
> > > > > > >
> > > > > > > # ../xdp-loader/xdp-loader features bond0
> > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > >
> > > > > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > > > > are not allocating skb?
> > > > > > > >
> > > > > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > > > > The packet has to be copied out to user-space then copied into the
> > > > > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > > > > not an option.
> > > > > > > >
> > > > > > >
> > > > > > > Just to put forth the pps numbers with the above mentioned single port
> > > > > > > in different modes and a comparison to the bond interface.
> > > > > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > > > > >
> > > > > > > Single AF_XDP sock on a single NIC queue-
> > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > ZC 14M 65% 35%
> > > > > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > > > > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > > > > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > > > > > SKB_MODE 2.2M 100% 62% ./xdpsock
> > > > > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > > > > * CPU receiving the packet
> > > > > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > > > > run BPF in offloaded mode so I cannot compare it.
> > > > > >
> > > > > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > > > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > > > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > > > > do the redirect in the CPU for it to work. If you want to know where
> > > > > > time is spent use "perf top". The biggest chunk of time is spent in
> > > > > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > > > > >
> > > > > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > ═══════════════════════════════════════════════
> > > > > > > drop, no-touch 14M 41% ./xdp-bench drop -p
> > > > > > > no-touch ens1f0 -e
> > > > > > > drop, read-data 14M 55% ./xdp-bench drop -p
> > > > > > > read-data ens1f0 -e
> > > > > > > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > > > > > > parse-ip ens1f0 -e
> > > > > > > * CPU receiving the packet
> > > > > > >
> > > > > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > ZC X X X
> > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > > > > > XDP_DRV/COPY X X X
> > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > > > > > SKB_MODE 2M 100% 55% ./xdpsock
> > > > > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > > > > * CPU receiving the packet
> > > > > > >
> > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > ═══════════════════════════════════════════════
> > > > > > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > > > > > > bond0 -e
> > > > > > > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > > > > > > read-data bond0 -e
> > > > > > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > > > > > > parse-ip bond0 -e
> > > > > > > * CPU receiving the packet
> > > > > > >
> > > > > > >
> > > > > > > > > Kindly share your thoughts and advice.
> > > > > > > > >
> > > > > > > > > Thanks,
> > > > > > > > > Prashant
> > > > > > > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-01-15 9:22 ` Magnus Karlsson
@ 2024-01-16 12:48 ` Prashant Batra
2024-01-16 12:59 ` Magnus Karlsson
0 siblings, 1 reply; 21+ messages in thread
From: Prashant Batra @ 2024-01-16 12:48 UTC (permalink / raw)
To: Magnus Karlsson; +Cc: Fijalkowski, Maciej, xdp-newbies
On Mon, Jan 15, 2024 at 2:52 PM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Thu, 11 Jan 2024 at 11:41, Prashant Batra <prbatra.mail@gmail.com> wrote:
> >
> > On Tue, Jan 2, 2024 at 3:27 PM Magnus Karlsson
> > <magnus.karlsson@gmail.com> wrote:
> > >
> > > On Fri, 22 Dec 2023 at 12:23, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > >
> > > > Yes, I found the place where the packet is getting dropped. The check
> > > > for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
> > > > The device in xs is the bond device whereas the one in xdp->rxq is the
> > > > slave device on which the packet is received and the xdp program is
> > > > being invoked from.
> > > >
> > > > static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
> > > > {
> > > > --
> > > > if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> > > > return -EINVAL;
> > > > --
> > > > }
> > >
> > > I am now back from the holidays.
> > >
> > > Perfect! Thank you for finding the root cause. I will rope in Maciej
> > > and we will get back to you with a solution proposal.
> > >
> > Thanks, will wait for your solution.
>
> FYI, I do not have a good solution for this yet. The one I have is too
> complicated for my taste. I might have to take this to the list to get
> some new ideas on how to tackle it. So this will take longer than
> anticipated.
>
Just to add that the AF_XDP TX in native mode is also not performing
well. I am getting around 2Mpps in native mode.
# ./xdpsock -t -i bond0 -N -G 0c:c4:7a:bd:13:b2 -H 0c:c4:7a:b7:5f:6c
sock0@bond0:0 txonly xdp-drv
pps pkts 1.00
rx 0 0
tx 2,520,587 2,521,152
sock0@bond0:0 txonly xdp-drv
pps pkts 1.00
rx 0 0
tx 2,362,740 4,884,352
sock0@bond0:0 txonly xdp-drv
pps pkts 1.00
rx 0 0
tx 1,814,437 6,698,944
sock0@bond0:0 txonly xdp-drv
pps pkts 1.00
rx 0 0
tx 1,817,913 8,517,120
# xdp-loader status bond0
CURRENT XDP PROGRAM STATUS:
Interface Prio Program name Mode ID Tag
Chain actions
--------------------------------------------------------------------------------------
bond0 xdp_dispatcher native 671 90f686eb86991928
=> 20 xsk_def_prog 680
8f9c40757cb0a6a2 XDP_PASS
> > > > Here is the perf backtrace for the xdp_redirect_err event.
> > > > ksoftirqd/0 14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
> > > > action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
> > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > ffffffffc05d0f0f ixgbe_run_xdp+0x10f
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > ffffffffc05d2da0 ixgbe_poll+0xf0
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > ffffffff873afad7 __napi_poll+0x27
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > ffffffff873affd3 net_rx_action+0x233
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > ffffffff8762ae27 __do_softirq+0xc7
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > ffffffff86b04cfe run_ksoftirqd+0x1e
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > ffffffff86b33d83 smpboot_thread_fn+0xd3
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > ffffffff86b2956d kthread+0xdd
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > ffffffff86a02289 ret_from_fork+0x29
> > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > >
> > > > I am curious why the xdp program is invoked from the ixgbe driver
> > > > (running for slave device) when the xdp program is actually attached
> > > > to the bond device? Is this by design?
> > > > # xdp-loader status bond0
> > > > CURRENT XDP PROGRAM STATUS:
> > > > Interface Prio Program name Mode ID Tag
> > > > Chain actions
> > > > --------------------------------------------------------------------------------------
> > > > bond0 xdp_dispatcher native 64 90f686eb86991928
> > > > => 20 xsk_def_prog 73
> > > > 8f9c40757cb0a6a2 XDP_PASS
> > > >
> > > > # xdp-loader status ens1f0
> > > > CURRENT XDP PROGRAM STATUS:
> > > > Interface Prio Program name Mode ID Tag
> > > > Chain actions
> > > > --------------------------------------------------------------------------------------
> > > > ens1f0 <No XDP program loaded!>
> > > >
> > > > # xdp-loader status ens1f1
> > > > CURRENT XDP PROGRAM STATUS:
> > > > Interface Prio Program name Mode ID Tag
> > > > Chain actions
> > > > --------------------------------------------------------------------------------------
> > > > ens1f1 <No XDP program loaded!>
> > > >
> > > > Now, if I skip the device check in xsk_rcv_check(), I can see the
> > > > packets being received in the AF_XDP socket in the driver mode.
> > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
> > > > sock0@bond0:5 rxdrop xdp-drv poll()
> > > > pps pkts 1.00
> > > > rx 10,126,924 1,984,092,501
> > > > tx 0 0
> > > >
> > > > I am sure we would not want to skip the device check generally
> > > > especially for non-bonded devices, etc. Please guide on how to take
> > > > this further and get the issue fixed in the mainline.
> > > >
> > > > The ZC mode doesn't work. Mostly because of the problem you had
> > > > pointed out before.
> > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
> > > > xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"
> > > >
> > > >
> > > > On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
> > > > <magnus.karlsson@gmail.com> wrote:
> > > > >
> > > > > On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > >
> > > > > > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > >
> > > > > > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Thanks for your response. My comments inline.
> > > > > > > >
> > > > > > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Hi,
> > > > > > > > > >
> > > > > > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > > > > > packets from the bond interface.
> > > > > > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > > > > > input interface. However the packets received on the bond interface
> > > > > > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > > > > > Note that the socket creation does succeed.
> > > > > > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > > > > > great.
> > > > > > > > > >
> > > > > > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > > > > > >
> > > > > > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > > > > > you using?
> > > > > > > > >
> > > > > > > > I will trace the packet and get back.
> > > > > > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > > > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > >
> > > > > > > > Bonding algo is 802.3ad
> > > > > > > >
> > > > > > > > CPU is Intel Xeon Gold 3.40GHz
> > > > > > > >
> > > > > > > > NIC Driver
> > > > > > > > # ethtool -i ens1f0
> > > > > > > > driver: ixgbe
> > > > > > > > version: 5.14.0-362.13.1.el9_3
> > > > > > >
> > > > > > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > > > > > a lot of things have happened since then.
> > > > > > >
> > > > > > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> > > > >
> > > > > OK, good to know. Have you managed to trace where the packet is lost?
> > > > >
> > > > > > > > Features
> > > > > > > > # xdp-loader features ens1f0
> > > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > > >
> > > > > > > > CPU is
> > > > > > > >
> > > > > > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > > > > > mode. That's because the features are copied from the slave device.
> > > > > > > > Which explains why there is no error while binding the socket in
> > > > > > > > native/zero-copy mode.
> > > > > > >
> > > > > > > It is probably the intention that if both the bonded devices support a
> > > > > > > feature, then the bonding device will too. I just saw that the bonding
> > > > > > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > > > > > zero-copy is not really supported so that support should not be
> > > > > > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > > > > > >
> > > > > > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > > > > > > err = -EOPNOTSUPP;
> > > > > > > goto err_unreg_pool;
> > > > > > > }
> > > > > > >
> > > > > > > So there are some things needed in the bonding driver to make
> > > > > > > zero-copy work. Might not be much though. But your problem is with
> > > > > > > XDP_DRV and copy mode, so let us start there.
> > > > > > >
> > > > > > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > > > > > {
> > > > > > > > ..
> > > > > > > > bond_for_each_slave(bond, slave, iter)
> > > > > > > > val &= slave->dev->xdp_features;
> > > > > > > > xdp_set_features_flag(bond_dev, val);
> > > > > > > > }
> > > > > > > >
> > > > > > > > # ../xdp-loader/xdp-loader features bond0
> > > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > > >
> > > > > > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > > > > > are not allocating skb?
> > > > > > > > >
> > > > > > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > > > > > The packet has to be copied out to user-space then copied into the
> > > > > > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > > > > > not an option.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Just to put forth the pps numbers with the above mentioned single port
> > > > > > > > in different modes and a comparison to the bond interface.
> > > > > > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > > > > > >
> > > > > > > > Single AF_XDP sock on a single NIC queue-
> > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > > ZC 14M 65% 35%
> > > > > > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > > > > > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > > > > > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > > > > > > SKB_MODE 2.2M 100% 62% ./xdpsock
> > > > > > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > > > > > * CPU receiving the packet
> > > > > > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > > > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > > > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > > > > > run BPF in offloaded mode so I cannot compare it.
> > > > > > >
> > > > > > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > > > > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > > > > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > > > > > do the redirect in the CPU for it to work. If you want to know where
> > > > > > > time is spent use "perf top". The biggest chunk of time is spent in
> > > > > > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > > > > > >
> > > > > > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > > ═══════════════════════════════════════════════
> > > > > > > > drop, no-touch 14M 41% ./xdp-bench drop -p
> > > > > > > > no-touch ens1f0 -e
> > > > > > > > drop, read-data 14M 55% ./xdp-bench drop -p
> > > > > > > > read-data ens1f0 -e
> > > > > > > > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > > > > > > > parse-ip ens1f0 -e
> > > > > > > > * CPU receiving the packet
> > > > > > > >
> > > > > > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > > ZC X X X
> > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > > > > > > XDP_DRV/COPY X X X
> > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > > > > > > SKB_MODE 2M 100% 55% ./xdpsock
> > > > > > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > > > > > * CPU receiving the packet
> > > > > > > >
> > > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > > ═══════════════════════════════════════════════
> > > > > > > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > > > > > > > bond0 -e
> > > > > > > > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > > > > > > > read-data bond0 -e
> > > > > > > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > > > > > > > parse-ip bond0 -e
> > > > > > > > * CPU receiving the packet
> > > > > > > >
> > > > > > > >
> > > > > > > > > > Kindly share your thoughts and advice.
> > > > > > > > > >
> > > > > > > > > > Thanks,
> > > > > > > > > > Prashant
> > > > > > > > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-01-16 12:48 ` Prashant Batra
@ 2024-01-16 12:59 ` Magnus Karlsson
2024-01-17 6:07 ` Prashant Batra
0 siblings, 1 reply; 21+ messages in thread
From: Magnus Karlsson @ 2024-01-16 12:59 UTC (permalink / raw)
To: Prashant Batra; +Cc: Fijalkowski, Maciej, xdp-newbies
On Tue, 16 Jan 2024 at 13:48, Prashant Batra <prbatra.mail@gmail.com> wrote:
>
> On Mon, Jan 15, 2024 at 2:52 PM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Thu, 11 Jan 2024 at 11:41, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > >
> > > On Tue, Jan 2, 2024 at 3:27 PM Magnus Karlsson
> > > <magnus.karlsson@gmail.com> wrote:
> > > >
> > > > On Fri, 22 Dec 2023 at 12:23, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > >
> > > > > Yes, I found the place where the packet is getting dropped. The check
> > > > > for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
> > > > > The device in xs is the bond device whereas the one in xdp->rxq is the
> > > > > slave device on which the packet is received and the xdp program is
> > > > > being invoked from.
> > > > >
> > > > > static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
> > > > > {
> > > > > --
> > > > > if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> > > > > return -EINVAL;
> > > > > --
> > > > > }
> > > >
> > > > I am now back from the holidays.
> > > >
> > > > Perfect! Thank you for finding the root cause. I will rope in Maciej
> > > > and we will get back to you with a solution proposal.
> > > >
> > > Thanks, will wait for your solution.
> >
> > FYI, I do not have a good solution for this yet. The one I have is too
> > complicated for my taste. I might have to take this to the list to get
> > some new ideas on how to tackle it. So this will take longer than
> > anticipated.
> >
> Just to add that the AF_XDP TX in native mode is also not performing
> well. I am getting around 2Mpps in native mode.
That is expected though. There are only two modes for Tx: SKB mode and
zero-copy mode, and since there is no zero-copy support for the
bonding driver, it will revert to skb mode. I would expect around 3
Mpps for Tx in skb mode, so 2 Mpps seems reasonable as the bonding
driver adds overhead.
For Rx there are 3 modes: skb, XDP_DRV (which is the one you are
getting with the -N switch) and zero-copy (that is not supported by
the bonding driver).
> # ./xdpsock -t -i bond0 -N -G 0c:c4:7a:bd:13:b2 -H 0c:c4:7a:b7:5f:6c
> sock0@bond0:0 txonly xdp-drv
>
> pps pkts 1.00
> rx 0 0
> tx 2,520,587 2,521,152
>
> sock0@bond0:0 txonly xdp-drv
> pps pkts 1.00
> rx 0 0
> tx 2,362,740 4,884,352
>
> sock0@bond0:0 txonly xdp-drv
> pps pkts 1.00
> rx 0 0
> tx 1,814,437 6,698,944
>
> sock0@bond0:0 txonly xdp-drv
> pps pkts 1.00
> rx 0 0
> tx 1,817,913 8,517,120
>
> # xdp-loader status bond0
> CURRENT XDP PROGRAM STATUS:
>
> Interface Prio Program name Mode ID Tag
> Chain actions
> --------------------------------------------------------------------------------------
> bond0 xdp_dispatcher native 671 90f686eb86991928
> => 20 xsk_def_prog 680
> 8f9c40757cb0a6a2 XDP_PASS
>
> > > > > Here is the perf backtrace for the xdp_redirect_err event.
> > > > > ksoftirqd/0 14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
> > > > > action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
> > > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > ffffffffc05d0f0f ixgbe_run_xdp+0x10f
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > ffffffffc05d2da0 ixgbe_poll+0xf0
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > ffffffff873afad7 __napi_poll+0x27
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > ffffffff873affd3 net_rx_action+0x233
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > ffffffff8762ae27 __do_softirq+0xc7
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > ffffffff86b04cfe run_ksoftirqd+0x1e
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > ffffffff86b33d83 smpboot_thread_fn+0xd3
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > ffffffff86b2956d kthread+0xdd
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > ffffffff86a02289 ret_from_fork+0x29
> > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > >
> > > > > I am curious why the xdp program is invoked from the ixgbe driver
> > > > > (running for slave device) when the xdp program is actually attached
> > > > > to the bond device? Is this by design?
> > > > > # xdp-loader status bond0
> > > > > CURRENT XDP PROGRAM STATUS:
> > > > > Interface Prio Program name Mode ID Tag
> > > > > Chain actions
> > > > > --------------------------------------------------------------------------------------
> > > > > bond0 xdp_dispatcher native 64 90f686eb86991928
> > > > > => 20 xsk_def_prog 73
> > > > > 8f9c40757cb0a6a2 XDP_PASS
> > > > >
> > > > > # xdp-loader status ens1f0
> > > > > CURRENT XDP PROGRAM STATUS:
> > > > > Interface Prio Program name Mode ID Tag
> > > > > Chain actions
> > > > > --------------------------------------------------------------------------------------
> > > > > ens1f0 <No XDP program loaded!>
> > > > >
> > > > > # xdp-loader status ens1f1
> > > > > CURRENT XDP PROGRAM STATUS:
> > > > > Interface Prio Program name Mode ID Tag
> > > > > Chain actions
> > > > > --------------------------------------------------------------------------------------
> > > > > ens1f1 <No XDP program loaded!>
> > > > >
> > > > > Now, if I skip the device check in xsk_rcv_check(), I can see the
> > > > > packets being received in the AF_XDP socket in the driver mode.
> > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
> > > > > sock0@bond0:5 rxdrop xdp-drv poll()
> > > > > pps pkts 1.00
> > > > > rx 10,126,924 1,984,092,501
> > > > > tx 0 0
> > > > >
> > > > > I am sure we would not want to skip the device check generally
> > > > > especially for non-bonded devices, etc. Please guide on how to take
> > > > > this further and get the issue fixed in the mainline.
> > > > >
> > > > > The ZC mode doesn't work. Mostly because of the problem you had
> > > > > pointed out before.
> > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
> > > > > xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"
> > > > >
> > > > >
> > > > > On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
> > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > >
> > > > > > On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > >
> > > > > > > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > Thanks for your response. My comments inline.
> > > > > > > > >
> > > > > > > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Hi,
> > > > > > > > > > >
> > > > > > > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > > > > > > packets from the bond interface.
> > > > > > > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > > > > > > input interface. However the packets received on the bond interface
> > > > > > > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > > > > > > Note that the socket creation does succeed.
> > > > > > > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > > > > > > great.
> > > > > > > > > > >
> > > > > > > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > > > > > > >
> > > > > > > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > > > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > > > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > > > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > > > > > > you using?
> > > > > > > > > >
> > > > > > > > > I will trace the packet and get back.
> > > > > > > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > > > > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > >
> > > > > > > > > Bonding algo is 802.3ad
> > > > > > > > >
> > > > > > > > > CPU is Intel Xeon Gold 3.40GHz
> > > > > > > > >
> > > > > > > > > NIC Driver
> > > > > > > > > # ethtool -i ens1f0
> > > > > > > > > driver: ixgbe
> > > > > > > > > version: 5.14.0-362.13.1.el9_3
> > > > > > > >
> > > > > > > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > > > > > > a lot of things have happened since then.
> > > > > > > >
> > > > > > > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> > > > > >
> > > > > > OK, good to know. Have you managed to trace where the packet is lost?
> > > > > >
> > > > > > > > > Features
> > > > > > > > > # xdp-loader features ens1f0
> > > > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > > > >
> > > > > > > > > CPU is
> > > > > > > > >
> > > > > > > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > > > > > > mode. That's because the features are copied from the slave device.
> > > > > > > > > Which explains why there is no error while binding the socket in
> > > > > > > > > native/zero-copy mode.
> > > > > > > >
> > > > > > > > It is probably the intention that if both the bonded devices support a
> > > > > > > > feature, then the bonding device will too. I just saw that the bonding
> > > > > > > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > > > > > > zero-copy is not really supported so that support should not be
> > > > > > > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > > > > > > >
> > > > > > > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > > > > > > > err = -EOPNOTSUPP;
> > > > > > > > goto err_unreg_pool;
> > > > > > > > }
> > > > > > > >
> > > > > > > > So there are some things needed in the bonding driver to make
> > > > > > > > zero-copy work. Might not be much though. But your problem is with
> > > > > > > > XDP_DRV and copy mode, so let us start there.
> > > > > > > >
> > > > > > > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > > > > > > {
> > > > > > > > > ..
> > > > > > > > > bond_for_each_slave(bond, slave, iter)
> > > > > > > > > val &= slave->dev->xdp_features;
> > > > > > > > > xdp_set_features_flag(bond_dev, val);
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > # ../xdp-loader/xdp-loader features bond0
> > > > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > > > >
> > > > > > > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > > > > > > are not allocating skb?
> > > > > > > > > >
> > > > > > > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > > > > > > The packet has to be copied out to user-space then copied into the
> > > > > > > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > > > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > > > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > > > > > > not an option.
> > > > > > > > > >
> > > > > > > > >
> > > > > > > > > Just to put forth the pps numbers with the above mentioned single port
> > > > > > > > > in different modes and a comparison to the bond interface.
> > > > > > > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > > > > > > >
> > > > > > > > > Single AF_XDP sock on a single NIC queue-
> > > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > > > ZC 14M 65% 35%
> > > > > > > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > > > > > > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > > > > > > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > > > > > > > SKB_MODE 2.2M 100% 62% ./xdpsock
> > > > > > > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > > > > > > * CPU receiving the packet
> > > > > > > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > > > > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > > > > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > > > > > > run BPF in offloaded mode so I cannot compare it.
> > > > > > > >
> > > > > > > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > > > > > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > > > > > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > > > > > > do the redirect in the CPU for it to work. If you want to know where
> > > > > > > > time is spent use "perf top". The biggest chunk of time is spent in
> > > > > > > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > > > > > > >
> > > > > > > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > > > ═══════════════════════════════════════════════
> > > > > > > > > drop, no-touch 14M 41% ./xdp-bench drop -p
> > > > > > > > > no-touch ens1f0 -e
> > > > > > > > > drop, read-data 14M 55% ./xdp-bench drop -p
> > > > > > > > > read-data ens1f0 -e
> > > > > > > > > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > > > > > > > > parse-ip ens1f0 -e
> > > > > > > > > * CPU receiving the packet
> > > > > > > > >
> > > > > > > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > > > ZC X X X
> > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > > > > > > > XDP_DRV/COPY X X X
> > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > > > > > > > SKB_MODE 2M 100% 55% ./xdpsock
> > > > > > > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > > > > > > * CPU receiving the packet
> > > > > > > > >
> > > > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > > > ═══════════════════════════════════════════════
> > > > > > > > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > > > > > > > > bond0 -e
> > > > > > > > > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > > > > > > > > read-data bond0 -e
> > > > > > > > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > > > > > > > > parse-ip bond0 -e
> > > > > > > > > * CPU receiving the packet
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > > Kindly share your thoughts and advice.
> > > > > > > > > > >
> > > > > > > > > > > Thanks,
> > > > > > > > > > > Prashant
> > > > > > > > > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-01-16 12:59 ` Magnus Karlsson
@ 2024-01-17 6:07 ` Prashant Batra
2024-01-17 7:41 ` Magnus Karlsson
0 siblings, 1 reply; 21+ messages in thread
From: Prashant Batra @ 2024-01-17 6:07 UTC (permalink / raw)
To: Magnus Karlsson; +Cc: Fijalkowski, Maciej, xdp-newbies
On Tue, Jan 16, 2024 at 6:29 PM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Tue, 16 Jan 2024 at 13:48, Prashant Batra <prbatra.mail@gmail.com> wrote:
> >
> > On Mon, Jan 15, 2024 at 2:52 PM Magnus Karlsson
> > <magnus.karlsson@gmail.com> wrote:
> > >
> > > On Thu, 11 Jan 2024 at 11:41, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > >
> > > > On Tue, Jan 2, 2024 at 3:27 PM Magnus Karlsson
> > > > <magnus.karlsson@gmail.com> wrote:
> > > > >
> > > > > On Fri, 22 Dec 2023 at 12:23, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > >
> > > > > > Yes, I found the place where the packet is getting dropped. The check
> > > > > > for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
> > > > > > The device in xs is the bond device whereas the one in xdp->rxq is the
> > > > > > slave device on which the packet is received and the xdp program is
> > > > > > being invoked from.
> > > > > >
> > > > > > static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
> > > > > > {
> > > > > > --
> > > > > > if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> > > > > > return -EINVAL;
> > > > > > --
> > > > > > }
> > > > >
> > > > > I am now back from the holidays.
> > > > >
> > > > > Perfect! Thank you for finding the root cause. I will rope in Maciej
> > > > > and we will get back to you with a solution proposal.
> > > > >
> > > > Thanks, will wait for your solution.
> > >
> > > FYI, I do not have a good solution for this yet. The one I have is too
> > > complicated for my taste. I might have to take this to the list to get
> > > some new ideas on how to tackle it. So this will take longer than
> > > anticipated.
> > >
> > Just to add that the AF_XDP TX in native mode is also not performing
> > well. I am getting around 2Mpps in native mode.
>
> That is expected though. There are only two modes for Tx: SKB mode and
> zero-copy mode, and since there is no zero-copy support for the
> bonding driver, it will revert to skb mode. I would expect around 3
> Mpps for Tx in skb mode, so 2 Mpps seems reasonable as the bonding
> driver adds overhead.
>
> For Rx there are 3 modes: skb, XDP_DRV (which is the one you are
> getting with the -N switch) and zero-copy (that is not supported by
> the bonding driver).
>
Thanks for quick info. So, when you provide the fix for the bond
driver, can we expect the bond-driver to be able to support ZC in the
Tx mode (and Rx mode) or will the Tx remain in SKB mode? At 2M pps,
it's a big gap in Rx and Tx and practically leaves xdp not much useful
with bond devices.
I also see a gap in Rx vs Tx for veth drivers-
In this below topology, I see AF_XDP TX to a veth device (veth1) is
not going beyond 1.2Mpps, The xdp program on veth2 redirects packet to
phy device ens1f0. I would assume based on your explanation above,
that this too is working in SKB mode, and hence the lower performance.
veth1 (AF_XDP Tx) -> veth2 (xdp) -> ens1f0
However in the reverse direction shown below, I can receive close to
5M pps on AF_XDP socket.
ens1f0 (xdp) ->veth2 -> veth1 (AF_XDP Rx)
Looking at the results here-
https://patchwork.ozlabs.org/project/netdev/cover/1533283098-2397-1-git-send-email-makita.toshiaki@lab.ntt.co.jp/
, I don't seem to find the benchmark which would validate my AF_XDP Rx
and Tx results with veth devices. The xdp DROP test results do match
with my tests though.
> > # ./xdpsock -t -i bond0 -N -G 0c:c4:7a:bd:13:b2 -H 0c:c4:7a:b7:5f:6c
> > sock0@bond0:0 txonly xdp-drv
> >
> > pps pkts 1.00
> > rx 0 0
> > tx 2,520,587 2,521,152
> >
> > sock0@bond0:0 txonly xdp-drv
> > pps pkts 1.00
> > rx 0 0
> > tx 2,362,740 4,884,352
> >
> > sock0@bond0:0 txonly xdp-drv
> > pps pkts 1.00
> > rx 0 0
> > tx 1,814,437 6,698,944
> >
> > sock0@bond0:0 txonly xdp-drv
> > pps pkts 1.00
> > rx 0 0
> > tx 1,817,913 8,517,120
> >
> > # xdp-loader status bond0
> > CURRENT XDP PROGRAM STATUS:
> >
> > Interface Prio Program name Mode ID Tag
> > Chain actions
> > --------------------------------------------------------------------------------------
> > bond0 xdp_dispatcher native 671 90f686eb86991928
> > => 20 xsk_def_prog 680
> > 8f9c40757cb0a6a2 XDP_PASS
> >
> > > > > > Here is the perf backtrace for the xdp_redirect_err event.
> > > > > > ksoftirqd/0 14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
> > > > > > action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
> > > > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > ffffffffc05d0f0f ixgbe_run_xdp+0x10f
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > > ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > > ffffffffc05d2da0 ixgbe_poll+0xf0
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > > ffffffff873afad7 __napi_poll+0x27
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > ffffffff873affd3 net_rx_action+0x233
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > ffffffff8762ae27 __do_softirq+0xc7
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > ffffffff86b04cfe run_ksoftirqd+0x1e
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > ffffffff86b33d83 smpboot_thread_fn+0xd3
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > ffffffff86b2956d kthread+0xdd
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > ffffffff86a02289 ret_from_fork+0x29
> > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > >
> > > > > > I am curious why the xdp program is invoked from the ixgbe driver
> > > > > > (running for slave device) when the xdp program is actually attached
> > > > > > to the bond device? Is this by design?
> > > > > > # xdp-loader status bond0
> > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > Interface Prio Program name Mode ID Tag
> > > > > > Chain actions
> > > > > > --------------------------------------------------------------------------------------
> > > > > > bond0 xdp_dispatcher native 64 90f686eb86991928
> > > > > > => 20 xsk_def_prog 73
> > > > > > 8f9c40757cb0a6a2 XDP_PASS
> > > > > >
> > > > > > # xdp-loader status ens1f0
> > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > Interface Prio Program name Mode ID Tag
> > > > > > Chain actions
> > > > > > --------------------------------------------------------------------------------------
> > > > > > ens1f0 <No XDP program loaded!>
> > > > > >
> > > > > > # xdp-loader status ens1f1
> > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > Interface Prio Program name Mode ID Tag
> > > > > > Chain actions
> > > > > > --------------------------------------------------------------------------------------
> > > > > > ens1f1 <No XDP program loaded!>
> > > > > >
> > > > > > Now, if I skip the device check in xsk_rcv_check(), I can see the
> > > > > > packets being received in the AF_XDP socket in the driver mode.
> > > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
> > > > > > sock0@bond0:5 rxdrop xdp-drv poll()
> > > > > > pps pkts 1.00
> > > > > > rx 10,126,924 1,984,092,501
> > > > > > tx 0 0
> > > > > >
> > > > > > I am sure we would not want to skip the device check generally
> > > > > > especially for non-bonded devices, etc. Please guide on how to take
> > > > > > this further and get the issue fixed in the mainline.
> > > > > >
> > > > > > The ZC mode doesn't work. Mostly because of the problem you had
> > > > > > pointed out before.
> > > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
> > > > > > xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"
> > > > > >
> > > > > >
> > > > > > On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
> > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > >
> > > > > > > On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > Thanks for your response. My comments inline.
> > > > > > > > > >
> > > > > > > > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Hi,
> > > > > > > > > > > >
> > > > > > > > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > > > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > > > > > > > packets from the bond interface.
> > > > > > > > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > > > > > > > input interface. However the packets received on the bond interface
> > > > > > > > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > > > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > > > > > > > Note that the socket creation does succeed.
> > > > > > > > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > > > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > > > > > > > great.
> > > > > > > > > > > >
> > > > > > > > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > > > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > > > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > > > > > > > >
> > > > > > > > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > > > > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > > > > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > > > > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > > > > > > > you using?
> > > > > > > > > > >
> > > > > > > > > > I will trace the packet and get back.
> > > > > > > > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > > > > > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > > >
> > > > > > > > > > Bonding algo is 802.3ad
> > > > > > > > > >
> > > > > > > > > > CPU is Intel Xeon Gold 3.40GHz
> > > > > > > > > >
> > > > > > > > > > NIC Driver
> > > > > > > > > > # ethtool -i ens1f0
> > > > > > > > > > driver: ixgbe
> > > > > > > > > > version: 5.14.0-362.13.1.el9_3
> > > > > > > > >
> > > > > > > > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > > > > > > > a lot of things have happened since then.
> > > > > > > > >
> > > > > > > > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> > > > > > >
> > > > > > > OK, good to know. Have you managed to trace where the packet is lost?
> > > > > > >
> > > > > > > > > > Features
> > > > > > > > > > # xdp-loader features ens1f0
> > > > > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > > > > >
> > > > > > > > > > CPU is
> > > > > > > > > >
> > > > > > > > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > > > > > > > mode. That's because the features are copied from the slave device.
> > > > > > > > > > Which explains why there is no error while binding the socket in
> > > > > > > > > > native/zero-copy mode.
> > > > > > > > >
> > > > > > > > > It is probably the intention that if both the bonded devices support a
> > > > > > > > > feature, then the bonding device will too. I just saw that the bonding
> > > > > > > > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > > > > > > > zero-copy is not really supported so that support should not be
> > > > > > > > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > > > > > > > >
> > > > > > > > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > > > > > > > > err = -EOPNOTSUPP;
> > > > > > > > > goto err_unreg_pool;
> > > > > > > > > }
> > > > > > > > >
> > > > > > > > > So there are some things needed in the bonding driver to make
> > > > > > > > > zero-copy work. Might not be much though. But your problem is with
> > > > > > > > > XDP_DRV and copy mode, so let us start there.
> > > > > > > > >
> > > > > > > > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > > > > > > > {
> > > > > > > > > > ..
> > > > > > > > > > bond_for_each_slave(bond, slave, iter)
> > > > > > > > > > val &= slave->dev->xdp_features;
> > > > > > > > > > xdp_set_features_flag(bond_dev, val);
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > # ../xdp-loader/xdp-loader features bond0
> > > > > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > > > > >
> > > > > > > > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > > > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > > > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > > > > > > > are not allocating skb?
> > > > > > > > > > >
> > > > > > > > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > > > > > > > The packet has to be copied out to user-space then copied into the
> > > > > > > > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > > > > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > > > > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > > > > > > > not an option.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Just to put forth the pps numbers with the above mentioned single port
> > > > > > > > > > in different modes and a comparison to the bond interface.
> > > > > > > > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > > > > > > > >
> > > > > > > > > > Single AF_XDP sock on a single NIC queue-
> > > > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > > > > ZC 14M 65% 35%
> > > > > > > > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > > > > > > > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > > > > > > > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > > > > > > > > SKB_MODE 2.2M 100% 62% ./xdpsock
> > > > > > > > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > > > > > > > * CPU receiving the packet
> > > > > > > > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > > > > > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > > > > > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > > > > > > > run BPF in offloaded mode so I cannot compare it.
> > > > > > > > >
> > > > > > > > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > > > > > > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > > > > > > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > > > > > > > do the redirect in the CPU for it to work. If you want to know where
> > > > > > > > > time is spent use "perf top". The biggest chunk of time is spent in
> > > > > > > > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > > > > > > > >
> > > > > > > > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > > > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > > > > ═══════════════════════════════════════════════
> > > > > > > > > > drop, no-touch 14M 41% ./xdp-bench drop -p
> > > > > > > > > > no-touch ens1f0 -e
> > > > > > > > > > drop, read-data 14M 55% ./xdp-bench drop -p
> > > > > > > > > > read-data ens1f0 -e
> > > > > > > > > > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > > > > > > > > > parse-ip ens1f0 -e
> > > > > > > > > > * CPU receiving the packet
> > > > > > > > > >
> > > > > > > > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > > > > ZC X X X
> > > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > > > > > > > > XDP_DRV/COPY X X X
> > > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > > > > > > > > SKB_MODE 2M 100% 55% ./xdpsock
> > > > > > > > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > > > > > > > * CPU receiving the packet
> > > > > > > > > >
> > > > > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > > > > ═══════════════════════════════════════════════
> > > > > > > > > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > > > > > > > > > bond0 -e
> > > > > > > > > > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > > > > > > > > > read-data bond0 -e
> > > > > > > > > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > > > > > > > > > parse-ip bond0 -e
> > > > > > > > > > * CPU receiving the packet
> > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > > > Kindly share your thoughts and advice.
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks,
> > > > > > > > > > > > Prashant
> > > > > > > > > > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-01-17 6:07 ` Prashant Batra
@ 2024-01-17 7:41 ` Magnus Karlsson
2024-01-19 12:43 ` Prashant Batra
0 siblings, 1 reply; 21+ messages in thread
From: Magnus Karlsson @ 2024-01-17 7:41 UTC (permalink / raw)
To: Prashant Batra; +Cc: Fijalkowski, Maciej, xdp-newbies
On Wed, 17 Jan 2024 at 07:07, Prashant Batra <prbatra.mail@gmail.com> wrote:
>
> On Tue, Jan 16, 2024 at 6:29 PM Magnus Karlsson
> <magnus.karlsson@gmail.com> wrote:
> >
> > On Tue, 16 Jan 2024 at 13:48, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > >
> > > On Mon, Jan 15, 2024 at 2:52 PM Magnus Karlsson
> > > <magnus.karlsson@gmail.com> wrote:
> > > >
> > > > On Thu, 11 Jan 2024 at 11:41, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > >
> > > > > On Tue, Jan 2, 2024 at 3:27 PM Magnus Karlsson
> > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > >
> > > > > > On Fri, 22 Dec 2023 at 12:23, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > >
> > > > > > > Yes, I found the place where the packet is getting dropped. The check
> > > > > > > for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
> > > > > > > The device in xs is the bond device whereas the one in xdp->rxq is the
> > > > > > > slave device on which the packet is received and the xdp program is
> > > > > > > being invoked from.
> > > > > > >
> > > > > > > static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
> > > > > > > {
> > > > > > > --
> > > > > > > if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> > > > > > > return -EINVAL;
> > > > > > > --
> > > > > > > }
> > > > > >
> > > > > > I am now back from the holidays.
> > > > > >
> > > > > > Perfect! Thank you for finding the root cause. I will rope in Maciej
> > > > > > and we will get back to you with a solution proposal.
> > > > > >
> > > > > Thanks, will wait for your solution.
> > > >
> > > > FYI, I do not have a good solution for this yet. The one I have is too
> > > > complicated for my taste. I might have to take this to the list to get
> > > > some new ideas on how to tackle it. So this will take longer than
> > > > anticipated.
> > > >
> > > Just to add that the AF_XDP TX in native mode is also not performing
> > > well. I am getting around 2Mpps in native mode.
> >
> > That is expected though. There are only two modes for Tx: SKB mode and
> > zero-copy mode, and since there is no zero-copy support for the
> > bonding driver, it will revert to skb mode. I would expect around 3
> > Mpps for Tx in skb mode, so 2 Mpps seems reasonable as the bonding
> > driver adds overhead.
> >
> > For Rx there are 3 modes: skb, XDP_DRV (which is the one you are
> > getting with the -N switch) and zero-copy (that is not supported by
> > the bonding driver).
> >
> Thanks for quick info. So, when you provide the fix for the bond
> driver, can we expect the bond-driver to be able to support ZC in the
> Tx mode (and Rx mode) or will the Tx remain in SKB mode? At 2M pps,
> it's a big gap in Rx and Tx and practically leaves xdp not much useful
> with bond devices.
Personally, I do not have the time right now to implement ZC for the
bonding driver. Once I have posted the fix (coding it up right now),
send a mail to the netdev mailing list with the bonding maintainers on
the to-line and state that you are interested in this functionality
and ask if there are any other people interested in it. Or maybe you
would like to implement it :-)?
> I also see a gap in Rx vs Tx for veth drivers-
> In this below topology, I see AF_XDP TX to a veth device (veth1) is
> not going beyond 1.2Mpps, The xdp program on veth2 redirects packet to
> phy device ens1f0. I would assume based on your explanation above,
> that this too is working in SKB mode, and hence the lower performance.
> veth1 (AF_XDP Tx) -> veth2 (xdp) -> ens1f0
Correct. There is no zero-copy for veth either.
> However in the reverse direction shown below, I can receive close to
> 5M pps on AF_XDP socket.
> ens1f0 (xdp) ->veth2 -> veth1 (AF_XDP Rx)
Yes, since it is using XDP_DRV mode without zero-copy.
> Looking at the results here-
> https://patchwork.ozlabs.org/project/netdev/cover/1533283098-2397-1-git-send-email-makita.toshiaki@lab.ntt.co.jp/
> , I don't seem to find the benchmark which would validate my AF_XDP Rx
> and Tx results with veth devices. The xdp DROP test results do match
> with my tests though.
Your numbers look reasonable. Just note that veth is not fast. If you
want to have a faster veth, you might want to take a look at the new
netkit device.
> > > # ./xdpsock -t -i bond0 -N -G 0c:c4:7a:bd:13:b2 -H 0c:c4:7a:b7:5f:6c
> > > sock0@bond0:0 txonly xdp-drv
> > >
> > > pps pkts 1.00
> > > rx 0 0
> > > tx 2,520,587 2,521,152
> > >
> > > sock0@bond0:0 txonly xdp-drv
> > > pps pkts 1.00
> > > rx 0 0
> > > tx 2,362,740 4,884,352
> > >
> > > sock0@bond0:0 txonly xdp-drv
> > > pps pkts 1.00
> > > rx 0 0
> > > tx 1,814,437 6,698,944
> > >
> > > sock0@bond0:0 txonly xdp-drv
> > > pps pkts 1.00
> > > rx 0 0
> > > tx 1,817,913 8,517,120
> > >
> > > # xdp-loader status bond0
> > > CURRENT XDP PROGRAM STATUS:
> > >
> > > Interface Prio Program name Mode ID Tag
> > > Chain actions
> > > --------------------------------------------------------------------------------------
> > > bond0 xdp_dispatcher native 671 90f686eb86991928
> > > => 20 xsk_def_prog 680
> > > 8f9c40757cb0a6a2 XDP_PASS
> > >
> > > > > > > Here is the perf backtrace for the xdp_redirect_err event.
> > > > > > > ksoftirqd/0 14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
> > > > > > > action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
> > > > > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > ffffffffc05d0f0f ixgbe_run_xdp+0x10f
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > > > ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > > > ffffffffc05d2da0 ixgbe_poll+0xf0
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > > > ffffffff873afad7 __napi_poll+0x27
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > ffffffff873affd3 net_rx_action+0x233
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > ffffffff8762ae27 __do_softirq+0xc7
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > ffffffff86b04cfe run_ksoftirqd+0x1e
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > ffffffff86b33d83 smpboot_thread_fn+0xd3
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > ffffffff86b2956d kthread+0xdd
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > ffffffff86a02289 ret_from_fork+0x29
> > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > >
> > > > > > > I am curious why the xdp program is invoked from the ixgbe driver
> > > > > > > (running for slave device) when the xdp program is actually attached
> > > > > > > to the bond device? Is this by design?
> > > > > > > # xdp-loader status bond0
> > > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > > Interface Prio Program name Mode ID Tag
> > > > > > > Chain actions
> > > > > > > --------------------------------------------------------------------------------------
> > > > > > > bond0 xdp_dispatcher native 64 90f686eb86991928
> > > > > > > => 20 xsk_def_prog 73
> > > > > > > 8f9c40757cb0a6a2 XDP_PASS
> > > > > > >
> > > > > > > # xdp-loader status ens1f0
> > > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > > Interface Prio Program name Mode ID Tag
> > > > > > > Chain actions
> > > > > > > --------------------------------------------------------------------------------------
> > > > > > > ens1f0 <No XDP program loaded!>
> > > > > > >
> > > > > > > # xdp-loader status ens1f1
> > > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > > Interface Prio Program name Mode ID Tag
> > > > > > > Chain actions
> > > > > > > --------------------------------------------------------------------------------------
> > > > > > > ens1f1 <No XDP program loaded!>
> > > > > > >
> > > > > > > Now, if I skip the device check in xsk_rcv_check(), I can see the
> > > > > > > packets being received in the AF_XDP socket in the driver mode.
> > > > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
> > > > > > > sock0@bond0:5 rxdrop xdp-drv poll()
> > > > > > > pps pkts 1.00
> > > > > > > rx 10,126,924 1,984,092,501
> > > > > > > tx 0 0
> > > > > > >
> > > > > > > I am sure we would not want to skip the device check generally
> > > > > > > especially for non-bonded devices, etc. Please guide on how to take
> > > > > > > this further and get the issue fixed in the mainline.
> > > > > > >
> > > > > > > The ZC mode doesn't work. Mostly because of the problem you had
> > > > > > > pointed out before.
> > > > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
> > > > > > > xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"
> > > > > > >
> > > > > > >
> > > > > > > On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
> > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > >
> > > > > > > > On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > > > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > Thanks for your response. My comments inline.
> > > > > > > > > > >
> > > > > > > > > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > > > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > Hi,
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > > > > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > > > > > > > > packets from the bond interface.
> > > > > > > > > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > > > > > > > > input interface. However the packets received on the bond interface
> > > > > > > > > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > > > > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > > > > > > > > Note that the socket creation does succeed.
> > > > > > > > > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > > > > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > > > > > > > > great.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > > > > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > > > > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > > > > > > > > >
> > > > > > > > > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > > > > > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > > > > > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > > > > > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > > > > > > > > you using?
> > > > > > > > > > > >
> > > > > > > > > > > I will trace the packet and get back.
> > > > > > > > > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > > > > > > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > > > >
> > > > > > > > > > > Bonding algo is 802.3ad
> > > > > > > > > > >
> > > > > > > > > > > CPU is Intel Xeon Gold 3.40GHz
> > > > > > > > > > >
> > > > > > > > > > > NIC Driver
> > > > > > > > > > > # ethtool -i ens1f0
> > > > > > > > > > > driver: ixgbe
> > > > > > > > > > > version: 5.14.0-362.13.1.el9_3
> > > > > > > > > >
> > > > > > > > > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > > > > > > > > a lot of things have happened since then.
> > > > > > > > > >
> > > > > > > > > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> > > > > > > >
> > > > > > > > OK, good to know. Have you managed to trace where the packet is lost?
> > > > > > > >
> > > > > > > > > > > Features
> > > > > > > > > > > # xdp-loader features ens1f0
> > > > > > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > > > > > >
> > > > > > > > > > > CPU is
> > > > > > > > > > >
> > > > > > > > > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > > > > > > > > mode. That's because the features are copied from the slave device.
> > > > > > > > > > > Which explains why there is no error while binding the socket in
> > > > > > > > > > > native/zero-copy mode.
> > > > > > > > > >
> > > > > > > > > > It is probably the intention that if both the bonded devices support a
> > > > > > > > > > feature, then the bonding device will too. I just saw that the bonding
> > > > > > > > > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > > > > > > > > zero-copy is not really supported so that support should not be
> > > > > > > > > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > > > > > > > > >
> > > > > > > > > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > > > > > > > > > err = -EOPNOTSUPP;
> > > > > > > > > > goto err_unreg_pool;
> > > > > > > > > > }
> > > > > > > > > >
> > > > > > > > > > So there are some things needed in the bonding driver to make
> > > > > > > > > > zero-copy work. Might not be much though. But your problem is with
> > > > > > > > > > XDP_DRV and copy mode, so let us start there.
> > > > > > > > > >
> > > > > > > > > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > > > > > > > > {
> > > > > > > > > > > ..
> > > > > > > > > > > bond_for_each_slave(bond, slave, iter)
> > > > > > > > > > > val &= slave->dev->xdp_features;
> > > > > > > > > > > xdp_set_features_flag(bond_dev, val);
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > # ../xdp-loader/xdp-loader features bond0
> > > > > > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > > > > > >
> > > > > > > > > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > > > > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > > > > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > > > > > > > > are not allocating skb?
> > > > > > > > > > > >
> > > > > > > > > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > > > > > > > > The packet has to be copied out to user-space then copied into the
> > > > > > > > > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > > > > > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > > > > > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > > > > > > > > not an option.
> > > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > Just to put forth the pps numbers with the above mentioned single port
> > > > > > > > > > > in different modes and a comparison to the bond interface.
> > > > > > > > > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > > > > > > > > >
> > > > > > > > > > > Single AF_XDP sock on a single NIC queue-
> > > > > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > > > > > ZC 14M 65% 35%
> > > > > > > > > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > > > > > > > > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > > > > > > > > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > > > > > > > > > SKB_MODE 2.2M 100% 62% ./xdpsock
> > > > > > > > > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > > > > > > > > * CPU receiving the packet
> > > > > > > > > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > > > > > > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > > > > > > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > > > > > > > > run BPF in offloaded mode so I cannot compare it.
> > > > > > > > > >
> > > > > > > > > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > > > > > > > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > > > > > > > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > > > > > > > > do the redirect in the CPU for it to work. If you want to know where
> > > > > > > > > > time is spent use "perf top". The biggest chunk of time is spent in
> > > > > > > > > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > > > > > > > > >
> > > > > > > > > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > > > > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > > > > > ═══════════════════════════════════════════════
> > > > > > > > > > > drop, no-touch 14M 41% ./xdp-bench drop -p
> > > > > > > > > > > no-touch ens1f0 -e
> > > > > > > > > > > drop, read-data 14M 55% ./xdp-bench drop -p
> > > > > > > > > > > read-data ens1f0 -e
> > > > > > > > > > > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > > > > > > > > > > parse-ip ens1f0 -e
> > > > > > > > > > > * CPU receiving the packet
> > > > > > > > > > >
> > > > > > > > > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > > > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > > > > > ZC X X X
> > > > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > > > > > > > > > XDP_DRV/COPY X X X
> > > > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > > > > > > > > > SKB_MODE 2M 100% 55% ./xdpsock
> > > > > > > > > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > > > > > > > > * CPU receiving the packet
> > > > > > > > > > >
> > > > > > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > > > > > ═══════════════════════════════════════════════
> > > > > > > > > > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > > > > > > > > > > bond0 -e
> > > > > > > > > > > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > > > > > > > > > > read-data bond0 -e
> > > > > > > > > > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > > > > > > > > > > parse-ip bond0 -e
> > > > > > > > > > > * CPU receiving the packet
> > > > > > > > > > >
> > > > > > > > > > >
> > > > > > > > > > > > > Kindly share your thoughts and advice.
> > > > > > > > > > > > >
> > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > Prashant
> > > > > > > > > > > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-01-17 7:41 ` Magnus Karlsson
@ 2024-01-19 12:43 ` Prashant Batra
2024-01-19 13:04 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 21+ messages in thread
From: Prashant Batra @ 2024-01-19 12:43 UTC (permalink / raw)
To: Magnus Karlsson; +Cc: Fijalkowski, Maciej, xdp-newbies
On Wed, Jan 17, 2024 at 1:11 PM Magnus Karlsson
<magnus.karlsson@gmail.com> wrote:
>
> On Wed, 17 Jan 2024 at 07:07, Prashant Batra <prbatra.mail@gmail.com> wrote:
> >
> > On Tue, Jan 16, 2024 at 6:29 PM Magnus Karlsson
> > <magnus.karlsson@gmail.com> wrote:
> > >
> > > On Tue, 16 Jan 2024 at 13:48, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > >
> > > > On Mon, Jan 15, 2024 at 2:52 PM Magnus Karlsson
> > > > <magnus.karlsson@gmail.com> wrote:
> > > > >
> > > > > On Thu, 11 Jan 2024 at 11:41, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Jan 2, 2024 at 3:27 PM Magnus Karlsson
> > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > >
> > > > > > > On Fri, 22 Dec 2023 at 12:23, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > >
> > > > > > > > Yes, I found the place where the packet is getting dropped. The check
> > > > > > > > for device match b/w xs and xdp->rxq is failing in xsk_rcv_check() .
> > > > > > > > The device in xs is the bond device whereas the one in xdp->rxq is the
> > > > > > > > slave device on which the packet is received and the xdp program is
> > > > > > > > being invoked from.
> > > > > > > >
> > > > > > > > static int xsk_rcv_check(struct xdp_sock *xs, struct xdp_buff *xdp)
> > > > > > > > {
> > > > > > > > --
> > > > > > > > if (xs->dev != xdp->rxq->dev || xs->queue_id != xdp->rxq->queue_index)
> > > > > > > > return -EINVAL;
> > > > > > > > --
> > > > > > > > }
> > > > > > >
> > > > > > > I am now back from the holidays.
> > > > > > >
> > > > > > > Perfect! Thank you for finding the root cause. I will rope in Maciej
> > > > > > > and we will get back to you with a solution proposal.
> > > > > > >
> > > > > > Thanks, will wait for your solution.
> > > > >
> > > > > FYI, I do not have a good solution for this yet. The one I have is too
> > > > > complicated for my taste. I might have to take this to the list to get
> > > > > some new ideas on how to tackle it. So this will take longer than
> > > > > anticipated.
> > > > >
> > > > Just to add that the AF_XDP TX in native mode is also not performing
> > > > well. I am getting around 2Mpps in native mode.
> > >
> > > That is expected though. There are only two modes for Tx: SKB mode and
> > > zero-copy mode, and since there is no zero-copy support for the
> > > bonding driver, it will revert to skb mode. I would expect around 3
> > > Mpps for Tx in skb mode, so 2 Mpps seems reasonable as the bonding
> > > driver adds overhead.
> > >
> > > For Rx there are 3 modes: skb, XDP_DRV (which is the one you are
> > > getting with the -N switch) and zero-copy (that is not supported by
> > > the bonding driver).
> > >
> > Thanks for quick info. So, when you provide the fix for the bond
> > driver, can we expect the bond-driver to be able to support ZC in the
> > Tx mode (and Rx mode) or will the Tx remain in SKB mode? At 2M pps,
> > it's a big gap in Rx and Tx and practically leaves xdp not much useful
> > with bond devices.
>
> Personally, I do not have the time right now to implement ZC for the
> bonding driver. Once I have posted the fix (coding it up right now),
> send a mail to the netdev mailing list with the bonding maintainers on
> the to-line and state that you are interested in this functionality
> and ask if there are any other people interested in it. Or maybe you
> would like to implement it :-)?
I will need to understand the current ZC design and its implications
on bonding driver, both of which I will be new to. So I will probably
follow your suggestion on going to the netdev first.
>
> > I also see a gap in Rx vs Tx for veth drivers-
> > In this below topology, I see AF_XDP TX to a veth device (veth1) is
> > not going beyond 1.2Mpps, The xdp program on veth2 redirects packet to
> > phy device ens1f0. I would assume based on your explanation above,
> > that this too is working in SKB mode, and hence the lower performance.
> > veth1 (AF_XDP Tx) -> veth2 (xdp) -> ens1f0
>
> Correct. There is no zero-copy for veth either.
>
> > However in the reverse direction shown below, I can receive close to
> > 5M pps on AF_XDP socket.
> > ens1f0 (xdp) ->veth2 -> veth1 (AF_XDP Rx)
>
> Yes, since it is using XDP_DRV mode without zero-copy.
>
> > Looking at the results here-
> > https://patchwork.ozlabs.org/project/netdev/cover/1533283098-2397-1-git-send-email-makita.toshiaki@lab.ntt.co.jp/
> > , I don't seem to find the benchmark which would validate my AF_XDP Rx
> > and Tx results with veth devices. The xdp DROP test results do match
> > with my tests though.
>
> Your numbers look reasonable. Just note that veth is not fast. If you
> want to have a faster veth, you might want to take a look at the new
> netkit device.
>
Just to get your expert opinion on this, I am sharing at a very high
level what my objectives are-
For Rx handling:
Demultiplex the packets received on the physical/bond interface based
on the packet's src + dst combination and direct it to the process
(Pn) handling that src + dst packets.
a) One way of doing this is to pick all packets in userspace in a
central process (Pc) using AF_XDP socket (ZC with phy device and
XDP_DRV with bond) and then Pc passes it using shared memory to the
correct process (Pn) for further handling.
ens1f0/bond (AF_XDP) -> Pc -> Pn
b) The other way is to run a xdp code attached to the phy/bond
device which based on the src + dst redirects the packets to the veth
pair (one per process Pn). The packet is then forwarded to the other
end of the veth-pair over which there is an AF_XDP socket opened.The
advantage here is that the packet demultiplex happens in the kernel
and there is no Pc process needed.
ens1f0/bond (XDP_REDIRECT_IF) ->vethext -> vethint ( AF_XDP) -> Pn
For Tx handling:
Similar to Rx, here the packets coming from each process Pn need to be
sent out of the phy/bond device.
a) Again one way is to have a central process Pc which opens
AF_XDP sockets on the phy/bond device and multiplexes packets coming
from each Pn process (passed over shared memory).
Pn -> Pc -> ens1f0/bond (AF_XDP)
b) Use a veth pair device over which the process Pn creates an
AF_XDP socket. The packets written to the veth device are then
redirected using a xdp program attached to the other end of the veth's
pair device to the phy/bond device.
Pn -> (AF_XDP) vethint -> vethext (XDP_REDIRECT_IF) -> ens1f0/bond.
Given the limitations of veth on the Tx path, another idea I am
exploring is to have two categories of process Pn(s). The first
category of process Pn that are to handle large pps, open AF_XDP
socket directly on the phy device's queue-N (again each such process
gets a dedicated queue and the corresponding 4 rings) to get ZC
benefit. The second category of process Pn with a lower pps
requirement to use the veth way (as we can't have so many queues to
dedicate to each process Pn).
1st category - Pn (AF_XDP) -> ens1f0
2nd category - As demonstrated above in Tx (b).
Note that at the moment my requirement has nothing to do with
containers. veth-pairs are used purely to segregate packets belonging
to different process Pn without needing the central process Pc.
Regarding netkit, I don't see much documentation or samples available
except the selftest available in the kernel code. If you have or know
of some samples that might fit in the above requirements that will
surely help. Does it fit in the above requirement?
Also, it would be really helpful if you can point to any other better
way to achieve the above set of objectives.
Thanks,
Prashant
> > > > # ./xdpsock -t -i bond0 -N -G 0c:c4:7a:bd:13:b2 -H 0c:c4:7a:b7:5f:6c
> > > > sock0@bond0:0 txonly xdp-drv
> > > >
> > > > pps pkts 1.00
> > > > rx 0 0
> > > > tx 2,520,587 2,521,152
> > > >
> > > > sock0@bond0:0 txonly xdp-drv
> > > > pps pkts 1.00
> > > > rx 0 0
> > > > tx 2,362,740 4,884,352
> > > >
> > > > sock0@bond0:0 txonly xdp-drv
> > > > pps pkts 1.00
> > > > rx 0 0
> > > > tx 1,814,437 6,698,944
> > > >
> > > > sock0@bond0:0 txonly xdp-drv
> > > > pps pkts 1.00
> > > > rx 0 0
> > > > tx 1,817,913 8,517,120
> > > >
> > > > # xdp-loader status bond0
> > > > CURRENT XDP PROGRAM STATUS:
> > > >
> > > > Interface Prio Program name Mode ID Tag
> > > > Chain actions
> > > > --------------------------------------------------------------------------------------
> > > > bond0 xdp_dispatcher native 671 90f686eb86991928
> > > > => 20 xsk_def_prog 680
> > > > 8f9c40757cb0a6a2 XDP_PASS
> > > >
> > > > > > > > Here is the perf backtrace for the xdp_redirect_err event.
> > > > > > > > ksoftirqd/0 14 [000] 10956.235960: xdp:xdp_redirect_err: prog_id=69
> > > > > > > > action=REDIRECT ifindex=5 to_ifindex=0 err=-22 map_id=19 map_index=5
> > > > > > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > > ffffffff873dcbf4 xdp_do_redirect+0x3b4
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > > ffffffffc05d0f0f ixgbe_run_xdp+0x10f
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > > > > ffffffffc05d297a ixgbe_clean_rx_irq+0x51a
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > > > > ffffffffc05d2da0 ixgbe_poll+0xf0
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/kernel/drivers/net/ethernet/intel/ixgbe/ixgbe.ko)
> > > > > > > > ffffffff873afad7 __napi_poll+0x27
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > > ffffffff873affd3 net_rx_action+0x233
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > > ffffffff8762ae27 __do_softirq+0xc7
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > > ffffffff86b04cfe run_ksoftirqd+0x1e
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > > ffffffff86b33d83 smpboot_thread_fn+0xd3
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > > ffffffff86b2956d kthread+0xdd
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > > ffffffff86a02289 ret_from_fork+0x29
> > > > > > > > (/lib/modules/5.14.0-362.13.1.el9_3_asn/build/vmlinux)
> > > > > > > >
> > > > > > > > I am curious why the xdp program is invoked from the ixgbe driver
> > > > > > > > (running for slave device) when the xdp program is actually attached
> > > > > > > > to the bond device? Is this by design?
> > > > > > > > # xdp-loader status bond0
> > > > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > > > Interface Prio Program name Mode ID Tag
> > > > > > > > Chain actions
> > > > > > > > --------------------------------------------------------------------------------------
> > > > > > > > bond0 xdp_dispatcher native 64 90f686eb86991928
> > > > > > > > => 20 xsk_def_prog 73
> > > > > > > > 8f9c40757cb0a6a2 XDP_PASS
> > > > > > > >
> > > > > > > > # xdp-loader status ens1f0
> > > > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > > > Interface Prio Program name Mode ID Tag
> > > > > > > > Chain actions
> > > > > > > > --------------------------------------------------------------------------------------
> > > > > > > > ens1f0 <No XDP program loaded!>
> > > > > > > >
> > > > > > > > # xdp-loader status ens1f1
> > > > > > > > CURRENT XDP PROGRAM STATUS:
> > > > > > > > Interface Prio Program name Mode ID Tag
> > > > > > > > Chain actions
> > > > > > > > --------------------------------------------------------------------------------------
> > > > > > > > ens1f1 <No XDP program loaded!>
> > > > > > > >
> > > > > > > > Now, if I skip the device check in xsk_rcv_check(), I can see the
> > > > > > > > packets being received in the AF_XDP socket in the driver mode.
> > > > > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N
> > > > > > > > sock0@bond0:5 rxdrop xdp-drv poll()
> > > > > > > > pps pkts 1.00
> > > > > > > > rx 10,126,924 1,984,092,501
> > > > > > > > tx 0 0
> > > > > > > >
> > > > > > > > I am sure we would not want to skip the device check generally
> > > > > > > > especially for non-bonded devices, etc. Please guide on how to take
> > > > > > > > this further and get the issue fixed in the mainline.
> > > > > > > >
> > > > > > > > The ZC mode doesn't work. Mostly because of the problem you had
> > > > > > > > pointed out before.
> > > > > > > > # ./xdpsock -r -i bond0 -q 5 -p -n 1 -N -z
> > > > > > > > xdpsock.c:xsk_configure_socket:1068: errno: 22/"Invalid argument"
> > > > > > > >
> > > > > > > >
> > > > > > > > On Thu, Dec 21, 2023 at 7:16 PM Magnus Karlsson
> > > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > > >
> > > > > > > > > On Thu, 21 Dec 2023 at 13:39, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > > >
> > > > > > > > > > On Wed, Dec 20, 2023 at 1:54 PM Magnus Karlsson
> > > > > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Tue, 19 Dec 2023 at 21:18, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > Thanks for your response. My comments inline.
> > > > > > > > > > > >
> > > > > > > > > > > > On Tue, Dec 19, 2023 at 7:17 PM Magnus Karlsson
> > > > > > > > > > > > <magnus.karlsson@gmail.com> wrote:
> > > > > > > > > > > > >
> > > > > > > > > > > > > On Tue, 19 Dec 2023 at 11:46, Prashant Batra <prbatra.mail@gmail.com> wrote:
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Hi,
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > I am new to XDP and exploring it's working with different interface
> > > > > > > > > > > > > > types supported in linux. One of my use cases is to be able to receive
> > > > > > > > > > > > > > packets from the bond interface.
> > > > > > > > > > > > > > I used xdpsock sample program specifying the bond interface as the
> > > > > > > > > > > > > > input interface. However the packets received on the bond interface
> > > > > > > > > > > > > > are not handed over to the socket by the kernel if the socket is bound
> > > > > > > > > > > > > > in native mode. The packets are neither being passed to the kernel.
> > > > > > > > > > > > > > Note that the socket creation does succeed.
> > > > > > > > > > > > > > In skb mode this works and I am able to receive packets in the
> > > > > > > > > > > > > > userspace. But in skb mode as expected the performance is not that
> > > > > > > > > > > > > > great.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Is AF_XDP sockets on bond not supported in native mode? Or since the
> > > > > > > > > > > > > > packet has be to be handed over to the bond driver post reception on
> > > > > > > > > > > > > > the phy port, a skb allocation and copy to it is indeed a must?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I have never tried a bonding interface with AF_XDP, so it might not
> > > > > > > > > > > > > work. Can you trace the packet to see where it is being dropped in
> > > > > > > > > > > > > native mode? There are no modifications needed to an XDP_REDIRECT
> > > > > > > > > > > > > enabled driver to support AF_XDP in XDP_DRV / copy mode. What NICs are
> > > > > > > > > > > > > you using?
> > > > > > > > > > > > >
> > > > > > > > > > > > I will trace the packet and get back.
> > > > > > > > > > > > The bond is over 2 physical ports part of the Intel NIC card. Those are-
> > > > > > > > > > > > b3:00.0 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > > > > > b3:00.1 Ethernet controller: Intel Corporation 82599ES 10-Gigabit
> > > > > > > > > > > > SFI/SFP+ Network Connection (rev 01)
> > > > > > > > > > > >
> > > > > > > > > > > > Bonding algo is 802.3ad
> > > > > > > > > > > >
> > > > > > > > > > > > CPU is Intel Xeon Gold 3.40GHz
> > > > > > > > > > > >
> > > > > > > > > > > > NIC Driver
> > > > > > > > > > > > # ethtool -i ens1f0
> > > > > > > > > > > > driver: ixgbe
> > > > > > > > > > > > version: 5.14.0-362.13.1.el9_3
> > > > > > > > > > >
> > > > > > > > > > > Could you please try with the latest kernel 6.7? 5.14 is quite old and
> > > > > > > > > > > a lot of things have happened since then.
> > > > > > > > > > >
> > > > > > > > > > I tried with kernel 6.6.8-1.el9.elrepo.x86_64. I still see the same issue.
> > > > > > > > >
> > > > > > > > > OK, good to know. Have you managed to trace where the packet is lost?
> > > > > > > > >
> > > > > > > > > > > > Features
> > > > > > > > > > > > # xdp-loader features ens1f0
> > > > > > > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > > > > > > >
> > > > > > > > > > > > CPU is
> > > > > > > > > > > >
> > > > > > > > > > > > Interesting thing is that the bond0 does advertise both native and ZC
> > > > > > > > > > > > mode. That's because the features are copied from the slave device.
> > > > > > > > > > > > Which explains why there is no error while binding the socket in
> > > > > > > > > > > > native/zero-copy mode.
> > > > > > > > > > >
> > > > > > > > > > > It is probably the intention that if both the bonded devices support a
> > > > > > > > > > > feature, then the bonding device will too. I just saw that the bonding
> > > > > > > > > > > device did not implement xsk_wakeup which is used by zero-copy, so
> > > > > > > > > > > zero-copy is not really supported so that support should not be
> > > > > > > > > > > advertised. The code in AF_XDP tests for zero-copy support this way:
> > > > > > > > > > >
> > > > > > > > > > > if ((netdev->xdp_features & NETDEV_XDP_ACT_ZC) != NETDEV_XDP_ACT_ZC) {
> > > > > > > > > > > err = -EOPNOTSUPP;
> > > > > > > > > > > goto err_unreg_pool;
> > > > > > > > > > > }
> > > > > > > > > > >
> > > > > > > > > > > So there are some things needed in the bonding driver to make
> > > > > > > > > > > zero-copy work. Might not be much though. But your problem is with
> > > > > > > > > > > XDP_DRV and copy mode, so let us start there.
> > > > > > > > > > >
> > > > > > > > > > > > void bond_xdp_set_features(struct net_device *bond_dev)
> > > > > > > > > > > > {
> > > > > > > > > > > > ..
> > > > > > > > > > > > bond_for_each_slave(bond, slave, iter)
> > > > > > > > > > > > val &= slave->dev->xdp_features;
> > > > > > > > > > > > xdp_set_features_flag(bond_dev, val);
> > > > > > > > > > > > }
> > > > > > > > > > > >
> > > > > > > > > > > > # ../xdp-loader/xdp-loader features bond0
> > > > > > > > > > > > NETDEV_XDP_ACT_BASIC: yes
> > > > > > > > > > > > NETDEV_XDP_ACT_REDIRECT: yes
> > > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT: no
> > > > > > > > > > > > NETDEV_XDP_ACT_XSK_ZEROCOPY: yes
> > > > > > > > > > > > NETDEV_XDP_ACT_HW_OFFLOAD: no
> > > > > > > > > > > > NETDEV_XDP_ACT_RX_SG: no
> > > > > > > > > > > > NETDEV_XDP_ACT_NDO_XMIT_SG: no
> > > > > > > > > > > >
> > > > > > > > > > > > > > Another thing I notice is that other XDP programs attached to bond
> > > > > > > > > > > > > > interface with targets like DROP, REDIRECT to other interface works
> > > > > > > > > > > > > > and perform better than AF_XDP (skb) based. Does this mean that these
> > > > > > > > > > > > > > are not allocating skb?
> > > > > > > > > > > > >
> > > > > > > > > > > > > I am not surprised that AF_XDP in copy is slower than XDP_REDIRECT.
> > > > > > > > > > > > > The packet has to be copied out to user-space then copied into the
> > > > > > > > > > > > > kernel again, something that is not needed in the XDP_REDIRECT case.
> > > > > > > > > > > > > If you were using zero-copy, on the other hand, it would be faster
> > > > > > > > > > > > > with AF_XDP. But the bonding interface does not support zero-copy, so
> > > > > > > > > > > > > not an option.
> > > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > Just to put forth the pps numbers with the above mentioned single port
> > > > > > > > > > > > in different modes and a comparison to the bond interface.
> > > > > > > > > > > > Test is using pktgen pumping 64 byte packets on a single flow.
> > > > > > > > > > > >
> > > > > > > > > > > > Single AF_XDP sock on a single NIC queue-
> > > > > > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > > > > > > ZC 14M 65% 35%
> > > > > > > > > > > > ./xdpsock -r -i ens1f0 -q 5 -p -n 1 -N -z
> > > > > > > > > > > > XDP_DRV/COPY 10M 100% 23% ./xdpsock -r
> > > > > > > > > > > > -i ens1f0 -q 5 -p -n 1 -N -c
> > > > > > > > > > > > SKB_MODE 2.2M 100% 62% ./xdpsock
> > > > > > > > > > > > -r -i ens1f0 -q 5 -p -n 1 -S
> > > > > > > > > > > > * CPU receiving the packet
> > > > > > > > > > > > In the above tests when using ZC and XDP_DRV/COPY, is this SI usage as
> > > > > > > > > > > > expected? Especially in ZC mode. Is it majorly because of the BPF
> > > > > > > > > > > > program running in non-HW offloaded mode? Don't have a NIC which can
> > > > > > > > > > > > run BPF in offloaded mode so I cannot compare it.
> > > > > > > > > > >
> > > > > > > > > > > I get about 25 - 30 Mpps at 100% CPU load on my system, but I have a
> > > > > > > > > > > 100G card and you are maxing out your 10G card at 65% and 14M. So yes,
> > > > > > > > > > > sounds reasonable. HW offload cannot be used with AF_XDP. You need to
> > > > > > > > > > > do the redirect in the CPU for it to work. If you want to know where
> > > > > > > > > > > time is spent use "perf top". The biggest chunk of time is spent in
> > > > > > > > > > > the XDP_REDIRECT operation, but there are many other time thiefs too.
> > > > > > > > > > >
> > > > > > > > > > > > The XDP_DROP target using xdp-bench tool (from xdp-tools) on the same NIC port-
> > > > > > > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > > > > > > ═══════════════════════════════════════════════
> > > > > > > > > > > > drop, no-touch 14M 41% ./xdp-bench drop -p
> > > > > > > > > > > > no-touch ens1f0 -e
> > > > > > > > > > > > drop, read-data 14M 55% ./xdp-bench drop -p
> > > > > > > > > > > > read-data ens1f0 -e
> > > > > > > > > > > > drop, parse-ip 14M 58% ./xdp-bench drop -p
> > > > > > > > > > > > parse-ip ens1f0 -e
> > > > > > > > > > > > * CPU receiving the packet
> > > > > > > > > > > >
> > > > > > > > > > > > The similar tests on bond interface (above mentioned 2 ports bonded)-
> > > > > > > > > > > > AF_XDP rxdrop PPS CPU-SI* CPU-xdpsock Command
> > > > > > > > > > > > ══════════════════════════════════════════════════════════
> > > > > > > > > > > > ZC X X X
> > > > > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -z
> > > > > > > > > > > > XDP_DRV/COPY X X X
> > > > > > > > > > > > ./xdpsock -r -i bond0 -q 0 -p -n 1 -N -c
> > > > > > > > > > > > SKB_MODE 2M 100% 55% ./xdpsock
> > > > > > > > > > > > -r -i bond0 -q 0 -p -n 1 -S
> > > > > > > > > > > > * CPU receiving the packet
> > > > > > > > > > > >
> > > > > > > > > > > > xdp-bench PPS CPU-SI* Command
> > > > > > > > > > > > ═══════════════════════════════════════════════
> > > > > > > > > > > > drop, no-touch 10.9M 33% ./xdp-bench drop -p no-touch
> > > > > > > > > > > > bond0 -e
> > > > > > > > > > > > drop, read-data 10.9M 44% ./xdp-bench drop -p
> > > > > > > > > > > > read-data bond0 -e
> > > > > > > > > > > > drop, parse-ip 10.9M 47% ./xdp-bench drop -p
> > > > > > > > > > > > parse-ip bond0 -e
> > > > > > > > > > > > * CPU receiving the packet
> > > > > > > > > > > >
> > > > > > > > > > > >
> > > > > > > > > > > > > > Kindly share your thoughts and advice.
> > > > > > > > > > > > > >
> > > > > > > > > > > > > > Thanks,
> > > > > > > > > > > > > > Prashant
> > > > > > > > > > > > > >
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-01-19 12:43 ` Prashant Batra
@ 2024-01-19 13:04 ` Toke Høiland-Jørgensen
[not found] ` <CAD0p+fUM5DcG44cxYXU3fMd9PgTjhTaMCH_oy=4iejJ41zxHpA@mail.gmail.com>
0 siblings, 1 reply; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-01-19 13:04 UTC (permalink / raw)
To: Prashant Batra, Magnus Karlsson
Cc: Fijalkowski, Maciej, xdp-newbies, Maryam Tahhan,
Jesper Dangaard Brouer, Lorenzo Bianconi
Prashant Batra <prbatra.mail@gmail.com> writes:
> Just to get your expert opinion on this, I am sharing at a very high
> level what my objectives are-
> For Rx handling:
> Demultiplex the packets received on the physical/bond interface based
> on the packet's src + dst combination and direct it to the process
> (Pn) handling that src + dst packets.
> a) One way of doing this is to pick all packets in userspace in a
> central process (Pc) using AF_XDP socket (ZC with phy device and
> XDP_DRV with bond) and then Pc passes it using shared memory to the
> correct process (Pn) for further handling.
> ens1f0/bond (AF_XDP) -> Pc -> Pn
> b) The other way is to run a xdp code attached to the phy/bond
> device which based on the src + dst redirects the packets to the veth
> pair (one per process Pn). The packet is then forwarded to the other
> end of the veth-pair over which there is an AF_XDP socket opened.The
> advantage here is that the packet demultiplex happens in the kernel
> and there is no Pc process needed.
> ens1f0/bond (XDP_REDIRECT_IF) ->vethext -> vethint ( AF_XDP) -> Pn
Adding in Jesper, Lorenzo and Maryam, as we've had various discussions
around improving AF_XDP support in containers, which seems to have some
overlap with your use case. Basically, what we have been discussing is
that your (b) approach has many desirable properties, also from a
container management PoV, and we believe it is possible to make it
perform reasonably well on both RX and TX.
It's most likely never going to be completely zero-copy because of the
veth traversal, but we should be able to get it down to a single copy at
least.
However, there is some work to be done before we can realise this
potential; but having more people interested in the use case may help
here :)
> Regarding netkit, I don't see much documentation or samples available
> except the selftest available in the kernel code. If you have or know
> of some samples that might fit in the above requirements that will
> surely help. Does it fit in the above requirement?
Netkit does not support XDP at all, and I doubt it ever will. Rather, it
is meant for optimising the use of BPF in the kernel stack (skb) path,
so it doesn't sound like it's a good fit for your use case if you want
to go directly from XDP to userspace.
-Toke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
[not found] ` <CAD0p+fUM5DcG44cxYXU3fMd9PgTjhTaMCH_oy=4iejJ41zxHpA@mail.gmail.com>
@ 2024-03-18 18:41 ` Christian Deacon
2024-03-19 7:52 ` Magnus Karlsson
0 siblings, 1 reply; 21+ messages in thread
From: Christian Deacon @ 2024-03-18 18:41 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: xdp-newbies
Resending the following email to the XDP Newbies mailing list since it
was rejected due to HTML contents (I've switched email clients and
forgot to disable HTML, I apologize).
Hey everyone,
I was wondering if there was an update to this. I'm currently running
into the same issue with a similar setup.
When running the XDP program on a bonding device via native mode,
packets redirected to the AF_XDP sockets with `bpf_redirect_map()`
inside the XDP program do not make it to the AF_XDP sockets. Switching
between zero copy and copy mode does not make a difference along with
setting the need wakeup flag.
I've tried the latest mainline kernel `6.8.1-060801`, but that did not
make a difference. If the XDP program is attached with SKB mode,
packets do show up on the AF_XDP sockets as mentioned in this thread
already.
While I haven't confirmed it on my side, I'm assuming the
`xsk_rcv_check()` function is the issue here. I'm unsure if skipping
this check for the time being would work for my needs, but I'm hoping
a better solution will be implemented to the mainline kernel.
I also saw there was another similar issue on this mailing list with
the title "Switching packets between queues in XDP program". However,
judging from the last reply in that thread, the fix implemented
wouldn't help with the bonding driver.
Any help is appreciated and thank you for your time!
On Mon, Mar 18, 2024 at 2:33 PM Christian Deacon
<christian.m.deacon@gmail.com> wrote:
>
> Hey everyone,
>
> I was wondering if there was an update to this. I'm currently running into the same issue with a similar setup.
>
> When running the XDP program on a bonding device via native mode, packets redirected to the AF_XDP sockets with `bpf_redirect_map()` inside the XDP program do not make it to the AF_XDP sockets. Switching between zero copy and copy mode does not make a difference along with setting the need wakeup flag.
>
> I've tried the latest mainline kernel `6.8.1-060801`, but that did not make a difference. If the XDP program is attached with SKB mode, packets do show up on the AF_XDP sockets as mentioned in this thread already.
>
> While I haven't confirmed it on my side, I'm assuming the `xsk_rcv_check()` function is the issue here. I'm unsure if skipping this check for the time being would work for my needs, but I'm hoping a better solution will be implemented to the mainline kernel.
>
> I also saw there was another similar issue on this mailing list with the title "Switching packets between queues in XDP program". However, judging from the last reply in that thread, the fix implemented wouldn't help with the bonding driver.
>
> Any help is appreciated and thank you for your time!
>
> On Fri, Jan 19, 2024 at 8:04 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Prashant Batra <prbatra.mail@gmail.com> writes:
>>
>> > Just to get your expert opinion on this, I am sharing at a very high
>> > level what my objectives are-
>> > For Rx handling:
>> > Demultiplex the packets received on the physical/bond interface based
>> > on the packet's src + dst combination and direct it to the process
>> > (Pn) handling that src + dst packets.
>> > a) One way of doing this is to pick all packets in userspace in a
>> > central process (Pc) using AF_XDP socket (ZC with phy device and
>> > XDP_DRV with bond) and then Pc passes it using shared memory to the
>> > correct process (Pn) for further handling.
>> > ens1f0/bond (AF_XDP) -> Pc -> Pn
>> > b) The other way is to run a xdp code attached to the phy/bond
>> > device which based on the src + dst redirects the packets to the veth
>> > pair (one per process Pn). The packet is then forwarded to the other
>> > end of the veth-pair over which there is an AF_XDP socket opened.The
>> > advantage here is that the packet demultiplex happens in the kernel
>> > and there is no Pc process needed.
>> > ens1f0/bond (XDP_REDIRECT_IF) ->vethext -> vethint ( AF_XDP) -> Pn
>>
>> Adding in Jesper, Lorenzo and Maryam, as we've had various discussions
>> around improving AF_XDP support in containers, which seems to have some
>> overlap with your use case. Basically, what we have been discussing is
>> that your (b) approach has many desirable properties, also from a
>> container management PoV, and we believe it is possible to make it
>> perform reasonably well on both RX and TX.
>>
>> It's most likely never going to be completely zero-copy because of the
>> veth traversal, but we should be able to get it down to a single copy at
>> least.
>>
>> However, there is some work to be done before we can realise this
>> potential; but having more people interested in the use case may help
>> here :)
>>
>> > Regarding netkit, I don't see much documentation or samples available
>> > except the selftest available in the kernel code. If you have or know
>> > of some samples that might fit in the above requirements that will
>> > surely help. Does it fit in the above requirement?
>>
>> Netkit does not support XDP at all, and I doubt it ever will. Rather, it
>> is meant for optimising the use of BPF in the kernel stack (skb) path,
>> so it doesn't sound like it's a good fit for your use case if you want
>> to go directly from XDP to userspace.
>>
>> -Toke
>>
>>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-03-18 18:41 ` Christian Deacon
@ 2024-03-19 7:52 ` Magnus Karlsson
2024-03-19 11:57 ` Toke Høiland-Jørgensen
0 siblings, 1 reply; 21+ messages in thread
From: Magnus Karlsson @ 2024-03-19 7:52 UTC (permalink / raw)
To: Christian Deacon; +Cc: Toke Høiland-Jørgensen, xdp-newbies
On Mon, 18 Mar 2024 at 19:41, Christian Deacon
<christian.m.deacon@gmail.com> wrote:
>
> Resending the following email to the XDP Newbies mailing list since it
> was rejected due to HTML contents (I've switched email clients and
> forgot to disable HTML, I apologize).
>
> Hey everyone,
>
> I was wondering if there was an update to this. I'm currently running
> into the same issue with a similar setup.
>
> When running the XDP program on a bonding device via native mode,
> packets redirected to the AF_XDP sockets with `bpf_redirect_map()`
> inside the XDP program do not make it to the AF_XDP sockets. Switching
> between zero copy and copy mode does not make a difference along with
> setting the need wakeup flag.
>
> I've tried the latest mainline kernel `6.8.1-060801`, but that did not
> make a difference. If the XDP program is attached with SKB mode,
> packets do show up on the AF_XDP sockets as mentioned in this thread
> already.
>
> While I haven't confirmed it on my side, I'm assuming the
> `xsk_rcv_check()` function is the issue here. I'm unsure if skipping
> this check for the time being would work for my needs, but I'm hoping
> a better solution will be implemented to the mainline kernel.
>
> I also saw there was another similar issue on this mailing list with
> the title "Switching packets between queues in XDP program". However,
> judging from the last reply in that thread, the fix implemented
> wouldn't help with the bonding driver.
>
> Any help is appreciated and thank you for your time!
You are correct in that the fix above does not address the bonding
case and that the problem is indeed that XDP reports the device as the
real NIC and that the AF_XDP socket is bound to the bonding device.
Therefore xdp->dev != xsk->dev (in principle, not the actual code) and
all packets will be discarded. I got as far as sketching on a solution
but I do not have the bandwidth at the moment to implement it.
Unfortunately it is not a one-liner or even just one hundred lines of
code. Let me know what you think, or if someone can come up with an
easier solution.
*** Suggestion on how to implement AF_XDP for the bond device
Two steps: XDP_DRV mode then zero-copy mode
* XDP_DRV:
For XDP_DRV mode, the problem to overcome is this piece of code
in xsk_rcv_check():
struct net_device *dev = xdp->rxq->dev;
u32 qid = xdp->rxq->queue_index;
if (!dev->_rx[qid].pool || xs->umem != dev->_rx[qid].pool->umem)
return -EINVAL;
xs is the socket that was bound to the bonding device e.g., bond0. So
xs->dev points to bond0. xdp->rxq->dev, on the other hand, comes from
XDP and the real driver e.g. eth0, thus xs->dev != xdp->rxq->dev. The
problem here is that only _rx[] of bond0 is populated with the pool
pointer at bind time, so dev->_rx[qid].pool is NULL as it refers to
the _rx of eth0 that was never set. The solution here is then to make
sure that the _rx[] of bond0 is propagated to eth0 (and any other device
bonded to bond0).
Two new features are needed to support this:
1) A helper that copies _rx[].pool from one struct to another
2) A new xsk_bind netdev event that a driver can subscribe to. Will be called
whenever a xsk socket is bound to a device.
In the case the socket is bound to bond0 before eth0 is bonded to
bond0, only 1) needs to be used in the bonding driver.
In the case the socket is bound to bond0 after bonding of eth0 to
bond0, the bonding driver need to subscribe to 2) and in the event
handle call 1).
* ZERO-COPY
1) Relay through the XDP_SETUP_XSK_POOL command in NDO_BPF to the
bonded devices.
2) Relay through the ndo_xsk_wakeup to the bonded devices.
Standby mode seems straight-forward to support.
How to deal with round-robin mode in the bonding driver? Not possible
to have multiple bonded devices access the same ring. Would require
multiple rings and copying to them. Also not clear how to propagate
the need_wakeup flags of the individual network devices to the one of
the bond device. I think this kind of functionality is much better
performed in user-space with a lib. Simpler and faster.
>
> On Mon, Mar 18, 2024 at 2:33 PM Christian Deacon
> <christian.m.deacon@gmail.com> wrote:
> >
> > Hey everyone,
> >
> > I was wondering if there was an update to this. I'm currently running into the same issue with a similar setup.
> >
> > When running the XDP program on a bonding device via native mode, packets redirected to the AF_XDP sockets with `bpf_redirect_map()` inside the XDP program do not make it to the AF_XDP sockets. Switching between zero copy and copy mode does not make a difference along with setting the need wakeup flag.
> >
> > I've tried the latest mainline kernel `6.8.1-060801`, but that did not make a difference. If the XDP program is attached with SKB mode, packets do show up on the AF_XDP sockets as mentioned in this thread already.
> >
> > While I haven't confirmed it on my side, I'm assuming the `xsk_rcv_check()` function is the issue here. I'm unsure if skipping this check for the time being would work for my needs, but I'm hoping a better solution will be implemented to the mainline kernel.
> >
> > I also saw there was another similar issue on this mailing list with the title "Switching packets between queues in XDP program". However, judging from the last reply in that thread, the fix implemented wouldn't help with the bonding driver.
> >
> > Any help is appreciated and thank you for your time!
> >
> > On Fri, Jan 19, 2024 at 8:04 AM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
> >>
> >> Prashant Batra <prbatra.mail@gmail.com> writes:
> >>
> >> > Just to get your expert opinion on this, I am sharing at a very high
> >> > level what my objectives are-
> >> > For Rx handling:
> >> > Demultiplex the packets received on the physical/bond interface based
> >> > on the packet's src + dst combination and direct it to the process
> >> > (Pn) handling that src + dst packets.
> >> > a) One way of doing this is to pick all packets in userspace in a
> >> > central process (Pc) using AF_XDP socket (ZC with phy device and
> >> > XDP_DRV with bond) and then Pc passes it using shared memory to the
> >> > correct process (Pn) for further handling.
> >> > ens1f0/bond (AF_XDP) -> Pc -> Pn
> >> > b) The other way is to run a xdp code attached to the phy/bond
> >> > device which based on the src + dst redirects the packets to the veth
> >> > pair (one per process Pn). The packet is then forwarded to the other
> >> > end of the veth-pair over which there is an AF_XDP socket opened.The
> >> > advantage here is that the packet demultiplex happens in the kernel
> >> > and there is no Pc process needed.
> >> > ens1f0/bond (XDP_REDIRECT_IF) ->vethext -> vethint ( AF_XDP) -> Pn
> >>
> >> Adding in Jesper, Lorenzo and Maryam, as we've had various discussions
> >> around improving AF_XDP support in containers, which seems to have some
> >> overlap with your use case. Basically, what we have been discussing is
> >> that your (b) approach has many desirable properties, also from a
> >> container management PoV, and we believe it is possible to make it
> >> perform reasonably well on both RX and TX.
> >>
> >> It's most likely never going to be completely zero-copy because of the
> >> veth traversal, but we should be able to get it down to a single copy at
> >> least.
> >>
> >> However, there is some work to be done before we can realise this
> >> potential; but having more people interested in the use case may help
> >> here :)
> >>
> >> > Regarding netkit, I don't see much documentation or samples available
> >> > except the selftest available in the kernel code. If you have or know
> >> > of some samples that might fit in the above requirements that will
> >> > surely help. Does it fit in the above requirement?
> >>
> >> Netkit does not support XDP at all, and I doubt it ever will. Rather, it
> >> is meant for optimising the use of BPF in the kernel stack (skb) path,
> >> so it doesn't sound like it's a good fit for your use case if you want
> >> to go directly from XDP to userspace.
> >>
> >> -Toke
> >>
> >>
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-03-19 7:52 ` Magnus Karlsson
@ 2024-03-19 11:57 ` Toke Høiland-Jørgensen
2024-03-19 12:29 ` Magnus Karlsson
0 siblings, 1 reply; 21+ messages in thread
From: Toke Høiland-Jørgensen @ 2024-03-19 11:57 UTC (permalink / raw)
To: Magnus Karlsson, Christian Deacon; +Cc: xdp-newbies
Magnus Karlsson <magnus.karlsson@gmail.com> writes:
> On Mon, 18 Mar 2024 at 19:41, Christian Deacon
> <christian.m.deacon@gmail.com> wrote:
>>
>> Resending the following email to the XDP Newbies mailing list since it
>> was rejected due to HTML contents (I've switched email clients and
>> forgot to disable HTML, I apologize).
>>
>> Hey everyone,
>>
>> I was wondering if there was an update to this. I'm currently running
>> into the same issue with a similar setup.
>>
>> When running the XDP program on a bonding device via native mode,
>> packets redirected to the AF_XDP sockets with `bpf_redirect_map()`
>> inside the XDP program do not make it to the AF_XDP sockets. Switching
>> between zero copy and copy mode does not make a difference along with
>> setting the need wakeup flag.
>>
>> I've tried the latest mainline kernel `6.8.1-060801`, but that did not
>> make a difference. If the XDP program is attached with SKB mode,
>> packets do show up on the AF_XDP sockets as mentioned in this thread
>> already.
>>
>> While I haven't confirmed it on my side, I'm assuming the
>> `xsk_rcv_check()` function is the issue here. I'm unsure if skipping
>> this check for the time being would work for my needs, but I'm hoping
>> a better solution will be implemented to the mainline kernel.
>>
>> I also saw there was another similar issue on this mailing list with
>> the title "Switching packets between queues in XDP program". However,
>> judging from the last reply in that thread, the fix implemented
>> wouldn't help with the bonding driver.
>>
>> Any help is appreciated and thank you for your time!
>
> You are correct in that the fix above does not address the bonding
> case and that the problem is indeed that XDP reports the device as the
> real NIC and that the AF_XDP socket is bound to the bonding device.
> Therefore xdp->dev != xsk->dev (in principle, not the actual code) and
> all packets will be discarded. I got as far as sketching on a solution
> but I do not have the bandwidth at the moment to implement it.
> Unfortunately it is not a one-liner or even just one hundred lines of
> code. Let me know what you think, or if someone can come up with an
> easier solution.
>
> *** Suggestion on how to implement AF_XDP for the bond device
>
> Two steps: XDP_DRV mode then zero-copy mode
>
> * XDP_DRV:
>
> For XDP_DRV mode, the problem to overcome is this piece of code
> in xsk_rcv_check():
>
> struct net_device *dev = xdp->rxq->dev;
> u32 qid = xdp->rxq->queue_index;
>
> if (!dev->_rx[qid].pool || xs->umem != dev->_rx[qid].pool->umem)
> return -EINVAL;
>
> xs is the socket that was bound to the bonding device e.g., bond0. So
> xs->dev points to bond0. xdp->rxq->dev, on the other hand, comes from
> XDP and the real driver e.g. eth0, thus xs->dev != xdp->rxq->dev. The
> problem here is that only _rx[] of bond0 is populated with the pool
> pointer at bind time, so dev->_rx[qid].pool is NULL as it refers to
> the _rx of eth0 that was never set. The solution here is then to make
> sure that the _rx[] of bond0 is propagated to eth0 (and any other device
> bonded to bond0).
>
> Two new features are needed to support this:
>
> 1) A helper that copies _rx[].pool from one struct to another
> 2) A new xsk_bind netdev event that a driver can subscribe to. Will be called
> whenever a xsk socket is bound to a device.
>
> In the case the socket is bound to bond0 before eth0 is bonded to
> bond0, only 1) needs to be used in the bonding driver.
>
> In the case the socket is bound to bond0 after bonding of eth0 to
> bond0, the bonding driver need to subscribe to 2) and in the event
> handle call 1).
>
> * ZERO-COPY
>
> 1) Relay through the XDP_SETUP_XSK_POOL command in NDO_BPF to the
> bonded devices.
>
> 2) Relay through the ndo_xsk_wakeup to the bonded devices.
>
> Standby mode seems straight-forward to support.
>
> How to deal with round-robin mode in the bonding driver? Not possible
> to have multiple bonded devices access the same ring. Would require
> multiple rings and copying to them. Also not clear how to propagate
> the need_wakeup flags of the individual network devices to the one of
> the bond device. I think this kind of functionality is much better
> performed in user-space with a lib. Simpler and faster.
I think this goes for all the things you mentioned above. There is no
way we can make this consistent with the in-kernel bond behaviour, so
it's going to be a pretty leaky abstraction anyway. So I don't think we
should add all this complexity, it's better to handle this in userspace
(and just attach to the component interfaces).
In fact, I think supporting XDP at all on the bond interface was a
mistake; let's not exacerbate it :/
-Toke
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Redirect to AF_XDP socket not working with bond interface in native mode
2024-03-19 11:57 ` Toke Høiland-Jørgensen
@ 2024-03-19 12:29 ` Magnus Karlsson
0 siblings, 0 replies; 21+ messages in thread
From: Magnus Karlsson @ 2024-03-19 12:29 UTC (permalink / raw)
To: Toke Høiland-Jørgensen; +Cc: Christian Deacon, xdp-newbies
On Tue, 19 Mar 2024 at 12:57, Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>
> Magnus Karlsson <magnus.karlsson@gmail.com> writes:
>
> > On Mon, 18 Mar 2024 at 19:41, Christian Deacon
> > <christian.m.deacon@gmail.com> wrote:
> >>
> >> Resending the following email to the XDP Newbies mailing list since it
> >> was rejected due to HTML contents (I've switched email clients and
> >> forgot to disable HTML, I apologize).
> >>
> >> Hey everyone,
> >>
> >> I was wondering if there was an update to this. I'm currently running
> >> into the same issue with a similar setup.
> >>
> >> When running the XDP program on a bonding device via native mode,
> >> packets redirected to the AF_XDP sockets with `bpf_redirect_map()`
> >> inside the XDP program do not make it to the AF_XDP sockets. Switching
> >> between zero copy and copy mode does not make a difference along with
> >> setting the need wakeup flag.
> >>
> >> I've tried the latest mainline kernel `6.8.1-060801`, but that did not
> >> make a difference. If the XDP program is attached with SKB mode,
> >> packets do show up on the AF_XDP sockets as mentioned in this thread
> >> already.
> >>
> >> While I haven't confirmed it on my side, I'm assuming the
> >> `xsk_rcv_check()` function is the issue here. I'm unsure if skipping
> >> this check for the time being would work for my needs, but I'm hoping
> >> a better solution will be implemented to the mainline kernel.
> >>
> >> I also saw there was another similar issue on this mailing list with
> >> the title "Switching packets between queues in XDP program". However,
> >> judging from the last reply in that thread, the fix implemented
> >> wouldn't help with the bonding driver.
> >>
> >> Any help is appreciated and thank you for your time!
> >
> > You are correct in that the fix above does not address the bonding
> > case and that the problem is indeed that XDP reports the device as the
> > real NIC and that the AF_XDP socket is bound to the bonding device.
> > Therefore xdp->dev != xsk->dev (in principle, not the actual code) and
> > all packets will be discarded. I got as far as sketching on a solution
> > but I do not have the bandwidth at the moment to implement it.
> > Unfortunately it is not a one-liner or even just one hundred lines of
> > code. Let me know what you think, or if someone can come up with an
> > easier solution.
> >
> > *** Suggestion on how to implement AF_XDP for the bond device
> >
> > Two steps: XDP_DRV mode then zero-copy mode
> >
> > * XDP_DRV:
> >
> > For XDP_DRV mode, the problem to overcome is this piece of code
> > in xsk_rcv_check():
> >
> > struct net_device *dev = xdp->rxq->dev;
> > u32 qid = xdp->rxq->queue_index;
> >
> > if (!dev->_rx[qid].pool || xs->umem != dev->_rx[qid].pool->umem)
> > return -EINVAL;
> >
> > xs is the socket that was bound to the bonding device e.g., bond0. So
> > xs->dev points to bond0. xdp->rxq->dev, on the other hand, comes from
> > XDP and the real driver e.g. eth0, thus xs->dev != xdp->rxq->dev. The
> > problem here is that only _rx[] of bond0 is populated with the pool
> > pointer at bind time, so dev->_rx[qid].pool is NULL as it refers to
> > the _rx of eth0 that was never set. The solution here is then to make
> > sure that the _rx[] of bond0 is propagated to eth0 (and any other device
> > bonded to bond0).
> >
> > Two new features are needed to support this:
> >
> > 1) A helper that copies _rx[].pool from one struct to another
> > 2) A new xsk_bind netdev event that a driver can subscribe to. Will be called
> > whenever a xsk socket is bound to a device.
> >
> > In the case the socket is bound to bond0 before eth0 is bonded to
> > bond0, only 1) needs to be used in the bonding driver.
> >
> > In the case the socket is bound to bond0 after bonding of eth0 to
> > bond0, the bonding driver need to subscribe to 2) and in the event
> > handle call 1).
> >
> > * ZERO-COPY
> >
> > 1) Relay through the XDP_SETUP_XSK_POOL command in NDO_BPF to the
> > bonded devices.
> >
> > 2) Relay through the ndo_xsk_wakeup to the bonded devices.
> >
> > Standby mode seems straight-forward to support.
> >
> > How to deal with round-robin mode in the bonding driver? Not possible
> > to have multiple bonded devices access the same ring. Would require
> > multiple rings and copying to them. Also not clear how to propagate
> > the need_wakeup flags of the individual network devices to the one of
> > the bond device. I think this kind of functionality is much better
> > performed in user-space with a lib. Simpler and faster.
>
> I think this goes for all the things you mentioned above. There is no
> way we can make this consistent with the in-kernel bond behaviour, so
> it's going to be a pretty leaky abstraction anyway. So I don't think we
> should add all this complexity, it's better to handle this in userspace
> (and just attach to the component interfaces).
Yes, supporting this in user-space would be much simpler and more of
the bonding scenarios could also be supported. I also do not see any
of this added kernel functionality being useful for any other use
case, except bonding.
> In fact, I think supporting XDP at all on the bond interface was a
> mistake; let's not exacerbate it :/
>
> -Toke
>
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-03-19 12:30 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-19 10:45 Redirect to AF_XDP socket not working with bond interface in native mode Prashant Batra
2023-12-19 10:58 ` Prashant Batra
2023-12-19 13:47 ` Magnus Karlsson
2023-12-19 20:18 ` Prashant Batra
2023-12-20 8:24 ` Magnus Karlsson
2023-12-21 12:39 ` Prashant Batra
2023-12-21 13:45 ` Magnus Karlsson
2023-12-22 11:23 ` Prashant Batra
2024-01-02 9:57 ` Magnus Karlsson
2024-01-11 10:41 ` Prashant Batra
2024-01-15 9:22 ` Magnus Karlsson
2024-01-16 12:48 ` Prashant Batra
2024-01-16 12:59 ` Magnus Karlsson
2024-01-17 6:07 ` Prashant Batra
2024-01-17 7:41 ` Magnus Karlsson
2024-01-19 12:43 ` Prashant Batra
2024-01-19 13:04 ` Toke Høiland-Jørgensen
[not found] ` <CAD0p+fUM5DcG44cxYXU3fMd9PgTjhTaMCH_oy=4iejJ41zxHpA@mail.gmail.com>
2024-03-18 18:41 ` Christian Deacon
2024-03-19 7:52 ` Magnus Karlsson
2024-03-19 11:57 ` Toke Høiland-Jørgensen
2024-03-19 12:29 ` Magnus Karlsson
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.