From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Magnus Karlsson <magnus.karlsson@gmail.com>,
Christian Deacon <christian.m.deacon@gmail.com>
Cc: xdp-newbies@vger.kernel.org
Subject: Re: Redirect to AF_XDP socket not working with bond interface in native mode
Date: Tue, 19 Mar 2024 12:57:34 +0100 [thread overview]
Message-ID: <87a5muzb69.fsf@toke.dk> (raw)
In-Reply-To: <CAJ8uoz25bTNDXsDDd1J5zKcoTAtEpAf35WTkFBZi6hyMJvRsRA@mail.gmail.com>
Magnus Karlsson <magnus.karlsson@gmail.com> writes:
> On Mon, 18 Mar 2024 at 19:41, Christian Deacon
> <christian.m.deacon@gmail.com> wrote:
>>
>> Resending the following email to the XDP Newbies mailing list since it
>> was rejected due to HTML contents (I've switched email clients and
>> forgot to disable HTML, I apologize).
>>
>> Hey everyone,
>>
>> I was wondering if there was an update to this. I'm currently running
>> into the same issue with a similar setup.
>>
>> When running the XDP program on a bonding device via native mode,
>> packets redirected to the AF_XDP sockets with `bpf_redirect_map()`
>> inside the XDP program do not make it to the AF_XDP sockets. Switching
>> between zero copy and copy mode does not make a difference along with
>> setting the need wakeup flag.
>>
>> I've tried the latest mainline kernel `6.8.1-060801`, but that did not
>> make a difference. If the XDP program is attached with SKB mode,
>> packets do show up on the AF_XDP sockets as mentioned in this thread
>> already.
>>
>> While I haven't confirmed it on my side, I'm assuming the
>> `xsk_rcv_check()` function is the issue here. I'm unsure if skipping
>> this check for the time being would work for my needs, but I'm hoping
>> a better solution will be implemented to the mainline kernel.
>>
>> I also saw there was another similar issue on this mailing list with
>> the title "Switching packets between queues in XDP program". However,
>> judging from the last reply in that thread, the fix implemented
>> wouldn't help with the bonding driver.
>>
>> Any help is appreciated and thank you for your time!
>
> You are correct in that the fix above does not address the bonding
> case and that the problem is indeed that XDP reports the device as the
> real NIC and that the AF_XDP socket is bound to the bonding device.
> Therefore xdp->dev != xsk->dev (in principle, not the actual code) and
> all packets will be discarded. I got as far as sketching on a solution
> but I do not have the bandwidth at the moment to implement it.
> Unfortunately it is not a one-liner or even just one hundred lines of
> code. Let me know what you think, or if someone can come up with an
> easier solution.
>
> *** Suggestion on how to implement AF_XDP for the bond device
>
> Two steps: XDP_DRV mode then zero-copy mode
>
> * XDP_DRV:
>
> For XDP_DRV mode, the problem to overcome is this piece of code
> in xsk_rcv_check():
>
> struct net_device *dev = xdp->rxq->dev;
> u32 qid = xdp->rxq->queue_index;
>
> if (!dev->_rx[qid].pool || xs->umem != dev->_rx[qid].pool->umem)
> return -EINVAL;
>
> xs is the socket that was bound to the bonding device e.g., bond0. So
> xs->dev points to bond0. xdp->rxq->dev, on the other hand, comes from
> XDP and the real driver e.g. eth0, thus xs->dev != xdp->rxq->dev. The
> problem here is that only _rx[] of bond0 is populated with the pool
> pointer at bind time, so dev->_rx[qid].pool is NULL as it refers to
> the _rx of eth0 that was never set. The solution here is then to make
> sure that the _rx[] of bond0 is propagated to eth0 (and any other device
> bonded to bond0).
>
> Two new features are needed to support this:
>
> 1) A helper that copies _rx[].pool from one struct to another
> 2) A new xsk_bind netdev event that a driver can subscribe to. Will be called
> whenever a xsk socket is bound to a device.
>
> In the case the socket is bound to bond0 before eth0 is bonded to
> bond0, only 1) needs to be used in the bonding driver.
>
> In the case the socket is bound to bond0 after bonding of eth0 to
> bond0, the bonding driver need to subscribe to 2) and in the event
> handle call 1).
>
> * ZERO-COPY
>
> 1) Relay through the XDP_SETUP_XSK_POOL command in NDO_BPF to the
> bonded devices.
>
> 2) Relay through the ndo_xsk_wakeup to the bonded devices.
>
> Standby mode seems straight-forward to support.
>
> How to deal with round-robin mode in the bonding driver? Not possible
> to have multiple bonded devices access the same ring. Would require
> multiple rings and copying to them. Also not clear how to propagate
> the need_wakeup flags of the individual network devices to the one of
> the bond device. I think this kind of functionality is much better
> performed in user-space with a lib. Simpler and faster.
I think this goes for all the things you mentioned above. There is no
way we can make this consistent with the in-kernel bond behaviour, so
it's going to be a pretty leaky abstraction anyway. So I don't think we
should add all this complexity, it's better to handle this in userspace
(and just attach to the component interfaces).
In fact, I think supporting XDP at all on the bond interface was a
mistake; let's not exacerbate it :/
-Toke
next prev parent reply other threads:[~2024-03-19 11:57 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-19 10:45 Redirect to AF_XDP socket not working with bond interface in native mode Prashant Batra
2023-12-19 10:58 ` Prashant Batra
2023-12-19 13:47 ` Magnus Karlsson
2023-12-19 20:18 ` Prashant Batra
2023-12-20 8:24 ` Magnus Karlsson
2023-12-21 12:39 ` Prashant Batra
2023-12-21 13:45 ` Magnus Karlsson
2023-12-22 11:23 ` Prashant Batra
2024-01-02 9:57 ` Magnus Karlsson
2024-01-11 10:41 ` Prashant Batra
2024-01-15 9:22 ` Magnus Karlsson
2024-01-16 12:48 ` Prashant Batra
2024-01-16 12:59 ` Magnus Karlsson
2024-01-17 6:07 ` Prashant Batra
2024-01-17 7:41 ` Magnus Karlsson
2024-01-19 12:43 ` Prashant Batra
2024-01-19 13:04 ` Toke Høiland-Jørgensen
[not found] ` <CAD0p+fUM5DcG44cxYXU3fMd9PgTjhTaMCH_oy=4iejJ41zxHpA@mail.gmail.com>
2024-03-18 18:41 ` Christian Deacon
2024-03-19 7:52 ` Magnus Karlsson
2024-03-19 11:57 ` Toke Høiland-Jørgensen [this message]
2024-03-19 12:29 ` Magnus Karlsson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a5muzb69.fsf@toke.dk \
--to=toke@redhat.com \
--cc=christian.m.deacon@gmail.com \
--cc=magnus.karlsson@gmail.com \
--cc=xdp-newbies@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.