From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
Andrii Nakryiko <andrii@kernel.org>,
Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
Yonghong Song <yhs@fb.com>,
John Fastabend <john.fastabend@gmail.com>,
KP Singh <kpsingh@kernel.org>,
"David S. Miller" <davem@davemloft.net>,
Jakub Kicinski <kuba@kernel.org>,
Jesper Dangaard Brouer <hawk@kernel.org>,
Network Development <netdev@vger.kernel.org>,
bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v7 1/3] bpf: Add "live packet" mode for XDP in bpf_prog_run()
Date: Sun, 09 Jan 2022 13:30:00 +0100 [thread overview]
Message-ID: <87ee5h852v.fsf@toke.dk> (raw)
In-Reply-To: <20220109022448.bxgatdsx3obvipbu@ast-mbp.dhcp.thefacebook.com>
Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:
> On Sat, Jan 08, 2022 at 09:19:41PM +0100, Toke Høiland-Jørgensen wrote:
>>
>> Sure, totally fine with documenting it. Just seems to me the most
>> obvious place to put this is in a new
>> Documentation/bpf/prog_test_run.rst file with a short introduction about
>> the general BPF_PROG_RUN mechanism, and then a subsection dedicated to
>> this facility.
>
> sgtm
Great!
>> > I guess it's ok-ish to get stuck with 128.
>> > It will be uapi that we cannot change though.
>> > Are you comfortable with that?
>>
>> UAPI in what sense? I'm thinking of documenting it like:
>>
>> "The packet data being supplied as data_in to BPF_PROG_RUN will be used
>> for the initial run of the XDP program. However, when running the
>> program multiple times (with repeat > 1), only the packet *bounds*
>> (i.e., the data, data_end and data_meta pointers) will be reset on each
>> invocation, the packet data itself won't be rewritten. The pages
>> backing the packets are recycled, but the order depends on the path the
>> packet takes through the kernel, making it hard to predict when a
>> particular modified page makes it back to the XDP program. In practice,
>> this means that if the XDP program modifies the packet payload before
>> sending out the packet, it has to be prepared to deal with subsequent
>> invocations seeing either the initial data or the already-modified
>> packet, in arbitrary order."
>>
>> I don't think this makes any promises about any particular size of the
>> page pool, so how does it constitute UAPI?
>
> Could you explain out-of-order scanario again?
> It's possible only if xdp_redirect is done into different netdevs.
> Then they can xmit at different times and cycle pages back into
> the loop in different order. But TX or REDIRECT into the same netdev
> will keep the pages in the same order. So the program can rely on
> that.
I left that out on purpose: I feel it's exposing an internal
implementation detail as UAPI (as you said). And I'm not convinced it
really needed (or helpful) - see below.
>> >
>> > reinit doesn't feel necessary.
>> > How one would use this interface to send N different packets?
>> > The api provides an interface for only one.
>>
>> By having the XDP program react appropriately. E.g., here is the XDP
>> program used by the trafficgen tool to cycle through UDP ports when
>> sending out the packets - it just reads the current value and updates
>> based on that, so it doesn't matter if it sees the initial page or one
>> it already modified:
>
> Sure. I think there is an untapped potential here.
> With this live packet prog_run anyone can buy 10G or 100G nic equipped
> server and for free transform it into $300k+ IXIA beating machine.
> It could be a game changer. pktgen doesn't come close.
> I'm thinking about generating and consuming test TCP traffic.
> TCP blaster would xmit 1M TCP connections through this live prog_run
> into eth0 and consume the traffic returning from "server under test"
> via a different XDP program attached to eth0.
> The prog_run's xdp prog would need to send SYN, increment sequence number,
> and keep sane data in the packets. It could be HTTP request, for example.
I'm glad you see the potential :)
> To achive this IXIA beating setup the TCP blaster would need a full
> understanding of what page pool is doing with the packets.
> Just saying "in arbitrary order" is a non starter. It diminishes
> this live prog_run into pktgen equivalent which is still useful,
> but lots of potential is lost.
I don't think a detailed knowledge of how the pages are recycled is
needed to implement a TCP stream? Even if you just rely on the packets
being recycled with a fixed period of 128 pages, how does that make your
XDP program simpler? You'll still have to update the packet header for
each packet, with state kept in a map; so why is it helpful to know when
a particular page comes back?
I'll try implementing a TCP stream mode in xdp_trafficgen just to make
sure I'm not missing something. But I believe that sending out a stream
of packets that looks like a coherent TCP stream should be simple
enough, at least. Dealing with the full handshake + CWND control loop
will be harder, though, and right now I think it'll require multiple
trips back to userspace.
>> Another question seeing as the merge window is imminent: How do you feel
>> about merging this before the merge window? I can resubmit before it
>> opens with the updated selftest and documentation, and we can deal with
>> any tweaks during the -rcs; or would you rather postpone the whole
>> thing until the next cycle?
>
> It's already too late for this merge window, but bpf-next is always open.
> Just like it was open for the last year. So please resubmit as soon as
> the tests are green and this discussion is over.
Ah, OK. I was under the impression that the cutoff date was tomorrow;
has that changed? But no worries, I'll spend my Sunday outside instead
of coding, then, and come back to this tomorrow :)
-Toke
next prev parent reply other threads:[~2022-01-09 12:30 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-01-07 21:54 [PATCH bpf-next v7 0/3] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
2022-01-07 21:54 ` [PATCH bpf-next v7 1/3] bpf: Add "live packet" mode for " Toke Høiland-Jørgensen
2022-01-08 2:44 ` Alexei Starovoitov
2022-01-08 13:19 ` Toke Høiland-Jørgensen
2022-01-08 19:28 ` Alexei Starovoitov
2022-01-08 20:19 ` Toke Høiland-Jørgensen
2022-01-09 2:24 ` Alexei Starovoitov
2022-01-09 12:30 ` Toke Høiland-Jørgensen [this message]
2022-01-13 1:37 ` Alexei Starovoitov
2022-02-11 7:19 ` Martin KaFai Lau
2022-02-11 16:03 ` Toke Høiland-Jørgensen
2022-01-07 21:54 ` [PATCH bpf-next v7 2/3] selftests/bpf: Move open_netns() and close_netns() into network_helpers.c Toke Høiland-Jørgensen
2022-01-07 21:54 ` [PATCH bpf-next v7 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Toke Høiland-Jørgensen
2022-01-08 19:32 ` Alexei Starovoitov
2022-01-08 20:29 ` Toke Høiland-Jørgensen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ee5h852v.fsf@toke.dk \
--to=toke@redhat.com \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=hawk@kernel.org \
--cc=john.fastabend@gmail.com \
--cc=kafai@fb.com \
--cc=kpsingh@kernel.org \
--cc=kuba@kernel.org \
--cc=netdev@vger.kernel.org \
--cc=songliubraving@fb.com \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.