All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Toke Høiland-Jørgensen" <toke@redhat.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Alexei Starovoitov <ast@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Andrii Nakryiko <andrii@kernel.org>,
	Martin KaFai Lau <kafai@fb.com>, Song Liu <songliubraving@fb.com>,
	Yonghong Song <yhs@fb.com>,
	John Fastabend <john.fastabend@gmail.com>,
	KP Singh <kpsingh@kernel.org>,
	"David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	Jesper Dangaard Brouer <hawk@kernel.org>,
	Network Development <netdev@vger.kernel.org>,
	bpf <bpf@vger.kernel.org>
Subject: Re: [PATCH bpf-next v7 1/3] bpf: Add "live packet" mode for XDP in bpf_prog_run()
Date: Sat, 08 Jan 2022 14:19:13 +0100	[thread overview]
Message-ID: <87pmp28iwe.fsf@toke.dk> (raw)
In-Reply-To: <CAADnVQ+uftgnRQa5nvG4FTJga_=_FMAGxuiPB3O=AFKfEdOg=A@mail.gmail.com>

Alexei Starovoitov <alexei.starovoitov@gmail.com> writes:

> On Fri, Jan 7, 2022 at 1:54 PM Toke Høiland-Jørgensen <toke@redhat.com> wrote:
>>
>> Because the data pages are recycled by the page pool, and the test runner
>> doesn't re-initialise them for each run, subsequent invocations of the XDP
>> program will see the packet data in the state it was after the last time it
>> ran on that particular page. This means that an XDP program that modifies
>> the packet before redirecting it has to be careful about which assumptions
>> it makes about the packet content, but that is only an issue for the most
>> naively written programs.
>
> This is too vague and partially incorrect.
> The bpf program can do bpf_xdp_adjust_meta() and otherwise change
> packet boundaries. These effects will be seen by subsequent
> XDP_PASS/TX/REDIRECT, but on the next iteration the boundaries
> will get reset to the original values.
> So the test runner actually re-initializes some parts of the data,
> but not the contents of the packet.
> At least that's my understanding of the patch.

Yes, that's correct. Boundaries will be reset, data won't. The boundary
reset was added later, though, so guess I neglected to update the commit
message. Will fix.

> The users shouldn't need to dig into implementation to discover this.
> Please document it.
> The more I think about it the more I believe that it warrants
> a little blurb in Documentation/bpf/ that describes what one can
> do with this "xdp live mode".

Sure, can do. Doesn't look like BPF_PROG_RUN is documented in there at
all, so guess I can start such a document :)

> Another question comes to mind:
> What happens when a program modifies the packet?
> Does it mean that the 2nd frame will see the modified data?
> It will not, right?
> It's the page pool size of packets that will be inited the same way
> at the beginning. Which is NAPI_POLL_WEIGHT * 2 == 128 packets.
> Why this number?

Yes, you're right: the next run won't see the modified packet data. The
128 pages is because we run the program loop in batches of 64 (like NAPI
does, the fact that TEST_XDP_BATCH and NAPI_POLL_WEIGHT are the same is
not a coincidence).

We need 2x because we want enough pages so we can keep running without
allocating more, and the first batch can still be in flight on a
different CPU while we're processing batch 2.

I experimented with different values, and 128 was the minimum size that
didn't have a significant negative impact on performance, and above that
saw diminishing returns.

> Should it be configurable?
> Then the user can say: init N packets with this one pattern
> and the program will know that exactly N invocation will be
> with the same data, but N+1 it will see the 1st packet again
> that potentially was modified by the program.
> Is it accurate?

I thought about making it configurable, but the trouble is that it's not
quite as straight-forward as the first N packets being "pristine": it
depends on what happens to the packet afterwards:

On XDP_DROP, the page will be recycled immediately, whereas on
XDP_{TX,REDIRECT} it will go through the egress driver after sitting in
the bulk queue for a little while, so you can get reordering compared to
the original execution order.

On XDP_PASS the kernel will release the page entirely from the pool when
building an skb, so you'll never see that particular page again (and
eventually page_pool will allocate a new batch that will be
re-initialised to the original value).

If we do want to support a "pristine data" mode, I think the least
cumbersome way would be to add a flag that would make the kernel
re-initialise the packet data before every program invocation. The
reason I didn't do this was because I didn't have a use case for it. The
traffic generator use case only rewrites a tiny bit of the packet
header, and it's just as easy to just keep rewriting it without assuming
a particular previous value. And there's also the possibility of just
calling bpf_prog_run() multiple times from userspace with a lower number
of repetitions...

I'm not opposed to adding such a flag if you think it would be useful,
though. WDYT?

-Toke


  reply	other threads:[~2022-01-08 13:19 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-01-07 21:54 [PATCH bpf-next v7 0/3] Add support for transmitting packets using XDP in bpf_prog_run() Toke Høiland-Jørgensen
2022-01-07 21:54 ` [PATCH bpf-next v7 1/3] bpf: Add "live packet" mode for " Toke Høiland-Jørgensen
2022-01-08  2:44   ` Alexei Starovoitov
2022-01-08 13:19     ` Toke Høiland-Jørgensen [this message]
2022-01-08 19:28       ` Alexei Starovoitov
2022-01-08 20:19         ` Toke Høiland-Jørgensen
2022-01-09  2:24           ` Alexei Starovoitov
2022-01-09 12:30             ` Toke Høiland-Jørgensen
2022-01-13  1:37               ` Alexei Starovoitov
2022-02-11  7:19                 ` Martin KaFai Lau
2022-02-11 16:03                   ` Toke Høiland-Jørgensen
2022-01-07 21:54 ` [PATCH bpf-next v7 2/3] selftests/bpf: Move open_netns() and close_netns() into network_helpers.c Toke Høiland-Jørgensen
2022-01-07 21:54 ` [PATCH bpf-next v7 3/3] selftests/bpf: Add selftest for XDP_REDIRECT in bpf_prog_run() Toke Høiland-Jørgensen
2022-01-08 19:32   ` Alexei Starovoitov
2022-01-08 20:29     ` Toke Høiland-Jørgensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pmp28iwe.fsf@toke.dk \
    --to=toke@redhat.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=hawk@kernel.org \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=songliubraving@fb.com \
    --cc=yhs@fb.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.