From: John Fastabend <john.fastabend@gmail.com>
To: "Toke Høiland-Jørgensen" <toke@redhat.com>,
"Björn Töpel" <bjorn.topel@intel.com>,
"John Fastabend" <john.fastabend@gmail.com>,
"Maciej Fijalkowski" <maciej.fijalkowski@intel.com>,
daniel@iogearbox.net, ast@kernel.org, bpf@vger.kernel.org,
netdev@vger.kernel.org
Cc: andrii@kernel.org, magnus.karlsson@intel.com, ciara.loftus@intel.com
Subject: Re: [PATCH bpf-next 1/3] libbpf: xsk: use bpf_link
Date: Tue, 16 Feb 2021 11:15:41 -0800 [thread overview]
Message-ID: <602c19dd5b1c2_6b719208e6@john-XPS-13-9370.notmuch> (raw)
In-Reply-To: <87mtw4b8k3.fsf@toke.dk>
Toke Høiland-Jørgensen wrote:
> Björn Töpel <bjorn.topel@intel.com> writes:
>
> > On 2021-02-15 21:49, John Fastabend wrote:
> >> Maciej Fijalkowski wrote:
> >>> Currently, if there are multiple xdpsock instances running on a single
> >>> interface and in case one of the instances is terminated, the rest of
> >>> them are left in an inoperable state due to the fact of unloaded XDP
> >>> prog from interface.
> >>>
> >>> To address that, step away from setting bpf prog in favour of bpf_link.
> >>> This means that refcounting of BPF resources will be done automatically
> >>> by bpf_link itself.
> >>>
> >>> When setting up BPF resources during xsk socket creation, check whether
> >>> bpf_link for a given ifindex already exists via set of calls to
> >>> bpf_link_get_next_id -> bpf_link_get_fd_by_id -> bpf_obj_get_info_by_fd
> >>> and comparing the ifindexes from bpf_link and xsk socket.
> >>>
> >>> If there's no bpf_link yet, create one for a given XDP prog and unload
> >>> explicitly existing prog if XDP_FLAGS_UPDATE_IF_NOEXIST is not set.
> >>>
> >>> If bpf_link is already at a given ifindex and underlying program is not
> >>> AF-XDP one, bail out or update the bpf_link's prog given the presence of
> >>> XDP_FLAGS_UPDATE_IF_NOEXIST.
> >>>
> >>> Signed-off-by: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
> >>> ---
[...]
> >>> - ctx->prog_fd = prog_fd;
> >>> + link_fd = bpf_link_create(ctx->prog_fd, xsk->ctx->ifindex, BPF_XDP, &opts);
> >>> + if (link_fd < 0) {
> >>> + pr_warn("bpf_link_create failed: %s\n", strerror(errno));
> >>> + return link_fd;
> >>> + }
> >>> +
> >>
> >> This can leave the system in a bad state where it unloaded the XDP program
> >> above, but then failed to create the link. So we should somehow fix that
> >> if possible or at minimum put a note somewhere so users can't claim they
> >> shouldn't know this.
> >>
> >> Also related, its not good for real systems to let XDP program go missing
> >> for some period of time. I didn't check but we should make
> >> XDP_FLAGS_UPDATE_IF_NOEXIST the default if its not already.
> >>
> >
> > This is the default for XDP sockets library. The
> > "bpf_set_link_xdp_fd(...-1)" way is only when a user sets it explicitly.
> > One could maybe argue that the "force remove" would be out of scope for
> > AF_XDP; Meaning that if an XDP program is running, attached via netlink,
> > the AF_XDP library simply cannot remove it. The user would need to rely
> > on some other mechanism.
>
> Yeah, I'd tend to agree with that. In general, I think the proliferation
> of "just force-remove (or override) the running program" in code and
> instructions has been a mistake; and application should only really be
> adding and removing its own program...
>
> -Toke
>
I'll try to consolidate some of my opinions from a couple threads here. It
looks to me many of these issues are self-inflicted by the API. We built
the API with out the right abstractions or at least different abstractions
from the rest of the BPF APIs. Not too surprising seeing the kernel side
and user side were all being built up at once.
For example this specific issue where the xsk API also deletes the XDP
program seems to be due to merging the xsk with the creation of the XDP
programs.
I'm not a real user of AF_XDP (yet.), but here is how I would expect it
to work based on how the sockmap pieces work, which are somewhat similar
given they also deal with sockets.
Program
(1) load and pin an XDP BPF program
- obj = bpf_object__open(prog);
- bpf_object__load_xattr(&attr);
- bpf_program__pin()
(2) pin the map, find map_xsk using any of the map APIs
- bpf_map__pin(map_xsk, path_to_pin)
(3) attach to XDP
- link = bpf_program__attach_xdp()
- bpf_link__pin()
At this point you have a BPF program loaded, a xsk map, and a link all
pinned and ready. And we can add socks using the process found in
`enter_xsks_into_map` in the sample. This can be the same program that
loaded/pinned the XDP program or some other program it doesn't really
matter.
- create xsk fd
. xsk_umem__create()
. xsk_socket__create
- open map @ pinned path
- bpf_map_update_elem(xsks_map, &key, &fd, 0);
Then it looks like we don't have any conflicts? The XDP program is pinned
and exists in its normal scope. The xsk elements can be added/deleted
as normal. If the XDP program is removed and the map referencing (using
normal ref rules) reaches zero its also deleted. Above is more or less
the same flow we use for any BPF program so looks good to me.
The trouble seems to pop up when using the higher abstraction APIs
xsk_setup_xdp_prog and friends I guess? I just see above as already
fairly easy to use and we have good helpers to create the sockets it looks
like. Maybe I missed some design considerations. IMO higher level
abstractions should go in new libxdp and above should stay in libbpf.
/rant off ;)
Thanks,
John
next prev parent reply other threads:[~2021-02-16 19:16 UTC|newest]
Thread overview: 37+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-02-15 15:46 [PATCH bpf-next 0/3] Introduce bpf_link in libbpf's xsk Maciej Fijalkowski
2021-02-15 15:46 ` [PATCH bpf-next 1/3] libbpf: xsk: use bpf_link Maciej Fijalkowski
2021-02-15 17:07 ` Toke Høiland-Jørgensen
2021-02-15 17:38 ` Björn Töpel
2021-02-15 19:35 ` Toke Høiland-Jørgensen
2021-02-16 2:01 ` Maciej Fijalkowski
2021-02-16 9:15 ` Björn Töpel
2021-02-16 10:27 ` Toke Høiland-Jørgensen
2021-02-16 20:15 ` Maciej Fijalkowski
2021-02-15 20:22 ` John Fastabend
2021-02-15 21:38 ` Toke Høiland-Jørgensen
2021-02-16 0:18 ` John Fastabend
2021-02-16 2:23 ` Maciej Fijalkowski
2021-02-16 9:23 ` Björn Töpel
2021-02-16 10:36 ` Toke Høiland-Jørgensen
2021-02-23 1:15 ` Andrii Nakryiko
2021-02-17 2:23 ` Dan Siemon
2021-02-17 7:16 ` Magnus Karlsson
2021-02-17 7:36 ` Magnus Karlsson
2021-02-16 2:10 ` Maciej Fijalkowski
2021-02-15 20:49 ` John Fastabend
2021-02-16 2:38 ` Maciej Fijalkowski
2021-02-16 18:19 ` John Fastabend
2021-02-16 20:10 ` Maciej Fijalkowski
2021-02-16 9:20 ` Björn Töpel
2021-02-16 10:39 ` Toke Høiland-Jørgensen
2021-02-16 19:15 ` John Fastabend [this message]
2021-02-16 20:50 ` Maciej Fijalkowski
2021-02-16 21:17 ` John Fastabend
2021-02-15 15:46 ` [PATCH bpf-next 2/3] libbpf: clear map_info before each bpf_obj_get_info_by_fd Maciej Fijalkowski
2021-02-15 20:33 ` John Fastabend
2021-02-16 2:42 ` Maciej Fijalkowski
2021-02-15 15:46 ` [PATCH bpf-next 3/3] samples: bpf: do not unload prog within xdpsock Maciej Fijalkowski
2021-02-15 20:24 ` John Fastabend
2021-02-16 9:22 ` Björn Töpel
2021-02-16 14:15 ` Maciej Fijalkowski
2021-02-15 16:07 ` [PATCH bpf-next 0/3] Introduce bpf_link in libbpf's xsk Björn Töpel
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=602c19dd5b1c2_6b719208e6@john-XPS-13-9370.notmuch \
--to=john.fastabend@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bjorn.topel@intel.com \
--cc=bpf@vger.kernel.org \
--cc=ciara.loftus@intel.com \
--cc=daniel@iogearbox.net \
--cc=maciej.fijalkowski@intel.com \
--cc=magnus.karlsson@intel.com \
--cc=netdev@vger.kernel.org \
--cc=toke@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox