From: Maciej Fijalkowski <maciej.fijalkowski@intel.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>, bpf <bpf@vger.kernel.org>,
"Alexei Starovoitov" <ast@kernel.org>,
Daniel Borkmann <daniel@iogearbox.net>,
"Andrii Nakryiko" <andrii@kernel.org>,
Network Development <netdev@vger.kernel.org>,
"Karlsson, Magnus" <magnus.karlsson@intel.com>,
Martin KaFai Lau <martin.lau@linux.dev>,
Amery Hung <ameryhung@gmail.com>
Subject: Re: [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs
Date: Thu, 27 Feb 2025 19:40:55 +0100 [thread overview]
Message-ID: <Z8Cxt+NrrO5L/AgP@boxer> (raw)
In-Reply-To: <CAADnVQ++0o-YoFR=yUcbJvaLX2rmWjC2LHjc3yyCp+bkgWGn1Q@mail.gmail.com>
On Wed, Feb 26, 2025 at 09:30:18AM -0800, Alexei Starovoitov wrote:
> On Tue, Feb 25, 2025 at 6:27 AM Maciej Fijalkowski
> <maciej.fijalkowski@intel.com> wrote:
> >
> > On Fri, Feb 21, 2025 at 05:55:57PM -0800, Alexei Starovoitov wrote:
> > > On Thu, Feb 20, 2025 at 5:45 AM Maciej Fijalkowski
> > > <maciej.fijalkowski@intel.com> wrote:
> > > >
> > > > Hi!
> > > >
> > > > This patchset provides what is needed for storing skbs as kptrs in bpf
> > > > maps. We start with necessary kernel change as discussed at [0] with
> > > > Martin, then next patch adds kfuncs for handling skb refcount and on top
> > > > of that a test case is added where one program stores skbs and then next
> > > > program is able to retrieve them from map.
> > > >
> > > > Martin, regarding the kernel change I decided to go with boolean
> > > > approach instead of what you initially suggested. Let me know if it
> > > > works for you.
> > > >
> > > > Thanks,
> > > > Maciej
> > > >
> > > > [0]: https://lore.kernel.org/bpf/Z0X%2F9PhIhvQwsgfW@boxer/
> > >
> > > Before we go further we need a lot more details on "why" part.
> > > In the old thread I was able to find:
> > >
> > > > On TC egress hook skb is stored in a map ...
> > > > During TC ingress hook on the same interface, the skb that was previously
> > > stored in map is retrieved ...
> > >
> > > This is too cryptic. I see several concerns with such use case
> > > including netns crossing, L2/L3 mismatch, skb_scrub.
> > >
> > > I doubt we can make such "skb stash in a map" safe without
> > > restricting the usage, so please provide detailed
> > > description of the use case.
> >
> > Hi Alexei,
> >
> > We have a system with two nodes: one is a fully fledged Linux system (big node)
> > and the other one a smaller embedded system. The big node runs Linux PTP for
> > time synchronization, the smaller node we have no control over (might run Linux
> > or something else). The problem is that we would like to use the Tx timestamps
> > from the small node in the Linux PTP application on the big node. When a packet
> > is sent out from the big node it arrives at the small node that send it out one
> > of its interfaces. It then replies with another packet back to the big node
> > with the Tx timestamp in it.
> >
> > Our current PoC for attacking this is to store the skb in a map (using this
> > patch set) when it is sent out from the big node then retrieve it from the map
> > when the reply from the small node is received. We then take the timestamp from
> > the packet and put it in the skb and send it up to the socket error queue so
> > that Linux PTP works out of the box.
>
> This sounds like you're actually xmit-ing the skb out of the big node
> and storing it in a map via simple refcnt++.
> That may work in some setups, but in general is not quite correct
> from networking stack pov.
> You need to skb_clone() it and keep the clone, so only cloned skb
> can go into the socket error queue and up to user space.
> xmit-ing the same skb and sending to user space
> is going to cause issues.
This skb only goes to errqueue and then gets its refcount decremented
and dropped, we do not xmit that skb per-se.
>
> Cleaner design would probably involve bpf_clone_redirect()
> and may be some form of bpf-qdisc where packet is waiting in the queue
> until its hwtimestamp is adjusted and its released from the queue
> into user space.
We were working on clone initially but map storage turned out to be much
cleaner approach, but we can re-iterate on that in case explanation
provided above regarding not xmitting stored skb still feels off for you.
>
> What happens when small node doesn't send that timestamped packet?
> The map will eventually overflow, right?
> Overall it feels that stashing skb-s in a map isn't the right approach.
We did not handle that in our demo but I believe BPF timers would be
useful here, wouldn't they?
>
> > Note that for the purpose of setting skb->hwtstamp and sending it up to the
> > error queue we are adding a kfunc (bpf_tx_tstamp) responsible for it, which is
> > not included in this set, so I understand it is not obvious what we achieved
> > with the current form of this patch set being discussed.
> >
> > We did not consider the restrictions that should be implemented from netns POV,
> > so that is a good point. How would you see this being fixed?
> >
prev parent reply other threads:[~2025-02-27 18:41 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2025-02-20 13:45 [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Maciej Fijalkowski
2025-02-20 13:45 ` [PATCH bpf-next 1/3] bpf: call btf_is_projection_of() conditionally Maciej Fijalkowski
2025-02-20 13:45 ` [PATCH bpf-next 2/3] bpf: add kfunc for skb refcounting Maciej Fijalkowski
2025-02-20 23:25 ` Amery Hung
2025-02-21 14:56 ` Maciej Fijalkowski
2025-02-21 19:11 ` Amery Hung
2025-02-21 20:17 ` Maciej Fijalkowski
2025-02-21 23:40 ` Amery Hung
2025-02-21 23:57 ` Amery Hung
2025-02-20 13:45 ` [PATCH bpf-next 3/3] selftests: bpf: implement test case for skb kptr map storage Maciej Fijalkowski
2025-02-22 1:55 ` [PATCH bpf-next 0/3] bpf: introduce skb refcount kfuncs Alexei Starovoitov
2025-02-25 14:27 ` Maciej Fijalkowski
2025-02-26 17:30 ` Alexei Starovoitov
2025-02-27 18:40 ` Maciej Fijalkowski [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Z8Cxt+NrrO5L/AgP@boxer \
--to=maciej.fijalkowski@intel.com \
--cc=alexei.starovoitov@gmail.com \
--cc=ameryhung@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=kuba@kernel.org \
--cc=magnus.karlsson@intel.com \
--cc=martin.lau@linux.dev \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.