From: Jiri Pirko <jiri@resnulli.us>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
John Fastabend <john.fastabend@gmail.com>,
netdev@vger.kernel.org, deb.chatterjee@intel.com,
anjali.singhai@intel.com, Vipin.Jain@amd.com,
namrata.limaye@intel.com, tom@sipanda.io, mleitner@redhat.com,
Mahesh.Shirshyad@amd.com, tomasz.osinski@intel.com,
xiyou.wangcong@gmail.com, davem@davemloft.net,
edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
vladbu@nvidia.com, horms@kernel.org, bpf@vger.kernel.org,
khalidm@nvidia.com, toke@redhat.com, mattyk@nvidia.com,
dan.daly@intel.com, chris.sommers@keysight.com,
john.andy.fingerhut@intel.com
Subject: Re: [PATCH net-next v8 00/15] Introducing P4TC
Date: Wed, 22 Nov 2023 10:25:57 +0100 [thread overview]
Message-ID: <ZV3JJQirPdZpbVIC@nanopsycho> (raw)
In-Reply-To: <CAM0EoMmPnCeU2uLph=uwh3JxtE4RQnvcSA2WdZgORywzNFCO6g@mail.gmail.com>
Tue, Nov 21, 2023 at 04:21:44PM CET, jhs@mojatatu.com wrote:
>On Tue, Nov 21, 2023 at 9:19 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Tue, Nov 21, 2023 at 02:47:40PM CET, jhs@mojatatu.com wrote:
>> >On Tue, Nov 21, 2023 at 8:06 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> >>
>> >> Mon, Nov 20, 2023 at 11:56:50PM CET, jhs@mojatatu.com wrote:
>> >> >On Mon, Nov 20, 2023 at 4:49 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> >> >>
>> >> >> On 11/20/23 8:56 PM, Jamal Hadi Salim wrote:
>> >> >> > On Mon, Nov 20, 2023 at 1:10 PM Jiri Pirko <jiri@resnulli.us> wrote:
>> >> >> >> Mon, Nov 20, 2023 at 03:23:59PM CET, jhs@mojatatu.com wrote:
>> >>
>> >> [...]
>> >>
>> >> >
>> >> >> tc BPF and XDP already have widely used infrastructure and can be developed
>> >> >> against libbpf or other user space libraries for a user space control plane.
>> >> >> With 'control plane' you refer here to the tc / netlink shim you've built,
>> >> >> but looking at the tc command line examples, this doesn't really provide a
>> >> >> good user experience (you call it p4 but people load bpf obj files). If the
>> >> >> expectation is that an operator should run tc commands, then neither it's
>> >> >> a nice experience for p4 nor for BPF folks. From a BPF PoV, we moved over
>> >> >> to bpf_mprog and plan to also extend this for XDP to have a common look and
>> >> >> feel wrt networking for developers. Why can't this be reused?
>> >> >
>> >> >The filter loading which loads the program is considered pipeline
>> >> >instantiation - consider it as "provisioning" more than "control"
>> >> >which runs at runtime. "control" is purely netlink based. The iproute2
>> >> >code we use links libbpf for example for the filter. If we can achieve
>> >> >the same with bpf_mprog then sure - we just dont want to loose
>> >> >functionality though. off top of my head, some sample space:
>> >> >- we could have multiple pipelines with different priorities (which tc
>> >> >provides to us) - and each pipeline may have its own logic with many
>> >> >tables etc (and the choice to iterate the next one is essentially
>> >> >encoded in the tc action codes)
>> >> >- we use tc block to map groups of ports (which i dont think bpf has
>> >> >internal access of)
>> >> >
>> >> >In regards to usability: no i dont expect someone doing things at
>> >> >scale to use command line tc. The APIs are via netlink. But the tc cli
>> >> >is must for the rest of the masses per our traditions. Also i really
>> >>
>> >> I don't follow. You repeatedly mention "the must of the traditional tc
>> >> cli", but what of the existing traditional cli you use for p4tc?
>> >> If I look at the examples, pretty much everything looks new to me.
>> >> Example:
>> >>
>> >> tc p4ctrl create myprog/table/mytable dstAddr 10.0.1.2/32 \
>> >> action send_to_port param port eno1
>> >>
>> >> This is just TC/RTnetlink used as a channel to pass new things over. If
>> >> that is the case, what's traditional here?
>> >>
>> >
>> >
>> >What is not traditional about it?
>>
>> Okay, so in that case, the following example communitating with
>> userspace deamon using imaginary "p4ctrl" app is equally traditional:
>> $ p4ctrl create myprog/table/mytable dstAddr 10.0.1.2/32 \
>> action send_to_port param port eno1
>
>Huh? Thats just an application - classical tc which part of iproute2
>that is sending to the kernel, no different than "tc flower.."
>Where do you get the "userspace" daemon part? Yes, you can write a
>daemon but it will use the same APIs as tc.
Okay, so which part is the "tradition"?
>
>>
>> >
>> >>
>> >> >didnt even want to use ebpf at all for operator experience reasons -
>> >> >it requires a compilation of the code and an extra loading compared to
>> >> >what our original u32/pedit code offered.
>> >> >
>> >> >> I don't quite follow why not most of this could be implemented entirely in
>> >> >> user space without the detour of this and you would provide a developer
>> >> >> library which could then be integrated into a p4 runtime/frontend? This
>> >> >> way users never interface with ebpf parts nor tc given they also shouldn't
>> >> >> have to - it's an implementation detail. This is what John was also pointing
>> >> >> out earlier.
>> >> >>
>> >> >
>> >> >Netlink is the API. We will provide a library for object manipulation
>> >> >which abstracts away the need to know netlink. Someone who for their
>> >> >own reasons wants to use p4runtime or TDI could write on top of this.
>> >> >I would not design a kernel interface to just meet p4runtime (we
>> >> >already have TDI which came later which does things differently). So i
>> >> >expect us to support both those two. And if i was to do something on
>> >> >SDN that was more robust i would write my own that still uses these
>> >> >netlink interfaces.
>> >>
>> >> Actually, what Daniel says about the p4 library used as a backend to p4
>> >> frontend is pretty much aligned what I claimed on the p4 calls couple of
>> >> times. If you have this p4 userspace tooling, it is easy for offloads to
>> >> replace the backed by vendor-specific library which allows p4 offload
>> >> suitable for all vendors (your plan of p4tc offload does not work well
>> >> for our hw, as we repeatedly claimed).
>> >>
>> >
>> >That's you - NVIDIA. You have chosen a path away from the kernel
>> >towards DOCA. I understand NVIDIA's frustration with dealing with
>> >upstream process (which has been cited to me as a good reason for
>> >DOCA) but please dont impose these values and your politics on other
>> >vendors(Intel, AMD for example) who are more than willing to invest
>> >into making the kernel interfaces the path forward. Your choice.
>>
>> No, you are missing the point. This has nothing to do with DOCA.
>
>Right Jiri ;->
>
>> This
>> has to do with the simple limitation of your offload assuming there are
>> no runtime changes in the compiled pipeline. For Intel, maybe they
>> aren't, and it's a good fit for them. All I say is, that it is not the
>> good fit for everyone.
>
> a) it is not part of the P4 spec to dynamically make changes to the
>datapath pipeline after it is create and we are discussing a P4
Isn't this up to the implementation? I mean from the p4 perspective,
everything is static. Hw might need to reshuffle the pipeline internally
during rule insertion/remove in order to optimize the layout.
>implementation not an extension that would add more value b) We are
>more than happy to add extensions in the future to accomodate for
>features but first _P4 spec_ must be met c) we had longer discussions
>with Matty, Khalid and the Rice folks who wrote a paper on that topic
>which you probably didnt attend and everything that needs to be done
>can be from user space today for all those optimizations.
>
>Conclusion is: For what you need to do (which i dont believe is a
>limitation in your hardware rather a design decision on your part) run
>your user space daemon, do optimizations and update the datapath.
>Everybody is happy.
Should the userspace daemon listen on inserted rules to be offloade
over netlink?
>
>>
>> >Nobody is stopping you from offering your customers proprietary
>> >solutions which include a specific ebpf approach alongside DOCA. We
>> >believe that a singular interface regardless of the vendor is the
>> >right way forward. IMHO, this siloing that unfortunately is also added
>> >by eBPF being a double edged sword is not good for the community.
>> >
>> >> As I also said on the p4 call couple of times, I don't see the kernel
>> >> as the correct place to do the p4 abstractions. Why don't you do it in
>> >> userspace and give vendors possiblity to have p4 backends with compilers,
>> >> runtime optimizations etc in userspace, talking to the HW in the
>> >> vendor-suitable way too. Then the SW implementation could be easily eBPF
>> >> and the main reason (I believe) why you need to have this is TC
>> >> (offload) is then void.
>> >>
>> >> The "everyone wants to use TC/netlink" claim does not seem correct
>> >> to me. Why not to have one Linux p4 solution that fits everyones needs?
>> >
>> >You mean more fitting to the DOCA world? no, because iam a kernel
>>
>> Again, this has 0 relation to DOCA.
>>
>>
>> >first person and kernel interfaces are good for everyone.
>>
>> Yeah, not really. Not always the kernel is the right answer. Your/Intel
>> plan to handle the offload by:
>> 1) abuse devlink to flash p4 binary
>> 2) parse the binary in kernel to match to the table ids of rules coming
>> from p4tc ndo_setup_tc
>> 3) abuse devlink to flash p4 binary for tc-flower
>> 4) parse the binary in kernel to match to the table ids of rules coming
>> from tc-flower ndo_setup_tc
>> is really something that is making me a little bit nauseous.
>>
>> If you don't have a feasible plan to do the offload, p4tc does not make
>> sense to me to be honest.
>
>You mean if there's no plan to match your (NVIDIA?) point of view.
>For #1 - how's this different from DDP? Wasnt that your suggestion to
I doubt that. Any flashing-blob-parsing-in-kernel is something I'm
opposed to from day 1.
>begin with? For #2 Nobody is proposing to do anything of the sort. The
>ndo is passed IDs for the objects and associated contents. For #3+#4
During offload, you need to parse the blob in driver to be able to match
the ids with blob entities. That was presented by you/Intel in the past
IIRC.
>tc flower thing has nothing to do with P4TC that was just some random
>proposal someone made seeing if they could ride on top of P4TC.
Yeah, it's not yet merged and already mentally used for abuse. I love
that :)
>
>Besides this nobody really has to satisfy your point of view - like i
>said earlier feel free to provide proprietary solutions. From a
>consumer perspective I would not want to deal with 4 different
>vendors with 4 different proprietary approaches. The kernel is the
>unifying part. You seemed happier with tc flower just not with the
Yeah, that is my point, why the unifying part can't be a userspace
daemon/library with multiple backends (p4tc, bpf, vendorX, vendorY, ..)?
I just don't see the kernel as a good fit for abstraction here,
given the fact that the vendor compilers does not run in kernel.
That is breaking your model.
>kernel process - which is ironically the same thing we are going
>through here ;->
>
>cheers,
>jamal
>
>>
>> >
>> >cheers,
>> >jamal
next prev parent reply other threads:[~2023-11-22 9:26 UTC|newest]
Thread overview: 79+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-11-16 14:59 [PATCH net-next v8 00/15] Introducing P4TC Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 01/15] net: sched: act_api: Introduce dynamic actions list Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 02/15] net/sched: act_api: increase action kind string length Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 03/15] net/sched: act_api: Update tc_action_ops to account for dynamic actions Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 04/15] net/sched: act_api: add struct p4tc_action_ops as a parameter to lookup callback Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 05/15] net: sched: act_api: Add support for preallocated dynamic action instances Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 06/15] net: introduce rcu_replace_pointer_rtnl Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 07/15] rtnl: add helper to check if group has listeners Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 08/15] p4tc: add P4 data types Jamal Hadi Salim
2023-11-16 16:03 ` Jiri Pirko
2023-11-17 12:01 ` Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 09/15] p4tc: add template pipeline create, get, update, delete Jamal Hadi Salim
2023-11-16 16:11 ` Jiri Pirko
2023-11-17 12:09 ` Jamal Hadi Salim
2023-11-20 8:18 ` Jiri Pirko
2023-11-20 12:48 ` Jamal Hadi Salim
2023-11-20 13:16 ` Jiri Pirko
2023-11-20 15:30 ` Jamal Hadi Salim
2023-11-20 16:25 ` Jiri Pirko
2023-11-20 18:20 ` David Ahern
2023-11-20 20:12 ` Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 10/15] p4tc: add action template create, update, delete, get, flush and dump Jamal Hadi Salim
2023-11-16 16:28 ` Jiri Pirko
2023-11-17 15:11 ` Jamal Hadi Salim
2023-11-20 8:19 ` Jiri Pirko
2023-11-20 13:45 ` Jamal Hadi Salim
2023-11-20 16:25 ` Jiri Pirko
2023-11-17 6:51 ` John Fastabend
2023-11-16 14:59 ` [PATCH net-next v8 11/15] p4tc: add template table " Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 12/15] p4tc: add runtime table entry create, update, get, delete, " Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 13/15] p4tc: add set of P4TC table kfuncs Jamal Hadi Salim
2023-11-17 7:09 ` John Fastabend
2023-11-19 9:14 ` kernel test robot
2023-11-20 22:28 ` kernel test robot
2023-11-16 14:59 ` [PATCH net-next v8 14/15] p4tc: add P4 classifier Jamal Hadi Salim
2023-11-17 7:17 ` John Fastabend
2023-11-16 14:59 ` [PATCH net-next v8 15/15] p4tc: Add P4 extern interface Jamal Hadi Salim
2023-11-16 16:42 ` Jiri Pirko
2023-11-17 12:14 ` Jamal Hadi Salim
2023-11-20 8:22 ` Jiri Pirko
2023-11-20 14:02 ` Jamal Hadi Salim
2023-11-20 16:27 ` Jiri Pirko
2023-11-20 19:00 ` Jamal Hadi Salim
2023-11-17 6:27 ` [PATCH net-next v8 00/15] Introducing P4TC John Fastabend
2023-11-17 12:49 ` Jamal Hadi Salim
2023-11-17 18:37 ` John Fastabend
2023-11-17 20:46 ` Jamal Hadi Salim
2023-11-20 9:39 ` Jiri Pirko
2023-11-20 14:23 ` Jamal Hadi Salim
2023-11-20 18:10 ` Jiri Pirko
2023-11-20 19:56 ` Jamal Hadi Salim
2023-11-20 20:41 ` John Fastabend
2023-11-20 22:13 ` Jamal Hadi Salim
2023-11-20 21:48 ` Daniel Borkmann
2023-11-20 22:56 ` Jamal Hadi Salim
2023-11-21 13:06 ` Jiri Pirko
2023-11-21 13:47 ` Jamal Hadi Salim
2023-11-21 14:19 ` Jiri Pirko
2023-11-21 15:21 ` Jamal Hadi Salim
2023-11-22 9:25 ` Jiri Pirko [this message]
2023-11-22 15:14 ` Jamal Hadi Salim
2023-11-22 18:31 ` Jiri Pirko
2023-11-22 18:50 ` John Fastabend
2023-11-22 19:35 ` Jamal Hadi Salim
2023-11-23 6:36 ` Jiri Pirko
2023-11-23 13:22 ` Jamal Hadi Salim
2023-11-23 13:34 ` Jiri Pirko
2023-11-23 13:45 ` Jamal Hadi Salim
2023-11-23 14:07 ` Jiri Pirko
2023-11-23 14:28 ` Jamal Hadi Salim
2023-11-23 15:27 ` Jiri Pirko
2023-11-23 16:30 ` Jamal Hadi Salim
2023-11-23 17:53 ` Edward Cree
2023-11-23 18:09 ` Jiri Pirko
2023-11-23 18:58 ` Jamal Hadi Salim
2023-11-23 18:53 ` Jakub Kicinski
2023-11-23 19:42 ` Tom Herbert
2023-11-24 10:39 ` Jiri Pirko
2023-11-23 18:04 ` Jiri Pirko
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZV3JJQirPdZpbVIC@nanopsycho \
--to=jiri@resnulli.us \
--cc=Mahesh.Shirshyad@amd.com \
--cc=Vipin.Jain@amd.com \
--cc=anjali.singhai@intel.com \
--cc=bpf@vger.kernel.org \
--cc=chris.sommers@keysight.com \
--cc=dan.daly@intel.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=deb.chatterjee@intel.com \
--cc=edumazet@google.com \
--cc=horms@kernel.org \
--cc=jhs@mojatatu.com \
--cc=john.andy.fingerhut@intel.com \
--cc=john.fastabend@gmail.com \
--cc=khalidm@nvidia.com \
--cc=kuba@kernel.org \
--cc=mattyk@nvidia.com \
--cc=mleitner@redhat.com \
--cc=namrata.limaye@intel.com \
--cc=netdev@vger.kernel.org \
--cc=pabeni@redhat.com \
--cc=toke@redhat.com \
--cc=tom@sipanda.io \
--cc=tomasz.osinski@intel.com \
--cc=vladbu@nvidia.com \
--cc=xiyou.wangcong@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox