public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed
From: Jiri Pirko <jiri@resnulli.us>
To: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Daniel Borkmann <daniel@iogearbox.net>,
	John Fastabend <john.fastabend@gmail.com>,
	netdev@vger.kernel.org, deb.chatterjee@intel.com,
	anjali.singhai@intel.com, Vipin.Jain@amd.com,
	namrata.limaye@intel.com, tom@sipanda.io, mleitner@redhat.com,
	Mahesh.Shirshyad@amd.com, tomasz.osinski@intel.com,
	xiyou.wangcong@gmail.com, davem@davemloft.net,
	edumazet@google.com, kuba@kernel.org, pabeni@redhat.com,
	vladbu@nvidia.com, horms@kernel.org, bpf@vger.kernel.org,
	khalidm@nvidia.com, toke@redhat.com, mattyk@nvidia.com,
	dan.daly@intel.com, chris.sommers@keysight.com,
	john.andy.fingerhut@intel.com
Subject: Re: [PATCH net-next v8 00/15] Introducing P4TC
Date: Wed, 22 Nov 2023 10:25:57 +0100	[thread overview]
Message-ID: <ZV3JJQirPdZpbVIC@nanopsycho> (raw)
In-Reply-To: <CAM0EoMmPnCeU2uLph=uwh3JxtE4RQnvcSA2WdZgORywzNFCO6g@mail.gmail.com>

Tue, Nov 21, 2023 at 04:21:44PM CET, jhs@mojatatu.com wrote:
>On Tue, Nov 21, 2023 at 9:19 AM Jiri Pirko <jiri@resnulli.us> wrote:
>>
>> Tue, Nov 21, 2023 at 02:47:40PM CET, jhs@mojatatu.com wrote:
>> >On Tue, Nov 21, 2023 at 8:06 AM Jiri Pirko <jiri@resnulli.us> wrote:
>> >>
>> >> Mon, Nov 20, 2023 at 11:56:50PM CET, jhs@mojatatu.com wrote:
>> >> >On Mon, Nov 20, 2023 at 4:49 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> >> >>
>> >> >> On 11/20/23 8:56 PM, Jamal Hadi Salim wrote:
>> >> >> > On Mon, Nov 20, 2023 at 1:10 PM Jiri Pirko <jiri@resnulli.us> wrote:
>> >> >> >> Mon, Nov 20, 2023 at 03:23:59PM CET, jhs@mojatatu.com wrote:
>> >>
>> >> [...]
>> >>
>> >> >
>> >> >> tc BPF and XDP already have widely used infrastructure and can be developed
>> >> >> against libbpf or other user space libraries for a user space control plane.
>> >> >> With 'control plane' you refer here to the tc / netlink shim you've built,
>> >> >> but looking at the tc command line examples, this doesn't really provide a
>> >> >> good user experience (you call it p4 but people load bpf obj files). If the
>> >> >> expectation is that an operator should run tc commands, then neither it's
>> >> >> a nice experience for p4 nor for BPF folks. From a BPF PoV, we moved over
>> >> >> to bpf_mprog and plan to also extend this for XDP to have a common look and
>> >> >> feel wrt networking for developers. Why can't this be reused?
>> >> >
>> >> >The filter loading which loads the program is considered pipeline
>> >> >instantiation - consider it as "provisioning" more than "control"
>> >> >which runs at runtime. "control" is purely netlink based. The iproute2
>> >> >code we use links libbpf for example for the filter. If we can achieve
>> >> >the same with bpf_mprog then sure - we just dont want to loose
>> >> >functionality though.  off top of my head, some sample space:
>> >> >- we could have multiple pipelines with different priorities (which tc
>> >> >provides to us) - and each pipeline may have its own logic with many
>> >> >tables etc (and the choice to iterate the next one is essentially
>> >> >encoded in the tc action codes)
>> >> >- we use tc block to map groups of ports (which i dont think bpf has
>> >> >internal access of)
>> >> >
>> >> >In regards to usability: no i dont expect someone doing things at
>> >> >scale to use command line tc. The APIs are via netlink. But the tc cli
>> >> >is must for the rest of the masses per our traditions. Also i really
>> >>
>> >> I don't follow. You repeatedly mention "the must of the traditional tc
>> >> cli", but what of the existing traditional cli you use for p4tc?
>> >> If I look at the examples, pretty much everything looks new to me.
>> >> Example:
>> >>
>> >>   tc p4ctrl create myprog/table/mytable dstAddr 10.0.1.2/32 \
>> >>     action send_to_port param port eno1
>> >>
>> >> This is just TC/RTnetlink used as a channel to pass new things over. If
>> >> that is the case, what's traditional here?
>> >>
>> >
>> >
>> >What is not traditional about it?
>>
>> Okay, so in that case, the following example communitating with
>> userspace deamon using imaginary "p4ctrl" app is equally traditional:
>>   $ p4ctrl create myprog/table/mytable dstAddr 10.0.1.2/32 \
>>      action send_to_port param port eno1
>
>Huh? Thats just an application - classical tc which part of iproute2
>that is sending to the kernel, no different than "tc flower.."
>Where do you get the "userspace" daemon part? Yes, you can write a
>daemon but it will use the same APIs as tc.

Okay, so which part is the "tradition"?


>
>>
>> >
>> >>
>> >> >didnt even want to use ebpf at all for operator experience reasons -
>> >> >it requires a compilation of the code and an extra loading compared to
>> >> >what our original u32/pedit code offered.
>> >> >
>> >> >> I don't quite follow why not most of this could be implemented entirely in
>> >> >> user space without the detour of this and you would provide a developer
>> >> >> library which could then be integrated into a p4 runtime/frontend? This
>> >> >> way users never interface with ebpf parts nor tc given they also shouldn't
>> >> >> have to - it's an implementation detail. This is what John was also pointing
>> >> >> out earlier.
>> >> >>
>> >> >
>> >> >Netlink is the API. We will provide a library for object manipulation
>> >> >which abstracts away the need to know netlink. Someone who for their
>> >> >own reasons wants to use p4runtime or TDI could write on top of this.
>> >> >I would not design a kernel interface to just meet p4runtime (we
>> >> >already have TDI which came later which does things differently). So i
>> >> >expect us to support both those two. And if i was to do something on
>> >> >SDN that was more robust i would write my own that still uses these
>> >> >netlink interfaces.
>> >>
>> >> Actually, what Daniel says about the p4 library used as a backend to p4
>> >> frontend is pretty much aligned what I claimed on the p4 calls couple of
>> >> times. If you have this p4 userspace tooling, it is easy for offloads to
>> >> replace the backed by vendor-specific library which allows p4 offload
>> >> suitable for all vendors (your plan of p4tc offload does not work well
>> >> for our hw, as we repeatedly claimed).
>> >>
>> >
>> >That's you - NVIDIA. You have chosen a path away from the kernel
>> >towards DOCA. I understand NVIDIA's frustration with dealing with
>> >upstream process (which has been cited to me as a good reason for
>> >DOCA) but please dont impose these values and your politics on other
>> >vendors(Intel, AMD for example) who are more than willing to invest
>> >into making the kernel interfaces the path forward. Your choice.
>>
>> No, you are missing the point. This has nothing to do with DOCA.
>
>Right Jiri ;->
>
>> This
>> has to do with the simple limitation of your offload assuming there are
>> no runtime changes in the compiled pipeline. For Intel, maybe they
>> aren't, and it's a good fit for them. All I say is, that it is not the
>> good fit for everyone.
>
> a) it is not part of the P4 spec to dynamically make changes to the
>datapath pipeline after it is create and we are discussing a P4

Isn't this up to the implementation? I mean from the p4 perspective,
everything is static. Hw might need to reshuffle the pipeline internally
during rule insertion/remove in order to optimize the layout.


>implementation not an extension that would add more value b) We are
>more than happy to add extensions in the future to accomodate for
>features but first _P4 spec_ must be met c) we had longer discussions
>with Matty, Khalid and the Rice folks who wrote a paper on that topic
>which you probably didnt attend and everything that needs to be done
>can be from user space today for all those optimizations.
>
>Conclusion is: For what you need to do (which i dont believe is a
>limitation in your hardware rather a design decision on your part) run
>your user space daemon, do optimizations and update the datapath.
>Everybody is happy.

Should the userspace daemon listen on inserted rules to be offloade
over netlink?


>
>>
>> >Nobody is stopping you from offering your customers proprietary
>> >solutions which include a specific ebpf approach alongside DOCA. We
>> >believe that a singular interface regardless of the vendor is the
>> >right way forward. IMHO, this siloing that unfortunately is also added
>> >by eBPF being a double edged sword is not good for the community.
>> >
>> >> As I also said on the p4 call couple of times, I don't see the kernel
>> >> as the correct place to do the p4 abstractions. Why don't you do it in
>> >> userspace and give vendors possiblity to have p4 backends with compilers,
>> >> runtime optimizations etc in userspace, talking to the HW in the
>> >> vendor-suitable way too. Then the SW implementation could be easily eBPF
>> >> and the main reason (I believe) why you need to have this is TC
>> >> (offload) is then void.
>> >>
>> >> The "everyone wants to use TC/netlink" claim does not seem correct
>> >> to me. Why not to have one Linux p4 solution that fits everyones needs?
>> >
>> >You mean more fitting to the DOCA world? no, because iam a kernel
>>
>> Again, this has 0 relation to DOCA.
>>
>>
>> >first person and kernel interfaces are good for everyone.
>>
>> Yeah, not really. Not always the kernel is the right answer. Your/Intel
>> plan to handle the offload by:
>> 1) abuse devlink to flash p4 binary
>> 2) parse the binary in kernel to match to the table ids of rules coming
>>    from p4tc ndo_setup_tc
>> 3) abuse devlink to flash p4 binary for tc-flower
>> 4) parse the binary in kernel to match to the table ids of rules coming
>>    from tc-flower ndo_setup_tc
>> is really something that is making me a little bit nauseous.
>>
>> If you don't have a feasible plan to do the offload, p4tc does not make
>> sense to me to be honest.
>
>You mean if there's no plan to match your (NVIDIA?)  point of view.
>For #1 - how's this different from DDP? Wasnt that your suggestion to

I doubt that. Any flashing-blob-parsing-in-kernel is something I'm
opposed to from day 1.


>begin with? For #2 Nobody is proposing to do anything of the sort. The
>ndo is passed IDs for the objects and associated contents. For #3+#4

During offload, you need to parse the blob in driver to be able to match
the ids with blob entities. That was presented by you/Intel in the past
IIRC.


>tc flower thing has nothing to do with P4TC that was just some random
>proposal someone made seeing if they could ride on top of P4TC.

Yeah, it's not yet merged and already mentally used for abuse. I love
that :)


>
>Besides this nobody really has to satisfy your point of view - like i
>said earlier feel free to provide proprietary solutions. From a
>consumer perspective  I would not want to deal with 4 different
>vendors with 4 different proprietary approaches. The kernel is the
>unifying part. You seemed happier with tc flower just not with the

Yeah, that is my point, why the unifying part can't be a userspace
daemon/library with multiple backends (p4tc, bpf, vendorX, vendorY, ..)?

I just don't see the kernel as a good fit for abstraction here,
given the fact that the vendor compilers does not run in kernel.
That is breaking your model.


>kernel process - which is ironically the same thing we are going
>through here ;->
>
>cheers,
>jamal
>
>>
>> >
>> >cheers,
>> >jamal

  reply	other threads:[~2023-11-22  9:26 UTC|newest]

Thread overview: 79+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-11-16 14:59 [PATCH net-next v8 00/15] Introducing P4TC Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 01/15] net: sched: act_api: Introduce dynamic actions list Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 02/15] net/sched: act_api: increase action kind string length Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 03/15] net/sched: act_api: Update tc_action_ops to account for dynamic actions Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 04/15] net/sched: act_api: add struct p4tc_action_ops as a parameter to lookup callback Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 05/15] net: sched: act_api: Add support for preallocated dynamic action instances Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 06/15] net: introduce rcu_replace_pointer_rtnl Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 07/15] rtnl: add helper to check if group has listeners Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 08/15] p4tc: add P4 data types Jamal Hadi Salim
2023-11-16 16:03   ` Jiri Pirko
2023-11-17 12:01     ` Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 09/15] p4tc: add template pipeline create, get, update, delete Jamal Hadi Salim
2023-11-16 16:11   ` Jiri Pirko
2023-11-17 12:09     ` Jamal Hadi Salim
2023-11-20  8:18       ` Jiri Pirko
2023-11-20 12:48         ` Jamal Hadi Salim
2023-11-20 13:16           ` Jiri Pirko
2023-11-20 15:30             ` Jamal Hadi Salim
2023-11-20 16:25               ` Jiri Pirko
2023-11-20 18:20       ` David Ahern
2023-11-20 20:12         ` Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 10/15] p4tc: add action template create, update, delete, get, flush and dump Jamal Hadi Salim
2023-11-16 16:28   ` Jiri Pirko
2023-11-17 15:11     ` Jamal Hadi Salim
2023-11-20  8:19       ` Jiri Pirko
2023-11-20 13:45         ` Jamal Hadi Salim
2023-11-20 16:25           ` Jiri Pirko
2023-11-17  6:51   ` John Fastabend
2023-11-16 14:59 ` [PATCH net-next v8 11/15] p4tc: add template table " Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 12/15] p4tc: add runtime table entry create, update, get, delete, " Jamal Hadi Salim
2023-11-16 14:59 ` [PATCH net-next v8 13/15] p4tc: add set of P4TC table kfuncs Jamal Hadi Salim
2023-11-17  7:09   ` John Fastabend
2023-11-19  9:14   ` kernel test robot
2023-11-20 22:28   ` kernel test robot
2023-11-16 14:59 ` [PATCH net-next v8 14/15] p4tc: add P4 classifier Jamal Hadi Salim
2023-11-17  7:17   ` John Fastabend
2023-11-16 14:59 ` [PATCH net-next v8 15/15] p4tc: Add P4 extern interface Jamal Hadi Salim
2023-11-16 16:42   ` Jiri Pirko
2023-11-17 12:14     ` Jamal Hadi Salim
2023-11-20  8:22       ` Jiri Pirko
2023-11-20 14:02         ` Jamal Hadi Salim
2023-11-20 16:27           ` Jiri Pirko
2023-11-20 19:00             ` Jamal Hadi Salim
2023-11-17  6:27 ` [PATCH net-next v8 00/15] Introducing P4TC John Fastabend
2023-11-17 12:49   ` Jamal Hadi Salim
2023-11-17 18:37     ` John Fastabend
2023-11-17 20:46       ` Jamal Hadi Salim
2023-11-20  9:39         ` Jiri Pirko
2023-11-20 14:23           ` Jamal Hadi Salim
2023-11-20 18:10             ` Jiri Pirko
2023-11-20 19:56               ` Jamal Hadi Salim
2023-11-20 20:41                 ` John Fastabend
2023-11-20 22:13                   ` Jamal Hadi Salim
2023-11-20 21:48                 ` Daniel Borkmann
2023-11-20 22:56                   ` Jamal Hadi Salim
2023-11-21 13:06                     ` Jiri Pirko
2023-11-21 13:47                       ` Jamal Hadi Salim
2023-11-21 14:19                         ` Jiri Pirko
2023-11-21 15:21                           ` Jamal Hadi Salim
2023-11-22  9:25                             ` Jiri Pirko [this message]
2023-11-22 15:14                               ` Jamal Hadi Salim
2023-11-22 18:31                                 ` Jiri Pirko
2023-11-22 18:50                                   ` John Fastabend
2023-11-22 19:35                                   ` Jamal Hadi Salim
2023-11-23  6:36                                     ` Jiri Pirko
2023-11-23 13:22                                       ` Jamal Hadi Salim
2023-11-23 13:34                                         ` Jiri Pirko
2023-11-23 13:45                                           ` Jamal Hadi Salim
2023-11-23 14:07                                             ` Jiri Pirko
2023-11-23 14:28                                               ` Jamal Hadi Salim
2023-11-23 15:27                                                 ` Jiri Pirko
2023-11-23 16:30                                                   ` Jamal Hadi Salim
2023-11-23 17:53                                                     ` Edward Cree
2023-11-23 18:09                                                       ` Jiri Pirko
2023-11-23 18:58                                                         ` Jamal Hadi Salim
2023-11-23 18:53                                                       ` Jakub Kicinski
2023-11-23 19:42                                                         ` Tom Herbert
2023-11-24 10:39                                                         ` Jiri Pirko
2023-11-23 18:04                                                     ` Jiri Pirko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZV3JJQirPdZpbVIC@nanopsycho \
    --to=jiri@resnulli.us \
    --cc=Mahesh.Shirshyad@amd.com \
    --cc=Vipin.Jain@amd.com \
    --cc=anjali.singhai@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=chris.sommers@keysight.com \
    --cc=dan.daly@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=deb.chatterjee@intel.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jhs@mojatatu.com \
    --cc=john.andy.fingerhut@intel.com \
    --cc=john.fastabend@gmail.com \
    --cc=khalidm@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=mattyk@nvidia.com \
    --cc=mleitner@redhat.com \
    --cc=namrata.limaye@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=toke@redhat.com \
    --cc=tom@sipanda.io \
    --cc=tomasz.osinski@intel.com \
    --cc=vladbu@nvidia.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox