netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Tom Herbert <tom@sipanda.io>
To: Jakub Kicinski <kuba@kernel.org>
Cc: "Jamal Hadi Salim" <jhs@mojatatu.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Singhai, Anjali" <anjali.singhai@intel.com>,
	"Paolo Abeni" <pabeni@redhat.com>,
	"Linux Kernel Network Developers" <netdev@vger.kernel.org>,
	"Chatterjee, Deb" <deb.chatterjee@intel.com>,
	"Limaye, Namrata" <namrata.limaye@intel.com>,
	mleitner@redhat.com, Mahesh.Shirshyad@amd.com,
	Vipin.Jain@amd.com, "Osinski, Tomasz" <tomasz.osinski@intel.com>,
	"Jiri Pirko" <jiri@resnulli.us>,
	"Cong Wang" <xiyou.wangcong@gmail.com>,
	"David S . Miller" <davem@davemloft.net>,
	edumazet@google.com, "Vlad Buslov" <vladbu@nvidia.com>,
	horms@kernel.org, khalidm@nvidia.com,
	"Toke Høiland-Jørgensen" <toke@redhat.com>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Victor Nogueira" <victor@mojatatu.com>,
	"Tammela, Pedro" <pctammela@mojatatu.com>,
	"Daly, Dan" <dan.daly@intel.com>,
	andy.fingerhut@gmail.com, "Sommers,
	Chris" <chris.sommers@keysight.com>,
	mattyk@nvidia.com, bpf@vger.kernel.org
Subject: Re: [PATCH net-next v12 00/15] Introducing P4TC (series 1)
Date: Sun, 3 Mar 2024 08:31:11 -0800	[thread overview]
Message-ID: <CAOuuhY_senZbdC2cVU9kfDww_bT+a_VkNaDJYRk4_fMbJW17sQ@mail.gmail.com> (raw)
In-Reply-To: <20240302191530.22353670@kernel.org>

On Sat, Mar 2, 2024 at 7:15 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 1 Mar 2024 18:20:36 -0800 Tom Herbert wrote:
> > This is configurability versus programmability. The table driven
> > approach as input (configurability) might work fine for generic
> > match-action tables up to the point that tables are expressive enough
> > to satisfy the requirements. But parsing doesn't fall into the table
> > driven paradigm: parsers want to be *programmed*. This is why we
> > removed kParser from this patch set and fell back to eBPF for parsing.
> > But the problem we quickly hit that eBPF is not offloadable to network
> > devices, for example when we compile P4 in an eBPF parser we've lost
> > the declarative representation that parsers in the devices could
> > consume (they're not CPUs running eBPF).
> >
> > I think the key here is what we mean by kernel offload. When we do
> > kernel offload, is it the kernel implementation or the kernel
> > functionality that's being offloaded? If it's the latter then we have
> > a lot more flexibility. What we'd need is a safe and secure way to
> > synchronize with that offload device that precisely supports the
> > kernel functionality we'd like to offload. This can be done if both
> > the kernel bits and programmed offload are derived from the same
> > source (i.e. tag source code with a sha-1). For example, if someone
> > writes a parser in P4, we can compile that into both eBPF and a P4
> > backend using independent tool chains and program download. At
> > runtime, the kernel can safely offload the functionality of the eBPF
> > parser to the device if it matches the hash to that reported by the
> > device
>
> Good points. If I understand you correctly you're saying that parsers
> are more complex than just a basic parsing tree a'la u32.

Yes. Parsing things like TLVs, GRE flag field, or nested protobufs
isn't conducive to u32. We also want the advantages of compiler
optimizations to unroll loops, squash nodes in the parse graph, etc.

> Then we can take this argument further. P4 has grown to encompass a lot
> of functionality of quite complex devices. How do we square that with
> the kernel functionality offload model. If the entire device is modeled,
> including f.e. TSO, an offload would mean that the user has to write
> a TSO implementation which they then load into TC? That seems odd.
>
> IOW I don't quite know how to square in my head the "total
> functionality" with being a TC-based "plugin".

Hi Jakub,

I believe the solution is to replace kernel code with eBPF in cases
where we need programmability. This effectively means that we would
ship eBPF code as part of the kernel. So in the case of TSO, the
kernel would include a standard implementation in eBPF that could be
compiled into the kernel by default. The restricted C source code is
tagged with a hash, so if someone wants to offload TSO they could
compile the source into their target and retain the hash. At runtime
it's a matter of querying the driver to see if the device supports the
TSO program the kernel is running by comparing hash values. Scaling
this, a device could support a catalogue of programs: TSO, LRO,
parser, IPtables, etc., If the kernel can match the hash of its eBPF
code to one reported by the driver then it can assume functionality is
offloadable. This is an elaboration of "device features", but instead
of the device telling us they think they support an adequate GRO
implementation by reporting NETIF_F_GRO, the device would tell the
kernel that they not only support GRO but they provide identical
functionality of the kernel GRO (which IMO is the first requirement of
kernel offload).

Even before considering hardware offload, I think this approach
addresses a more fundamental problem to make the kernel programmable.
Since the code is in eBPF, the kernel can be reprogrammed at runtime
which could be controlled by TC. This allows local customization of
kernel features, but also is the simplest way to "patch" the kernel
with security and bug fixes (nobody is ever excited to do a kernel
rebase in their datacenter!). Flow dissector is a prime candidate for
this, and I am still planning to replace it with an all eBPF program
(https://netdevconf.info/0x15/slides/16/Flow%20dissector_PANDA%20parser.pdf).

Tom

  reply	other threads:[~2024-03-03 16:31 UTC|newest]

Thread overview: 71+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-25 16:54 [PATCH net-next v12 00/15] Introducing P4TC (series 1) Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 01/15] net: sched: act_api: Introduce P4 actions list Jamal Hadi Salim
2024-02-29 15:05   ` Paolo Abeni
2024-02-29 18:21     ` Jamal Hadi Salim
2024-03-01  7:30       ` Paolo Abeni
2024-03-01 12:39         ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 02/15] net/sched: act_api: increase action kind string length Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 03/15] net/sched: act_api: Update tc_action_ops to account for P4 actions Jamal Hadi Salim
2024-02-29 16:19   ` Paolo Abeni
2024-02-29 18:30     ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 04/15] net/sched: act_api: add struct p4tc_action_ops as a parameter to lookup callback Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 05/15] net: sched: act_api: Add support for preallocated P4 action instances Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 06/15] p4tc: add P4 data types Jamal Hadi Salim
2024-02-29 15:09   ` Paolo Abeni
2024-02-29 18:31     ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 07/15] p4tc: add template API Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 08/15] p4tc: add template pipeline create, get, update, delete Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 09/15] p4tc: add template action create, update, delete, get, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 10/15] p4tc: add runtime action support Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 11/15] p4tc: add template table create, update, delete, get, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 12/15] p4tc: add runtime table entry create and update Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 13/15] p4tc: add runtime table entry get, delete, flush and dump Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 14/15] p4tc: add set of P4TC table kfuncs Jamal Hadi Salim
2024-03-01  6:53   ` Martin KaFai Lau
2024-03-01 12:31     ` Jamal Hadi Salim
2024-03-03  1:32       ` Martin KaFai Lau
2024-03-03 17:20         ` Jamal Hadi Salim
2024-03-05  7:40           ` Martin KaFai Lau
2024-03-05 12:30             ` Jamal Hadi Salim
2024-03-06  7:58               ` Martin KaFai Lau
2024-03-06 20:22                 ` Jamal Hadi Salim
2024-03-06 22:21                   ` Martin KaFai Lau
2024-03-06 23:19                     ` Jamal Hadi Salim
2024-02-25 16:54 ` [PATCH net-next v12 15/15] p4tc: add P4 classifier Jamal Hadi Salim
2024-02-28 17:11 ` [PATCH net-next v12 00/15] Introducing P4TC (series 1) John Fastabend
2024-02-28 18:23   ` Jamal Hadi Salim
2024-02-28 21:13     ` John Fastabend
2024-03-01  7:02   ` Martin KaFai Lau
2024-03-01 12:36     ` Jamal Hadi Salim
2024-02-29 17:13 ` Paolo Abeni
2024-02-29 18:49   ` Jamal Hadi Salim
2024-02-29 20:52     ` John Fastabend
2024-02-29 21:49   ` Singhai, Anjali
2024-02-29 22:33     ` John Fastabend
2024-02-29 22:48       ` Jamal Hadi Salim
     [not found]         ` <CAOuuhY8qbsYCjdUYUZv8J3jz8HGXmtxLmTDP6LKgN5uRVZwMnQ@mail.gmail.com>
2024-03-01 17:00           ` Jakub Kicinski
2024-03-01 17:39             ` Jamal Hadi Salim
2024-03-02  1:32               ` Jakub Kicinski
2024-03-02  2:20                 ` Tom Herbert
2024-03-03  3:15                   ` Jakub Kicinski
2024-03-03 16:31                     ` Tom Herbert [this message]
2024-03-04 20:07                       ` Jakub Kicinski
2024-03-04 20:58                         ` eBPF to implement core functionility WAS " Tom Herbert
2024-03-04 21:19                       ` Stanislav Fomichev
2024-03-04 22:01                         ` Tom Herbert
2024-03-04 23:24                           ` Stanislav Fomichev
2024-03-04 23:50                             ` Tom Herbert
2024-03-02  2:59                 ` Hardware Offload discussion WAS(Re: " Jamal Hadi Salim
2024-03-02 14:36                   ` Jamal Hadi Salim
2024-03-03  3:27                     ` Jakub Kicinski
2024-03-03 17:00                       ` Jamal Hadi Salim
2024-03-03 18:10                         ` Tom Herbert
2024-03-03 19:04                           ` Jamal Hadi Salim
2024-03-04 20:18                             ` Jakub Kicinski
2024-03-04 21:02                               ` Jamal Hadi Salim
2024-03-04 21:23                             ` Stanislav Fomichev
2024-03-04 21:44                               ` Jamal Hadi Salim
2024-03-04 22:23                                 ` Stanislav Fomichev
2024-03-04 22:59                                   ` Jamal Hadi Salim
2024-03-04 23:14                                     ` Stanislav Fomichev
2024-03-01 18:53   ` Chris Sommers

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAOuuhY_senZbdC2cVU9kfDww_bT+a_VkNaDJYRk4_fMbJW17sQ@mail.gmail.com \
    --to=tom@sipanda.io \
    --cc=Mahesh.Shirshyad@amd.com \
    --cc=Vipin.Jain@amd.com \
    --cc=andy.fingerhut@gmail.com \
    --cc=anjali.singhai@intel.com \
    --cc=bpf@vger.kernel.org \
    --cc=chris.sommers@keysight.com \
    --cc=dan.daly@intel.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=deb.chatterjee@intel.com \
    --cc=edumazet@google.com \
    --cc=horms@kernel.org \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=john.fastabend@gmail.com \
    --cc=khalidm@nvidia.com \
    --cc=kuba@kernel.org \
    --cc=mattyk@nvidia.com \
    --cc=mleitner@redhat.com \
    --cc=namrata.limaye@intel.com \
    --cc=netdev@vger.kernel.org \
    --cc=pabeni@redhat.com \
    --cc=pctammela@mojatatu.com \
    --cc=toke@redhat.com \
    --cc=tomasz.osinski@intel.com \
    --cc=victor@mojatatu.com \
    --cc=vladbu@nvidia.com \
    --cc=xiyou.wangcong@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).