From mboxrd@z Thu Jan  1 00:00:00 1970
From: Jiri Pirko <jiri@resnulli.us>
Subject: Re: Let's do P4
Date: Sat, 29 Oct 2016 16:58:50 +0200
Message-ID: <20161029145850.GJ1692@nanopsycho.orion>
References: <20161029075328.GB1692@nanopsycho.orion>
 <20161029093905.GA1810@pox.localdomain>
 <20161029101003.GC1692@nanopsycho.orion>
 <20161029111548.GB1810@pox.localdomain>
 <20161029112834.GF1692@nanopsycho.orion>
 <20161029120932.GD1810@pox.localdomain>
 <20161029135855.GH1692@nanopsycho.orion>
 <20161029155421.02d81125@jkicinski-Precision-T1700>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Cc: Thomas Graf <tgraf@suug.ch>, netdev@vger.kernel.org,
        davem@davemloft.net, jhs@mojatatu.com, roopa@cumulusnetworks.com,
        john.fastabend@gmail.com, simon.horman@netronome.com,
        ast@kernel.org, daniel@iogearbox.net, prem@barefootnetworks.com,
        hannes@stressinduktion.org, jbenc@redhat.com, tom@herbertland.com,
        mattyk@mellanox.com, idosch@mellanox.com, eladr@mellanox.com,
        yotamg@mellanox.com, nogahf@mellanox.com, ogerlitz@mellanox.com,
        linville@tuxdriver.com, andy@greyhouse.net, f.fainelli@gmail.com,
        dsa@cumulusnetworks.com, vivien.didelot@savoirfairelinux.com,
        andrew@lunn.ch, ivecera@redhat.com
To: Jakub Kicinski <kubakici@wp.pl>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-wm0-f50.google.com ([74.125.82.50]:38527 "EHLO
        mail-wm0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751075AbcJ2O6x (ORCPT
        <rfc822;netdev@vger.kernel.org>); Sat, 29 Oct 2016 10:58:53 -0400
Received: by mail-wm0-f50.google.com with SMTP id n67so159894851wme.1
        for <netdev@vger.kernel.org>; Sat, 29 Oct 2016 07:58:52 -0700 (PDT)
Content-Disposition: inline
In-Reply-To: <20161029155421.02d81125@jkicinski-Precision-T1700>
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

Sat, Oct 29, 2016 at 04:54:21PM CEST, kubakici@wp.pl wrote:
>On Sat, 29 Oct 2016 15:58:55 +0200, Jiri Pirko wrote:
>> Sat, Oct 29, 2016 at 02:09:32PM CEST, tgraf@suug.ch wrote:
>> >On 10/29/16 at 01:28pm, Jiri Pirko wrote:  
>> >> Sat, Oct 29, 2016 at 01:15:48PM CEST, tgraf@suug.ch wrote:  
>> >> >So given the SKIP_SW flag, the in-kernel compiler is optional anyway.
>> >> >Why even risk including a possibly incomplete compiler? Older kernels
>> >> >must be capable of running along newer hardware as long as eBPF can
>> >> >represent the software path. Having to upgrade to latest and greatest
>> >> >kernels is not an option for most people so they would simply have to
>> >> >fall back to SKIP_SW and do it in user space anyway.  
>> >> 
>> >> The thing is, if we needo to offload something, it needs to be
>> >> implemented in kernel first. Also, I believe that it is good to have
>> >> in-kernel p4 engine for testing and development purposes.  
>> >
>> >You lost me now :-) In an earlier email you said:
>> >  
>> >> It can be the other way around. The p4>ebpf compiler won't be complete
>> >> at the beginning so it is possible that HW could provide more features.
>> >> I don't think it is a problem. With SKIP_SW and SKIP_HW flags in TC,
>> >> the user can set different program to each. I think in real life, that
>> >> would be the most common case anyway.  
>> >
>> >If you allow to SKIP_SW and set different programs each to address
>> >this, then how is this any different.
>> >
>> >I completely agree that kernel must be able to provide the same
>> >functionality as HW with optional additional capabilities on top so
>> >the HW can always bail out and punt to SW.
>> >
>> >[...]
>> >  
>> >> >I'm not seeing how either of them is more or less variable. The main
>> >> >difference is whether to require configuring a single cls with both
>> >> >p4ast + bpf or two separate cls, one for each. I'd prefer the single
>> >> >cls approach simply because it is cleaner wither regard to offload
>> >> >directly off bpf vs off p4ast.  
>> >> 
>> >> That's the bundle that you asked me to forget earlier in this email? :)  
>> >
>> >I thought you referred to the "store in same object file" as bundle.
>> >I don't really care about that. What I care about is a single way to
>> >configure this that works for both ASIC and non-ASIC hardware.
>> >  
>> >> >My main point is to not include a IR to eBPF compiler in the kernel
>> >> >and let user space handle this instead.  
>> >> 
>> >> It we do it as you describe, we would be using 2 different APIs for
>> >> offloaded and non-offloaded path. I don't believe it is acceptable as
>> >> the offloaded features has to have kernel implementation. Therefore, I
>> >> believe that p4ast as a kernel API is the only possible option.  
>> >
>> >Yes, the kernel has the SW implementation in eBPF. I thought that is
>> >what you propose as well. The only difference is whether to generate
>> >that eBPF in kernel or user space.
>> >
>> >Not sure I understand the multiple APIs point for offload vs
>> >non-offload. There is a single API: tc. Both models require the user
>> >to provide additional metadata to allow programming ASIC HW: p4ast
>> >IR or whatever we agree on.  
>> 
>> If you do p4>ebpf in userspace, you have 2 apis:
>> 1) to setup sw (in-kernel) p4 datapath, you push bpf.o to kernel
>> 2) to setup hw p4 datapath, you push program.p4ast to kernel
>> 
>> Those are 2 apis. Both wrapped up by TC, but still 2 apis.
>> 
>> What I believe is correct is to have one api:
>> 1) to setup sw (in-kernel) p4 datapath, you push program.p4ast to kernel
>> 2) to setup hw p4 datapath, you push program.p4ast to kernel
>> 
>> In case of 1), the program.p4ast will be either interpreted by new p4
>> interpreter, of translated to bpf and interpreted by that. But this
>> translation code is part of kernel.
>
>Option 3) use a well structured subset of eBPF as user space ABI ;)

:( That would not be nice I believe. Also confusing and hard to
maintain. Plus we would have to do 2 translations, in between
incompatible paradigms.


>
>In all seriousness, user space already has to have some knowledge about
>the underlaying hardware today with different vendors picking different
>TC classifiers for offload.  So I humbly agree that 2 APIs may be
>acceptable here.