From mboxrd@z Thu Jan  1 00:00:00 1970
From: John Fastabend <john.fastabend@gmail.com>
Subject: Re: [PATCH v15 ] net/veth/XDP: Line-rate packet forwarding in kernel
Date: Tue, 3 Apr 2018 09:41:08 -0700
Message-ID: <4f0c0f20-ce25-4996-4f28-14a73c988446@gmail.com>
References: <CAFgPn1DX9cOpDRGj=wFwvZq_bpq6VFnEOzR1YbMuC0+=DFEWxA@mail.gmail.com>
 <7cfca503-3e17-6287-8888-92d43ce7a2e7@gmail.com>
 <2ac3c590-8f13-b983-7efb-021f82ee3295@gmail.com>
 <20180402181602.jpdb25ytmffg2gei@ast-mbp.dhcp.thefacebook.com>
 <9cb8a162-3b6a-abfa-4f6e-524995bbfb8d@gmail.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Cc: "Md. Islam" <mislam4@kent.edu>, netdev@vger.kernel.org,
        David Miller <davem@davemloft.net>, stephen@networkplumber.org,
        agaceph@gmail.com, Pavel Emelyanov <xemul@openvz.org>,
        Eric Dumazet <edumazet@google.com>, brouer@redhat.com
To: David Ahern <dsahern@gmail.com>,
        Alexei Starovoitov <alexei.starovoitov@gmail.com>
Return-path: <netdev-owner@vger.kernel.org>
Received: from mail-pl0-f48.google.com ([209.85.160.48]:44717 "EHLO
        mail-pl0-f48.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
        with ESMTP id S1751904AbeDCQlZ (ORCPT
        <rfc822;netdev@vger.kernel.org>); Tue, 3 Apr 2018 12:41:25 -0400
Received: by mail-pl0-f48.google.com with SMTP id b6-v6so7614887pla.11
        for <netdev@vger.kernel.org>; Tue, 03 Apr 2018 09:41:25 -0700 (PDT)
In-Reply-To: <9cb8a162-3b6a-abfa-4f6e-524995bbfb8d@gmail.com>
Content-Language: en-US
Sender: netdev-owner@vger.kernel.org
List-ID: <netdev.vger.kernel.org>

On 04/03/2018 08:07 AM, David Ahern wrote:
> On 4/2/18 12:16 PM, Alexei Starovoitov wrote:
>> On Mon, Apr 02, 2018 at 12:09:44PM -0600, David Ahern wrote:
>>> On 4/2/18 12:03 PM, John Fastabend wrote:
>>>>
>>>> Can the above be a normal BPF helper that returns an
>>>> ifindex? Then something roughly like this patter would
>>>> work for all drivers with redirect support,
>>>>
>>>>
>>>>      route_ifindex = ip_route_lookup(__daddr, ....)
>>>>      if (!route_ifindex)
>>>>            return do_foo()
>>>>      return xdp_redirect(route_ifindex);
>>>>      
>>>> So my suggestion is,
>>>>
>>>>   1. enable veth xdp (including redirect support)
>>>>   2. add a helper to lookup route from routing table
>>>>
>>>> Alternatively you can skip step (2) and encode the routing
>>>> table in BPF directly. Maybe we need a more efficient data
>>>> structure but that should also work.
>>>>
>>>
>>> That's what I have here:
>>>
>>> https://github.com/dsahern/linux/commit/bab42f158c0925339f7519df7fb2cde8eac33aa8
>>
>> was wondering what's up with the delay and when are you going to
>> submit them officially...
>> The use case came up several times.
>>
> 
> I need to find time to come back to that set. As I recall there a number
> of outstanding issues:
> 
> 1. you and Daniel had comments about the bpf_func_proto declarations
> 
> 2. Jesper had concerns about xdp redirect to any netdev. e.g., How does
> the lookup know the egress netdev supports xdp? Right now you can try
> and the packet is dropped if it is not supported.
> 

There should probably be a tracepoint there if not already. Otherwise
I think the orchestration/loader layer should be ensuring that xdp
support is sufficient. I don't think we need anything specific in the
XDP/BPF code to handle unsupported devices.

> 3. VLAN devices. I suspect these will affect the final bpf function
> prototype. It would awkward to have 1 forwarding API for non-vlan
> devices and a second for vlan devices, hence the need to resolve this
> before it goes in.
> 

Interesting. Do we need stacked XDP, I could imagine having 802.1Q
simply call the lower dev XDP xmit routine. Possibly adding the 8021q
header first.

Or alternatively a new dev type could let users query things like
vlan-id from the dev rather than automatically doing the tagging. I
suspect though if you forward to a vlan device automatically adding
the tag is the correct behavior.


> 4. What about other stacked devices - bonds and bridges - will those
> just work with the bpf helper? VRF is already handled of course. ;-)
> 

So if we simply handle this like other stacked devices and call the
lower devs xdp_xmit routine we should get reasonable behavior. For
bonds and bridges I guess some generalization is needed though because
everything at the moment is skb centric. I don't think its necessary
in the first series though. It can be added later.

.John