All of lore.kernel.org
 help / color / mirror / Atom feed
From: John Fastabend <john.fastabend@gmail.com>
To: Brenden Blanco <bblanco@plumgrid.com>,
	Jesper Dangaard Brouer <brouer@redhat.com>
Cc: Tom Herbert <tom@herbertland.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	"David S. Miller" <davem@davemloft.net>,
	Linux Kernel Network Developers <netdev@vger.kernel.org>,
	Alexei Starovoitov <alexei.starovoitov@gmail.com>,
	ogerlitz@mellanox.com
Subject: Re: [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter
Date: Mon, 4 Apr 2016 09:07:03 -0700	[thread overview]
Message-ID: <57029127.3040303@gmail.com> (raw)
In-Reply-To: <20160404152948.GA495@gmail.com>

On 16-04-04 08:29 AM, Brenden Blanco wrote:
> On Mon, Apr 04, 2016 at 05:12:27PM +0200, Jesper Dangaard Brouer wrote:
>> On Mon, 4 Apr 2016 11:09:57 -0300
>> Tom Herbert <tom@herbertland.com> wrote:
>>
>>> On Mon, Apr 4, 2016 at 10:36 AM, Daniel Borkmann <daniel@iogearbox.net> wrote:
>>>> On 04/04/2016 03:07 PM, Jesper Dangaard Brouer wrote:  
>>>>>
>>>>> On Mon, 04 Apr 2016 10:49:09 +0200 Daniel Borkmann <daniel@iogearbox.net>
>>>>> wrote:  
>>>>>>
>>>>>> On 04/02/2016 03:21 AM, Brenden Blanco wrote:  
>>>>>>>
>>>>>>> Add a new bpf prog type that is intended to run in early stages of the
>>>>>>> packet rx path. Only minimal packet metadata will be available, hence a
>>>>>>> new
>>>>>>> context type, struct xdp_metadata, is exposed to userspace. So far only
>>>>>>> expose the readable packet length, and only in read mode.
>>>>>>>
>>>>>>> The PHYS_DEV name is chosen to represent that the program is meant only
>>>>>>> for physical adapters, rather than all netdevs.
>>>>>>>
>>>>>>> While the user visible struct is new, the underlying context must be
>>>>>>> implemented as a minimal skb in order for the packet load_* instructions
>>>>>>> to work. The skb filled in by the driver must have skb->len, skb->head,
>>>>>>> and skb->data set, and skb->data_len == 0.
>>>>>>>  
>>>>> [...]  
>>>>>>
>>>>>>
>>>>>> Do you plan to support bpf_skb_load_bytes() as well? I like using
>>>>>> this API especially when dealing with larger chunks (>4 bytes) to
>>>>>> load into stack memory, plus content is kept in network byte order.
>>>>>>
>>>>>> What about other helpers such as bpf_skb_store_bytes() et al that
>>>>>> work on skbs. Do you intent to reuse them as is and thus populate
>>>>>> the per cpu skb with needed fields (faking linear data), or do you
>>>>>> see larger obstacles that prevent for this?  
>>>>>
>>>>>
>>>>> Argh... maybe the minimal pseudo/fake SKB is the wrong "signal" to send
>>>>> to users of this API.
>>>>>
>>>>> The hole idea is that an SKB is NOT allocated yet, and not needed at
>>>>> this level.  If we start supporting calling underlying SKB functions,
>>>>> then we will end-up in the same place (performance wise).  
>>>>
>>>>
>>>> I'm talking about the current skb-related BPF helper functions we have,
>>>> so the question is how much from that code we have we can reuse under
>>>> these constraints (obviously things like the tunnel helpers are a different
>>>> story) and if that trade-off is acceptable for us. I'm also thinking
>>>> that, for example, if you need to parse the packet data anyway for a drop
>>>> verdict, you might as well pass some meta data (that is set in the real
>>>> skb later on) for those packets that go up the stack.  
>>>
>>> Right, the meta data in this case is an abstracted receive descriptor.
>>> This would include items that we get in a device receive descriptor
>>> (computed checksum, hash, VLAN tag). This is purposely a small
>>> restricted data structure. I'm hoping we can minimize the size of this
>>> to not much more than 32 bytes (including pointers to data and
>>> linkage).
>>
>> I agree.
>>  
>>> How this translates to skb to maintain compatibility is with BPF
>>> interesting question. One other consideration is that skb's are kernel
>>> specific, we should be able to use the same BPF filter program in
>>> userspace over DPDK for instance-- so an skb interface as the packet
>>> abstraction might not be the right model...
>>
>> I agree.  I don't think reusing the SKB data structure is the right
>> model.  We should drop the SKB pointer from the API.
>>
>> As Tom also points out, making the BPF interface independent of the SKB
>> meta-data structure, would also make the eBPF program more generally
>> applicable.
> The initial approach that I tried went down this path. Alexei advised
> that I use the pseudo skb, and in the future the API between drivers and
> bpf can change to adopt non-skb context. The only user facing ABIs in
> this patchset are the IFLA, the xdp_metadata struct, and the name of the
> new enum.
> 
> The reason to use a pseudo skb for now is that there will be a fair
> amount of churn to get bpf jit and interpreter to understand non-skb
> context in the bpf_load_pointer() code. I don't see the need for
> requiring that for this patchset, as it will be internal-only change
> if/when we use something else.

Another option would be to have per driver JIT code to patch up the
skb read/loads with descriptor reads and metadata. From a strictly
performance stand point it should be better than pseudo skbs.

.John

  reply	other threads:[~2016-04-04 16:07 UTC|newest]

Thread overview: 62+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-02  1:21 [RFC PATCH 0/5] Add driver bpf hook for early packet drop Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 1/5] bpf: add PHYS_DEV prog type for early driver filter Brenden Blanco
2016-04-02 16:39   ` Tom Herbert
2016-04-03  7:02     ` Brenden Blanco
2016-04-04 22:07       ` Thomas Graf
2016-04-05  8:19         ` Jesper Dangaard Brouer
2016-04-04  8:49   ` Daniel Borkmann
2016-04-04 13:07     ` Jesper Dangaard Brouer
2016-04-04 13:36       ` Daniel Borkmann
2016-04-04 14:09         ` Tom Herbert
2016-04-04 15:12           ` Jesper Dangaard Brouer
2016-04-04 15:29             ` Brenden Blanco
2016-04-04 16:07               ` John Fastabend [this message]
2016-04-04 16:17                 ` Brenden Blanco
2016-04-04 20:00                   ` Alexei Starovoitov
2016-04-04 22:04                     ` Thomas Graf
2016-04-05  2:25                       ` Alexei Starovoitov
2016-04-05  8:11                         ` Jesper Dangaard Brouer
2016-04-05  9:29                     ` Jesper Dangaard Brouer
2016-04-05 22:06                       ` Alexei Starovoitov
2016-04-04 14:33       ` Eric Dumazet
2016-04-04 15:18         ` Edward Cree
2016-04-02  1:21 ` [RFC PATCH 2/5] net: add ndo to set bpf prog in adapter rx Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 3/5] rtnl: add option for setting link bpf prog Brenden Blanco
2016-04-02  1:21 ` [RFC PATCH 4/5] mlx4: add support for fast rx drop bpf program Brenden Blanco
2016-04-02  2:08   ` Eric Dumazet
2016-04-02  2:47     ` Alexei Starovoitov
2016-04-04 14:57       ` Jesper Dangaard Brouer
2016-04-04 15:22         ` Eric Dumazet
2016-04-04 18:50           ` Alexei Starovoitov
2016-04-05 14:15             ` Or Gerlitz
2016-04-06  4:05               ` Brenden Blanco
2016-04-03  6:15     ` Brenden Blanco
2016-04-05  2:20       ` Brenden Blanco
2016-04-05  2:44         ` Eric Dumazet
2016-04-05 18:59         ` Eran Ben Elisha
2016-04-02  8:23   ` Jesper Dangaard Brouer
2016-04-03  6:11     ` Brenden Blanco
2016-04-04 18:27       ` Alexei Starovoitov
2016-04-05  6:04         ` Jesper Dangaard Brouer
2016-04-02 18:40   ` Johannes Berg
2016-04-03  6:38     ` Brenden Blanco
2016-04-04  7:35       ` Johannes Berg
2016-04-04  9:57         ` Daniel Borkmann
2016-04-04 18:46           ` Alexei Starovoitov
2016-04-04 21:01             ` Daniel Borkmann
2016-04-05  1:17               ` Alexei Starovoitov
2016-04-04  8:33   ` Jesper Dangaard Brouer
2016-04-04  9:22   ` Daniel Borkmann
2016-04-02  1:21 ` [RFC PATCH 5/5] Add sample for adding simple drop program to link Brenden Blanco
2016-04-06 19:48   ` Jesper Dangaard Brouer
2016-04-06 20:01     ` Jesper Dangaard Brouer
2016-04-06 23:11       ` Alexei Starovoitov
2016-04-06 20:03     ` Daniel Borkmann
2016-04-02 16:47 ` [RFC PATCH 0/5] Add driver bpf hook for early packet drop Tom Herbert
2016-04-03  5:41   ` Brenden Blanco
2016-04-04  7:48     ` Jesper Dangaard Brouer
2016-04-04 18:10       ` Alexei Starovoitov
2016-04-02 18:41 ` Johannes Berg
2016-04-02 22:57   ` Tom Herbert
2016-04-03  2:28     ` Lorenzo Colitti
2016-04-04  7:37       ` Johannes Berg

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57029127.3040303@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=alexei.starovoitov@gmail.com \
    --cc=bblanco@plumgrid.com \
    --cc=brouer@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    --cc=ogerlitz@mellanox.com \
    --cc=tom@herbertland.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.