netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Chema Gonzalez <chema@google.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: David Miller <davem@davemloft.net>,
	Eric Dumazet <edumazet@google.com>,
	Daniel Borkmann <dborkman@redhat.com>,
	Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH v5 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF
Date: Mon, 19 May 2014 15:23:53 -0700	[thread overview]
Message-ID: <CA+ZOOTPmSPeBrSHAsJtyxkAUnZDpgdhhw2PkrCJ3LhmG+ZSRYA@mail.gmail.com> (raw)
In-Reply-To: <CAMEtUuxBCNqbYctdxvWiJbp0mY__dKjP-RU8Ao-x3osVbFkxVg@mail.gmail.com>

On Fri, May 16, 2014 at 3:00 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> this is way too long for commit log. These details can go after '---' line.
This can be fixed easily ;). I must admit I'm surprised by this
request: The way I see it, a commit log is not part of the code, but
text that justifies a patch. Therefore, being extremely descriptive is
a good thing. But it's got an easy solution.

>> +        * such field has been inited)
>> +        */
>> +       context = (void *)FP - BPF_MEMWORDS * 4 - sizeof(*context);
>> +       context->flow_initted = false;
>> +
>
> Neither this code nor comment belong in interpreter which is striving to
> be generic. Both have a meaning only for your new instructions in
> socket filters. They do not apply to seccomp and to tracing.
> Calling it 'struct sk_run_filter_ctx' is equally misleading.
> 'struct flow_keys' has nothing to do with seccomp and even with
> normal tcpdump filters.
I'd argue specificity already pervades the BPF filter. Almost all the
ancillary loads (now bpf_calls) assume a sk_buff in the ctx field (all
but the get_cpu and random). If the user's filter has a "ld #nla"
insn, the eBPF filter ends up having "mov(ctx, arg1); mov(A, arg2);
...; call(__skb_get_nlattr)", which assumes ctx has an skb, and runs
"nla_find((struct nlattr *) &skb->data[a], ...)".

Now, this specificity is needed: We're using BPF filters to filter
packets. It would make no sense not to use all the packet-related
functions in the kernel.

> We've spent so much time to generalize it and now you want to take
> it back into socket specific territory. As I said in the previous email
> it's totally not ok to customize eBPF just for one use case especially
> when you're not sharing final usage scenario.
Sorry if I haven't mentioned it before.

Right now I use the following filter to detect some packets I'm
interested on ('ip[1] == 0x03 and tcp and dst port 80' in tcpdump
expression-ish):

$ cat tools/net/tos_and_tcp_and_dport.bpf
ldh [12]
jne #0x800, drop
ldb [15]
jne #0x3, drop
ldb [23]
jne #0x6, drop
ldh [20]
jset #0x1fff, drop
ldxb 4*([14]&0xf)
ldh [x + 16]
jne #0x50, drop
ret #-1
drop: ret #0

Now, I want to do the same filtering in encapsulated traffic (vlan and
gre). I'd like to be able to write something like:

$ cat tools/net/tos_and_tcp_and_dport_generic.bpf
ld #nproto
jne #0x800, drop
ld #nhoff
ldh [x + 1]
jne #0x3, drop
...

This will automatically support all the de-encapsulation performed by
the flow dissector.

> Any uapi change must be strongly justified.
I hope I have managed to convince you with the example above. Still,
my proposal only add a couple of ancillary loads.

> Other than design issue there is technical problem too.
> You're not updating eBPF JIT and context->flow_initted will have junk,
> so ld_poff will be returning junk too.
> JITs and interpreter must match. That was always the case even in
> old days of classic BPF. Interpreter initialized A and X to zero,
> JITs had to do the same. If you add stuff like this to interpreter,
> JITs would have to follow. Fast forward few years and this is not scalable.
> You have several JITs around and adding custom stuff to interpreter
> may not even be possible to do in JITs.
This is a fair point. In fact my patch is breaking "ld poff."

I think the previous mechanism where a new insn added to BPF would
cause the BPF JIT engine to refuse to run was a more scalable: New
stuff always went in the non-jitted runner, and people interested in
the jitted version will port when they needed it. We definitely not
want to require people adding new features to provide 6 versions of
the code.

> Therefore I'm strongly against adding custom C code to interpreter.
> It's an interpreter of eBPF instructions. Nothing else.
> At the same time adding new eBPF instructions would be completely fine.
> Like right now we don't have < and <= conditional jumps, which adds
> unnecessary complexity to tracing filters. So we may still expand
> interpreter and JITs in the future.
>
> Going back to your use case… assuming that you share how user
> space will use new anc loads… you can do equivalent initialization
> of 'flow_initted' by emitting extra eBPF insn while converting classic
> to internal. Don't do it unconditionally though. tcpdump filters should
> not suffer. You can scan program for presence of ld_poff, ld_toff,
> ld_tproto and if they're present, emit single eBPF 'st [fp - off], 0'
> ST_MEM instruction as a first instruction of converted filter. All eBPF
> JITs will compile it to native automatically. You'll have good
> performance and all existing filters will not regress.
Sounds fair. I'll think of a good mechanism to allow other
instructions "sharing" state among themselves.

-Chema


>
>>         /* Register for user BPF programs need to be reset first. */
>>         regs[BPF_REG_A] = 0;
>>         regs[BPF_REG_X] = 0;
>> @@ -602,7 +619,10 @@ static unsigned int pkt_type_offset(void)
>>
>>  static u64 __skb_get_pay_offset(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
>>  {
>> -       return __skb_get_poff((struct sk_buff *)(unsigned long) ctx);
>> +       struct sk_run_filter_ctx *context = (void *) r4 - BPF_MEMWORDS * 4 -
>> +                       sizeof(*context);
>> +       return __skb_get_poff((struct sk_buff *)(unsigned long) ctx,
>> +                       &context->flow, &context->flow_initted);
>>  }
>>
>>  static u64 __skb_get_nlattr(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
>> @@ -783,6 +803,10 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
>>                 *insn = BPF_ALU64_REG(BPF_MOV, BPF_REG_ARG3, BPF_REG_X);
>>                 insn++;
>>
>> +               /* arg4 = FP */
>> +               *insn = BPF_ALU64_REG(BPF_MOV, BPF_REG_ARG4, BPF_REG_FP);
>> +               insn++;
>> +
>>                 /* Emit call(ctx, arg2=A, arg3=X) */
>>                 insn->code = BPF_JMP | BPF_CALL;
>>                 switch (fp->k) {
>> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>> index 107ed12..cefe1d2 100644
>> --- a/net/core/flow_dissector.c
>> +++ b/net/core/flow_dissector.c
>> @@ -275,16 +275,20 @@ EXPORT_SYMBOL(__skb_tx_hash);
>>   * truncate packets without needing to push actual payload to the user
>>   * space and can analyze headers only, instead.
>>   */
>> -u32 __skb_get_poff(const struct sk_buff *skb)
>> +u32 __skb_get_poff(const struct sk_buff *skb, struct flow_keys *flow,
>> +               bool *flow_initted)
>>  {
>> -       struct flow_keys keys;
>>         u32 poff = 0;
>>
>> -       if (!skb_flow_dissect(skb, &keys))
>> -               return 0;
>> +       /* check whether the flow dissector has already been run */
>> +       if (!*flow_initted) {
>> +               if (!skb_flow_dissect(skb, flow))
>> +                       return 0;
>> +               *flow_initted = true;
>> +       }
>>
>> -       poff += keys.thoff;
>> -       switch (keys.ip_proto) {
>> +       poff += flow->thoff;
>> +       switch (flow->ip_proto) {
>>         case IPPROTO_TCP: {
>>                 const struct tcphdr *tcph;
>>                 struct tcphdr _tcph;
>> --
>> 1.9.1.423.g4596e3a
>>

  reply	other threads:[~2014-05-19 22:23 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2014-04-30 18:29 [PATCH] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-04-30 22:21 ` Alexei Starovoitov
2014-05-01 10:53 ` Daniel Borkmann
2014-05-01 10:55   ` Daniel Borkmann
2014-05-01 18:44     ` Chema Gonzalez
2014-05-01 18:44 ` [PATCH net-next v2] " Chema Gonzalez
2014-05-02 16:21   ` Ben Hutchings
2014-05-02 21:49   ` David Miller
2014-05-03  0:53     ` Chema Gonzalez
2014-05-03  2:52       ` David Miller
2014-05-05 18:42         ` Chema Gonzalez
2014-05-05 19:12           ` David Miller
2014-05-14 18:42             ` Chema Gonzalez
2014-05-14 18:51               ` Chema Gonzalez
2014-05-14 18:42 ` [PATCH v2 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF Chema Gonzalez
2014-05-14 18:42   ` [PATCH v2 net-next 2/3] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-14 18:42   ` [PATCH v2 net-next 3/3] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-14 18:51 ` [PATCH v3 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF Chema Gonzalez
2014-05-14 20:05   ` Alexei Starovoitov
2014-05-14 21:51     ` Chema Gonzalez
2014-05-14 22:44       ` Alexei Starovoitov
2014-05-16 18:41         ` Chema Gonzalez
2014-05-14 18:51 ` [PATCH v3 net-next 2/3] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-14 18:51 ` [PATCH v3 net-next 3/3] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-16 18:41 ` [PATCH v5 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF Chema Gonzalez
2014-05-16 22:00   ` Alexei Starovoitov
2014-05-19 22:23     ` Chema Gonzalez [this message]
2014-05-20  9:58       ` Daniel Borkmann
2014-05-16 18:41 ` [PATCH v5 net-next 2/3] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-16 18:41 ` [PATCH v5 net-next 3/3] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-29 18:55 ` [PATCH v6 net-next 1/4] net: flow_dissector: avoid multiple calls in eBPF Chema Gonzalez
2014-05-29 23:54   ` Daniel Borkmann
2014-05-30 17:12     ` Chema Gonzalez
2014-06-02 12:36       ` Daniel Borkmann
2014-06-02 16:48         ` Alexei Starovoitov
2014-06-03  8:33           ` Daniel Borkmann
2014-06-03 20:15             ` Alexei Starovoitov
2014-06-03 21:12               ` Chema Gonzalez
2014-06-04  8:51                 ` Daniel Borkmann
2014-06-05  6:55                   ` David Miller
2014-06-20 21:56                   ` Chema Gonzalez
2014-06-24  8:14                     ` Daniel Borkmann
2014-06-25 22:00                       ` Chema Gonzalez
2014-06-27 10:19                         ` Daniel Borkmann
2014-05-29 18:56 ` [PATCH v6 net-next 2/4] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-29 18:56 ` [PATCH v6 net-next 3/4] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-29 18:56 ` [PATCH v6 net-next 4/4] net: filter: minor BPF cleanups Chema Gonzalez

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+ZOOTPmSPeBrSHAsJtyxkAUnZDpgdhhw2PkrCJ3LhmG+ZSRYA@mail.gmail.com \
    --to=chema@google.com \
    --cc=ast@plumgrid.com \
    --cc=davem@davemloft.net \
    --cc=dborkman@redhat.com \
    --cc=edumazet@google.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).