From: Chema Gonzalez <chema@google.com>
To: Alexei Starovoitov <ast@plumgrid.com>
Cc: David Miller <davem@davemloft.net>,
Eric Dumazet <edumazet@google.com>,
Daniel Borkmann <dborkman@redhat.com>,
Network Development <netdev@vger.kernel.org>
Subject: Re: [PATCH v5 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF
Date: Mon, 19 May 2014 15:23:53 -0700 [thread overview]
Message-ID: <CA+ZOOTPmSPeBrSHAsJtyxkAUnZDpgdhhw2PkrCJ3LhmG+ZSRYA@mail.gmail.com> (raw)
In-Reply-To: <CAMEtUuxBCNqbYctdxvWiJbp0mY__dKjP-RU8Ao-x3osVbFkxVg@mail.gmail.com>
On Fri, May 16, 2014 at 3:00 PM, Alexei Starovoitov <ast@plumgrid.com> wrote:
> this is way too long for commit log. These details can go after '---' line.
This can be fixed easily ;). I must admit I'm surprised by this
request: The way I see it, a commit log is not part of the code, but
text that justifies a patch. Therefore, being extremely descriptive is
a good thing. But it's got an easy solution.
>> + * such field has been inited)
>> + */
>> + context = (void *)FP - BPF_MEMWORDS * 4 - sizeof(*context);
>> + context->flow_initted = false;
>> +
>
> Neither this code nor comment belong in interpreter which is striving to
> be generic. Both have a meaning only for your new instructions in
> socket filters. They do not apply to seccomp and to tracing.
> Calling it 'struct sk_run_filter_ctx' is equally misleading.
> 'struct flow_keys' has nothing to do with seccomp and even with
> normal tcpdump filters.
I'd argue specificity already pervades the BPF filter. Almost all the
ancillary loads (now bpf_calls) assume a sk_buff in the ctx field (all
but the get_cpu and random). If the user's filter has a "ld #nla"
insn, the eBPF filter ends up having "mov(ctx, arg1); mov(A, arg2);
...; call(__skb_get_nlattr)", which assumes ctx has an skb, and runs
"nla_find((struct nlattr *) &skb->data[a], ...)".
Now, this specificity is needed: We're using BPF filters to filter
packets. It would make no sense not to use all the packet-related
functions in the kernel.
> We've spent so much time to generalize it and now you want to take
> it back into socket specific territory. As I said in the previous email
> it's totally not ok to customize eBPF just for one use case especially
> when you're not sharing final usage scenario.
Sorry if I haven't mentioned it before.
Right now I use the following filter to detect some packets I'm
interested on ('ip[1] == 0x03 and tcp and dst port 80' in tcpdump
expression-ish):
$ cat tools/net/tos_and_tcp_and_dport.bpf
ldh [12]
jne #0x800, drop
ldb [15]
jne #0x3, drop
ldb [23]
jne #0x6, drop
ldh [20]
jset #0x1fff, drop
ldxb 4*([14]&0xf)
ldh [x + 16]
jne #0x50, drop
ret #-1
drop: ret #0
Now, I want to do the same filtering in encapsulated traffic (vlan and
gre). I'd like to be able to write something like:
$ cat tools/net/tos_and_tcp_and_dport_generic.bpf
ld #nproto
jne #0x800, drop
ld #nhoff
ldh [x + 1]
jne #0x3, drop
...
This will automatically support all the de-encapsulation performed by
the flow dissector.
> Any uapi change must be strongly justified.
I hope I have managed to convince you with the example above. Still,
my proposal only add a couple of ancillary loads.
> Other than design issue there is technical problem too.
> You're not updating eBPF JIT and context->flow_initted will have junk,
> so ld_poff will be returning junk too.
> JITs and interpreter must match. That was always the case even in
> old days of classic BPF. Interpreter initialized A and X to zero,
> JITs had to do the same. If you add stuff like this to interpreter,
> JITs would have to follow. Fast forward few years and this is not scalable.
> You have several JITs around and adding custom stuff to interpreter
> may not even be possible to do in JITs.
This is a fair point. In fact my patch is breaking "ld poff."
I think the previous mechanism where a new insn added to BPF would
cause the BPF JIT engine to refuse to run was a more scalable: New
stuff always went in the non-jitted runner, and people interested in
the jitted version will port when they needed it. We definitely not
want to require people adding new features to provide 6 versions of
the code.
> Therefore I'm strongly against adding custom C code to interpreter.
> It's an interpreter of eBPF instructions. Nothing else.
> At the same time adding new eBPF instructions would be completely fine.
> Like right now we don't have < and <= conditional jumps, which adds
> unnecessary complexity to tracing filters. So we may still expand
> interpreter and JITs in the future.
>
> Going back to your use case… assuming that you share how user
> space will use new anc loads… you can do equivalent initialization
> of 'flow_initted' by emitting extra eBPF insn while converting classic
> to internal. Don't do it unconditionally though. tcpdump filters should
> not suffer. You can scan program for presence of ld_poff, ld_toff,
> ld_tproto and if they're present, emit single eBPF 'st [fp - off], 0'
> ST_MEM instruction as a first instruction of converted filter. All eBPF
> JITs will compile it to native automatically. You'll have good
> performance and all existing filters will not regress.
Sounds fair. I'll think of a good mechanism to allow other
instructions "sharing" state among themselves.
-Chema
>
>> /* Register for user BPF programs need to be reset first. */
>> regs[BPF_REG_A] = 0;
>> regs[BPF_REG_X] = 0;
>> @@ -602,7 +619,10 @@ static unsigned int pkt_type_offset(void)
>>
>> static u64 __skb_get_pay_offset(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
>> {
>> - return __skb_get_poff((struct sk_buff *)(unsigned long) ctx);
>> + struct sk_run_filter_ctx *context = (void *) r4 - BPF_MEMWORDS * 4 -
>> + sizeof(*context);
>> + return __skb_get_poff((struct sk_buff *)(unsigned long) ctx,
>> + &context->flow, &context->flow_initted);
>> }
>>
>> static u64 __skb_get_nlattr(u64 ctx, u64 a, u64 x, u64 r4, u64 r5)
>> @@ -783,6 +803,10 @@ static bool convert_bpf_extensions(struct sock_filter *fp,
>> *insn = BPF_ALU64_REG(BPF_MOV, BPF_REG_ARG3, BPF_REG_X);
>> insn++;
>>
>> + /* arg4 = FP */
>> + *insn = BPF_ALU64_REG(BPF_MOV, BPF_REG_ARG4, BPF_REG_FP);
>> + insn++;
>> +
>> /* Emit call(ctx, arg2=A, arg3=X) */
>> insn->code = BPF_JMP | BPF_CALL;
>> switch (fp->k) {
>> diff --git a/net/core/flow_dissector.c b/net/core/flow_dissector.c
>> index 107ed12..cefe1d2 100644
>> --- a/net/core/flow_dissector.c
>> +++ b/net/core/flow_dissector.c
>> @@ -275,16 +275,20 @@ EXPORT_SYMBOL(__skb_tx_hash);
>> * truncate packets without needing to push actual payload to the user
>> * space and can analyze headers only, instead.
>> */
>> -u32 __skb_get_poff(const struct sk_buff *skb)
>> +u32 __skb_get_poff(const struct sk_buff *skb, struct flow_keys *flow,
>> + bool *flow_initted)
>> {
>> - struct flow_keys keys;
>> u32 poff = 0;
>>
>> - if (!skb_flow_dissect(skb, &keys))
>> - return 0;
>> + /* check whether the flow dissector has already been run */
>> + if (!*flow_initted) {
>> + if (!skb_flow_dissect(skb, flow))
>> + return 0;
>> + *flow_initted = true;
>> + }
>>
>> - poff += keys.thoff;
>> - switch (keys.ip_proto) {
>> + poff += flow->thoff;
>> + switch (flow->ip_proto) {
>> case IPPROTO_TCP: {
>> const struct tcphdr *tcph;
>> struct tcphdr _tcph;
>> --
>> 1.9.1.423.g4596e3a
>>
next prev parent reply other threads:[~2014-05-19 22:23 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-30 18:29 [PATCH] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-04-30 22:21 ` Alexei Starovoitov
2014-05-01 10:53 ` Daniel Borkmann
2014-05-01 10:55 ` Daniel Borkmann
2014-05-01 18:44 ` Chema Gonzalez
2014-05-01 18:44 ` [PATCH net-next v2] " Chema Gonzalez
2014-05-02 16:21 ` Ben Hutchings
2014-05-02 21:49 ` David Miller
2014-05-03 0:53 ` Chema Gonzalez
2014-05-03 2:52 ` David Miller
2014-05-05 18:42 ` Chema Gonzalez
2014-05-05 19:12 ` David Miller
2014-05-14 18:42 ` Chema Gonzalez
2014-05-14 18:51 ` Chema Gonzalez
2014-05-14 18:42 ` [PATCH v2 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF Chema Gonzalez
2014-05-14 18:42 ` [PATCH v2 net-next 2/3] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-14 18:42 ` [PATCH v2 net-next 3/3] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-14 18:51 ` [PATCH v3 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF Chema Gonzalez
2014-05-14 20:05 ` Alexei Starovoitov
2014-05-14 21:51 ` Chema Gonzalez
2014-05-14 22:44 ` Alexei Starovoitov
2014-05-16 18:41 ` Chema Gonzalez
2014-05-14 18:51 ` [PATCH v3 net-next 2/3] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-14 18:51 ` [PATCH v3 net-next 3/3] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-16 18:41 ` [PATCH v5 net-next 1/3] net: flow_dissector: avoid multiple calls in BPF Chema Gonzalez
2014-05-16 22:00 ` Alexei Starovoitov
2014-05-19 22:23 ` Chema Gonzalez [this message]
2014-05-20 9:58 ` Daniel Borkmann
2014-05-16 18:41 ` [PATCH v5 net-next 2/3] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-16 18:41 ` [PATCH v5 net-next 3/3] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-29 18:55 ` [PATCH v6 net-next 1/4] net: flow_dissector: avoid multiple calls in eBPF Chema Gonzalez
2014-05-29 23:54 ` Daniel Borkmann
2014-05-30 17:12 ` Chema Gonzalez
2014-06-02 12:36 ` Daniel Borkmann
2014-06-02 16:48 ` Alexei Starovoitov
2014-06-03 8:33 ` Daniel Borkmann
2014-06-03 20:15 ` Alexei Starovoitov
2014-06-03 21:12 ` Chema Gonzalez
2014-06-04 8:51 ` Daniel Borkmann
2014-06-05 6:55 ` David Miller
2014-06-20 21:56 ` Chema Gonzalez
2014-06-24 8:14 ` Daniel Borkmann
2014-06-25 22:00 ` Chema Gonzalez
2014-06-27 10:19 ` Daniel Borkmann
2014-05-29 18:56 ` [PATCH v6 net-next 2/4] net: filter: add insn for loading internal transport header offset Chema Gonzalez
2014-05-29 18:56 ` [PATCH v6 net-next 3/4] net: filter: add insn for loading internal transport header proto Chema Gonzalez
2014-05-29 18:56 ` [PATCH v6 net-next 4/4] net: filter: minor BPF cleanups Chema Gonzalez
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CA+ZOOTPmSPeBrSHAsJtyxkAUnZDpgdhhw2PkrCJ3LhmG+ZSRYA@mail.gmail.com \
--to=chema@google.com \
--cc=ast@plumgrid.com \
--cc=davem@davemloft.net \
--cc=dborkman@redhat.com \
--cc=edumazet@google.com \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).