Re: [PATCH v3 net-next 2/2] tc: add 'needs_l2' flag to ingress qdisc

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: Alexei Starovoitov <ast@plumgrid.com>
To: David Miller <davem@davemloft.net>
Cc: daniel@iogearbox.net, tgraf@suug.ch, jiri@resnulli.us,
	jhs@mojatatu.com, netdev@vger.kernel.org
Subject: Re: [PATCH v3 net-next 2/2] tc: add 'needs_l2' flag to ingress qdisc
Date: Wed, 08 Apr 2015 22:20:47 -0700	[thread overview]
Message-ID: <55260C2F.608@plumgrid.com> (raw)
In-Reply-To: <5525F48F.5030108@plumgrid.com>

On 4/8/15 8:39 PM, Alexei Starovoitov wrote:
> On 4/8/15 8:14 PM, David Miller wrote:
>> From: Alexei Starovoitov <ast@plumgrid.com>
>> Date: Wed, 08 Apr 2015 20:05:13 -0700
>>
>>> I'm sure there is a way to propagate the offset into the programs.
>>> It's not about efficiency of programs, but about consistency.
>>> Programs should know nothing about kernel. Sending network offset
>>> into them is exposing this very specific kernel behavior.
>>
>> It can be performed by the data access helpers the JIT'd programs
>> have to invoke anyways.
>
> hmm, not sure what you mean.
> Let's take specific line from sockex1_kern.c:
> int index = load_byte(skb, ETH_HLEN + offsetof(struct iphdr, protocol));
> this C code is compiled into
> R0 = LD_ABS_B 23
> (instruction with fixed offset)
> which is being interpreted as:
> skb_header_pointer(skb, 23, 1, buffer);
> and similar by JITs which are using doing
> r0 = *(char *)(skb->data + 23)
> in this case.
>
> Are you proposing to change semantics of LD_ABS instruction to use
> skb->head + skb->mac_header instead of skb->data in interpreter
> and in all JITs?
> Performance wise it will be ok, since JITs can cache that pointer.
> But that will be huge and very risky change.
> I'm not sure yet whether all programs will keep working afterwards.
> Is it really worth taking so much risk vs push/pull of L2?
> If you say, let's take the risk, sure, I can try hacking all the bits
> and see what the cost.

have to correct myself.
we cannot change the meaning of ld_abs, since for dgram sockets
offset is actually not pointing to L2.
af_packet is doing:
  if (sk->sk_type != SOCK_DGRAM)
       skb_push(skb, skb->data - skb_mac_header(skb));
  res = run_filter(skb, sk, snaplen);
so not everything assumes L2. raw socket taps, ppp, team, cls_bpf
certainly want to see L2.
I still hate to see ingress and egress cls/act programs to be different.
af_packet also does:
skb = skb_share_check(skb, GFP_ATOMIC);

so Dave, would you consider early_ingress_l2() hook that doesn't need
to do skb_share_check ?
If not, the only option I have is to introduce a set of helpers
that can read from packet where offset is always starts at L2.
Then we can say that cls/act_bpf should never use ld_abs/ld_ind
instructions if they want to run on both ingress and egress and
should use these new read helpers instead.

next prev parent reply	other threads:[~2015-04-09  5:20 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-04-08 23:26 [PATCH v3 net-next 1/2] net: introduce skb_postpush_rcsum() helper Alexei Starovoitov
2015-04-08 23:26 ` [PATCH v3 net-next 2/2] tc: add 'needs_l2' flag to ingress qdisc Alexei Starovoitov
2015-04-09  2:44   ` David Miller
2015-04-09  3:05     ` Alexei Starovoitov
2015-04-09  3:14       ` David Miller
2015-04-09  3:39         ` Alexei Starovoitov
2015-04-09  5:20           ` Alexei Starovoitov [this message]
2015-04-09  5:25             ` David Miller
2015-04-09 15:15             ` Daniel Borkmann
2015-04-09 15:36               ` Eric Dumazet
2015-04-09 17:20                 ` Alexei Starovoitov
2015-04-09 10:49       ` Jamal Hadi Salim
2015-04-09 11:02         ` Jamal Hadi Salim
2015-04-09 15:38         ` Daniel Borkmann
2015-04-10 11:49           ` Jamal Hadi Salim
2015-04-09 17:03         ` Alexei Starovoitov
2015-04-10 12:48           ` Jamal Hadi Salim
2015-04-10 22:35             ` Alexei Starovoitov
2015-04-13 14:28               ` Jamal Hadi Salim
2015-04-13 16:13                 ` Alexei Starovoitov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=55260C2F.608@plumgrid.com \
    --to=ast@plumgrid.com \
    --cc=daniel@iogearbox.net \
    --cc=davem@davemloft.net \
    --cc=jhs@mojatatu.com \
    --cc=jiri@resnulli.us \
    --cc=netdev@vger.kernel.org \
    --cc=tgraf@suug.ch \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).