From: Alexei Starovoitov <ast@plumgrid.com>
To: Jamal Hadi Salim <jhs@mojatatu.com>,
Daniel Borkmann <daniel@iogearbox.net>,
Jiri Pirko <jiri@resnulli.us>
Cc: Thomas Graf <tgraf@suug.ch>, David Miller <davem@davemloft.net>,
netdev@vger.kernel.org
Subject: Re: [PATCH v2 net-next 2/2] tc: make ingress and egress qdiscs consistent
Date: Wed, 08 Apr 2015 09:26:36 -0700 [thread overview]
Message-ID: <552556BC.8020108@plumgrid.com> (raw)
In-Reply-To: <55253289.8020306@mojatatu.com>
On 4/8/15 6:52 AM, Jamal Hadi Salim wrote:
> On 04/08/15 09:41, Daniel Borkmann wrote:
>> On 04/08/2015 03:34 PM, Jiri Pirko wrote:
>>> Wed, Apr 08, 2015 at 03:27:57PM CEST, daniel@iogearbox.net wrote:
>>>> On 04/08/2015 03:14 PM, Thomas Graf wrote:
>>>>> On 04/08/15 at 08:58am, Jamal Hadi Salim wrote:
>>>>>> On 04/08/15 08:31, Daniel Borkmann wrote:
>>>>>>> That means the tc's cls_u32
>>>>>>> sample selectors a la ip, ip6, udp, tcp, icmp don't work on ingress
>>>>>>> either,so in u32 speak you would need to do that by hand, but that
>>>>>>> doesn't work as you don't have the Ethernet type context available.
>>>>>>> Am I missing something? :)
>>>>>>
>>>>>> u32 works fine. I am sure i have tests which run these on both
>>>>>> in/egress.
>>>>>
>>>>> His point is that an u32 filter written for egress won't work at
>>>>> ingress because the offsets are different. This has always been the
>>>>> case and we can't break this behaviour either. I'm sure you have
>>>>> these weird negative offset u32 egress filters in your repertoire
>>>>> as well ;-)
>>>>
>>>> Okay, you can use negative offsets in cls_u32 to accomodate for
>>>> that; so yeah, you'd need to implement your filter differently
>>>> on ingress. That should also work on cls_bpf et al.
>>>
>>> That is certainly doable. But is that what we want? I don't think so. I
>>> would like to have the same for in/eg.
>>
>> I mean it's certainly a non-obvious hack, where user space has to
>> fix up something that the kernel should have gotten right in the
>> first place. :/
>
> Try something like:
> on ingress
> filter add dev eth0 parent ffff: protocol ip prio 6 u32 match ip src \
> 10.0.0.21/32 flowid 1:16 action ok
> and egress
> filter add dev eth0 parent 1: protocol ip prio 6 u32 match ip src \
> 10.0.0.21/32 flowid 1:16 action ok
>
> and see if you need negative offsets for anything.
>
> If you really must adjust in the kernel then use the indicators which
> tell you where you are at.
>
> Negative offsets are used if you must go beyond the network header. This
> has been the expectation so far (whether it is actions or classifiers),
> example, here is something that mucks around with ethernet
> headers:
>
> tc filter add dev eth0 parent ffff: protocol ip prio 10 u32 \
> match ip src 192.168.1.10/32 flowid 1:2 \
> action pedit munge offset -14 u16 set 0x0000 \
> munge offset -12 u32 set 0x00010100 \
> munge offset -8 u32 set 0x0aaf0100 \
> munge offset -4 u32 set 0x0008ec06 pipe \
> action mirred egress redirect dev eth1
here u32 assumes ethernet L2 and can hard code -14.
bpf programs cannot. Their key feature is parsing the packet.
Look at libpcap it can parse weird protocols that kernel doesn't
even know about.
Theoretically we can pass hard_header_len/network_offset/starting_offset
into bpf program and add this offset everywhere. It would mean that
LD_ABS instruction becomes useless and performance immediately drops.
We cannot use libpcap and pretty much all existing classic and extended
bpf programs, since they assume starting at L2 and parsing its way.
That is awful already, but more importantly this l2 vs l3 or
ingress vs egress qdisc exposes this kernel specific difference to user
space. bpf programs should know nothing about the kernel.
They should only see packet + metadata. So network_offset+-off style of
access for bpf programs is non starter. It's perfectly fine for other
classifiers/actions because their are part of the kernel.
Reading through the thread looks like the only option is to add
'needs_l2' flag to ingress qdisc and if it's there do skb_share_check
and push. Also add similar flags to classifier/actions and only
allow them to be added to such ingress qdisc if flags match.
It's a bit intrusive, but won't cost any performance for
existing users.
Another alternative is in cls_bpf do:
if(skb_shared) return -1;
else push L2 and run the program.
This way if tap is added to the interface with ingress+bpf
the programs will stop working as soon as tap is added.
That's not great, but very small isolated change.
My preference is to add 'needs_l2' flag to ingress qdisc.
next prev parent reply other threads:[~2015-04-08 16:26 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-04-08 1:03 [PATCH v2 net-next 1/2] net: introduce skb_postpush_rcsum() helper Alexei Starovoitov
2015-04-08 1:03 ` [PATCH v2 net-next 2/2] tc: make ingress and egress qdiscs consistent Alexei Starovoitov
2015-04-08 2:35 ` David Miller
2015-04-08 3:22 ` Alexei Starovoitov
2015-04-08 4:48 ` Alexei Starovoitov
2015-04-08 8:36 ` Daniel Borkmann
2015-04-08 9:05 ` Jiri Pirko
2015-04-08 10:54 ` Daniel Borkmann
2015-04-08 11:14 ` Daniel Borkmann
2015-04-08 11:47 ` Jamal Hadi Salim
2015-04-08 12:31 ` Daniel Borkmann
2015-04-08 12:58 ` Jamal Hadi Salim
2015-04-08 13:14 ` Thomas Graf
2015-04-08 13:27 ` Daniel Borkmann
2015-04-08 13:34 ` Jiri Pirko
2015-04-08 13:41 ` Daniel Borkmann
2015-04-08 13:47 ` Thomas Graf
2015-04-08 13:52 ` Jamal Hadi Salim
2015-04-08 14:53 ` Daniel Borkmann
2015-04-08 16:26 ` Alexei Starovoitov [this message]
2015-04-08 16:32 ` David Miller
2015-04-08 16:44 ` Alexei Starovoitov
2015-04-08 16:54 ` Daniel Borkmann
2015-04-08 11:43 ` Jamal Hadi Salim
2015-04-08 7:25 ` [PATCH v2 net-next 1/2] net: introduce skb_postpush_rcsum() helper Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=552556BC.8020108@plumgrid.com \
--to=ast@plumgrid.com \
--cc=daniel@iogearbox.net \
--cc=davem@davemloft.net \
--cc=jhs@mojatatu.com \
--cc=jiri@resnulli.us \
--cc=netdev@vger.kernel.org \
--cc=tgraf@suug.ch \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).