RFC: New BGF 'LOOP' instruction

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

* RFC: New BGF 'LOOP' instruction
@ 2010-08-02 11:03 Paul LeoNerd Evans
  2010-08-02 11:13 ` RFC: New BPF " Paul LeoNerd Evans
                   ` (2 more replies)
  0 siblings, 3 replies; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-02 11:03 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 2670 bytes --]

---
 Proposal: Create a new BPF instruction, "LOOP", which can implement a
   specific time-bounded kind of while() loop over packet contents
---

IPv6 packets contain a linked-list of headers. Some other network
protocols may also contain linked-list structure.

BPF cannot implement loops.

Currently therefore, it is impossible to efficiently parse IPv6 packets
without resorting to such annoying tricks as statically unrolling a loop
into a long list of instructions. In IPv6's case this gets very large
very quickly, as different header types have different lengths, or
structure layouts.

I propose to add a new instruction, "LOOP", with the following
semantics:

 BPF_JMP|BPF_LOOP, jt

    If A == 0, fallthrough to the next instruction.
      (TODO: Or perhaps this should be considered a hard error which
       immediately aborts the filter, similar to divide by zero?)
    Otherwise:
       X += A.
       If X < len, jump backwards jt instructions.
       Otherwise, fallthrough to the next instruction

The following static checks would be enforced:

 None of the 'jt' preceeding instructions before the LOOP instruction
 (i.e. the body of the loop) may themselves be LOOP instructions, nor may
 they be STX.

The intention of this instruction is to be able to implement a loop in
which successive iterations advance the index register along the packet
buffer. By comparing X to the packet length, we can bound the running
time of the loop instruction, avoiding it locking up the kernel. By
banning STX instructions within the body of the loop, we can ensure that
X must be a strictly monotonically increasing sequence. At absolute
worst, X is increased by 1 each time, meaning at worst the body of the
loop must execute for every byte in the packet. By banning further
nested LOOP instructions, we can ensure at worst a linear running time.

I believe this addition should have minimal impact on existing users of
the filter layer, as it simply adds a new instruction and does not
otherwise change the semantics of any existing code. I also believe it
to be useful in writing filters that process IPv6 packets. I also
believe that the semantics and static checks are sufficient to preserve
the termination guarantee of BPF filter programs, ensuring each packet's
fate is decided in a timely fashion to avoid locking up the kernel.

Any comments on this, while I proceed? Barring any major complaints,
I'll have a hack at some code and present a patch in due course...

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BPF 'LOOP' instruction
  2010-08-02 11:03 RFC: New BGF 'LOOP' instruction Paul LeoNerd Evans
@ 2010-08-02 11:13 ` Paul LeoNerd Evans
  2010-08-02 20:16 ` RFC: New BGF " Hagen Paul Pfeifer
  2010-08-03  5:13 ` RFC: New BGF " David Miller
  2 siblings, 0 replies; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-02 11:13 UTC (permalink / raw)
  To: netdev

[-- Attachment #1: Type: text/plain, Size: 195 bytes --]

*ahem* Typo in the subject line there, sorry. I meant "BPF".


-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-02 11:03 RFC: New BGF 'LOOP' instruction Paul LeoNerd Evans
  2010-08-02 11:13 ` RFC: New BPF " Paul LeoNerd Evans
@ 2010-08-02 20:16 ` Hagen Paul Pfeifer
  2010-08-03  5:18   ` David Miller
  2010-08-03  7:18   ` RFC: New BPF " Paul LeoNerd Evans
  2010-08-03  5:13 ` RFC: New BGF " David Miller
  2 siblings, 2 replies; 26+ messages in thread
From: Hagen Paul Pfeifer @ 2010-08-02 20:16 UTC (permalink / raw)
  To: Paul LeoNerd Evans; +Cc: netdev

* Paul LeoNerd Evans | 2010-08-02 12:03:34 [+0100]:

Hello Paul,

>Currently therefore, it is impossible to efficiently parse IPv6 packets
>without resorting to such annoying tricks as statically unrolling a loop
>into a long list of instructions. In IPv6's case this gets very large
>very quickly, as different header types have different lengths, or
>structure layouts.
>
>I propose to add a new instruction, "LOOP", with the following
>semantics:
>
> BPF_JMP|BPF_LOOP, jt
>
>    If A == 0, fallthrough to the next instruction.
>      (TODO: Or perhaps this should be considered a hard error which
>       immediately aborts the filter, similar to divide by zero?)
>    Otherwise:
>       X += A.
>       If X < len, jump backwards jt instructions.
>       Otherwise, fallthrough to the next instruction
>
>The following static checks would be enforced:
>
> None of the 'jt' preceeding instructions before the LOOP instruction
> (i.e. the body of the loop) may themselves be LOOP instructions, nor may
> they be STX.

[..]

in principle I had no objectives against any BPF loop construct. But as said
this I question the significant advantages.

In general: BPF was constructed to address filters rules in a generic manner
and BPF does not contain any special protocol specific optimization - nor any
sophisticated connection tracking functionality. In general you should
pre-filter unneeded packets and shift the real high level filtering to some
post-processing step. tcpdump filter capabilities are limited and where never
designed to filter _any_ traffic. For example: you are lost if you want to match
transport layer fields like port number where the underlying IPv{4,6} packet
is fragmented.

Currently tcpdump/libpcap does not generate any IPv4 options or IPv6 
extension header code. The following byte code is generated by applying a 
"tcp" filter:

(000) ldh      [12]
(001) jeq      #0x86dd          jt 2    jf 4
(002) ldb      [20]
(003) jeq      #0x6             jt 7    jf 8
(004) jeq      #0x800           jt 5    jf 8
(005) ldb      [23]
(006) jeq      #0x6             jt 7    jf 8
(007) ret      #96
(008) ret      #0

Furthermore, I doubt that the loop provides any significant advantages. 
IPv6 extension header parsing is not that straight forward. For example 
check  the IPSec AH Extension header where the extension header length 
must interpreted differently because of a IPSec AH protcol defect. I assume
that a straight forward pcap encoded BPF opcode (composed of jump and load
instructions) is more efficient as an highly flexible loop construct. 

Last but not least I am interested in a RFC patch as well as a pcap patch (see
pcap-opt.c). You should not underrate the effort to generate an generic IPv6
extension header opcode optimizer - without this the newly introduced opcode
is pointless.

Hagen

PS: the LOOP opcode must be secure against any ressource attack -> the loop
must be break after n iterations.

-- 
Hagen Paul Pfeifer <hagen@jauu.net>  ||  http://jauu.net/
Telephone: +49 174 5455209           ||  Key Id: 0x98350C22
Key Fingerprint: 490F 557B 6C48 6D7E 5706 2EA2 4A22 8D45 9835 0C22

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-02 11:03 RFC: New BGF 'LOOP' instruction Paul LeoNerd Evans
  2010-08-02 11:13 ` RFC: New BPF " Paul LeoNerd Evans
  2010-08-02 20:16 ` RFC: New BGF " Hagen Paul Pfeifer
@ 2010-08-03  5:13 ` David Miller
  2010-08-03  7:04   ` Paul LeoNerd Evans
  2 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2010-08-03  5:13 UTC (permalink / raw)
  To: leonerd; +Cc: netdev

From: Paul LeoNerd Evans <leonerd@leonerd.org.uk>
Date: Mon, 2 Aug 2010 12:03:34 +0100

> Any comments on this, while I proceed? Barring any major complaints,
> I'll have a hack at some code and present a patch in due course...

We're not adding loop instructions, it's just asking for trouble
since any user can attach BPF filters to a socket and it's just
way too easy to make a loop endless.

There's a reason no loop primitives were added to the original
BPF specification, perhaps you should take a look at what their
reasoning was.

It still applies now.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-02 20:16 ` RFC: New BGF " Hagen Paul Pfeifer
@ 2010-08-03  5:18   ` David Miller
  2010-08-03  7:07     ` Paul LeoNerd Evans
  2010-08-03  9:03     ` Hagen Paul Pfeifer
  2010-08-03  7:18   ` RFC: New BPF " Paul LeoNerd Evans
  1 sibling, 2 replies; 26+ messages in thread
From: David Miller @ 2010-08-03  5:18 UTC (permalink / raw)
  To: hagen; +Cc: leonerd, netdev

From: Hagen Paul Pfeifer <hagen@jauu.net>
Date: Mon, 2 Aug 2010 22:16:29 +0200

> PS: the LOOP opcode must be secure against any ressource attack -> the loop
> must be break after n iterations.

Oh yeah, what is an iteration in your definition?  See this is why I
totally refuse to add a looping construct to BPF.

If you just check for a single loop hitting, the user will just use
a chaining of two looping constructs.  And then three levels of
indirection, then four, etc.  He can run up to just before exhasting
the "iteration limit" of one loop, and branch to the next one, and
so on and so forth.

There are probably a million ways to exploit this, and once you come
up with a validation or limiting scheme one of two things will happen:

1) The limiting scheme will make legitimate scripts USELESS

2) Someone will just figure out another hole to punch through and
   exploit

There's a reason no back branching construct was added to BPF to begin
with.  It violates the core design principles of BPF.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03  5:13 ` RFC: New BGF " David Miller
@ 2010-08-03  7:04   ` Paul LeoNerd Evans
  2010-08-03  7:18     ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03  7:04 UTC (permalink / raw)
  To: David Miller, netdev

[-- Attachment #1: Type: text/plain, Size: 1046 bytes --]

On Mon, Aug 02, 2010 at 10:13:41PM -0700, David Miller wrote:
> > Any comments on this, while I proceed? Barring any major complaints,
> > I'll have a hack at some code and present a patch in due course...
> 
> We're not adding loop instructions, it's just asking for trouble
> since any user can attach BPF filters to a socket and it's just
> way too easy to make a loop endless.
> 
> There's a reason no loop primitives were added to the original
> BPF specification, perhaps you should take a look at what their
> reasoning was.

Yes. I am very aware of that.

Please read carefully my suggestion. These loops cannot be made endless
- they will be bounded by, at most, the number of bytes in the packet
buffer. The loop is required to increment X at least 1 at every
iteration, and will not allow it to continue past the end of the packet.
This puts a strict bound on the runtime of the loop.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03  5:18   ` David Miller
@ 2010-08-03  7:07     ` Paul LeoNerd Evans
  2010-08-03  7:19       ` David Miller
  2010-08-03  9:03     ` Hagen Paul Pfeifer
  1 sibling, 1 reply; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03  7:07 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: hagen

[-- Attachment #1: Type: text/plain, Size: 1164 bytes --]

On Mon, Aug 02, 2010 at 10:18:13PM -0700, David Miller wrote:
> If you just check for a single loop hitting, the user will just use
> a chaining of two looping constructs.  And then three levels of
> indirection, then four, etc.  He can run up to just before exhasting
> the "iteration limit" of one loop, and branch to the next one, and
> so on and so forth.

And this is why part of my suggestion bans the use of a LOOP
instruction within the "body" of another, such that they cannot nest.

> There are probably a million ways to exploit this, and once you come
> up with a validation or limiting scheme one of two things will happen:
> 
> 1) The limiting scheme will make legitimate scripts USELESS

Rightnow, BPF is all but useless for parsing, say, IPv6. I only pick
IPv6 as one example, I'm sure there must exist a great number more
packet-based protocols that use a "linked-list" style approach to
headers. None of those are currently filterable on the current set of
instructions. LOOP would allow these.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BPF 'LOOP' instruction
  2010-08-02 20:16 ` RFC: New BGF " Hagen Paul Pfeifer
  2010-08-03  5:18   ` David Miller
@ 2010-08-03  7:18   ` Paul LeoNerd Evans
  1 sibling, 0 replies; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03  7:18 UTC (permalink / raw)
  To: Hagen Paul Pfeifer, netdev

[-- Attachment #1: Type: text/plain, Size: 5159 bytes --]

On Mon, Aug 02, 2010 at 10:16:29PM +0200, Hagen Paul Pfeifer wrote:
> In general: BPF was constructed to address filters rules in a generic manner
> and BPF does not contain any special protocol specific optimization - nor any
> sophisticated connection tracking functionality. In general you should
> pre-filter unneeded packets and shift the real high level filtering to some
> post-processing step. tcpdump filter capabilities are limited and where never
> designed to filter _any_ traffic. For example: you are lost if you want to match
> transport layer fields like port number where the underlying IPv{4,6} packet
> is fragmented.

Oh, I am quite aware of the futility in trying to, for example, match up
IPv4 fragments.

There's nothing about my suggestion that is in any way IPv6-specific. I
used IPv6 simply as an example to motivate the suggestion. It could
quite easily apply to any other sort of protocol that uses a linked-list
of headers.

> Furthermore, I doubt that the loop provides any significant advantages. 
> IPv6 extension header parsing is not that straight forward. For example 
> check  the IPSec AH Extension header where the extension header length 
> must interpreted differently because of a IPSec AH protcol defect. I assume
> that a straight forward pcap encoded BPF opcode (composed of jump and load
> instructions) is more efficient as an highly flexible loop construct. 

I'm not sure I follow your logic here.

By my understanding, pcap's IPv6 header parsing filter is a 6-times
statically-unrolled loop, where each loop body has to parse some
headers. I'm already aware that various headers are hard to parse.

Allow me some pseudocode... Currently, pcap has to do the equivalent of:

X = 0
Look at header at [X]; if it's what we want goto 'got'; else load A with
its length.
X += A
Look at header at [X]; if it's what we want goto 'got'; else load A with
its length.
X += A
Look at header at [X]; if it's what we want goto 'got'; else load A with
its length.
X += A
Look at header at [X]; if it's what we want goto 'got'; else load A with
its length.
X += A
Look at header at [X]; if it's what we want goto 'got'; else load A with
its length.
X += A
Look at header at [X]; if it's what we want goto 'got'; else load A with
its length.
X += A
got:
  ... continue with filter.

That "load A with its length" is the IPv6-specific part; I'm not
suggesting that my LOOP suggestion in any way helps that. It's a
difficult problem, sure. What I _am_ suggesting is that this static
unrolling can be avoided, instead becoming:

X = 0
start:
Look at header at [X]; if it's what we want goto 'got'; else load A with
its length.
LOOP to start
got:
  ... continue with filter.

This results also in a shorter program, because there is a hard limit on
the total number of instructions in a filter.

> Last but not least I am interested in a RFC patch as well as a pcap patch (see
> pcap-opt.c). You should not underrate the effort to generate an generic IPv6
> extension header opcode optimizer - without this the newly introduced opcode
> is pointless.

As above; I was under the impression that pcap already -does- contain
code to have a reasonable attempt to hunt down the requested IPv6
header, which is what implements "ipv6 protochain". I'll quote from
pcap-filter(7):

       ip6 protochain protocol
              True  if the packet is IPv6 packet, and contains protocol header
              with type protocol in its protocol header chain.  For example,
                   ip6 protochain 6
              matches any IPv6 packet with TCP protocol header in the protocol
              header  chain.  The packet may contain, for example, authentica‐
              tion  header,  routing  header,  or  hop-by-hop  option  header,
              between  IPv6  header  and  TCP header.  The BPF code emitted by
              this primitive is complex and cannot be  optimized  by  the  BPF
              optimizer code, so this can be somewhat slow.

> PS: the LOOP opcode must be secure against any ressource attack -> the loop
> must be break after n iterations.

Which is -exactly- what it does. I'll quote my original:

       X += A.
       If X < len, jump backwards jt instructions.
       Otherwise, fallthrough to the next instruction
  ...
  The intention of this instruction is to be able to implement a loop in
  which successive iterations advance the index register along the packet
  buffer. By comparing X to the packet length, we can bound the running
  time of the loop instruction, avoiding it locking up the kernel. By
  banning STX instructions within the body of the loop, we can ensure that
  X must be a strictly monotonically increasing sequence. At absolute
  worst, X is increased by 1 each time, meaning at worst the body of the
  loop must execute for every byte in the packet.

Is this sufficiently secure, or do you suggest a further limit is
required?

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03  7:04   ` Paul LeoNerd Evans
@ 2010-08-03  7:18     ` David Miller
  2010-08-03 12:58       ` Andi Kleen
  0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2010-08-03  7:18 UTC (permalink / raw)
  To: leonerd; +Cc: netdev

From: Paul LeoNerd Evans <leonerd@leonerd.org.uk>
Date: Tue, 3 Aug 2010 08:04:27 +0100

> On Mon, Aug 02, 2010 at 10:13:41PM -0700, David Miller wrote:
>> > Any comments on this, while I proceed? Barring any major complaints,
>> > I'll have a hack at some code and present a patch in due course...
>> 
>> We're not adding loop instructions, it's just asking for trouble
>> since any user can attach BPF filters to a socket and it's just
>> way too easy to make a loop endless.
>> 
>> There's a reason no loop primitives were added to the original
>> BPF specification, perhaps you should take a look at what their
>> reasoning was.
> 
> Yes. I am very aware of that.
> 
> Please read carefully my suggestion. These loops cannot be made endless
> - they will be bounded by, at most, the number of bytes in the packet
> buffer. The loop is required to increment X at least 1 at every
> iteration, and will not allow it to continue past the end of the packet.
> This puts a strict bound on the runtime of the loop.

That makes the looping construct largely useless, which I mentioned in
my second reply to this thread.


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03  7:07     ` Paul LeoNerd Evans
@ 2010-08-03  7:19       ` David Miller
  2010-08-03  9:10         ` Hagen Paul Pfeifer
  0 siblings, 1 reply; 26+ messages in thread
From: David Miller @ 2010-08-03  7:19 UTC (permalink / raw)
  To: leonerd; +Cc: netdev, hagen

From: Paul LeoNerd Evans <leonerd@leonerd.org.uk>
Date: Tue, 3 Aug 2010 08:07:10 +0100

> On Mon, Aug 02, 2010 at 10:18:13PM -0700, David Miller wrote:
>> 1) The limiting scheme will make legitimate scripts USELESS
> 
> Rightnow, BPF is all but useless for parsing, say, IPv6. I only pick
> IPv6 as one example, I'm sure there must exist a great number more
> packet-based protocols that use a "linked-list" style approach to
> headers. None of those are currently filterable on the current set of
> instructions. LOOP would allow these.

It's not meant for detailed packet protocol header analysis,
it's for stateless straight line matching of masked values
in packet headers.

Nothing more.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03  5:18   ` David Miller
  2010-08-03  7:07     ` Paul LeoNerd Evans
@ 2010-08-03  9:03     ` Hagen Paul Pfeifer
  1 sibling, 0 replies; 26+ messages in thread
From: Hagen Paul Pfeifer @ 2010-08-03  9:03 UTC (permalink / raw)
  To: David Miller; +Cc: leonerd, netdev


On Mon, 02 Aug 2010 22:18:13 -0700 (PDT), David Miller wrote:



> Oh yeah, what is an iteration in your definition?  See this is why I

> totally refuse to add a looping construct to BPF.

> 

> If you just check for a single loop hitting, the user will just use

> a chaining of two looping constructs.  And then three levels of

> indirection, then four, etc.  He can run up to just before exhasting

> the "iteration limit" of one loop, and branch to the next one, and

> so on and so forth.



I am aware of any problems caused by complex instructions. David, I was

rather curious to see an unrecognized and ground breaking instructions, I

don't wanted to scotch any (possible) improvement.



HGN

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03  7:19       ` David Miller
@ 2010-08-03  9:10         ` Hagen Paul Pfeifer
  2010-08-03 13:40           ` Paul LeoNerd Evans
  0 siblings, 1 reply; 26+ messages in thread
From: Hagen Paul Pfeifer @ 2010-08-03  9:10 UTC (permalink / raw)
  To: David Miller; +Cc: leonerd, netdev


On Tue, 03 Aug 2010 00:19:04 -0700 (PDT), David Miller wrote:

> From: Paul LeoNerd Evans <leonerd@leonerd.org.uk>



>> Rightnow, BPF is all but useless for parsing, say, IPv6. I only pick

>> IPv6 as one example, I'm sure there must exist a great number more

>> packet-based protocols that use a "linked-list" style approach to

>> headers. None of those are currently filterable on the current set of

>> instructions. LOOP would allow these.

> 

> It's not meant for detailed packet protocol header analysis,

> it's for stateless straight line matching of masked values

> in packet headers.



David is right, BPF cannot - and will not - keep with any high level

connection tracking packet filter. There is an processing trade-off between

packet classification and packet storage with post processing analysis.



Hagen

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03  7:18     ` David Miller
@ 2010-08-03 12:58       ` Andi Kleen
  2010-08-03 13:07         ` David Miller
  0 siblings, 1 reply; 26+ messages in thread
From: Andi Kleen @ 2010-08-03 12:58 UTC (permalink / raw)
  To: David Miller; +Cc: leonerd, netdev

David Miller <davem@davemloft.net> writes:
>
> That makes the looping construct largely useless, which I mentioned in
> my second reply to this thread.

How about simply adding a "skip ipv6 extension headers until header type
X" opcode?

I bet that would solve most of the problems here in practice.

-Andi
-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03 12:58       ` Andi Kleen
@ 2010-08-03 13:07         ` David Miller
  2010-08-03 13:34           ` RFC: New BPF " Paul LeoNerd Evans
  2010-08-03 14:05           ` RFC: New BGF " Andi Kleen
  0 siblings, 2 replies; 26+ messages in thread
From: David Miller @ 2010-08-03 13:07 UTC (permalink / raw)
  To: andi; +Cc: leonerd, netdev

From: Andi Kleen <andi@firstfloor.org>
Date: Tue, 03 Aug 2010 14:58:02 +0200

> David Miller <davem@davemloft.net> writes:
>>
>> That makes the looping construct largely useless, which I mentioned in
>> my second reply to this thread.
> 
> How about simply adding a "skip ipv6 extension headers until header type
> X" opcode?
> 
> I bet that would solve most of the problems here in practice.

BPF really should not have protocol specific opcodes.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BPF 'LOOP' instruction
  2010-08-03 13:07         ` David Miller
@ 2010-08-03 13:34           ` Paul LeoNerd Evans
  2010-08-03 13:42             ` Paul LeoNerd Evans
  2010-08-03 14:09             ` Rémi Denis-Courmont
  2010-08-03 14:05           ` RFC: New BGF " Andi Kleen
  1 sibling, 2 replies; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03 13:34 UTC (permalink / raw)
  To: David Miller, netdev; +Cc: andi

[-- Attachment #1: Type: text/plain, Size: 2260 bytes --]

On Tue, Aug 03, 2010 at 06:07:54AM -0700, David Miller wrote:
> > How about simply adding a "skip ipv6 extension headers until header type
> > X" opcode?
> > 
> > I bet that would solve most of the problems here in practice.
> 
> BPF really should not have protocol specific opcodes.

You mean like the LD MSH instruction, the "load a byte, mask by 0x0f then
shift up two bits" one that's specific to fetching an IPv4 header length?

Lets look at this another way around then. Ignore my LOOP instruction
idea.

Already -right now- BPF has the SKF_NET_OFF + SFK_AD_PROTO information.
Lets consider an Ethernet/IP/TCP packet we've received:

[Ethernet header | IP header | TCP header ....]
^                ^
|                |
|                +-- SKF_NET_OFF is here
+-- 0 is here

SKF_AD_PROTO == 0x0800 (IPv4)


What if we added a new constant SKF_TRANS_OFF to store the start address
of the transport header, and a new SKF_AD storage area for the transport
protocol:

[Ethernet header | IP header | TCP header ....]
^                ^           ^
|                |           |
|                |           +--  SKF_TRANS_OFF is here
|                +-- SKF_NET_OFF is here
+-- 0 is here

SKF_AD_PROTO == 0x0800 (IPv4)
SKF_AD_TRANSPROTO == 6 (IPPROTO_TCP)


Now it's easy to see how IPv6 header processing fits into this. No
longer do we have to calculate the length of the IPv6 header, we can
just start off directly looking at the TCP header. I wanted TCP port 80;
no problem:

    LD BYTE[SKF_AD_PROTO]
    JEQ 0x0800, 1, #reject
    JEQ 0x86dd, 0, #reject
    LD BYTE[SKF_AD_TRANSPROTO]
    JEQ 6, 0, #reject
    LD BYTE[SKF_NET_OFF+0]
    JEQ 80, #accept, 0
    LD BYTE[SKF_NET_OFF+2]
    JEQ 80, 0, #reject
  accept:
    LD len
    RET A
  reject:
    RET 0

Hey presto; I've just accepted TCP src or dest port 80 on IPv4 or IPv6
without having any code to actually -parse- IPv4 or '6 headers.


Does this sound workable?

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03  9:10         ` Hagen Paul Pfeifer
@ 2010-08-03 13:40           ` Paul LeoNerd Evans
  0 siblings, 0 replies; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03 13:40 UTC (permalink / raw)
  To: Hagen Paul Pfeifer, netdev; +Cc: David Miller

[-- Attachment #1: Type: text/plain, Size: 1812 bytes --]

On Tue, Aug 03, 2010 at 11:10:28AM +0200, Hagen Paul Pfeifer wrote:
> >> Rightnow, BPF is all but useless for parsing, say, IPv6. I only pick
> >> IPv6 as one example, I'm sure there must exist a great number more
> >> packet-based protocols that use a "linked-list" style approach to
> >> headers. None of those are currently filterable on the current set of
> >> instructions. LOOP would allow these.
> > 
> > It's not meant for detailed packet protocol header analysis,
> > it's for stateless straight line matching of masked values
> > in packet headers.
> 
> David is right, BPF cannot - and will not - keep with any high level
> connection tracking packet filter. There is an processing trade-off between
> packet classification and packet storage with post processing analysis.

This has nothing to do with high-level connection tracking.

I want to accept all (IPv4 or IPv6) TCP packets concerning port 80.
That's all. No connection tracking. Simply a "stateless straight line
matching of masked values in packet headers". Namely, the TCP source or
destination ports, being 80. 

Should BPF be allowed to implement such a filter?

This is the core question.

If yes, then we either need LOOP, or alternatively my SKF_AD_TRANSPROTO
/ SKF_TRANS_OFF idea (see the other thread fork). Without either LOOP or
TRANSPROTO, it becomes next-to-impossible to -find- the TCP header in an
IPv6 packet, and hence make filtering decisions based on it.

If no, please justify what BPF -is- for then, given that right now
applications like tcpdump/libpcap already use it for this very purpose.
Please further justify why BPF has the "LDX MSH" instruction

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BPF 'LOOP' instruction
  2010-08-03 13:34           ` RFC: New BPF " Paul LeoNerd Evans
@ 2010-08-03 13:42             ` Paul LeoNerd Evans
  2010-08-03 14:09             ` Rémi Denis-Courmont
  1 sibling, 0 replies; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03 13:42 UTC (permalink / raw)
  To: netdev; +Cc: andi

[-- Attachment #1: Type: text/plain, Size: 467 bytes --]

On Tue, Aug 03, 2010 at 02:34:43PM +0100, Paul LeoNerd Evans wrote:
>     LD BYTE[SKF_NET_OFF+0]
>     JEQ 80, #accept, 0
>     LD BYTE[SKF_NET_OFF+2]
>     JEQ 80, 0, #reject

My apologies; theabove should be "LD HALF" instructions; TCP ports are
16 bit quantities. Hopefully this mistake does not detract from my
argument...

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03 13:07         ` David Miller
  2010-08-03 13:34           ` RFC: New BPF " Paul LeoNerd Evans
@ 2010-08-03 14:05           ` Andi Kleen
  2010-08-03 14:11             ` Paul LeoNerd Evans
  1 sibling, 1 reply; 26+ messages in thread
From: Andi Kleen @ 2010-08-03 14:05 UTC (permalink / raw)
  To: David Miller; +Cc: andi, leonerd, netdev

On Tue, Aug 03, 2010 at 06:07:54AM -0700, David Miller wrote:
> From: Andi Kleen <andi@firstfloor.org>
> Date: Tue, 03 Aug 2010 14:58:02 +0200
> 
> > David Miller <davem@davemloft.net> writes:
> >>
> >> That makes the looping construct largely useless, which I mentioned in
> >> my second reply to this thread.
> > 
> > How about simply adding a "skip ipv6 extension headers until header type
> > X" opcode?
> > 
> > I bet that would solve most of the problems here in practice.
> 
> BPF really should not have protocol specific opcodes.

Well you could generalize it, like "SKIP headers where length 
is at offset X and type at offset Y" 

-Andi

-- 
ak@linux.intel.com -- Speaking for myself only.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BPF 'LOOP' instruction
  2010-08-03 13:34           ` RFC: New BPF " Paul LeoNerd Evans
  2010-08-03 13:42             ` Paul LeoNerd Evans
@ 2010-08-03 14:09             ` Rémi Denis-Courmont
  2010-08-03 14:13               ` Paul LeoNerd Evans
  1 sibling, 1 reply; 26+ messages in thread
From: Rémi Denis-Courmont @ 2010-08-03 14:09 UTC (permalink / raw)
  To: Paul LeoNerd Evans; +Cc: netdev


On Tue, 3 Aug 2010 14:34:43 +0100, Paul LeoNerd Evans
<leonerd@leonerd.org.uk> wrote:
> What if we added a new constant SKF_TRANS_OFF to store the start address
> of the transport header, and a new SKF_AD storage area for the transport
> protocol:
(...)
> Does this sound workable?

The network header has not been processed by the time the skbuff hits the
packet socket. So the transport header offset is not defined yet.

-- 
Rémi Denis-Courmont
http://www.remlab.net
http://fi.linkedin.com/in/remidenis


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03 14:05           ` RFC: New BGF " Andi Kleen
@ 2010-08-03 14:11             ` Paul LeoNerd Evans
  2010-08-03 14:34               ` Paul LeoNerd Evans
  0 siblings, 1 reply; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03 14:11 UTC (permalink / raw)
  To: Andi Kleen, netdev; +Cc: David Miller

[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]

On Tue, Aug 03, 2010 at 04:05:39PM +0200, Andi Kleen wrote:
> Well you could generalize it, like "SKIP headers where length 
> is at offset X and type at offset Y" 

Except that doesn't work for IPv6.

Some IPv6 headers are implied-length; their length never appears in the
packet. You have to "just know".

Some IPv6 headers store their length somewhere in the header body.
Different headers use different offsets within the body.

Some IPv6 headers do not make their length known on the wire -at all-,
such as IPsec's AH. Only the IPsec endpoints know how long this header
is.

This is what makes IPv6 -really- difficult to actually parse like this.

Ignoring even for a moment the impossible ones (IPsec's AH and ESP), the
rest of the headers end up becoming a giant lookup table, analogous to:

  switch(hdrtype)
  {
     case 1:
       length = someconst; break;
     case 2:
       length = someotherconst; break;
     case 3:
       length = b[someoffset]; break;
     ...
  }

This is why I wanted a LOOP instruction, the above switch code could
then be written -once- in BPF and LOOP'ed over to find the required
header. Instead, that loop must be statically unrolled some number of
times into an n-times-longer program script of less-than-equivalent
power. E.g. tcpdump/libpcap unrolls it a statically-configured 6 times,
meaning if the packet is particularly large, and the header comes 7th,
you'll never see it. My LOOP idea would mean the code is run once for
every header in the packet, regardless how many there were.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BPF 'LOOP' instruction
  2010-08-03 14:09             ` Rémi Denis-Courmont
@ 2010-08-03 14:13               ` Paul LeoNerd Evans
  2010-08-03 14:16                 ` Rémi Denis-Courmont
  0 siblings, 1 reply; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03 14:13 UTC (permalink / raw)
  To: Rémi Denis-Courmont, netdev

[-- Attachment #1: Type: text/plain, Size: 1122 bytes --]

On Tue, Aug 03, 2010 at 04:09:53PM +0200, Rémi Denis-Courmont wrote:
> 
> On Tue, 3 Aug 2010 14:34:43 +0100, Paul LeoNerd Evans
> <leonerd@leonerd.org.uk> wrote:
> > What if we added a new constant SKF_TRANS_OFF to store the start address
> > of the transport header, and a new SKF_AD storage area for the transport
> > protocol:
> (...)
> > Does this sound workable?
> 
> The network header has not been processed by the time the skbuff hits the
> packet socket. So the transport header offset is not defined yet.

Is there any way it could be done lazily, at the moment that filter.c
knows it has to provide either SKF_TRANS_OFF or SKF_AD_TRANSPROTO? It
doesn't even need to do a full parse, just enough to find the length and
protocol type.

That processing -has- to be done one way or another, if BPF is ever to
support filtering on TCP headers in IPv6 in a sane way. Either encode
the algorithm in BPF, or in compiled C code - the latter would be more
performant.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BPF 'LOOP' instruction
  2010-08-03 14:13               ` Paul LeoNerd Evans
@ 2010-08-03 14:16                 ` Rémi Denis-Courmont
  2010-08-03 14:19                   ` Paul LeoNerd Evans
  0 siblings, 1 reply; 26+ messages in thread
From: Rémi Denis-Courmont @ 2010-08-03 14:16 UTC (permalink / raw)
  To: Paul LeoNerd Evans; +Cc: netdev


On Tue, 3 Aug 2010 15:13:13 +0100, Paul LeoNerd Evans
<leonerd@leonerd.org.uk> wrote:
> Is there any way it could be done lazily, at the moment that filter.c
> knows it has to provide either SKF_TRANS_OFF or SKF_AD_TRANSPROTO?

You would essentially need to implement dedicated parsing for each
protocols family.
So you might as well add an opcode dedicated to IPv6...

-- 
Rémi Denis-Courmont
http://www.remlab.net
http://fi.linkedin.com/in/remidenis


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BPF 'LOOP' instruction
  2010-08-03 14:16                 ` Rémi Denis-Courmont
@ 2010-08-03 14:19                   ` Paul LeoNerd Evans
  2010-08-03 15:17                     ` Rémi Denis-Courmont
  0 siblings, 1 reply; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03 14:19 UTC (permalink / raw)
  To: Rémi Denis-Courmont, netdev

[-- Attachment #1: Type: text/plain, Size: 1082 bytes --]

On Tue, Aug 03, 2010 at 04:16:02PM +0200, Rémi Denis-Courmont wrote:
> > Is there any way it could be done lazily, at the moment that filter.c
> > knows it has to provide either SKF_TRANS_OFF or SKF_AD_TRANSPROTO?
> 
> You would essentially need to implement dedicated parsing for each
> protocols family.
> So you might as well add an opcode dedicated to IPv6...

And what happens when IPv8 comes along? Or we want to parse IPX/SPX or
any of those thousands of other network protocols?

I'd prefer to keep the BPF layer (relatively) protocol-neutral. Yes, we
have LDX MSH which with hindsight I'd say looks like a very
protocol-specific instruction. But there's nothing protocol-specific
about asking for where the transport header is and what type it is, no
moreso than asking where the network offset and type are, out of the
link-level header. This just extends that layer model.

How to -implement- it is quite another matter.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BGF 'LOOP' instruction
  2010-08-03 14:11             ` Paul LeoNerd Evans
@ 2010-08-03 14:34               ` Paul LeoNerd Evans
  0 siblings, 0 replies; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03 14:34 UTC (permalink / raw)
  To: Andi Kleen, netdev; +Cc: David Miller

[-- Attachment #1: Type: text/plain, Size: 1287 bytes --]

On Tue, Aug 03, 2010 at 03:11:10PM +0100, Paul LeoNerd Evans wrote:
>   switch(hdrtype)
>   {
>      case 1:
>        length = someconst; break;
>      case 2:
>        length = someotherconst; break;
>      case 3:
>        length = b[someoffset]; break;
>      ...
>   }

Of course, I completely forgot about finding also the offset of the
'next header' from the current header. That results in some code which,
in C, would look like:

  int hdrtype = b[IPv6_nexthdr];

  int x = size_of_IPv6_header;

  while(hdrtype != hdr_wanted) {
       int len;
       switch(hdrtype) {
         case 1:
           len = someconst;
           hdrtype = someotherconst;
           break;
         case 2:
           len = b[x+someoffs];
           hdrtype = b[x+someotheroffs];
           break;
         /* other IPv6 header types here */
       }

       x += len;
  }

You can't compile that idea into BPF without using a scratch memory
cell, because you can't have both 'len' and 'hdrtype' live in A at the
same time, nor can you atomically x += b[x+someoffs].

In short, much much easier if the C code did this part...

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BPF 'LOOP' instruction
  2010-08-03 14:19                   ` Paul LeoNerd Evans
@ 2010-08-03 15:17                     ` Rémi Denis-Courmont
  2010-08-03 15:27                       ` Paul LeoNerd Evans
  0 siblings, 1 reply; 26+ messages in thread
From: Rémi Denis-Courmont @ 2010-08-03 15:17 UTC (permalink / raw)
  To: netdev

Le mardi 3 août 2010 17:19:24 Paul LeoNerd Evans, vous avez écrit :
> On Tue, Aug 03, 2010 at 04:16:02PM +0200, Rémi Denis-Courmont wrote:
> > > Is there any way it could be done lazily, at the moment that filter.c
> > > knows it has to provide either SKF_TRANS_OFF or SKF_AD_TRANSPROTO?
> > 
> > You would essentially need to implement dedicated parsing for each
> > protocols family.
> > So you might as well add an opcode dedicated to IPv6...
> 
> And what happens when IPv8 comes along?
> Or we want to parse IPX/SPX or
> any of those thousands of other network protocols?

It does not work. That's why your SKB_TRANS_OFF proposal sucks totally because 
it is not implementable. On the other hand, n IPv6-specific opcode sucks only 
a little due to its ugliness and lack of forward compatibility.

-- 
Rémi Denis-Courmont
http://www.remlab.net/
http://fi.linkedin.com/in/remidenis

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: RFC: New BPF 'LOOP' instruction
  2010-08-03 15:17                     ` Rémi Denis-Courmont
@ 2010-08-03 15:27                       ` Paul LeoNerd Evans
  0 siblings, 0 replies; 26+ messages in thread
From: Paul LeoNerd Evans @ 2010-08-03 15:27 UTC (permalink / raw)
  To: Rémi Denis-Courmont, netdev

[-- Attachment #1: Type: text/plain, Size: 1134 bytes --]

On Tue, Aug 03, 2010 at 06:17:40PM +0300, Rémi Denis-Courmont wrote:
> > And what happens when IPv8 comes along?
> > Or we want to parse IPX/SPX or
> > any of those thousands of other network protocols?
> 
> It does not work. That's why your SKB_TRANS_OFF proposal sucks totally because 
> it is not implementable. On the other hand, n IPv6-specific opcode sucks only 
> a little due to its ugliness and lack of forward compatibility.

Huh? So now you want to make every BPF program IPv6-specific, so we've
no hope in hell of making them cope with The Next Big Thing? As opposed
to my idea, which makes them neutral on the subject, and puts all the
knowledge of the protocol in the -kernel-, where we can easily implement
new things?

When some brandnew protocol comes long we'd like to filter on, kernel is
going to have to know about it. Which is -exactly- the same as the
current situation with regards SKF_NET_OFF / SKF_AD_PROTO, so I don't
really see what difference that makes.

-- 
Paul "LeoNerd" Evans

leonerd@leonerd.org.uk
ICQ# 4135350       |  Registered Linux# 179460
http://www.leonerd.org.uk/

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 190 bytes --]

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2010-08-03 15:28 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-08-02 11:03 RFC: New BGF 'LOOP' instruction Paul LeoNerd Evans
2010-08-02 11:13 ` RFC: New BPF " Paul LeoNerd Evans
2010-08-02 20:16 ` RFC: New BGF " Hagen Paul Pfeifer
2010-08-03  5:18   ` David Miller
2010-08-03  7:07     ` Paul LeoNerd Evans
2010-08-03  7:19       ` David Miller
2010-08-03  9:10         ` Hagen Paul Pfeifer
2010-08-03 13:40           ` Paul LeoNerd Evans
2010-08-03  9:03     ` Hagen Paul Pfeifer
2010-08-03  7:18   ` RFC: New BPF " Paul LeoNerd Evans
2010-08-03  5:13 ` RFC: New BGF " David Miller
2010-08-03  7:04   ` Paul LeoNerd Evans
2010-08-03  7:18     ` David Miller
2010-08-03 12:58       ` Andi Kleen
2010-08-03 13:07         ` David Miller
2010-08-03 13:34           ` RFC: New BPF " Paul LeoNerd Evans
2010-08-03 13:42             ` Paul LeoNerd Evans
2010-08-03 14:09             ` Rémi Denis-Courmont
2010-08-03 14:13               ` Paul LeoNerd Evans
2010-08-03 14:16                 ` Rémi Denis-Courmont
2010-08-03 14:19                   ` Paul LeoNerd Evans
2010-08-03 15:17                     ` Rémi Denis-Courmont
2010-08-03 15:27                       ` Paul LeoNerd Evans
2010-08-03 14:05           ` RFC: New BGF " Andi Kleen
2010-08-03 14:11             ` Paul LeoNerd Evans
2010-08-03 14:34               ` Paul LeoNerd Evans

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).