Re: [bpf-next PATCH 05/16] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data

netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed

From: John Fastabend <john.fastabend@gmail.com>
To: David Miller <davem@davemloft.net>
Cc: ast@kernel.org, daniel@iogearbox.net, netdev@vger.kernel.org,
	davejwatson@fb.com
Subject: Re: [bpf-next PATCH 05/16] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
Date: Tue, 6 Mar 2018 10:18:12 -0800	[thread overview]
Message-ID: <f30d24e5-5578-4804-d425-37f26f819972@gmail.com> (raw)
In-Reply-To: <20180306.104735.201608464583585143.davem@davemloft.net>

On 03/06/2018 07:47 AM, David Miller wrote:
> From: John Fastabend <john.fastabend@gmail.com>
> Date: Mon, 5 Mar 2018 23:06:01 -0800
> 
>> On 03/05/2018 10:42 PM, David Miller wrote:
>>> From: John Fastabend <john.fastabend@gmail.com>
>>> Date: Mon, 5 Mar 2018 22:22:21 -0800
>>>
>>>> All I meant by this is if an application uses sendfile() call
>>>> there is no good way to know when/if the kernel side will copy or
>>>> xmit the  data. So a reliable user space application will need to
>>>> only modify the data if it "knows" there are no outstanding sends
>>>> in-flight. So if we assume applications follow this then it
>>>> is OK to avoid the copy. Of course this is not good enough for
>>>> security, but for monitoring/statistics (my use case 1 it works).
>>>
>>> For an application implementing a networking file system, it's pretty
>>> legitimate for file contents to change before the page gets DMA's to
>>> the networking card.
>>>
>>
>> Still there are useful BPF programs that can tolerate this. So I
>> would prefer to allow BPF programs to operate in the no-copy mode
>> if wanted. It doesn't have to be the default though as it currently
>> is. A l7 load balancer is a good example of this.
> 
> Maybe I'd be ok if it were not the default.  But do you really want to
> expose a potential attack vector, even if the app gets to choose and
> say "I'm ok"?
> 

Yes, because I have use cases where I don't need to read the data, but
have already "approved" the data. One example applications like
nginx can serve static http data. Just reading over the code what they
do, when sendfile is enabled, is a sendmsg call with the header. We want
to enforce the policy on the header. Then we know the next N bytes are
OK. Nginx will then send the payload over sendfile syscall. We already
know the data is good from initial sendmsg call the next N bytes can
get the verdict SK_PASS without even touching the data. If we do a
copy in this case we see significant performance degradation.

The other use case is the L7 load balancer mentioned above. If we are
using RR policies or some other heuristic if the user modifies the
payload after the BPF verdict that is also fine. A malicious user
could rewrite the header and try to game the load balancer but the
BPF program can always just dev/null (SK_DROP) the application when
it detects this. This also assumes the load balancer is using the
header for its heuristic some interesting heuristics may not use
the header at all.

>>> And that's perfectly fine, and we everything such that this will work
>>> properly.
>>>
>>> The card checksums what ends up being DMA'd so nothing from the
>>> networking side is broken.
>>
>> Assuming the card has checksum support correct? Which is why we have
>> the SKBTX_SHARED_FRAG checked in skb_has_shared_frag() and the checksum
>> helpers called by the drivers when they do not support the protocol
>> being used. So probably OK assumption if using supported protocols and
>> hardware? Perhaps in general folks just use normal protocols and
>> hardware so it works.
> 
> If the hardware doesn't support the checksums, we linearize the SKB
> (therefore obtain a snapshot of the data), and checksum.  Exactly what
> would happen if the hardware did the checksum.
> 
> So OK in that case too.
> 
> We always guarantee that you will always get a correct checksum on
> outgoing packets, even if you modify the page contents meanwhile.
> 

Agreed the checksum is correct, but the user doesn't know if the linearize
happened while it was modifying the data, potentially creating data with
a partial update. Because the user modifying the data doesn't block the
linearize operation in the kernel and vice versa the linearize operation
can happen in parallel with the user side data modification. So maybe
I'm still missing something but it seems the data can be in some unknown
state on the wire.

Either way though I think its fine to make the default sendpage hook do
the copy. A flag to avoid the copy can be added later to resolve my use
cases above. I'll code this up in a v2 today/tomorrow.

>> So the "I need at least X more bytes" is the msg_cork_bytes() in patch
>> 7. I could handle the sendpage case the same as I handle the sendmsg
>> case and copy the data into the buffer until N bytes are received. I
>> had planned to add this mode in a follow up series but could add it in
>> this series so we have all the pieces in one submission.
>>
>> Although I used a scatterlist instead of a linear buffer. I was
>> planning to add a helper to pull in next sg list item if needed
>> rather than try to allocate a large linear block up front.
> 
> For non-deep packet inspection cases this re-running of the parser case
> will probably not trigger at all.
> 

Agreed, its mostly there to handle cases where the sendmsg call
only sent part of a application (kafka, http, etc) header. This can
happen if user is sending multiple messages in a single sendmsg/sendfile
call. But, yeah I see it rarely in practice its mostly there for
completeness and to handle these edge cases.

next prev parent reply	other threads:[~2018-03-06 18:18 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-05 19:50 [bpf-next PATCH 00/16] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
2018-03-05 19:51 ` [bpf-next PATCH 01/16] sock: make static tls function alloc_sg generic sock helper John Fastabend
2018-03-05 21:32   ` David Miller
2018-03-05 19:51 ` [bpf-next PATCH 02/16] sockmap: convert refcnt to an atomic refcnt John Fastabend
2018-03-05 21:34   ` David Miller
2018-03-05 19:51 ` [bpf-next PATCH 03/16] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG John Fastabend
2018-03-05 19:51 ` [bpf-next PATCH 04/16] net: generalize sk_alloc_sg to work with scatterlist rings John Fastabend
2018-03-05 21:35   ` David Miller
2018-03-05 19:51 ` [bpf-next PATCH 05/16] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data John Fastabend
2018-03-05 21:40   ` David Miller
2018-03-05 22:53     ` John Fastabend
2018-03-06  5:42       ` David Miller
2018-03-06  6:22         ` John Fastabend
2018-03-06  6:42           ` David Miller
2018-03-06  7:06             ` John Fastabend
2018-03-06 15:47               ` David Miller
2018-03-06 18:18                 ` John Fastabend [this message]
2018-03-07  3:25                   ` John Fastabend
2018-03-07  4:41                     ` David Miller
2018-03-07 13:03                     ` Daniel Borkmann
2018-03-05 19:51 ` [bpf-next PATCH 06/16] bpf: sockmap, add bpf_msg_apply_bytes() helper John Fastabend
2018-03-05 19:51 ` [bpf-next PATCH 07/16] bpf: sockmap, add msg_cork_bytes() helper John Fastabend
2018-03-05 19:51 ` [bpf-next PATCH 08/16] bpf: add map tests for BPF_PROG_TYPE_SK_MSG John Fastabend
2018-03-05 19:51 ` [bpf-next PATCH 09/16] bpf: add verifier " John Fastabend
2018-03-05 19:51 ` [bpf-next PATCH 10/16] bpf: sockmap sample, add option to attach SK_MSG program John Fastabend
2018-03-05 19:51 ` [bpf-next PATCH 11/16] bpf: sockmap sample, add sendfile test John Fastabend
2018-03-05 19:51 ` [bpf-next PATCH 12/16] bpf: sockmap sample, add data verification option John Fastabend
2018-03-05 19:52 ` [bpf-next PATCH 13/16] bpf: sockmap, add sample option to test apply_bytes helper John Fastabend
2018-03-05 19:52 ` [bpf-next PATCH 14/16] bpf: sockmap sample support for bpf_msg_cork_bytes() John Fastabend
2018-03-05 19:52 ` [bpf-next PATCH 15/16] sockmap: add SK_DROP tests John Fastabend
2018-03-05 19:52 ` [bpf-next PATCH 16/16] bpf: sockmap test script John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f30d24e5-5578-4804-d425-37f26f819972@gmail.com \
    --to=john.fastabend@gmail.com \
    --cc=ast@kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=davejwatson@fb.com \
    --cc=davem@davemloft.net \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).