From: John Fastabend <john.fastabend@gmail.com>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: davejwatson@fb.com, davem@davemloft.net, daniel@iogearbox.net,
ast@kernel.org, netdev@vger.kernel.org
Subject: Re: [bpf-next PATCH v3 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data
Date: Mon, 19 Mar 2018 22:54:28 -0700 [thread overview]
Message-ID: <b3d504e8-2fc2-e520-f6ce-bbaa72c35037@gmail.com> (raw)
In-Reply-To: <20180319202400.unsb3wjr546ew4sb@ast-mbp.dhcp.thefacebook.com>
On 03/19/2018 01:24 PM, Alexei Starovoitov wrote:
> On Sun, Mar 18, 2018 at 12:57:25PM -0700, John Fastabend wrote:
>> Currently, if a bpf sk msg program is run the program
>> can only parse data that the (start,end) pointers already
>> consumed. For sendmsg hooks this is likely the first
>> scatterlist element. For sendpage this will be the range
>> (0,0) because the data is shared with userspace and by
>> default we want to avoid allowing userspace to modify
>> data while (or after) BPF verdict is being decided.
>>
>> To support pulling in additional bytes for parsing use
>> a new helper bpf_sk_msg_pull(start, end, flags) which
>> works similar to cls tc logic. This helper will attempt
>> to point the data start pointer at 'start' bytes offest
>> into msg and data end pointer at 'end' bytes offset into
>> message.
>>
>> After basic sanity checks to ensure 'start' <= 'end' and
>> 'end' <= msg_length there are a few cases we need to
>> handle.
>>
>> First the sendmsg hook has already copied the data from
>> userspace and has exclusive access to it. Therefor, it
>> is not necessesary to copy the data. However, it may
>> be required. After finding the scatterlist element with
>> 'start' offset byte in it there are two cases. One the
>> range (start,end) is entirely contained in the sg element
>> and is already linear. All that is needed is to update the
>> data pointers, no allocate/copy is needed. The other case
>> is (start, end) crosses sg element boundaries. In this
>> case we allocate a block of size 'end - start' and copy
>> the data to linearize it.
>>
>> Next sendpage hook has not copied any data in initial
>> state so that data pointers are (0,0). In this case we
>> handle it similar to the above sendmsg case except the
>> allocation/copy must always happen. Then when sending
>> the data we have possibly three memory regions that
>> need to be sent, (0, start - 1), (start, end), and
>> (end + 1, msg_length). This is required to ensure any
>> writes by the BPF program are correctly transmitted.
>>
>> Lastly this operation will invalidate any previous
>> data checks so BPF programs will have to revalidate
>> pointers after making this BPF call.
>>
>> Signed-off-by: John Fastabend <john.fastabend@gmail.com>
> ..
>> +
>> + page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy));
>> + if (unlikely(!page))
>> + return -ENOMEM;
>
> I think that's fine. Just curious what order do you see in practice?
At the moment I'm mostly reading headers so this only
happens when a header is split across multiple scatterlist
elements. In these cases a copy size of less than 4k is good
enough.
Some of the nginx configurations I have use a max sendfile
size of 128kb. So these are larger, but unless we look
at the payload we can avoid reading/writing this. If
it becomes commonplace we could look at optimizing it.
Should be doable without changing the user facing API.
>
> Acked-by: Alexei Starovoitov <ast@kernel.org>
>
next prev parent reply other threads:[~2018-03-20 5:54 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-03-18 19:56 [bpf-next PATCH v3 00/18] bpf,sockmap: sendmsg/sendfile ULP John Fastabend
2018-03-18 19:56 ` [bpf-next PATCH v3 01/18] sock: make static tls function alloc_sg generic sock helper John Fastabend
2018-03-18 19:56 ` [bpf-next PATCH v3 02/18] sockmap: convert refcnt to an atomic refcnt John Fastabend
2018-03-18 19:57 ` [bpf-next PATCH v3 03/18] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG John Fastabend
2018-03-18 19:57 ` [bpf-next PATCH v3 04/18] net: generalize sk_alloc_sg to work with scatterlist rings John Fastabend
2018-03-18 19:57 ` [bpf-next PATCH v3 05/18] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data John Fastabend
2018-03-18 20:30 ` David Miller
2018-03-19 16:27 ` Alexei Starovoitov
2018-03-18 19:57 ` [bpf-next PATCH v3 06/18] bpf: sockmap, add bpf_msg_apply_bytes() helper John Fastabend
2018-03-18 20:30 ` David Miller
2018-03-19 16:27 ` Alexei Starovoitov
2018-03-18 19:57 ` [bpf-next PATCH v3 07/18] bpf: sockmap, add msg_cork_bytes() helper John Fastabend
2018-03-18 20:30 ` David Miller
2018-03-19 16:30 ` Alexei Starovoitov
2018-03-19 20:00 ` John Fastabend
2018-03-18 19:57 ` [bpf-next PATCH v3 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data John Fastabend
2018-03-18 20:31 ` David Miller
2018-03-19 20:24 ` Alexei Starovoitov
2018-03-20 5:54 ` John Fastabend [this message]
2018-03-18 19:57 ` [bpf-next PATCH v3 09/18] bpf: add map tests for BPF_PROG_TYPE_SK_MSG John Fastabend
2018-03-18 19:57 ` [bpf-next PATCH v3 10/18] bpf: add verifier " John Fastabend
2018-03-18 19:57 ` [bpf-next PATCH v3 11/18] bpf: sockmap sample, add option to attach SK_MSG program John Fastabend
2018-03-18 19:57 ` [bpf-next PATCH v3 12/18] bpf: sockmap sample, add sendfile test John Fastabend
2018-03-18 19:57 ` [bpf-next PATCH v3 13/18] bpf: sockmap sample, add data verification option John Fastabend
2018-03-18 19:57 ` [bpf-next PATCH v3 14/18] bpf: sockmap, add sample option to test apply_bytes helper John Fastabend
2018-03-18 19:58 ` [bpf-next PATCH v3 15/18] bpf: sockmap sample support for bpf_msg_cork_bytes() John Fastabend
2018-03-18 19:58 ` [bpf-next PATCH v3 16/18] bpf: sockmap add SK_DROP tests John Fastabend
2018-03-18 19:58 ` [bpf-next PATCH v3 17/18] bpf: sockmap sample test for bpf_msg_pull_data John Fastabend
2018-03-18 19:58 ` [bpf-next PATCH v3 18/18] bpf: sockmap test script John Fastabend
2018-03-19 20:20 ` [bpf-next PATCH v3 00/18] bpf,sockmap: sendmsg/sendfile ULP Daniel Borkmann
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=b3d504e8-2fc2-e520-f6ce-bbaa72c35037@gmail.com \
--to=john.fastabend@gmail.com \
--cc=alexei.starovoitov@gmail.com \
--cc=ast@kernel.org \
--cc=daniel@iogearbox.net \
--cc=davejwatson@fb.com \
--cc=davem@davemloft.net \
--cc=netdev@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox