From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-pf0-f193.google.com ([209.85.192.193]:35569 "EHLO mail-pf0-f193.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751320AbeCTFyp (ORCPT ); Tue, 20 Mar 2018 01:54:45 -0400 Received: by mail-pf0-f193.google.com with SMTP id y186so255788pfb.2 for ; Mon, 19 Mar 2018 22:54:44 -0700 (PDT) Subject: Re: [bpf-next PATCH v3 08/18] bpf: sk_msg program helper bpf_sk_msg_pull_data To: Alexei Starovoitov Cc: davejwatson@fb.com, davem@davemloft.net, daniel@iogearbox.net, ast@kernel.org, netdev@vger.kernel.org References: <20180318195501.14466.25366.stgit@john-Precision-Tower-5810> <20180318195725.14466.97172.stgit@john-Precision-Tower-5810> <20180319202400.unsb3wjr546ew4sb@ast-mbp.dhcp.thefacebook.com> From: John Fastabend Message-ID: Date: Mon, 19 Mar 2018 22:54:28 -0700 MIME-Version: 1.0 In-Reply-To: <20180319202400.unsb3wjr546ew4sb@ast-mbp.dhcp.thefacebook.com> Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 7bit Sender: netdev-owner@vger.kernel.org List-ID: On 03/19/2018 01:24 PM, Alexei Starovoitov wrote: > On Sun, Mar 18, 2018 at 12:57:25PM -0700, John Fastabend wrote: >> Currently, if a bpf sk msg program is run the program >> can only parse data that the (start,end) pointers already >> consumed. For sendmsg hooks this is likely the first >> scatterlist element. For sendpage this will be the range >> (0,0) because the data is shared with userspace and by >> default we want to avoid allowing userspace to modify >> data while (or after) BPF verdict is being decided. >> >> To support pulling in additional bytes for parsing use >> a new helper bpf_sk_msg_pull(start, end, flags) which >> works similar to cls tc logic. This helper will attempt >> to point the data start pointer at 'start' bytes offest >> into msg and data end pointer at 'end' bytes offset into >> message. >> >> After basic sanity checks to ensure 'start' <= 'end' and >> 'end' <= msg_length there are a few cases we need to >> handle. >> >> First the sendmsg hook has already copied the data from >> userspace and has exclusive access to it. Therefor, it >> is not necessesary to copy the data. However, it may >> be required. After finding the scatterlist element with >> 'start' offset byte in it there are two cases. One the >> range (start,end) is entirely contained in the sg element >> and is already linear. All that is needed is to update the >> data pointers, no allocate/copy is needed. The other case >> is (start, end) crosses sg element boundaries. In this >> case we allocate a block of size 'end - start' and copy >> the data to linearize it. >> >> Next sendpage hook has not copied any data in initial >> state so that data pointers are (0,0). In this case we >> handle it similar to the above sendmsg case except the >> allocation/copy must always happen. Then when sending >> the data we have possibly three memory regions that >> need to be sent, (0, start - 1), (start, end), and >> (end + 1, msg_length). This is required to ensure any >> writes by the BPF program are correctly transmitted. >> >> Lastly this operation will invalidate any previous >> data checks so BPF programs will have to revalidate >> pointers after making this BPF call. >> >> Signed-off-by: John Fastabend > .. >> + >> + page = alloc_pages(__GFP_NOWARN | GFP_ATOMIC, get_order(copy)); >> + if (unlikely(!page)) >> + return -ENOMEM; > > I think that's fine. Just curious what order do you see in practice? At the moment I'm mostly reading headers so this only happens when a header is split across multiple scatterlist elements. In these cases a copy size of less than 4k is good enough. Some of the nginx configurations I have use a max sendfile size of 128kb. So these are larger, but unless we look at the payload we can avoid reading/writing this. If it becomes commonplace we could look at optimizing it. Should be doable without changing the user facing API. > > Acked-by: Alexei Starovoitov >