From mboxrd@z Thu Jan 1 00:00:00 1970 From: John Fastabend Subject: Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data Date: Wed, 17 Jan 2018 12:32:03 -0800 Message-ID: <80c4b9fb-7938-782f-6b5f-012db04c41e6@gmail.com> References: <20180112175029.21531.54693.stgit@john-Precision-Tower-5810> <20180112181111.21531.25665.stgit@john-Precision-Tower-5810> <20180117022545.7c2tlipmugcfeyme@ast-mbp> <1df36069-4b7c-5244-8d49-37ad831a298e@gmail.com> <20180117062013.upgi7q5qia6tuawc@ast-mbp> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit Cc: borkmann@iogearbox.net, ast@kernel.org, netdev@vger.kernel.org, kafai@fb.com To: Alexei Starovoitov Return-path: Received: from mail-ot0-f171.google.com ([74.125.82.171]:42854 "EHLO mail-ot0-f171.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1752578AbeAQUcG (ORCPT ); Wed, 17 Jan 2018 15:32:06 -0500 Received: by mail-ot0-f171.google.com with SMTP id s3so18014169otc.9 for ; Wed, 17 Jan 2018 12:32:06 -0800 (PST) In-Reply-To: <20180117062013.upgi7q5qia6tuawc@ast-mbp> Content-Language: en-US Sender: netdev-owner@vger.kernel.org List-ID: On 01/16/2018 10:20 PM, Alexei Starovoitov wrote: > On Tue, Jan 16, 2018 at 09:49:16PM -0800, John Fastabend wrote: >> >>> but this program will see only first SG ? >> >> Correct, to read further into the msg we would need to have a helper >> or some other way to catch reads/writes past the first 4k and read >> the next sg. We have the same limitation in cls_bpf. >> >> I have started a patch on top of this series with the current working >> title msg_apply_bytes(int bytes). This would let us apply a verdict to >> a set number of bytes instead of the entire msg. By calling >> msg_apply_bytes(data_end - data) a user who needs to read an entire msg >> could do this in 4k chunks until the entire msg is passed through the >> bpf prog. > > good idea. > I think would be good to add this helper as part of this patch set > to make sure there is a way for user to look through the whole > tcp stream if the program really wants to. Sure I'll add it to this series. > I understand that program cannot examine every byte anyway > due to lack of loops and helpers, but this part of sockmap api > should still provide an interface from day one. > One example would be the program parsing http2 or similar > where in the header it sees length. Then it can do > msg_apply_bytes(length) > to skip the bytes it processed, but still continue within > the same 64Kbyte chunk when 0 < length < 64k > Yep, this is my use case. >>> and it's typically going to be one page only ? >> >> yep >> >>> then what's the value of waiting for MAX_SKB_FRAGS ? >>> >> Its not waiting for MAX_SKB_FRAGS its simple copying up to MAX_SKB_FRAGS >> pages in one call if possible. It seems better to me to run this loop >> over as much data as we can. > > agree on trying to do MAX_SKB_FRAGS as a 'processing unit', > but program should still be able to skip or redirect > parts of the bytes and not the whole 64k chunk. > From program point of view it should never see or worry about > SG list boundaries whereas right now it seems that below code > is dealing with them (though program doesn't know where sg ends): > The apply_bytes() helper allows for skip and redirect of parts of the bytes and not the whole 64k chunk. >> + >> + switch (eval) { >> + case __SK_PASS: >> + sg_mark_end(sg + sg_num - 1); All this does is tell the TCP sendpage call that after this sg entry we don't have anymore data so go ahead and flush it. >> + err = bpf_tcp_push(sk, sg, &sg_curr, flags, true); >> + if (unlikely(err)) { >> + copied -= free_sg(sk, sg, sg_curr, sg_num); >> + goto out_err; >> + } >> + break; >> + case __SK_REDIRECT: >> + sg_mark_end(sg + sg_num - 1); >> + goto do_redir; > ... >>>> +static int bpf_tcp_ulp_register(void) >>>> +{ >>>> + tcp_bpf_proto = tcp_prot; >>>> + tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg; >>>> + tcp_bpf_proto.sendpage = bpf_tcp_sendpage; >>>> + return tcp_register_ulp(&bpf_tcp_ulp_ops); >>> >>> I don't see corresponding tcp_unregister_ulp(). >>> >> >> There is none. tcp_register_ulp() adds the bpf_tcp_ulp to the list of >> available ULPs and never removes it. To remove it we would have to >> keep a ref count on the reg/unreg calls. This would require a couple >> more patches to the ULP infra and not sure it hurts to leave the ULP >> reference around... >> >> Maybe a follow on patch? Or else it could be a patch in front of this >> patch. > > I see. I'm ok with leaving that for latter. > It doesn't hurt to keep it registered. Please add a comment though. > OK will add comment.