Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data

public inbox for netdev@vger.kernel.org
 help / color / mirror / Atom feed

From: Alexei Starovoitov <alexei.starovoitov@gmail.com>
To: John Fastabend <john.fastabend@gmail.com>
Cc: borkmann@iogearbox.net, ast@kernel.org, netdev@vger.kernel.org,
	kafai@fb.com
Subject: Re: [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data
Date: Tue, 16 Jan 2018 22:20:15 -0800	[thread overview]
Message-ID: <20180117062013.upgi7q5qia6tuawc@ast-mbp> (raw)
In-Reply-To: <1df36069-4b7c-5244-8d49-37ad831a298e@gmail.com>

On Tue, Jan 16, 2018 at 09:49:16PM -0800, John Fastabend wrote:
> 
> > but this program will see only first SG ?
> 
> Correct, to read further into the msg we would need to have a helper
> or some other way to catch reads/writes past the first 4k and read
> the next sg. We have the same limitation in cls_bpf.
> 
> I have started a patch on top of this series with the current working
> title msg_apply_bytes(int bytes). This would let us apply a verdict to
> a set number of bytes instead of the entire msg. By calling
> msg_apply_bytes(data_end - data) a user who needs to read an entire msg
> could do this in 4k chunks until the entire msg is passed through the
> bpf prog.

good idea.
I think would be good to add this helper as part of this patch set
to make sure there is a way for user to look through the whole
tcp stream if the program really wants to.
I understand that program cannot examine every byte anyway
due to lack of loops and helpers, but this part of sockmap api
should still provide an interface from day one.
One example would be the program parsing http2 or similar
where in the header it sees length. Then it can do
msg_apply_bytes(length)
to skip the bytes it processed, but still continue within
the same 64Kbyte chunk when 0 < length < 64k

> > and it's typically going to be one page only ?
> 
> yep
> 
> > then what's the value of waiting for MAX_SKB_FRAGS ?
> > 
> Its not waiting for MAX_SKB_FRAGS its simple copying up to MAX_SKB_FRAGS
> pages in one call if possible. It seems better to me to run this loop
> over as much data as we can.

agree on trying to do MAX_SKB_FRAGS as a 'processing unit',
but program should still be able to skip or redirect 
parts of the bytes and not the whole 64k chunk.
>From program point of view it should never see or worry about
SG list boundaries whereas right now it seems that below code
is dealing with them (though program doesn't know where sg ends):

> +
> +		switch (eval) {
> +		case __SK_PASS:
> +			sg_mark_end(sg + sg_num - 1);
> +			err = bpf_tcp_push(sk, sg, &sg_curr, flags, true);
> +			if (unlikely(err)) {
> +				copied -= free_sg(sk, sg, sg_curr, sg_num);
> +				goto out_err;
> +			}
> +			break;
> +		case __SK_REDIRECT:
> +			sg_mark_end(sg + sg_num - 1);
> +			goto do_redir;
...
> >> +static int bpf_tcp_ulp_register(void)
> >> +{
> >> +	tcp_bpf_proto = tcp_prot;
> >> +	tcp_bpf_proto.sendmsg = bpf_tcp_sendmsg;
> >> +	tcp_bpf_proto.sendpage = bpf_tcp_sendpage;
> >> +	return tcp_register_ulp(&bpf_tcp_ulp_ops);
> > 
> > I don't see corresponding tcp_unregister_ulp().
> > 
> 
> There is none. tcp_register_ulp() adds the bpf_tcp_ulp to the list of
> available ULPs and never removes it. To remove it we would have to
> keep a ref count on the reg/unreg calls. This would require a couple
> more patches to the ULP infra and not sure it hurts to leave the ULP
> reference around...
> 
> Maybe a follow on patch? Or else it could be a patch in front of this
> patch.

I see. I'm ok with leaving that for latter.
It doesn't hurt to keep it registered. Please add a comment though.

next prev parent reply	other threads:[~2018-01-17  6:20 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-01-12 18:09 [bpf-next PATCH 0/7] Add BPF_PROG_TYPE_SK_MSG and attach pt John Fastabend
2018-01-12 18:10 ` [bpf-next PATCH 1/7] net: add a UID to use for ULP socket assignment John Fastabend
2018-01-12 18:10 ` [bpf-next PATCH 2/7] sock: make static tls function alloc_sg generic sock helper John Fastabend
2018-01-12 18:10 ` [bpf-next PATCH 3/7] sockmap: convert refcnt to an atomic refcnt John Fastabend
2018-01-12 18:10 ` [bpf-next PATCH 4/7] net: do_tcp_sendpages flag to avoid SKBTX_SHARED_FRAG John Fastabend
2018-01-12 20:10   ` Eric Dumazet
2018-01-12 20:26     ` John Fastabend
2018-01-12 20:46       ` Eric Dumazet
2018-01-12 21:11         ` John Fastabend
2018-01-12 18:11 ` [bpf-next PATCH 5/7] bpf: create tcp_bpf_ulp allowing BPF to monitor socket TX/RX data John Fastabend
2018-01-17  2:25   ` Alexei Starovoitov
2018-01-17  5:49     ` John Fastabend
2018-01-17  6:20       ` Alexei Starovoitov [this message]
2018-01-17 20:32         ` John Fastabend
2018-01-17 22:04   ` Martin KaFai Lau
2018-01-18 17:27     ` John Fastabend
2018-01-12 18:11 ` [bpf-next PATCH 6/7] bpf: add map tests for BPF_PROG_TYPE_SK_MSG John Fastabend
2018-01-12 18:11 ` [bpf-next PATCH 7/7] bpf: add verifier " John Fastabend

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180117062013.upgi7q5qia6tuawc@ast-mbp \
    --to=alexei.starovoitov@gmail.com \
    --cc=ast@kernel.org \
    --cc=borkmann@iogearbox.net \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox