netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Yonghong Song <yhs@fb.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: Andrii Nakryiko <andriin@fb.com>, bpf <bpf@vger.kernel.org>,
	Martin KaFai Lau <kafai@fb.com>,
	Networking <netdev@vger.kernel.org>,
	Alexei Starovoitov <ast@fb.com>,
	Daniel Borkmann <daniel@iogearbox.net>,
	Kernel Team <kernel-team@fb.com>
Subject: Re: [PATCH bpf-next v3 05/21] bpf: implement bpf_seq_read() for bpf iterator
Date: Fri, 8 May 2020 18:41:03 -0700	[thread overview]
Message-ID: <62858d10-0200-592f-1bf4-e97f462a9c68@fb.com> (raw)
In-Reply-To: <CAEf4BzZ_TnCdvTucUpr1CRiGqnf7GZfdyXmszToTTLYyQxbk4Q@mail.gmail.com>



On 5/8/20 11:52 AM, Andrii Nakryiko wrote:
> On Wed, May 6, 2020 at 10:39 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> bpf iterator uses seq_file to provide a lossless
>> way to transfer data to user space. But we want to call
>> bpf program after all objects have been traversed, and
>> bpf program may write additional data to the
>> seq_file buffer. The current seq_read() does not work
>> for this use case.
>>
>> Besides allowing stop() function to write to the buffer,
>> the bpf_seq_read() also fixed the buffer size to one page.
>> If any single call of show() or stop() will emit data
>> more than one page to cause overflow, -E2BIG error code
>> will be returned to user space.
>>
>> Signed-off-by: Yonghong Song <yhs@fb.com>
>> ---
> 
> This loop is much simpler and more streamlined now, thanks a lot! I
> think it's correct, see below about one confusing (but apparently
> correct) bit, though. Either way:
> 
> Acked-by: Andrii Nakryiko <andriin@fb.com>
> 
>>   kernel/bpf/bpf_iter.c | 118 ++++++++++++++++++++++++++++++++++++++++++
>>   1 file changed, 118 insertions(+)
>>
>> diff --git a/kernel/bpf/bpf_iter.c b/kernel/bpf/bpf_iter.c
>> index 0542a243b78c..f198597b0ea4 100644
>> --- a/kernel/bpf/bpf_iter.c
>> +++ b/kernel/bpf/bpf_iter.c
>> @@ -26,6 +26,124 @@ static DEFINE_MUTEX(targets_mutex);
>>   /* protect bpf_iter_link changes */
>>   static DEFINE_MUTEX(link_mutex);
>>
>> +/* bpf_seq_read, a customized and simpler version for bpf iterator.
>> + * no_llseek is assumed for this file.
>> + * The following are differences from seq_read():
>> + *  . fixed buffer size (PAGE_SIZE)
>> + *  . assuming no_llseek
>> + *  . stop() may call bpf program, handling potential overflow there
>> + */
>> +static ssize_t bpf_seq_read(struct file *file, char __user *buf, size_t size,
>> +                           loff_t *ppos)
>> +{
>> +       struct seq_file *seq = file->private_data;
>> +       size_t n, offs, copied = 0;
>> +       int err = 0;
>> +       void *p;
>> +
>> +       mutex_lock(&seq->lock);
>> +
>> +       if (!seq->buf) {
>> +               seq->size = PAGE_SIZE;
>> +               seq->buf = kmalloc(seq->size, GFP_KERNEL);
>> +               if (!seq->buf) {
>> +                       err = -ENOMEM;
>> +                       goto done;
> 
> oh, thank you for converting to all lower-case label names! :)
> 
>> +               }
>> +       }
>> +
>> +       if (seq->count) {
>> +               n = min(seq->count, size);
>> +               err = copy_to_user(buf, seq->buf + seq->from, n);
>> +               if (err) {
>> +                       err = -EFAULT;
>> +                       goto done;
>> +               }
>> +               seq->count -= n;
>> +               seq->from += n;
>> +               copied = n;
>> +               goto done;
>> +       }
>> +
>> +       seq->from = 0;
>> +       p = seq->op->start(seq, &seq->index);
>> +       if (IS_ERR_OR_NULL(p))
>> +               goto stop;
> 
> if start() returns IS_ERR(p), stop(p) below won't produce any output
> (because BPF program is called only for p == NULL), so we'll just
> return 0 with no error, do I interpret the code correctly? I think
> seq_file's read actually returns PTR_ERR(p) as a result in this case.
> 
> so I think you need err = PTR_ERR(p); before goto stop here?

Thanks for catching this!
Yes, seq_file() indeed returns PTR_ERR(p) to user space here.
Will make the change.

> 
>> +
>> +       err = seq->op->show(seq, p);
>> +       if (err > 0) {
>> +               seq->count = 0;
>> +       } else if (err < 0 || seq_has_overflowed(seq)) {
>> +               if (!err)
>> +                       err = -E2BIG;
>> +               seq->count = 0;
>> +               seq->op->stop(seq, p);
>> +               goto done;
>> +       }
>> +
>> +       while (1) {
>> +               loff_t pos = seq->index;
>> +
>> +               offs = seq->count;
>> +               p = seq->op->next(seq, p, &seq->index);
>> +               if (pos == seq->index) {
>> +                       pr_info_ratelimited("buggy seq_file .next function %ps "
>> +                               "did not updated position index\n",
>> +                               seq->op->next);
>> +                       seq->index++;
>> +               }
>> +
>> +               if (IS_ERR_OR_NULL(p)) {
>> +                       err = PTR_ERR(p);
>> +                       break;
>> +               }
>> +               if (seq->count >= size)
>> +                       break;
>> +
>> +               err = seq->op->show(seq, p);
>> +               if (err > 0) {
>> +                       seq->count = offs;
>> +               } else if (err < 0 || seq_has_overflowed(seq)) {
>> +                       seq->count = offs;
>> +                       if (!err)
>> +                               err = -E2BIG;
> 
> nit: this -E2BIG is set unconditionally even for 2nd+ show(). This
> will work, because it will get ignored on next iteration, but I think
> it will be much more obvious if written as:
> 
> if (!err && offs = 0)
>      err = -E2BIG;

Yes, will make the change since it indeed makes code more readable.

> 
> It took me few re-readings of the code I'm pretty familiar with
> already to realize that this is ok.
> 
> I had to write the below piece to realize that this is fine :) Just
> leaving here just in case you find it useful:
> 
> else if (err < 0 || seq_has_overflowed(seq)) {
>      if (!err && offs == 0) /* overflow in first show() output */
>          err = -E2BIG;
>      if (err) {             /* overflow in first show() or real error happened */
>          seq->count = 0; /* not strictly necessary, but shows that we
> are truncating output */
>          seq->op->stop(seq, p);
>          goto done; /* done will return err */
>      }
>      /* no error and overflow for 2nd+ show(), roll back output and stop */
>      seq->count = offs;
>      break;
> }
> 
>> +                       if (offs == 0) {
>> +                               seq->op->stop(seq, p);
>> +                               goto done;
>> +                       }
>> +                       break;
>> +               }
>> +       }
>> +stop:
>> +       offs = seq->count;
>> +       /* bpf program called if !p */
>> +       seq->op->stop(seq, p);
>> +       if (!p && seq_has_overflowed(seq)) {
>> +               seq->count = offs;
>> +               if (offs == 0) {
>> +                       err = -E2BIG;
>> +                       goto done;
>> +               }
>> +       }
>> +
>> +       n = min(seq->count, size);
>> +       err = copy_to_user(buf, seq->buf, n);
>> +       if (err) {
>> +               err = -EFAULT;
>> +               goto done;
>> +       }
>> +       copied = n;
>> +       seq->count -= n;
>> +       seq->from = n;
>> +done:
>> +       if (!copied)
>> +               copied = err;
>> +       else
>> +               *ppos += copied;
>> +       mutex_unlock(&seq->lock);
>> +       return copied;
>> +}
>> +
>>   int bpf_iter_reg_target(struct bpf_iter_reg *reg_info)
>>   {
>>          struct bpf_iter_target_info *tinfo;
>> --
>> 2.24.1
>>

  reply	other threads:[~2020-05-09  1:41 UTC|newest]

Thread overview: 45+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-05-07  5:39 [PATCH bpf-next v3 00/21] bpf: implement bpf iterator for kernel data Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 01/21] bpf: implement an interface to register bpf_iter targets Yonghong Song
2020-05-08 18:18   ` Andrii Nakryiko
2020-05-07  5:39 ` [PATCH bpf-next v3 02/21] bpf: allow loading of a bpf_iter program Yonghong Song
2020-05-08 18:20   ` Andrii Nakryiko
2020-05-07  5:39 ` [PATCH bpf-next v3 03/21] bpf: support bpf tracing/iter programs for BPF_LINK_CREATE Yonghong Song
2020-05-08 18:24   ` Andrii Nakryiko
2020-05-09  1:36     ` Yonghong Song
2020-05-12  3:15       ` Andrii Nakryiko
2020-05-13 16:57         ` Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 04/21] bpf: support bpf tracing/iter programs for BPF_LINK_UPDATE Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 05/21] bpf: implement bpf_seq_read() for bpf iterator Yonghong Song
2020-05-08 18:52   ` Andrii Nakryiko
2020-05-09  1:41     ` Yonghong Song [this message]
2020-05-07  5:39 ` [PATCH bpf-next v3 06/21] bpf: create anonymous " Yonghong Song
2020-05-08 18:57   ` Andrii Nakryiko
2020-05-07  5:39 ` [PATCH bpf-next v3 07/21] bpf: create file " Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 08/21] bpf: implement common macros/helpers for target iterators Yonghong Song
2020-05-08 19:07   ` Andrii Nakryiko
2020-05-09  3:18     ` Yonghong Song
2020-05-12  3:16       ` Andrii Nakryiko
2020-05-07  5:39 ` [PATCH bpf-next v3 09/21] bpf: add bpf_map iterator Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 10/21] net: bpf: add netlink and ipv6_route bpf_iter targets Yonghong Song
2020-05-08 19:17   ` Andrii Nakryiko
2020-05-07  5:39 ` [PATCH bpf-next v3 11/21] bpf: add task and task/file iterator targets Yonghong Song
2020-05-08 19:36   ` Andrii Nakryiko
2020-05-07  5:39 ` [PATCH bpf-next v3 12/21] bpf: add PTR_TO_BTF_ID_OR_NULL support Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 13/21] bpf: add bpf_seq_printf and bpf_seq_write helpers Yonghong Song
2020-05-08 19:44   ` Andrii Nakryiko
2020-05-09  4:18     ` Yonghong Song
2020-05-09  5:30       ` Alexei Starovoitov
2020-05-09  6:04         ` Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 14/21] bpf: handle spilled PTR_TO_BTF_ID properly when checking stack_boundary Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 15/21] bpf: support variable length array in tracing programs Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 16/21] tools/libbpf: add bpf_iter support Yonghong Song
2020-05-08 19:46   ` Andrii Nakryiko
2020-05-07  5:39 ` [PATCH bpf-next v3 17/21] tools/libpf: add offsetof/container_of macro in bpf_helpers.h Yonghong Song
2020-05-08 19:48   ` Andrii Nakryiko
2020-05-07  5:39 ` [PATCH bpf-next v3 18/21] tools/bpftool: add bpf_iter support for bptool Yonghong Song
2020-05-08 19:51   ` Andrii Nakryiko
2020-05-09  5:26     ` Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 19/21] tools/bpf: selftests: add iterator programs for ipv6_route and netlink Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 20/21] tools/bpf: selftests: add iter progs for bpf_map/task/task_file Yonghong Song
2020-05-07  5:39 ` [PATCH bpf-next v3 21/21] tools/bpf: selftests: add bpf_iter selftests Yonghong Song
2020-05-08 19:57   ` Andrii Nakryiko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=62858d10-0200-592f-1bf4-e97f462a9c68@fb.com \
    --to=yhs@fb.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andriin@fb.com \
    --cc=ast@fb.com \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kafai@fb.com \
    --cc=kernel-team@fb.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).