BPF List
 help / color / mirror / Atom feed
From: Philo Lu <lulie@linux.alibaba.com>
To: Andrii Nakryiko <andrii.nakryiko@gmail.com>
Cc: bpf@vger.kernel.org, song@kernel.org, andrii@kernel.org,
	ast@kernel.org, Daniel Borkmann <daniel@iogearbox.net>,
	xuanzhuo@linux.alibaba.com, dust.li@linux.alibaba.com,
	guwen@linux.alibaba.com, alibuda@linux.alibaba.com,
	hengqi@linux.alibaba.com, Nathan Slingerland <slinger@meta.com>,
	"rihams@meta.com" <rihams@meta.com>,
	Alan Maguire <alan.maguire@oracle.com>
Subject: Re: Question about bpf perfbuf/ringbuf: pinned in backend with overwriting
Date: Fri, 15 Dec 2023 18:10:51 +0800	[thread overview]
Message-ID: <23bcab0e-bec1-4edd-b45a-0142ebcda41a@linux.alibaba.com> (raw)
In-Reply-To: <CAEf4BzaQv23wzgmmoSFBja7Syp3m3fRrfzWkFobQ4NNisDTEyA@mail.gmail.com>



On 2023/12/14 07:35, Andrii Nakryiko wrote:
> On Mon, Dec 11, 2023 at 4:39 AM Philo Lu <lulie@linux.alibaba.com> wrote:
>>
>>
>>
>> On 2023/12/9 06:32, Andrii Nakryiko wrote:
>>> On Thu, Dec 7, 2023 at 6:49 AM Alan Maguire <alan.maguire@oracle.com> wrote:
>>>>
>>>> On 07/12/2023 13:15, Philo Lu wrote:
>>>>> Hi all. I have a question when using perfbuf/ringbuf in bpf. I will
>>>>> appreciate it if you give me any advice.
>>>>>
>>>>> Imagine a simple case: the bpf program output a log (some tcp
>>>>> statistics) to user every time a packet is received, and the user
>>>>> actively read the logs if he wants. I do not want to keep a user process
>>>>> alive, waiting for outputs of the buffer. User can read the buffer as
>>>>> need. BTW, the order does not matter.
>>>>>
>>>>> To conclude, I hope the buffer performs like relayfs: (1) no need for
>>>>> user process to receive logs, and the user may read at any time (and no
>>>>> wakeup would be better); (2) old data can be overwritten by new ones.
>>>>>
>>>>> Currently, it seems that perfbuf and ringbuf cannot satisfy both: (i)
>>>>> ringbuf: only satisfies (1). However, if data arrive when the buffer is
>>>>> full, the new data will be lost, until the buffer is consumed. (ii)
>>>>> perfbuf: only satisfies (2). But user cannot access the buffer after the
>>>>> process who creates it (including perf_event.rb via mmap) exits.
>>>>> Specifically, I can use BPF_F_PRESERVE_ELEMS flag to keep the
>>>>> perf_events, but I do not know how to get the buffer again in a new
>>>>> process.
>>>>>
>>>>> In my opinion, this can be solved by either of the following: (a) add
>>>>> overwrite support in ringbuf (maybe a new flag for reserve), but we have
>>>>> to address synchronization between kernel and user, especially under
>>>>> variable data size, because when overwriting occurs, kernel has to
>>>>> update the consumer posi too; (b) implement map_fd_sys_lookup_elem for
>>>>> perfbuf to expose fds to user via map_lookup_elem syscall, and a
>>>>> mechanism is need to preserve perf_event->rb when process exits
>>>>> (otherwise the buffer will be freed by perf_mmap_close). I am not sure
>>>>> if they are feasible, and which is better. If not, perhaps we can
>>>>> develop another mechanism to achieve this?
>>>>>
>>>>
>>>> There was an RFC a while back focused on supporting BPF ringbuf
>>>> over-writing [1]; at the time, Andrii noted some potential issues that
>>>> might be exposed by doing multiple ringbuf reserves to overfill the
>>>> buffer within the same program.
>>>>
>>>
>>> Correct. I don't think it's possible to correctly and safely support
>>> overwriting with BPF ringbuf that has variable-sized elements.
>>>
>>> We'll need to implement MPMC ringbuf (probably with fixed sized
>>> element size) to be able to support this.
>>>
>>
>> Thank you very much!
>>
>> If it is indeed difficult with ringbuf, maybe I can implement a new type
>> of bpf map based on relay interface [1]? e.g., init relay during map
>> creating, write into it with bpf helper, and then user can access to it
>> in filesystem. I think it will be a simple but useful map for
>> overwritable data transfer.
> 
> I don't know much about relay, tbh. Give it a try, I guess.
> Alternatively, we need better and faster implementation of
> BPF_MAP_TYPE_QUEUE, which seems like the data structure that can
> support overwriting and generally be a fixed elementa size
> alternative/complement to BPF ringbuf.
> 

Thank you for your reply. I am afraid BPF_MAP_TYPE_QUEUE cannot get rid 
of locking overheads with concurrent reading and writing by design, and 
a lockless buffer like relay fits better to our case. So I will try it :)

>>
>> [1]
>> https://github.com/torvalds/linux/blob/master/Documentation/filesystems/relay.rst
>>
>>>> Alan
>>>>
>>>> [1]
>>>> https://lore.kernel.org/lkml/20220906195656.33021-2-flaniel@linux.microsoft.com/

  reply	other threads:[~2023-12-15 10:10 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-07 13:15 Question about bpf perfbuf/ringbuf: pinned in backend with overwriting Philo Lu
2023-12-07 14:48 ` Alan Maguire
2023-12-08 22:32   ` Andrii Nakryiko
2023-12-11 12:39     ` Philo Lu
2023-12-13 23:35       ` Andrii Nakryiko
2023-12-15 10:10         ` Philo Lu [this message]
2023-12-15 22:39           ` Andrii Nakryiko
2023-12-16  8:50             ` Dmitry Vyukov
2023-12-18 12:58               ` Philo Lu
2023-12-19 19:25               ` Andrii Nakryiko
2023-12-19  6:23         ` Shung-Hsi Yu
2023-12-19 13:38           ` Steven Rostedt
2023-12-19 17:01             ` Alexei Starovoitov
2023-12-19 17:28             ` Steven Rostedt
2023-12-21 13:00             ` Philo Lu
2023-12-21 14:49               ` Steven Rostedt
2023-12-22 12:25                 ` Philo Lu

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=23bcab0e-bec1-4edd-b45a-0142ebcda41a@linux.alibaba.com \
    --to=lulie@linux.alibaba.com \
    --cc=alan.maguire@oracle.com \
    --cc=alibuda@linux.alibaba.com \
    --cc=andrii.nakryiko@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=dust.li@linux.alibaba.com \
    --cc=guwen@linux.alibaba.com \
    --cc=hengqi@linux.alibaba.com \
    --cc=rihams@meta.com \
    --cc=slinger@meta.com \
    --cc=song@kernel.org \
    --cc=xuanzhuo@linux.alibaba.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox