Re: [PATCH bpf-next v3 4/8] bpf: Introduce cgroup iter

public inbox for cgroups@vger.kernel.org
 help / color / mirror / Atom feed

From: Yonghong Song <yhs@fb.com>
To: Hao Luo <haoluo@google.com>
Cc: "Yosry Ahmed" <yosryahmed@google.com>,
	"Alexei Starovoitov" <ast@kernel.org>,
	"Daniel Borkmann" <daniel@iogearbox.net>,
	"Andrii Nakryiko" <andrii@kernel.org>,
	"Martin KaFai Lau" <kafai@fb.com>,
	"Song Liu" <songliubraving@fb.com>, "Tejun Heo" <tj@kernel.org>,
	"Zefan Li" <lizefan.x@bytedance.com>,
	"Johannes Weiner" <hannes@cmpxchg.org>,
	"Shuah Khan" <shuah@kernel.org>,
	"Michal Hocko" <mhocko@kernel.org>,
	"KP Singh" <kpsingh@kernel.org>,
	"Benjamin Tissoires" <benjamin.tissoires@redhat.com>,
	"John Fastabend" <john.fastabend@gmail.com>,
	"Michal Koutný" <mkoutny@suse.com>,
	"Roman Gushchin" <roman.gushchin@linux.dev>,
	"David Rientjes" <rientjes@google.com>,
	"Stanislav Fomichev" <sdf@google.com>,
	"Greg Thelen" <gthelen@google.com>,
	"Shakeel Butt" <shakeelb@google.com>
Subject: Re: [PATCH bpf-next v3 4/8] bpf: Introduce cgroup iter
Date: Thu, 21 Jul 2022 09:15:07 -0700	[thread overview]
Message-ID: <3f3ffe0e-d2ac-c868-a1bf-cdf1b58fd666@fb.com> (raw)
In-Reply-To: <CA+khW7jp+0AadVagqCcV8ELNRphP47vJ6=jGyuMJGnTtYynF+Q@mail.gmail.com>



On 7/20/22 5:40 PM, Hao Luo wrote:
> On Mon, Jul 11, 2022 at 8:45 PM Yonghong Song <yhs@fb.com> wrote:
>>
>> On 7/11/22 5:42 PM, Hao Luo wrote:
> [...]
>>>>>> +
>>>>>> +static void *cgroup_iter_seq_start(struct seq_file *seq, loff_t *pos)
>>>>>> +{
>>>>>> +    struct cgroup_iter_priv *p = seq->private;
>>>>>> +
>>>>>> +    mutex_lock(&cgroup_mutex);
>>>>>> +
>>>>>> +    /* support only one session */
>>>>>> +    if (*pos > 0)
>>>>>> +        return NULL;
>>>>>
>>>>> This might be okay. But want to check what is
>>>>> the practical upper limit for cgroups in a system
>>>>> and whether we may miss some cgroups. If this
>>>>> happens, it will be a surprise to the user.
>>>>>
>>>
>>> Ok. What's the max number of items supported in a single session?
>>
>> The max number of items (cgroups) in a single session is determined
>> by kernel_buffer_size which equals to 8 * PAGE_SIZE. So it really
>> depends on how much data bpf program intends to send to user space.
>> If each bpf program run intends to send 64B to user space, e.g., for
>> cpu, memory, cpu pressure, mem pressure, io pressure, read rate, write
>> rate, read/write rate. Then each session can support 512 cgroups.
>>
> 
> Hi Yonghong,
> 
> Sorry about the late reply. It's possible that the number of cgroup
> can be large, 1000+, in our production environment. But that may not
> be common. Would it be good to leave handling large number of cgroups
> as follow up for this patch? If it turns out to be a problem, to
> alleviate it, we could:
> 
> 1. tell users to write program to skip a certain uninteresting cgroups.
> 2. support requesting large kernel_buffer_size for bpf_iter, maybe as
> a new bpf_iter flag.

Currently if we intend to support multiple read() for cgroup_iter,
the following is a very inefficient approach:

in seq_file private data structure, remember the last cgroup visited
and for the second read() syscall, do the traversal again (but not 
calling bpf program) until the last cgroup and proceed from there.
This is inefficient and probably works. But if the last cgroup is
gone from the hierarchy, that the above approach won't work. One
possibility is to rememobe the last two cgroups. If the last cgroup
is gone, check the 'next' cgroup based on the one before the last
cgroup. If both are gone, we return NULL.

But in any case, if there are additional cgroups not visited,
in the second read(), we should not return NULL which indicates
done with all cgroups. We may return EOPNOTSUPP to indicate there
are missing cgroups due to not supported.

Once users see EOPNOTSUPP which indicates there are missing
cgroups, they can do more filtering in bpf program to avoid
large data volume to user space.

To provide a way to truely visit *all* cgroups,
we can either use bpf_iter link_create->flags
to increase the buffer size as your suggested in the above so
user can try to allocate more kernel buffer size. Or implement
proper second read() traversal which I don't have a good idea
how to do it efficiently.
> 
> Hao
> 
>>>
> [...]
>>>>> [...]

next prev parent reply	other threads:[~2022-07-21 16:15 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-07-09  0:04 [PATCH bpf-next v3 0/8] bpf: rstat: cgroup hierarchical stats Yosry Ahmed
     [not found] ` <20220709000439.243271-1-yosryahmed-hpIqsD4AKlfQT0dZR+AlfA@public.gmane.org>
2022-07-09  0:04   ` [PATCH bpf-next v3 1/8] btf: Add a new kfunc set which allows to mark a function to be sleepable Yosry Ahmed
2022-07-09  0:04   ` [PATCH bpf-next v3 4/8] bpf: Introduce cgroup iter Yosry Ahmed
2022-07-11  0:19     ` Yonghong Song
2022-07-11 23:20       ` Yonghong Song
2022-07-12  0:42         ` Hao Luo
2022-07-12  3:45           ` Yonghong Song
     [not found]             ` <2a26b45d-6fab-b2a2-786e-5cb4572219ea-b10kYP2dOMg@public.gmane.org>
2022-07-21  0:40               ` Hao Luo
2022-07-21 16:15                 ` Yonghong Song [this message]
     [not found]                   ` <3f3ffe0e-d2ac-c868-a1bf-cdf1b58fd666-b10kYP2dOMg@public.gmane.org>
2022-07-21 17:21                     ` Hao Luo
2022-07-21 18:15                       ` Yonghong Song
2022-07-21 21:07                         ` Hao Luo
2022-07-09  0:04 ` [PATCH bpf-next v3 2/8] cgroup: enable cgroup_get_from_file() on cgroup1 Yosry Ahmed
2022-07-09  0:04 ` [PATCH bpf-next v3 3/8] bpf, iter: Fix the condition on p when calling stop Yosry Ahmed
2022-07-09  0:04 ` [PATCH bpf-next v3 5/8] selftests/bpf: Test cgroup_iter Yosry Ahmed
2022-07-09  0:04 ` [PATCH bpf-next v3 6/8] cgroup: bpf: enable bpf programs to integrate with rstat Yosry Ahmed
2022-07-09  0:04 ` [PATCH bpf-next v3 7/8] selftests/bpf: extend cgroup helpers Yosry Ahmed
2022-07-09  0:04 ` [PATCH bpf-next v3 8/8] bpf: add a selftest for cgroup hierarchical stats collection Yosry Ahmed
2022-07-11  0:26   ` Yonghong Song
     [not found]     ` <b4936952-2fe7-656c-2d0d-69044265392a-b10kYP2dOMg@public.gmane.org>
2022-07-11  0:51       ` Yonghong Song
2022-07-11  6:01         ` Hao Luo
2022-07-11  6:19           ` Yonghong Song
     [not found]             ` <e2f8fcd8-9219-1119-86ca-69714789d494-b10kYP2dOMg@public.gmane.org>
2022-07-12  0:44               ` Hao Luo
2022-07-12  3:55         ` Yosry Ahmed
2022-07-18 19:34           ` Yosry Ahmed
     [not found]             ` <CAJD7tkb8-scb1sstre0LRhY3dgfUJhGvSR=DgEqfwcVtBwb+5w-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2022-07-19  0:19               ` Hao Luo
2022-07-19 16:17             ` Yonghong Song
2022-07-19 17:02               ` Yosry Ahmed

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3f3ffe0e-d2ac-c868-a1bf-cdf1b58fd666@fb.com \
    --to=yhs@fb.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=benjamin.tissoires@redhat.com \
    --cc=daniel@iogearbox.net \
    --cc=gthelen@google.com \
    --cc=hannes@cmpxchg.org \
    --cc=haoluo@google.com \
    --cc=john.fastabend@gmail.com \
    --cc=kafai@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mhocko@kernel.org \
    --cc=mkoutny@suse.com \
    --cc=rientjes@google.com \
    --cc=roman.gushchin@linux.dev \
    --cc=sdf@google.com \
    --cc=shakeelb@google.com \
    --cc=shuah@kernel.org \
    --cc=songliubraving@fb.com \
    --cc=tj@kernel.org \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox