Re: [PATCH bpf-next v4 3/7] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs

public inbox for bpf@vger.kernel.org
 help / color / mirror / Atom feed

From: Martin KaFai Lau <martin.lau@linux.dev>
To: Yosry Ahmed <yosryahmed@google.com>
Cc: Yonghong Song <yhs@fb.com>, Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Daniel Borkmann <daniel@iogearbox.net>,
	kernel-team@fb.com, KP Singh <kpsingh@kernel.org>,
	Martin KaFai Lau <martin.lau@kernel.org>,
	Tejun Heo <tj@kernel.org>,
	bpf@vger.kernel.org
Subject: Re: [PATCH bpf-next v4 3/7] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs
Date: Mon, 24 Oct 2022 17:32:31 -0700	[thread overview]
Message-ID: <5dd7b50f-3179-75c2-4125-ee872f225129@linux.dev> (raw)
In-Reply-To: <CAJD7tkafC5BPqUxUWc3UUrphe0wKaRe=HfLvkyPk09+EV8ndCw@mail.gmail.com>

On 10/24/22 5:21 PM, Yosry Ahmed wrote:
> On Mon, Oct 24, 2022 at 2:15 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>
>> On 10/23/22 11:05 AM, Yonghong Song wrote:
>>> +void bpf_cgrp_storage_free(struct cgroup *cgroup)
>>> +{
>>> +     struct bpf_local_storage *local_storage;
>>> +     struct bpf_local_storage_elem *selem;
>>> +     bool free_cgroup_storage = false;
>>> +     struct hlist_node *n;
>>> +     unsigned long flags;
>>> +
>>> +     rcu_read_lock();
>>> +     local_storage = rcu_dereference(cgroup->bpf_cgrp_storage);
>>> +     if (!local_storage) {
>>> +             rcu_read_unlock();
>>> +             return;
>>> +     }
>>> +
>>> +     /* Neither the bpf_prog nor the bpf_map's syscall
>>> +      * could be modifying the local_storage->list now.
>>> +      * Thus, no elem can be added to or deleted from the
>>> +      * local_storage->list by the bpf_prog or by the bpf_map's syscall.
>>> +      *
>>> +      * It is racing with __bpf_local_storage_map_free() alone
>>> +      * when unlinking elem from the local_storage->list and
>>> +      * the map's bucket->list.
>>> +      */
>>> +     bpf_cgrp_storage_lock();
>>> +     raw_spin_lock_irqsave(&local_storage->lock, flags);
>>> +     hlist_for_each_entry_safe(selem, n, &local_storage->list, snode) {
>>> +             bpf_selem_unlink_map(selem);
>>> +             /* If local_storage list has only one element, the
>>> +              * bpf_selem_unlink_storage_nolock() will return true.
>>> +              * Otherwise, it will return false. The current loop iteration
>>> +              * intends to remove all local storage. So the last iteration
>>> +              * of the loop will set the free_cgroup_storage to true.
>>> +              */
>>> +             free_cgroup_storage =
>>> +                     bpf_selem_unlink_storage_nolock(local_storage, selem, false, false);
>>> +     }
>>> +     raw_spin_unlock_irqrestore(&local_storage->lock, flags);
>>> +     bpf_cgrp_storage_unlock();
>>> +     rcu_read_unlock();
>>> +
>>> +     if (free_cgroup_storage)
>>> +             kfree_rcu(local_storage, rcu);
>>> +}
>>
>> [ ... ]
>>
>>> +/* *gfp_flags* is a hidden argument provided by the verifier */
>>> +BPF_CALL_5(bpf_cgrp_storage_get, struct bpf_map *, map, struct cgroup *, cgroup,
>>> +        void *, value, u64, flags, gfp_t, gfp_flags)
>>> +{
>>> +     struct bpf_local_storage_data *sdata;
>>> +
>>> +     WARN_ON_ONCE(!bpf_rcu_lock_held());
>>> +     if (flags & ~(BPF_LOCAL_STORAGE_GET_F_CREATE))
>>> +             return (unsigned long)NULL;
>>> +
>>> +     if (!cgroup)
>>> +             return (unsigned long)NULL;
>>> +
>>> +     if (!bpf_cgrp_storage_trylock())
>>> +             return (unsigned long)NULL;
>>> +
>>> +     sdata = cgroup_storage_lookup(cgroup, map, true);
>>> +     if (sdata)
>>> +             goto unlock;
>>> +
>>> +     /* only allocate new storage, when the cgroup is refcounted */
>>> +     if (!percpu_ref_is_dying(&cgroup->self.refcnt) &&
>>> +         (flags & BPF_LOCAL_STORAGE_GET_F_CREATE))
>>> +             sdata = bpf_local_storage_update(cgroup, (struct bpf_local_storage_map *)map,
>>> +                                              value, BPF_NOEXIST, gfp_flags);
>>> +
>>> +unlock:
>>> +     bpf_cgrp_storage_unlock();
>>> +     return IS_ERR_OR_NULL(sdata) ? (unsigned long)NULL : (unsigned long)sdata->data;
>>> +}
>>
>> [ ... ]
>>
>>> diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c
>>> index 764bdd5fd8d1..32145d066a09 100644
>>> --- a/kernel/cgroup/cgroup.c
>>> +++ b/kernel/cgroup/cgroup.c
>>> @@ -5227,6 +5227,10 @@ static void css_free_rwork_fn(struct work_struct *work)
>>>        struct cgroup_subsys *ss = css->ss;
>>>        struct cgroup *cgrp = css->cgroup;
>>>
>>> +#ifdef CONFIG_BPF_SYSCALL
>>> +     bpf_cgrp_storage_free(cgrp);
>>> +#endif
>>
>>
>> After revisiting comment 4bfc0bb2c60e, some of the commit message came to my mind:
>>
>> " ...... it blocks a possibility to implement
>>     the memcg-based memory accounting for bpf objects, because a circular
>>     reference dependency will occur. Charged memory pages are pinning the
>>     corresponding memory cgroup, and if the memory cgroup is pinning
>>     the attached bpf program, nothing will be ever released."
>>
>> Considering the bpf_map_kzalloc() is used in bpf_local_storage_map.c and it can
>> charge the memcg, I wonder if the cgrp_local_storage will have similar refcnt
>> loop issue here.
>>
>> If here is the right place to free the cgrp_local_storage() and enough to break
>> this refcnt loop, it will be useful to add some explanation and its
>> consideration in the commit message.
>>
> 
> I think a similar refcount loop issue can happen here as well. IIUC,
> this function will only be run when the css is released after all
> references are dropped. So if memcg charging is enabled the cgroup
> will never be removed. This one might be trickier to handle though..

How about removing all storage from cgrp->bpf_cgrp_storage in 
cgroup_destroy_locked()?

next prev parent reply	other threads:[~2022-10-25  1:15 UTC|newest]

Thread overview: 31+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-23 18:05 [PATCH bpf-next v4 0/7] bpf: Implement cgroup local storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-23 18:05 ` [PATCH bpf-next v4 1/7] bpf: Make struct cgroup btf id global Yonghong Song
2022-10-23 19:59   ` David Vernet
2022-10-23 18:05 ` [PATCH bpf-next v4 2/7] bpf: Refactor inode/task/sk storage map_{alloc,free}() for reuse Yonghong Song
2022-10-24 18:02   ` sdf
2022-10-24 19:08     ` Yonghong Song
2022-10-24 20:34   ` Martin KaFai Lau
2022-10-25  2:28     ` Yonghong Song
2022-10-23 18:05 ` [PATCH bpf-next v4 3/7] bpf: Implement cgroup storage available to non-cgroup-attached bpf progs Yonghong Song
2022-10-23 20:02   ` David Vernet
2022-10-24 19:19   ` Martin KaFai Lau
2022-10-25  0:21     ` Yosry Ahmed
2022-10-25  0:32       ` Martin KaFai Lau [this message]
2022-10-25  0:48         ` Yosry Ahmed
2022-10-25  0:55           ` Yosry Ahmed
2022-10-25  2:38             ` Roman Gushchin
2022-10-25  5:46               ` Yosry Ahmed
2022-10-25  1:36           ` Martin KaFai Lau
2022-10-25  5:44             ` Yosry Ahmed
2022-10-25 19:53               ` Yonghong Song
2022-10-23 18:05 ` [PATCH bpf-next v4 4/7] libbpf: Support new cgroup local storage Yonghong Song
2022-10-23 20:03   ` David Vernet
2022-10-23 18:05 ` [PATCH bpf-next v4 5/7] bpftool: " Yonghong Song
2022-10-23 18:05 ` [PATCH bpf-next v4 6/7] selftests/bpf: Add selftests for " Yonghong Song
2022-10-23 20:14   ` David Vernet
2022-10-24 19:03     ` Yonghong Song
2022-10-24 20:30   ` Martin KaFai Lau
2022-10-25  2:26     ` Yonghong Song
2022-10-23 18:05 ` [PATCH bpf-next v4 7/7] docs/bpf: Add documentation " Yonghong Song
2022-10-23 20:26   ` David Vernet
2022-10-24 19:05     ` Yonghong Song

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5dd7b50f-3179-75c2-4125-ee872f225129@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=daniel@iogearbox.net \
    --cc=kernel-team@fb.com \
    --cc=kpsingh@kernel.org \
    --cc=martin.lau@kernel.org \
    --cc=tj@kernel.org \
    --cc=yhs@fb.com \
    --cc=yosryahmed@google.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox