From: Martin KaFai Lau <martin.lau@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Kui-Feng Lee <thinker.li@gmail.com>, bpf <bpf@vger.kernel.org>,
Alexei Starovoitov <ast@kernel.org>,
Andrii Nakryiko <andrii@kernel.org>,
Kui-Feng Lee <sinquersw@gmail.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: [RFC bpf-next v4 4/6] bpf: pin, translate, and unpin __uptr from syscalls.
Date: Fri, 6 Sep 2024 21:03:44 -0700 [thread overview]
Message-ID: <769541a2-4009-4035-a327-838ebdfbf258@linux.dev> (raw)
In-Reply-To: <8b61f093-a6a6-4f99-91f8-20f2a7235d76@linux.dev>
On 9/6/24 6:32 PM, Martin KaFai Lau wrote:
> On 9/6/24 4:44 PM, Alexei Starovoitov wrote:
>> On Fri, Sep 6, 2024 at 1:11 PM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>>>
>>> On 9/4/24 3:21 PM, Martin KaFai Lau wrote:
>>>> On 8/28/24 4:24 PM, Alexei Starovoitov wrote:
>>>>>> @@ -714,6 +869,11 @@ void bpf_obj_free_fields(const struct btf_record *rec,
>>>>>> void *obj)
>>>>>> field->kptr.dtor(xchgd_field);
>>>>>> }
>>>>>> break;
>>>>>> + case BPF_UPTR:
>>>>>> + if (*(void **)field_ptr)
>>>>>> + bpf_obj_unpin_uptr(field, *(void
>>>>>> **)field_ptr);
>>>>>> + *(void **)field_ptr = NULL;
>>>>> This one will be called from
>>>>> task_storage_delete->bpf_selem_free->bpf_obj_free_fields
>>>>>
>>>>> and even if upin was safe to do from that context
>>>>> we cannot just do:
>>>>> *(void **)field_ptr = NULL;
>>>>>
>>>>> since bpf prog might be running in parallel,
>>>>> it could have just read that addr and now is using it.
>>>>>
>>>>> The first thought of a way to fix this was to split
>>>>> bpf_obj_free_fields() into the current one plus
>>>>> bpf_obj_free_fields_after_gp()
>>>>> that will do the above unpin bit.
>>>>> and call the later one from bpf_selem_free_rcu()
>>>>> while bpf_obj_free_fields() from bpf_selem_free()
>>>>> will not touch uptr.
>>>>>
>>>>> But after digging further I realized that task_storage
>>>>> already switched to use bpf_ma, so the above won't work.
>>>>>
>>>>> So we need something similar to BPF_KPTR_REF logic:
>>>>> xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
>>>>> and then delay of uptr unpin for that address into call_rcu.
>>>>>
>>>>> Any better ideas?
>>>>
>>>
>>> I think the existing reuse_now arg in the bpf_selem_free can be used. reuse_now
>>> (renamed from the earlier use_trace_rcu) was added to avoid call_rcu_tasks_trace
>>> for the common case.
>>>
>>> selem (in type "struct bpf_local_storage_elem") is the one exposed to the bpf
>>> prog.
>>>
>>> bpf_selem_free knows whether a selem can be reused immediately based on the
>>> caller. It is currently flagged in the reuse_now arg: "bpf_selem_free(...., bool
>>> reuse_now)".
>>>
>>> If a selem cannot be reuse_now (i.e. == false), it is currently going through
>>> "call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu)". We can do
>>> unpin_user_page() in the rcu callback.
>>>
>>> A selem can be reuse_now (i.e. == true) if the selem is no longer needed because
>>> either its owner (i.e. the task_struct here) is going away in free_task() or the
>>> bpf map is being destructed in bpf_local_storage_map_free(). No bpf prog should
>>> have a hold on the selem at this point. I think for these two cases, the
>>> unpin_user_page() can be directly called in bpf_selem_free().
>>
>> but there is also this path:
>> bpf_task_storage_delete -> task_storage_delete -> bpf_selem_free
>> -> bpf_obj_free_fields
>>
>> In this case bpf prog may still be looking at uptr address
>> and we cannot do unpin right away in bpf_obj_free_fields.
>
> cannot unpin immediately in the bpf_task_storage_delete() path is understood.
> task_storage can be used in sleepable. It needs to wait for the tasks_trace and
> the regular rcu gp before unpin.
>
> I forgot to mention earlier that bpf_task_storage_delete() will have the
> bpf_selem_free(..., reuse_now == false). It will then do the
> "call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu);". The unpin could
> happen in bpf_selem_free_trace_rcu() in this case. I am suggesting to unpin in
> bpf_selem_free_trace_rcu together with the selem free.
>
> I just noticed the map and its btf_record are gone in
> bpf_selem_free_trace_rcu()... so won't work. :(
Thought about it more. Adding a rcu_barrier() to bpf_local_storage_map_free()
may be enough. Then bpf_selem_free_rcu() will have the map->record to unpin.
will need to think about it more.
>
>> All other special fields in map value are ok,
>> since they are either relying on bpf_mem_alloc and
>> have rcu/rcu_tasks_trace gp
>> or extra indirection like timer/wq.
>>
>>> One existing bug is, from looking at patch 6, I don't think the free_task() case
>>> can be "reuse_now == true" anymore because of the bpf_task_release kfunc did not
>>> mark the previously obtained task_storage to be invalid:
>>>
>>> data_task = bpf_task_from_pid(parent_pid);
>>> ptr = bpf_task_storage_get(&datamap, data_task, 0, ...);
>>> bpf_task_release(data_task);
>>> if (!ptr)
>>> return 0;
>>> /* The prog still holds a valid task storage ptr. */
>>> udata = ptr->udata;
>>>
>>> It can be fixed by marking the ref_obj_id of the "ptr". Although it is more
>>> correct to make the task storage "ptr" invalid after task_release, it may break
>>> the existing progs.
>>
>> Are you suggesting that bpf_task_release should invalidate all pointers
>> fetched from map value?
>
> I was thinking at least the map value ptr itself needs to be invalidated.
>
>> That will work, but it's not an issue for other special fields in there
>> like kptr.
>> So this invalidation would be need only for uptr which feels
>> weird to special case it and probably will be confusing to users writing
>> such programs.
>
> hmm... I haven't thought about the other pointer fields that read before the
> task_release().
>
> Agreed, it is hard to use if only marks uptr invalid. Thinking about it. Even
> marking the map value ptr invalid while other previously read fields keep
> working is also the same weirdness.
>
>> Above bpf prog example should be ok to use.
>> We only need to delay unpin after rcu/rcu_task_trace gp.
>> Hence my proposal in bpf_obj_free_fields() do:
>> case UPTR:
>> xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
>> call_rcu(...) to unpin.
>
> Agree that call_rcu() here is the only option. It probably needs to go through
> the tasks_trace gp also.
>
> Can the page->rcu_head be used here?
>
next prev parent reply other threads:[~2024-09-07 4:03 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-08-16 19:12 [RFC bpf-next v4 0/6] Share user memory to BPF program through task storage map Kui-Feng Lee
2024-08-16 19:12 ` [RFC bpf-next v4 1/6] bpf: define BPF_UPTR a new enumerator of btf_field_type Kui-Feng Lee
2024-08-16 19:12 ` [RFC bpf-next v4 2/6] bpf: Parse and support "uptr" tag Kui-Feng Lee
2024-08-16 19:12 ` [RFC bpf-next v4 3/6] bpf: Handle BPF_UPTR in verifier Kui-Feng Lee
2024-08-16 19:12 ` [RFC bpf-next v4 4/6] bpf: pin, translate, and unpin __uptr from syscalls Kui-Feng Lee
2024-08-28 23:24 ` Alexei Starovoitov
2024-09-04 22:21 ` Martin KaFai Lau
2024-09-06 20:11 ` Martin KaFai Lau
2024-09-06 23:44 ` Alexei Starovoitov
2024-09-07 1:32 ` Martin KaFai Lau
2024-09-07 4:03 ` Martin KaFai Lau [this message]
2024-08-16 19:12 ` [RFC bpf-next v4 5/6] libbpf: define __uptr Kui-Feng Lee
2024-08-27 23:13 ` Andrii Nakryiko
2024-08-28 17:53 ` Kui-Feng Lee
2024-08-16 19:12 ` [RFC bpf-next v4 6/6] selftests/bpf: test __uptr on the value of a task storage map Kui-Feng Lee
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=769541a2-4009-4035-a327-838ebdfbf258@linux.dev \
--to=martin.lau@linux.dev \
--cc=alexei.starovoitov@gmail.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=sinquersw@gmail.com \
--cc=thinker.li@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.