All of lore.kernel.org
 help / color / mirror / Atom feed
From: Martin KaFai Lau <martin.lau@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Kui-Feng Lee <thinker.li@gmail.com>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kui-Feng Lee <sinquersw@gmail.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: [RFC bpf-next v4 4/6] bpf: pin, translate, and unpin __uptr from syscalls.
Date: Fri, 6 Sep 2024 13:11:48 -0700	[thread overview]
Message-ID: <f84e4a86-976a-4fd2-94e7-8026dc3ae56e@linux.dev> (raw)
In-Reply-To: <70a1b24f-84cd-464c-8fb6-a2c52fd3d703@linux.dev>

On 9/4/24 3:21 PM, Martin KaFai Lau wrote:
> On 8/28/24 4:24 PM, Alexei Starovoitov wrote:
>>> @@ -714,6 +869,11 @@ void bpf_obj_free_fields(const struct btf_record *rec, 
>>> void *obj)
>>>                                  field->kptr.dtor(xchgd_field);
>>>                          }
>>>                          break;
>>> +               case BPF_UPTR:
>>> +                       if (*(void **)field_ptr)
>>> +                               bpf_obj_unpin_uptr(field, *(void **)field_ptr);
>>> +                       *(void **)field_ptr = NULL;
>> This one will be called from
>>   task_storage_delete->bpf_selem_free->bpf_obj_free_fields
>>
>> and even if upin was safe to do from that context
>> we cannot just do:
>> *(void **)field_ptr = NULL;
>>
>> since bpf prog might be running in parallel,
>> it could have just read that addr and now is using it.
>>
>> The first thought of a way to fix this was to split
>> bpf_obj_free_fields() into the current one plus
>> bpf_obj_free_fields_after_gp()
>> that will do the above unpin bit.
>> and call the later one from bpf_selem_free_rcu()
>> while bpf_obj_free_fields() from bpf_selem_free()
>> will not touch uptr.
>>
>> But after digging further I realized that task_storage
>> already switched to use bpf_ma, so the above won't work.
>>
>> So we need something similar to BPF_KPTR_REF logic:
>> xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
>> and then delay of uptr unpin for that address into call_rcu.
>>
>> Any better ideas?
> 

I think the existing reuse_now arg in the bpf_selem_free can be used. reuse_now 
(renamed from the earlier use_trace_rcu) was added to avoid call_rcu_tasks_trace 
for the common case.

selem (in type "struct bpf_local_storage_elem") is the one exposed to the bpf prog.

bpf_selem_free knows whether a selem can be reused immediately based on the 
caller. It is currently flagged in the reuse_now arg: "bpf_selem_free(...., bool 
reuse_now)".

If a selem cannot be reuse_now (i.e. == false), it is currently going through 
"call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu)". We can do 
unpin_user_page() in the rcu callback.

A selem can be reuse_now (i.e. == true) if the selem is no longer needed because 
either its owner (i.e. the task_struct here) is going away in free_task() or the 
bpf map is being destructed in bpf_local_storage_map_free(). No bpf prog should 
have a hold on the selem at this point. I think for these two cases, the 
unpin_user_page() can be directly called in bpf_selem_free().

One existing bug is, from looking at patch 6, I don't think the free_task() case 
can be "reuse_now == true" anymore because of the bpf_task_release kfunc did not 
mark the previously obtained task_storage to be invalid:

data_task = bpf_task_from_pid(parent_pid);
ptr = bpf_task_storage_get(&datamap, data_task, 0, ...);
bpf_task_release(data_task);
if (!ptr)
	return 0;
/* The prog still holds a valid task storage ptr. */
udata = ptr->udata;

It can be fixed by marking the ref_obj_id of the "ptr". Although it is more 
correct to make the task storage "ptr" invalid after task_release, it may break 
the existing progs.

The same issue probably is true for cgroup_storage. There is no release kfunc 
for inode and sk, so inode and sk storage should be fine.


  reply	other threads:[~2024-09-06 20:11 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-16 19:12 [RFC bpf-next v4 0/6] Share user memory to BPF program through task storage map Kui-Feng Lee
2024-08-16 19:12 ` [RFC bpf-next v4 1/6] bpf: define BPF_UPTR a new enumerator of btf_field_type Kui-Feng Lee
2024-08-16 19:12 ` [RFC bpf-next v4 2/6] bpf: Parse and support "uptr" tag Kui-Feng Lee
2024-08-16 19:12 ` [RFC bpf-next v4 3/6] bpf: Handle BPF_UPTR in verifier Kui-Feng Lee
2024-08-16 19:12 ` [RFC bpf-next v4 4/6] bpf: pin, translate, and unpin __uptr from syscalls Kui-Feng Lee
2024-08-28 23:24   ` Alexei Starovoitov
2024-09-04 22:21     ` Martin KaFai Lau
2024-09-06 20:11       ` Martin KaFai Lau [this message]
2024-09-06 23:44         ` Alexei Starovoitov
2024-09-07  1:32           ` Martin KaFai Lau
2024-09-07  4:03             ` Martin KaFai Lau
2024-08-16 19:12 ` [RFC bpf-next v4 5/6] libbpf: define __uptr Kui-Feng Lee
2024-08-27 23:13   ` Andrii Nakryiko
2024-08-28 17:53     ` Kui-Feng Lee
2024-08-16 19:12 ` [RFC bpf-next v4 6/6] selftests/bpf: test __uptr on the value of a task storage map Kui-Feng Lee

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f84e4a86-976a-4fd2-94e7-8026dc3ae56e@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sinquersw@gmail.com \
    --cc=thinker.li@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.