Re: [RFC bpf-next v4 4/6] bpf: pin, translate, and unpin __uptr from syscalls.

linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed

From: Martin KaFai Lau <martin.lau@linux.dev>
To: Alexei Starovoitov <alexei.starovoitov@gmail.com>
Cc: Kui-Feng Lee <thinker.li@gmail.com>, bpf <bpf@vger.kernel.org>,
	Alexei Starovoitov <ast@kernel.org>,
	Andrii Nakryiko <andrii@kernel.org>,
	Kui-Feng Lee <sinquersw@gmail.com>, linux-mm <linux-mm@kvack.org>
Subject: Re: [RFC bpf-next v4 4/6] bpf: pin, translate, and unpin __uptr from syscalls.
Date: Fri, 6 Sep 2024 13:11:48 -0700	[thread overview]
Message-ID: <f84e4a86-976a-4fd2-94e7-8026dc3ae56e@linux.dev> (raw)
In-Reply-To: <70a1b24f-84cd-464c-8fb6-a2c52fd3d703@linux.dev>

On 9/4/24 3:21 PM, Martin KaFai Lau wrote:
> On 8/28/24 4:24 PM, Alexei Starovoitov wrote:
>>> @@ -714,6 +869,11 @@ void bpf_obj_free_fields(const struct btf_record *rec, 
>>> void *obj)
>>>                                  field->kptr.dtor(xchgd_field);
>>>                          }
>>>                          break;
>>> +               case BPF_UPTR:
>>> +                       if (*(void **)field_ptr)
>>> +                               bpf_obj_unpin_uptr(field, *(void **)field_ptr);
>>> +                       *(void **)field_ptr = NULL;
>> This one will be called from
>>   task_storage_delete->bpf_selem_free->bpf_obj_free_fields
>>
>> and even if upin was safe to do from that context
>> we cannot just do:
>> *(void **)field_ptr = NULL;
>>
>> since bpf prog might be running in parallel,
>> it could have just read that addr and now is using it.
>>
>> The first thought of a way to fix this was to split
>> bpf_obj_free_fields() into the current one plus
>> bpf_obj_free_fields_after_gp()
>> that will do the above unpin bit.
>> and call the later one from bpf_selem_free_rcu()
>> while bpf_obj_free_fields() from bpf_selem_free()
>> will not touch uptr.
>>
>> But after digging further I realized that task_storage
>> already switched to use bpf_ma, so the above won't work.
>>
>> So we need something similar to BPF_KPTR_REF logic:
>> xchgd_field = (void *)xchg((unsigned long *)field_ptr, 0);
>> and then delay of uptr unpin for that address into call_rcu.
>>
>> Any better ideas?
> 

I think the existing reuse_now arg in the bpf_selem_free can be used. reuse_now 
(renamed from the earlier use_trace_rcu) was added to avoid call_rcu_tasks_trace 
for the common case.

selem (in type "struct bpf_local_storage_elem") is the one exposed to the bpf prog.

bpf_selem_free knows whether a selem can be reused immediately based on the 
caller. It is currently flagged in the reuse_now arg: "bpf_selem_free(...., bool 
reuse_now)".

If a selem cannot be reuse_now (i.e. == false), it is currently going through 
"call_rcu_tasks_trace(&selem->rcu, bpf_selem_free_trace_rcu)". We can do 
unpin_user_page() in the rcu callback.

A selem can be reuse_now (i.e. == true) if the selem is no longer needed because 
either its owner (i.e. the task_struct here) is going away in free_task() or the 
bpf map is being destructed in bpf_local_storage_map_free(). No bpf prog should 
have a hold on the selem at this point. I think for these two cases, the 
unpin_user_page() can be directly called in bpf_selem_free().

One existing bug is, from looking at patch 6, I don't think the free_task() case 
can be "reuse_now == true" anymore because of the bpf_task_release kfunc did not 
mark the previously obtained task_storage to be invalid:

data_task = bpf_task_from_pid(parent_pid);
ptr = bpf_task_storage_get(&datamap, data_task, 0, ...);
bpf_task_release(data_task);
if (!ptr)
	return 0;
/* The prog still holds a valid task storage ptr. */
udata = ptr->udata;

It can be fixed by marking the ref_obj_id of the "ptr". Although it is more 
correct to make the task storage "ptr" invalid after task_release, it may break 
the existing progs.

The same issue probably is true for cgroup_storage. There is no release kfunc 
for inode and sk, so inode and sk storage should be fine.

next prev parent reply	other threads:[~2024-09-06 20:12 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20240816191213.35573-1-thinker.li@gmail.com>
2024-08-16 19:12 ` [RFC bpf-next v4 4/6] bpf: pin, translate, and unpin __uptr from syscalls Kui-Feng Lee
2024-08-28 23:24   ` Alexei Starovoitov
2024-09-04 22:21     ` Martin KaFai Lau
2024-09-06 20:11       ` Martin KaFai Lau [this message]
2024-09-06 23:44         ` Alexei Starovoitov
2024-09-07  1:32           ` Martin KaFai Lau
2024-09-07  4:03             ` Martin KaFai Lau

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f84e4a86-976a-4fd2-94e7-8026dc3ae56e@linux.dev \
    --to=martin.lau@linux.dev \
    --cc=alexei.starovoitov@gmail.com \
    --cc=andrii@kernel.org \
    --cc=ast@kernel.org \
    --cc=bpf@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=sinquersw@gmail.com \
    --cc=thinker.li@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).