From: David Vernet <void@manifault.com>
To: Matus Jokay <matus.jokay@stuba.sk>
Cc: andrii@kernel.org, ast@kernel.org, bpf@vger.kernel.org,
daniel@iogearbox.net, haoluo@google.com,
john.fastabend@gmail.com, jolsa@kernel.org, kernel-team@fb.com,
kpsingh@kernel.org, linux-kernel@vger.kernel.org,
martin.lau@linux.dev, memxor@gmail.com, sdf@google.com,
song@kernel.org, tj@kernel.org, yhs@fb.com, "Ploszek,
Roderik" <roderik.ploszek@stuba.sk>,
"Eric W. Biederman" <ebiederm@xmission.com>,
Oleg Nesterov <oleg@redhat.com>
Subject: Re: [PATCH bpf-next v9 3/4] bpf: Add kfuncs for storing struct task_struct * as a kptr
Date: Mon, 5 Dec 2022 09:02:57 -0600 [thread overview]
Message-ID: <Y44IITgHrhJf5fWJ@maniforge.lan> (raw)
In-Reply-To: <52f31c6f-7adb-78a4-dec5-8da524b4efa6@stuba.sk>
On Mon, Dec 05, 2022 at 11:11:47AM +0100, Matus Jokay wrote:
> Hello David,
Hi Matus,
>
> Your idea behind this patch is cool, but I'm afraid that the
> implementation is incorrect.
>
> As you can see, the task_struct:rcu_users member shares the same memory
> area with the task_struct:rcu (the head of an RCU CB).
> Consequence: *violated invariant* that the reference counter will
> remain zero after reaching zero!!!
> After reaching zero the task_struct:rcu head is set, so further attempts
> to access the task_struct:rcu_users may lead to a non-zero value.
Yes, you're right. Thanks for explaining this and pointing out the
oversight.
> For more information see
> https://lore.kernel.org/lkml/CAHk-=wjT6LG6sDaZtfeT80B9RaMP-y7RNRM4F5CX2v2Z=o8e=A@mail.gmail.com/
> In my opinion, the decision about task_struct:rcu and
> task_struct:rcu_users union is very bad, but you should probably consult
> the memory separation with authors of the actual implementation.
I expect the reason it's like that is because prior to this change, as
Linus pointed out, nothing ever increments the refcount (other than as
of commit 912616f142bf: ("exit: Guarantee make_task_dead leaks the tsk
when calling do_task_exit"), which similarly increments before the
reference could have ever gone to 0, so I think is fine), so we had the
ability to save a few bytes of memory in struct task_struct. Eric
mentioned this explicitly in the commit summary for commit 3fbd7ee285b2
("tasks: Add a count of task RCU users").
Now that the refcount is actually being used as a proper refcount with
this commit, that space saving is no longer an option (unless we rip out
my changes of course). +cc Eric and Oleg -- would you guys be OK with
separating them out from that union? I guess the alternative would be to
check for p->flags & PF_EXITING in the helper, but using p->rcu_users
feels more natural.
> For now, in our project, we use the following approach:
>
> 1) get a reference to a valid task within RCU read-side with
> get_task_struct()
> 2) in the release function:
> 2.1) enter RCU read-side
> 2.2) if the task state is not TASK_DEAD: use put_task_struct()
> Note: In the case of a race with an exiting task it's OK to
> call put_task_struct(), because task_struct will be freed
> *after* the current RCU GP thanks to the task_struct:rcu_users
> mechanism.
> 2.3) otherwise if test_and_set(my_cb_flag): call_rcu(my_cb)
> Note1: With respect to the RCU CB API you should guarantee that
> your CB will be installed only once within a given RCU GP. For
> that purpose we use my_cb_flag.
> Note2: This code will race with the task_struct:rcu_users
> mechanism [delayed_put_task_struct()], but it's OK. Either the
> delayed_put_task_struct() or my_cb() can be the last to call
> final put_task_struct() after the current RCU GP.
I think this idea would work, but in order for us to do this, I believe
we'd have to add _another_ struct rcu_head to struct task_struct. If we
did that, I don't think there's any reason to not just separate them out
of the union where they live today as it's only like that for
space-saving reasons.
> 2.4) otherwise: call put_task_struct()
> Note: The my_cb() is already installed, so within the current
> RCU GP we can invoke put_task_struct() and the ref counter of
> the task_struct will not reach zero.
> 2.5) release the RCU read-side
> 3) The RCU CB my_cb() should set the my_cb_flag to False and call
> put_task_struct().
>
> If the release function is called within RCU read-side, the task_struct
> is guaranteed to remain valid until the end of the current RCU GP.
>
> Good luck,
> mY
next prev parent reply other threads:[~2022-12-05 15:03 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-20 5:10 [PATCH bpf-next v9 0/4] Support storing struct task_struct objects as kptrs David Vernet
2022-11-20 5:10 ` [PATCH bpf-next v9 1/4] bpf: Allow multiple modifiers in reg_type_str() prefix David Vernet
2022-11-20 5:10 ` [PATCH bpf-next v9 2/4] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs David Vernet
2022-11-20 17:28 ` Alexei Starovoitov
2022-11-20 17:39 ` David Vernet
2022-11-20 19:45 ` Kumar Kartikeya Dwivedi
2022-11-21 15:31 ` David Vernet
2022-11-21 16:06 ` Kumar Kartikeya Dwivedi
2022-11-20 5:10 ` [PATCH bpf-next v9 3/4] bpf: Add kfuncs for storing struct task_struct * as a kptr David Vernet
2022-12-05 10:11 ` Matus Jokay
2022-12-05 15:02 ` David Vernet [this message]
2022-11-20 5:10 ` [PATCH bpf-next v9 4/4] bpf/selftests: Add selftests for new task kfuncs David Vernet
2022-11-20 17:30 ` [PATCH bpf-next v9 0/4] Support storing struct task_struct objects as kptrs patchwork-bot+netdevbpf
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y44IITgHrhJf5fWJ@maniforge.lan \
--to=void@manifault.com \
--cc=andrii@kernel.org \
--cc=ast@kernel.org \
--cc=bpf@vger.kernel.org \
--cc=daniel@iogearbox.net \
--cc=ebiederm@xmission.com \
--cc=haoluo@google.com \
--cc=john.fastabend@gmail.com \
--cc=jolsa@kernel.org \
--cc=kernel-team@fb.com \
--cc=kpsingh@kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=martin.lau@linux.dev \
--cc=matus.jokay@stuba.sk \
--cc=memxor@gmail.com \
--cc=oleg@redhat.com \
--cc=roderik.ploszek@stuba.sk \
--cc=sdf@google.com \
--cc=song@kernel.org \
--cc=tj@kernel.org \
--cc=yhs@fb.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.