From: Suren Baghdasaryan <surenb@google.com>
To: Tejun Heo <tj@kernel.org>
Cc: Christian Brauner <brauner@kernel.org>,
gregkh@linuxfoundation.org, peterz@infradead.org,
lujialin4@huawei.com, lizefan.x@bytedance.com,
hannes@cmpxchg.org, mingo@redhat.com, ebiggers@kernel.org,
oleg@redhat.com, akpm@linux-foundation.org,
viro@zeniv.linux.org.uk, juri.lelli@redhat.com,
vincent.guittot@linaro.org, dietmar.eggemann@arm.com,
rostedt@goodmis.org, bsegall@google.com, mgorman@suse.de,
bristot@redhat.com, vschneid@redhat.com,
linux-kernel@vger.kernel.org, cgroups@vger.kernel.org,
linux-fsdevel@vger.kernel.org, kernel-team@android.com
Subject: Re: [PATCH 1/2] kernfs: add kernfs_ops.free operation to free resources tied to the file
Date: Tue, 27 Jun 2023 20:09:46 -0700 [thread overview]
Message-ID: <CAJuCfpGoNbLOLm08LWKPOgn05+FB1GEqeMTUSJUZpRmDYQSjpA@mail.gmail.com> (raw)
In-Reply-To: <ZJuSzlHfbLj3OjvM@slm.duckdns.org>
On Tue, Jun 27, 2023 at 6:54 PM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Tue, Jun 27, 2023 at 02:58:08PM -0700, Suren Baghdasaryan wrote:
> > Ok in kernfs_generic_poll() we are using kernfs_open_node.poll
> > waitqueue head for polling and kernfs_open_node is freed from inside
> > kernfs_unlink_open_file() which is called from kernfs_fop_release().
> > So, it is destroyed only when the last fput() is done, unlike the
> > ops->release() operation which we are using for destroying PSI
> > trigger's waitqueue. So, it seems we still need an operation which
> > would indicate that the file is truly going away.
>
> If we want to stay consistent with how kernfs behaves w.r.t. severing, the
> right thing to do would be preventing any future polling at severing and
> waking up everyone currently waiting, which sounds fine from cgroup behavior
> POV too.
That's actually what we are currently doing for PSI triggers.
->release() is handled by cgroup_pressure_release() which signals the
waiters, waits for RCU grace period to pass (per
https://elixir.bootlin.com/linux/latest/source/include/linux/wait.h#L258)
and then releases all the trigger resources including the waitqueue
head. However as reported in
https://lore.kernel.org/all/20230613062306.101831-1-lujialin4@huawei.com
this does not save us from the synchronous polling case:
do_select
vfs_poll
cgroup_pressure_release
psi_trigger_destroy
wake_up_pollfree(&t->event_wait) -> unblocks vfs_poll
synchronize_rcu()
kfree(t) -> frees waitqueue head
poll_freewait()
-> uses waitqueue head
This happens because we release the resources associated with the file
while there are still file users (the file's refcount is non-zero).
And that happens because kernfs can call ->release() before the last
fput().
>
> Now, the challenge is designing an interface which is difficult to make
> mistake with. IOW, it'd be great if kernfs wraps poll call so that severing
> is implemented without kernfs users doing anything, or at least make it
> pretty obvious what the correct usage pattern is.
>
> > Christian's suggestion to rename current ops->release() operation into
> > ops->drain() (or ops->flush() per Matthew's request) and introduce a
> > "new" ops->release() which is called only when the last fput() is done
> > seems sane to me. Would everyone be happy with that approach?
>
> I'm not sure I'd go there. The contract is that once ->release() is called,
> the code backing that file can go away (e.g. rmmod'd). It really should
> behave just like the last put from kernfs users' POV.
I 100% agree with the above statement.
> For this specific fix,
> it's safe because we know the ops is always built into the kernel and won't
> go away but it'd be really bad if the interface says "this is a normal thing
> to do". We'd be calling into rmmod'd text pages in no time.
>
> So, I mean, even for temporary fix, we have to make it abundantly clear that
> this is not for usual usage and can only be used if the code backing the ops
> is built into the kernel and so on.
I think the root cause of this problem is that ->release() in kernfs
does not adhere to the common rule that ->release() is called only
when the file is going away and has no users left. Am I wrong?
Thanks,
Suren.
>
> Thanks.
>
> --
> tejun
next prev parent reply other threads:[~2023-06-28 3:10 UTC|newest]
Thread overview: 34+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-06-26 20:17 [PATCH 1/2] kernfs: add kernfs_ops.free operation to free resources tied to the file Suren Baghdasaryan
2023-06-26 20:17 ` [PATCH 2/2] sched/psi: tie psi trigger destruction with file's lifecycle Suren Baghdasaryan
2023-06-26 20:21 ` [PATCH 1/2] kernfs: add kernfs_ops.free operation to free resources tied to the file Suren Baghdasaryan
2023-06-26 20:31 ` Tejun Heo
2023-06-26 20:39 ` Suren Baghdasaryan
2023-06-27 8:24 ` Christian Brauner
2023-06-27 17:09 ` Suren Baghdasaryan
2023-06-27 17:30 ` Christian Brauner
2023-06-27 17:36 ` Suren Baghdasaryan
2023-06-27 18:42 ` Tejun Heo
2023-06-27 20:09 ` Suren Baghdasaryan
2023-06-27 21:43 ` Suren Baghdasaryan
2023-06-27 21:58 ` Suren Baghdasaryan
2023-06-28 1:54 ` Tejun Heo
2023-06-28 3:09 ` Suren Baghdasaryan [this message]
2023-06-28 7:26 ` Christian Brauner
2023-06-28 7:46 ` Suren Baghdasaryan
2023-06-28 8:41 ` Christian Brauner
2023-06-28 16:28 ` Suren Baghdasaryan
2023-06-28 17:35 ` Christian Brauner
2023-06-28 18:02 ` Tejun Heo
2023-06-28 18:18 ` Suren Baghdasaryan
2023-06-28 18:42 ` Greg KH
2023-06-28 20:12 ` Suren Baghdasaryan
2023-06-28 20:34 ` Tejun Heo
2023-06-28 21:50 ` Suren Baghdasaryan
2023-06-30 0:59 ` Suren Baghdasaryan
2023-06-30 8:21 ` Christian Brauner
2023-07-10 20:38 ` Tejun Heo
2023-06-28 17:58 ` Tejun Heo
2023-06-27 6:25 ` Greg KH
2023-06-27 17:03 ` Suren Baghdasaryan
2023-06-27 17:23 ` Christian Brauner
2023-06-27 17:36 ` Matthew Wilcox
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJuCfpGoNbLOLm08LWKPOgn05+FB1GEqeMTUSJUZpRmDYQSjpA@mail.gmail.com \
--to=surenb@google.com \
--cc=akpm@linux-foundation.org \
--cc=brauner@kernel.org \
--cc=bristot@redhat.com \
--cc=bsegall@google.com \
--cc=cgroups@vger.kernel.org \
--cc=dietmar.eggemann@arm.com \
--cc=ebiggers@kernel.org \
--cc=gregkh@linuxfoundation.org \
--cc=hannes@cmpxchg.org \
--cc=juri.lelli@redhat.com \
--cc=kernel-team@android.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=lizefan.x@bytedance.com \
--cc=lujialin4@huawei.com \
--cc=mgorman@suse.de \
--cc=mingo@redhat.com \
--cc=oleg@redhat.com \
--cc=peterz@infradead.org \
--cc=rostedt@goodmis.org \
--cc=tj@kernel.org \
--cc=vincent.guittot@linaro.org \
--cc=viro@zeniv.linux.org.uk \
--cc=vschneid@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).