From: Eric Biggers <ebiggers@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
Johannes Weiner <hannes@cmpxchg.org>,
Peter Zijlstra <peterz@infradead.org>,
Juri Lelli <juri.lelli@redhat.com>,
Vincent Guittot <vincent.guittot@linaro.org>,
Ingo Molnar <mingo@redhat.com>, Hillf Danton <hdanton@sina.com>,
syzbot <syzbot+cdb5dd11c97cc532efad@syzkaller.appspotmail.com>,
linux-fsdevel <linux-fsdevel@vger.kernel.org>,
Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
Linux-MM <linux-mm@kvack.org>
Subject: Re: psi_trigger_poll() is completely broken
Date: Thu, 6 Jan 2022 16:13:59 -0800 [thread overview]
Message-ID: <YdeFx9i/LaAC346s@sol.localdomain> (raw)
In-Reply-To: <CAHk-=whhcoeTOZB_de-Nh28Fy4iNTu2DXzKXEPOubzL36+ME=A@mail.gmail.com>
On Thu, Jan 06, 2022 at 02:59:36PM -0800, Linus Torvalds wrote:
>
> So here's a COMPLETELY UNTESTED patch to try to fix the lifetime and locking.
>
> The locking was completely broken, in that psi_trigger_replace()
> expected that the caller would hold some exclusive lock so that it
> would release the correct previous trigger. The cgroup code doesn't
> seem to have any such exclusion.
>
> This (UNTESTED!) patch fixes that breakage by just using a cmpxchg loop.
>
> And the lifetime was completely broken (and that's Eric's email)
> because psi_trigger_replace() would drop the refcount to the old
> trigger - assuming it got the right one - even though the old trigger
> could still have active waiters on the waitqueue due to poll() or
> select().
>
> This (UNTESTED!) patch fixes _that_ breakage by making
> psi_trigger_replace() instead just put the previous trigger on the
> "stale_trigger" linked list, and never release it at all.
>
> It now gets released by "psi_trigger_release()" instead, which walks
> the list at file release time. Doing "psi_trigger_replace(.., NULL)"
> is not valid any more.
>
> And because the reference cannot go away, we now can throw away all
> the incorrect temporary kref_get/put games from psi_trigger_poll(),
> which didn't actually fix the race at all, only limited it to the poll
> waitqueue.
>
> That also means we can remove the "synchronize_rcu()" from
> psi_trigger_destroy(), since that was trying to hide all the problems
> with the "take rcu lock and then do kref_get()" thing not having
> locking. The locking still doesn't exist, but since we don't release
> the old one when replacing it, the issue is moot.
>
> NOTE NOTE NOTE! Not only is this patch entirely untested, there are
> optimizations you could do if there was some sane synchronization
> between psi_trigger_poll() and psi_trigger_replace(). I put comments
> about it in the code, but right now the code just assumes that
> replacing a trigger is fairly rare (and since it requires write
> permissions, it's not something random users can do).
>
> I'm not proud of this patch, but I think it might fix the fundamental
> bugs in the code for now.
>
> It's not lovely, it has room for improvement, and I wish we didn't
> need this kind of thing, but it looks superficially sane as a fix to
> me.
>
> Comments?
>
> And once again: this is UNTESTED. I've compiled-tested it, it looks
> kind of sane to me, but honestly, I don't know the code very well.
>
> Also, I'm not super-happy with how that 'psi_disabled' static branch
> works. If somebody switches it off after it has been on, that will
> also disable the freeing code, so now you'll be leaking memory.
>
> I couldn't find it in myself to care.
I had to make the following changes to Linus's patch:
diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 10430f75f21a..7d5afa89db44 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -1255,7 +1255,7 @@ void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *new)
struct psi_trigger *old = *trigger_ptr;
new->stale_trigger = old;
- if (try_cmpxchg(trigger_ptr, old, new))
+ if (try_cmpxchg(trigger_ptr, &old, new))
break;
}
@@ -1270,7 +1270,7 @@ void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *new)
/* No locking needed for final release */
void psi_trigger_release(void **trigger_ptr)
{
- struct psi_trigger *trigger;
+ struct psi_trigger *trigger = *trigger_ptr;
if (static_branch_likely(&psi_disabled))
return;
After that, the two reproducers I gave in
https://lore.kernel.org/r/YbQUSlq76Iv5L4cC@sol.localdomain (the ones at the end
of my mail, not the syzbot-generated ones which I didn't try) no longer crash
the kernel.
This is one way to fix the use-after-free, but the fact that it allows anyone
who can write to a /proc/pressure/* file to cause the kernel to allocate an
unbounded number of 'struct psi_trigger' structs is still really broken.
I think we really need an answer to Linus' question:
> What are the users? Can we make the rule for -EBUSY simply be that you
> can _install_ a trigger, but you can't replace an existing one (except
> with NULL, when you close it).
... since that would be a much better fix. The example in
Documentation/accounting/psi.rst only does a single write; that case wouldn't be
broken if we made multiple writes not work.
- Eric
next prev parent reply other threads:[~2022-01-07 0:14 UTC|newest]
Thread overview: 16+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-16 9:23 [syzbot] KASAN: use-after-free Read in remove_wait_queue (3) syzbot
2021-12-10 22:42 ` syzbot
[not found] ` <20211211015620.1793-1-hdanton@sina.com>
2021-12-11 3:00 ` Eric Biggers
2022-01-05 15:20 ` psi_trigger_poll() is completely broken Eric Biggers
2022-01-05 18:50 ` Linus Torvalds
2022-01-05 19:07 ` Linus Torvalds
2022-01-05 19:13 ` Linus Torvalds
2022-01-06 22:05 ` Tejun Heo
2022-01-06 22:59 ` Linus Torvalds
2022-01-07 0:13 ` Eric Biggers [this message]
2022-01-07 3:02 ` Linus Torvalds
2022-01-10 13:45 ` Johannes Weiner
2022-01-10 17:25 ` Suren Baghdasaryan
2022-01-10 17:42 ` Linus Torvalds
2022-01-10 18:19 ` Suren Baghdasaryan
2022-01-11 3:02 ` Suren Baghdasaryan
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=YdeFx9i/LaAC346s@sol.localdomain \
--to=ebiggers@kernel.org \
--cc=hannes@cmpxchg.org \
--cc=hdanton@sina.com \
--cc=juri.lelli@redhat.com \
--cc=linux-fsdevel@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mm@kvack.org \
--cc=lizefan.x@bytedance.com \
--cc=mingo@redhat.com \
--cc=peterz@infradead.org \
--cc=syzbot+cdb5dd11c97cc532efad@syzkaller.appspotmail.com \
--cc=syzkaller-bugs@googlegroups.com \
--cc=tj@kernel.org \
--cc=torvalds@linux-foundation.org \
--cc=vincent.guittot@linaro.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.