linux-fsdevel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Eric Biggers <ebiggers@kernel.org>
To: Linus Torvalds <torvalds@linux-foundation.org>
Cc: Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Ingo Molnar <mingo@redhat.com>, Hillf Danton <hdanton@sina.com>,
	syzbot <syzbot+cdb5dd11c97cc532efad@syzkaller.appspotmail.com>,
	linux-fsdevel <linux-fsdevel@vger.kernel.org>,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>,
	syzkaller-bugs <syzkaller-bugs@googlegroups.com>,
	Linux-MM <linux-mm@kvack.org>
Subject: Re: psi_trigger_poll() is completely broken
Date: Thu, 6 Jan 2022 16:13:59 -0800	[thread overview]
Message-ID: <YdeFx9i/LaAC346s@sol.localdomain> (raw)
In-Reply-To: <CAHk-=whhcoeTOZB_de-Nh28Fy4iNTu2DXzKXEPOubzL36+ME=A@mail.gmail.com>

On Thu, Jan 06, 2022 at 02:59:36PM -0800, Linus Torvalds wrote:
> 
> So here's a COMPLETELY UNTESTED patch to try to fix the lifetime and locking.
> 
> The locking was completely broken, in that psi_trigger_replace()
> expected that the caller would hold some exclusive lock so that it
> would release the correct previous trigger. The cgroup code doesn't
> seem to have any such exclusion.
> 
> This (UNTESTED!) patch fixes that breakage by just using a cmpxchg loop.
> 
> And the lifetime was completely broken (and that's Eric's email)
> because psi_trigger_replace() would drop the refcount to the old
> trigger - assuming it got the right one - even though the old trigger
> could still have active waiters on the waitqueue due to poll() or
> select().
> 
> This (UNTESTED!) patch fixes _that_ breakage by making
> psi_trigger_replace() instead just put the previous trigger on the
> "stale_trigger" linked list, and never release it at all.
> 
> It now gets released by "psi_trigger_release()" instead, which walks
> the list at file release time. Doing "psi_trigger_replace(.., NULL)"
> is not valid any more.
> 
> And because the reference cannot go away, we now can throw away all
> the incorrect temporary kref_get/put games from psi_trigger_poll(),
> which didn't actually fix the race at all, only limited it to the poll
> waitqueue.
> 
> That also means we can remove the "synchronize_rcu()" from
> psi_trigger_destroy(), since that was trying to hide all the problems
> with the "take rcu lock and then do kref_get()" thing not having
> locking. The locking still doesn't exist, but since we don't release
> the old one when replacing it, the issue is moot.
> 
> NOTE NOTE NOTE! Not only is this patch entirely untested, there are
> optimizations you could do if there was some sane synchronization
> between psi_trigger_poll() and psi_trigger_replace(). I put comments
> about it in the code, but right now the code just assumes that
> replacing a trigger is fairly rare (and since it requires write
> permissions, it's not something random users can do).
> 
> I'm not proud of this patch, but I think it might fix the fundamental
> bugs in the code for now.
> 
> It's not lovely, it has room for improvement, and I wish we didn't
> need this kind of thing, but it looks superficially sane as a fix to
> me.
> 
> Comments?
> 
> And once again: this is UNTESTED. I've compiled-tested it, it looks
> kind of sane to me, but honestly, I don't know the code very well.
> 
> Also, I'm not super-happy with how that 'psi_disabled' static branch
> works.  If somebody switches it off after it has been on, that will
> also disable the freeing code, so now you'll be leaking memory.
> 
> I couldn't find it in myself to care.

I had to make the following changes to Linus's patch:

diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c
index 10430f75f21a..7d5afa89db44 100644
--- a/kernel/sched/psi.c
+++ b/kernel/sched/psi.c
@@ -1255,7 +1255,7 @@ void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *new)
 		struct psi_trigger *old = *trigger_ptr;
 
 		new->stale_trigger = old;
-		if (try_cmpxchg(trigger_ptr, old, new))
+		if (try_cmpxchg(trigger_ptr, &old, new))
 			break;
 	}
 
@@ -1270,7 +1270,7 @@ void psi_trigger_replace(void **trigger_ptr, struct psi_trigger *new)
 /* No locking needed for final release */
 void psi_trigger_release(void **trigger_ptr)
 {
-	struct psi_trigger *trigger;
+	struct psi_trigger *trigger = *trigger_ptr;
 
 	if (static_branch_likely(&psi_disabled))
 		return;


After that, the two reproducers I gave in
https://lore.kernel.org/r/YbQUSlq76Iv5L4cC@sol.localdomain (the ones at the end
of my mail, not the syzbot-generated ones which I didn't try) no longer crash
the kernel.

This is one way to fix the use-after-free, but the fact that it allows anyone
who can write to a /proc/pressure/* file to cause the kernel to allocate an
unbounded number of 'struct psi_trigger' structs is still really broken.

I think we really need an answer to Linus' question:

> What are the users? Can we make the rule for -EBUSY simply be that you
> can _install_ a trigger, but you can't replace an existing one (except
> with NULL, when you close it).

... since that would be a much better fix.  The example in
Documentation/accounting/psi.rst only does a single write; that case wouldn't be
broken if we made multiple writes not work.

- Eric

  reply	other threads:[~2022-01-07  0:14 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-16  9:23 [syzbot] KASAN: use-after-free Read in remove_wait_queue (3) syzbot
2021-12-10 22:42 ` syzbot
     [not found] ` <20211211015620.1793-1-hdanton@sina.com>
2021-12-11  3:00   ` Eric Biggers
2022-01-05 15:20     ` psi_trigger_poll() is completely broken Eric Biggers
2022-01-05 18:50       ` Linus Torvalds
2022-01-05 19:07         ` Linus Torvalds
2022-01-05 19:13           ` Linus Torvalds
2022-01-06 22:05             ` Tejun Heo
2022-01-06 22:59               ` Linus Torvalds
2022-01-07  0:13                 ` Eric Biggers [this message]
2022-01-07  3:02                   ` Linus Torvalds
2022-01-10 13:45             ` Johannes Weiner
2022-01-10 17:25               ` Suren Baghdasaryan
2022-01-10 17:42                 ` Linus Torvalds
2022-01-10 18:19                   ` Suren Baghdasaryan
2022-01-11  3:02                     ` Suren Baghdasaryan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YdeFx9i/LaAC346s@sol.localdomain \
    --to=ebiggers@kernel.org \
    --cc=hannes@cmpxchg.org \
    --cc=hdanton@sina.com \
    --cc=juri.lelli@redhat.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=lizefan.x@bytedance.com \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=syzbot+cdb5dd11c97cc532efad@syzkaller.appspotmail.com \
    --cc=syzkaller-bugs@googlegroups.com \
    --cc=tj@kernel.org \
    --cc=torvalds@linux-foundation.org \
    --cc=vincent.guittot@linaro.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).