public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Will Deacon <will.deacon@arm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch
Date: Thu, 15 Feb 2018 14:22:39 +0000	[thread overview]
Message-ID: <20180215142239.GF16623@arm.com> (raw)
In-Reply-To: <254787533.21950.1518634424009.JavaMail.zimbra@efficios.com>

On Wed, Feb 14, 2018 at 06:53:44PM +0000, Mathieu Desnoyers wrote:
> ----- On Feb 14, 2018, at 11:51 AM, Mark Rutland mark.rutland@arm.com wrote:
> > On Wed, Feb 14, 2018 at 03:07:41PM +0000, Will Deacon wrote:
> >> If the exit()ing task had recently migrated from another CPU, then that
> >> CPU could concurrently run context_switch() and take this path:
> >> 
> >> 	if (!prev->mm) {
> >> 		prev->active_mm = NULL;
> >> 		rq->prev_mm = oldmm;
> >> 	}
> > 
> > IIUC, on the prior context_switch, next->mm == NULL, so we set
> > next->active_mm to prev->mm.
> > 
> > Then, in this context_switch we set oldmm = prev->active_mm (where prev
> > is next from the prior context switch).
> > 
> > ... right?
> > 
> >> which then means finish_task_switch will call mmdrop():
> >> 
> >> 	struct mm_struct *mm = rq->prev_mm;
> >> 	[...]
> >> 	if (mm) {
> >> 		membarrier_mm_sync_core_before_usermode(mm);
> >> 		mmdrop(mm);
> >> 	}
> > 
> > ... then here we use what was prev->active_mm in the most recent context
> > switch.
> > 
> > So AFAICT, we're never concurrently accessing a task_struct::mm field
> > here, only prev::{mm,active_mm} while prev is current...
> > 
> > [...]
> > 
> >> diff --git a/kernel/exit.c b/kernel/exit.c
> >> index 995453d9fb55..f91e8d56b03f 100644
> >> --- a/kernel/exit.c
> >> +++ b/kernel/exit.c
> >> @@ -534,8 +534,9 @@ static void exit_mm(void)
> >>         }
> >>         mmgrab(mm);
> >>         BUG_ON(mm != current->active_mm);
> >> -       /* more a memory barrier than a real lock */
> >>         task_lock(current);
> >> +       /* Ensure we've grabbed the mm before setting current->mm to NULL */
> >> +       smp_mb__after_spin_lock();
> >>         current->mm = NULL;
> > 
> > ... and thus I don't follow why we would need to order these with
> > anything more than a compiler barrier (if we're preemptible here).
> > 
> > What have I completely misunderstood? ;)
> 
> The compiler barrier would not change anything, because task_lock()
> already implies a compiler barrier (provided by the arch spin lock
> inline asm memory clobber). So compiler-wise, it cannot move the
> mmgrab(mm) after the store "current->mm = NULL".
> 
> However, given the scenario involves multiples CPUs (one doing exit_mm(),
> the other doing context switch), the actual order of perceived load/store
> can be shuffled. And AFAIU nothing prevents the CPU from ordering the
> atomic_inc() done by mmgrab(mm) _after_ the store to current->mm.

Mark and I have spent most of the morning looking at this and realised I
made a mistake in my original guesswork: prev can't migrate until half way
down finish_task_switch when on_cpu = 0, so the access of prev->mm in
context_switch can't race with exit_mm() for that task.

Furthermore, although the mmgrab() could in theory be reordered with
current->mm = NULL (and the ARMv8 architecture allows this too), it's
pretty unlikely with LL/SC atomics and the backwards branch, where the
CPU would have to pull off quite a few tricks for this to happen.

Instead, we've come up with a more plausible sequence that can in theory
happen on a single CPU:

<task foo calls exit()>

do_exit
	exit_mm
		mmgrab(mm);			// foo's mm has count +1
		BUG_ON(mm != current->active_mm);
		task_lock(current);
		current->mm = NULL;
		task_unlock(current);

<irq and ctxsw to kthread>

context_switch(prev=foo, next=kthread)
	mm = next->mm;
	oldmm = prev->active_mm;

	if (!mm) {				// True for kthread
		next->active_mm = oldmm;
		mmgrab(oldmm);			// foo's mm has count +2
	}

	if (!prev->mm) {			// True for foo
		rq->prev_mm = oldmm;
	}

	finish_task_switch
		mm = rq->prev_mm;
		if (mm) {			// True (foo's mm)
			mmdrop(mm);		// foo's mm has count +1
		}

	[...]

<ctxsw to task bar>

context_switch(prev=kthread, next=bar)
	mm = next->mm;
	oldmm = prev->active_mm;		// foo's mm!

	if (!prev->mm) {			// True for kthread
		rq->prev_mm = oldmm;
	}

	finish_task_switch
		mm = rq->prev_mm;
		if (mm) {			// True (foo's mm)
			mmdrop(mm);		// foo's mm has count +0
		}

	[...]

<ctxsw back to task foo>

context_switch(prev=bar, next=foo)
	mm = next->mm;
	oldmm = prev->active_mm;

	if (!mm) {				// True for foo
		next->active_mm = oldmm;	// This is bar's mm
		mmgrab(oldmm);			// bar's mm has count +1
	}


	[return back to exit_mm]

mmdrop(mm);					// foo's mm has count -1

At this point, we've got an imbalanced count on the mm and could free it
prematurely as seen in the KASAN log. A subsequent context-switch away
from foo would therefore result in a use-after-free.

Assuming others agree with this diagnosis, I'm not sure how to fix it.
It's basically not safe to set current->mm = NULL with preemption enabled.

Will

  parent reply	other threads:[~2018-02-15 14:22 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-14 12:02 arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch Mark Rutland
2018-02-14 15:07 ` Will Deacon
2018-02-14 16:51   ` Mark Rutland
2018-02-14 18:53     ` Mathieu Desnoyers
2018-02-15 11:49       ` Peter Zijlstra
2018-02-15 13:13         ` Mathieu Desnoyers
2018-02-15 14:22       ` Will Deacon [this message]
2018-02-15 15:33         ` Will Deacon
2018-02-15 16:47         ` Peter Zijlstra
2018-02-15 18:21           ` Will Deacon
2018-02-15 22:08             ` Mathieu Desnoyers
2018-02-16  0:02               ` Mathieu Desnoyers
2018-02-16  8:11                 ` Peter Zijlstra
2018-02-16 16:53               ` Mark Rutland
2018-02-16 17:17                 ` Mathieu Desnoyers
2018-02-16 18:33                   ` Mark Rutland
2018-02-19 11:26         ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180215142239.GF16623@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.desnoyers@efficios.com \
    --cc=mingo@kernel.org \
    --cc=peterz@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox