arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch

All of lore.kernel.org
 help / color / mirror / Atom feed

From: will.deacon@arm.com (Will Deacon)
To: linux-arm-kernel@lists.infradead.org
Subject: arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch
Date: Thu, 15 Feb 2018 14:22:39 +0000	[thread overview]
Message-ID: <20180215142239.GF16623@arm.com> (raw)
In-Reply-To: <254787533.21950.1518634424009.JavaMail.zimbra@efficios.com>

On Wed, Feb 14, 2018 at 06:53:44PM +0000, Mathieu Desnoyers wrote:
> ----- On Feb 14, 2018, at 11:51 AM, Mark Rutland mark.rutland at arm.com wrote:
> > On Wed, Feb 14, 2018 at 03:07:41PM +0000, Will Deacon wrote:
> >> If the exit()ing task had recently migrated from another CPU, then that
> >> CPU could concurrently run context_switch() and take this path:
> >> 
> >> 	if (!prev->mm) {
> >> 		prev->active_mm = NULL;
> >> 		rq->prev_mm = oldmm;
> >> 	}
> > 
> > IIUC, on the prior context_switch, next->mm == NULL, so we set
> > next->active_mm to prev->mm.
> > 
> > Then, in this context_switch we set oldmm = prev->active_mm (where prev
> > is next from the prior context switch).
> > 
> > ... right?
> > 
> >> which then means finish_task_switch will call mmdrop():
> >> 
> >> 	struct mm_struct *mm = rq->prev_mm;
> >> 	[...]
> >> 	if (mm) {
> >> 		membarrier_mm_sync_core_before_usermode(mm);
> >> 		mmdrop(mm);
> >> 	}
> > 
> > ... then here we use what was prev->active_mm in the most recent context
> > switch.
> > 
> > So AFAICT, we're never concurrently accessing a task_struct::mm field
> > here, only prev::{mm,active_mm} while prev is current...
> > 
> > [...]
> > 
> >> diff --git a/kernel/exit.c b/kernel/exit.c
> >> index 995453d9fb55..f91e8d56b03f 100644
> >> --- a/kernel/exit.c
> >> +++ b/kernel/exit.c
> >> @@ -534,8 +534,9 @@ static void exit_mm(void)
> >>         }
> >>         mmgrab(mm);
> >>         BUG_ON(mm != current->active_mm);
> >> -       /* more a memory barrier than a real lock */
> >>         task_lock(current);
> >> +       /* Ensure we've grabbed the mm before setting current->mm to NULL */
> >> +       smp_mb__after_spin_lock();
> >>         current->mm = NULL;
> > 
> > ... and thus I don't follow why we would need to order these with
> > anything more than a compiler barrier (if we're preemptible here).
> > 
> > What have I completely misunderstood? ;)
> 
> The compiler barrier would not change anything, because task_lock()
> already implies a compiler barrier (provided by the arch spin lock
> inline asm memory clobber). So compiler-wise, it cannot move the
> mmgrab(mm) after the store "current->mm = NULL".
> 
> However, given the scenario involves multiples CPUs (one doing exit_mm(),
> the other doing context switch), the actual order of perceived load/store
> can be shuffled. And AFAIU nothing prevents the CPU from ordering the
> atomic_inc() done by mmgrab(mm) _after_ the store to current->mm.

Mark and I have spent most of the morning looking at this and realised I
made a mistake in my original guesswork: prev can't migrate until half way
down finish_task_switch when on_cpu = 0, so the access of prev->mm in
context_switch can't race with exit_mm() for that task.

Furthermore, although the mmgrab() could in theory be reordered with
current->mm = NULL (and the ARMv8 architecture allows this too), it's
pretty unlikely with LL/SC atomics and the backwards branch, where the
CPU would have to pull off quite a few tricks for this to happen.

Instead, we've come up with a more plausible sequence that can in theory
happen on a single CPU:

<task foo calls exit()>

do_exit
	exit_mm
		mmgrab(mm);			// foo's mm has count +1
		BUG_ON(mm != current->active_mm);
		task_lock(current);
		current->mm = NULL;
		task_unlock(current);

<irq and ctxsw to kthread>

context_switch(prev=foo, next=kthread)
	mm = next->mm;
	oldmm = prev->active_mm;

	if (!mm) {				// True for kthread
		next->active_mm = oldmm;
		mmgrab(oldmm);			// foo's mm has count +2
	}

	if (!prev->mm) {			// True for foo
		rq->prev_mm = oldmm;
	}

	finish_task_switch
		mm = rq->prev_mm;
		if (mm) {			// True (foo's mm)
			mmdrop(mm);		// foo's mm has count +1
		}

	[...]

<ctxsw to task bar>

context_switch(prev=kthread, next=bar)
	mm = next->mm;
	oldmm = prev->active_mm;		// foo's mm!

	if (!prev->mm) {			// True for kthread
		rq->prev_mm = oldmm;
	}

	finish_task_switch
		mm = rq->prev_mm;
		if (mm) {			// True (foo's mm)
			mmdrop(mm);		// foo's mm has count +0
		}

	[...]

<ctxsw back to task foo>

context_switch(prev=bar, next=foo)
	mm = next->mm;
	oldmm = prev->active_mm;

	if (!mm) {				// True for foo
		next->active_mm = oldmm;	// This is bar's mm
		mmgrab(oldmm);			// bar's mm has count +1
	}


	[return back to exit_mm]

mmdrop(mm);					// foo's mm has count -1

At this point, we've got an imbalanced count on the mm and could free it
prematurely as seen in the KASAN log. A subsequent context-switch away
from foo would therefore result in a use-after-free.

Assuming others agree with this diagnosis, I'm not sure how to fix it.
It's basically not safe to set current->mm = NULL with preemption enabled.

Will

WARNING: multiple messages have this Message-ID (diff)

From: Will Deacon <will.deacon@arm.com>
To: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	linux-arm-kernel <linux-arm-kernel@lists.infradead.org>,
	Ingo Molnar <mingo@kernel.org>,
	Peter Zijlstra <peterz@infradead.org>
Subject: Re: arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch
Date: Thu, 15 Feb 2018 14:22:39 +0000	[thread overview]
Message-ID: <20180215142239.GF16623@arm.com> (raw)
In-Reply-To: <254787533.21950.1518634424009.JavaMail.zimbra@efficios.com>

On Wed, Feb 14, 2018 at 06:53:44PM +0000, Mathieu Desnoyers wrote:
> ----- On Feb 14, 2018, at 11:51 AM, Mark Rutland mark.rutland@arm.com wrote:
> > On Wed, Feb 14, 2018 at 03:07:41PM +0000, Will Deacon wrote:
> >> If the exit()ing task had recently migrated from another CPU, then that
> >> CPU could concurrently run context_switch() and take this path:
> >> 
> >> 	if (!prev->mm) {
> >> 		prev->active_mm = NULL;
> >> 		rq->prev_mm = oldmm;
> >> 	}
> > 
> > IIUC, on the prior context_switch, next->mm == NULL, so we set
> > next->active_mm to prev->mm.
> > 
> > Then, in this context_switch we set oldmm = prev->active_mm (where prev
> > is next from the prior context switch).
> > 
> > ... right?
> > 
> >> which then means finish_task_switch will call mmdrop():
> >> 
> >> 	struct mm_struct *mm = rq->prev_mm;
> >> 	[...]
> >> 	if (mm) {
> >> 		membarrier_mm_sync_core_before_usermode(mm);
> >> 		mmdrop(mm);
> >> 	}
> > 
> > ... then here we use what was prev->active_mm in the most recent context
> > switch.
> > 
> > So AFAICT, we're never concurrently accessing a task_struct::mm field
> > here, only prev::{mm,active_mm} while prev is current...
> > 
> > [...]
> > 
> >> diff --git a/kernel/exit.c b/kernel/exit.c
> >> index 995453d9fb55..f91e8d56b03f 100644
> >> --- a/kernel/exit.c
> >> +++ b/kernel/exit.c
> >> @@ -534,8 +534,9 @@ static void exit_mm(void)
> >>         }
> >>         mmgrab(mm);
> >>         BUG_ON(mm != current->active_mm);
> >> -       /* more a memory barrier than a real lock */
> >>         task_lock(current);
> >> +       /* Ensure we've grabbed the mm before setting current->mm to NULL */
> >> +       smp_mb__after_spin_lock();
> >>         current->mm = NULL;
> > 
> > ... and thus I don't follow why we would need to order these with
> > anything more than a compiler barrier (if we're preemptible here).
> > 
> > What have I completely misunderstood? ;)
> 
> The compiler barrier would not change anything, because task_lock()
> already implies a compiler barrier (provided by the arch spin lock
> inline asm memory clobber). So compiler-wise, it cannot move the
> mmgrab(mm) after the store "current->mm = NULL".
> 
> However, given the scenario involves multiples CPUs (one doing exit_mm(),
> the other doing context switch), the actual order of perceived load/store
> can be shuffled. And AFAIU nothing prevents the CPU from ordering the
> atomic_inc() done by mmgrab(mm) _after_ the store to current->mm.

Mark and I have spent most of the morning looking at this and realised I
made a mistake in my original guesswork: prev can't migrate until half way
down finish_task_switch when on_cpu = 0, so the access of prev->mm in
context_switch can't race with exit_mm() for that task.

Furthermore, although the mmgrab() could in theory be reordered with
current->mm = NULL (and the ARMv8 architecture allows this too), it's
pretty unlikely with LL/SC atomics and the backwards branch, where the
CPU would have to pull off quite a few tricks for this to happen.

Instead, we've come up with a more plausible sequence that can in theory
happen on a single CPU:

<task foo calls exit()>

do_exit
	exit_mm
		mmgrab(mm);			// foo's mm has count +1
		BUG_ON(mm != current->active_mm);
		task_lock(current);
		current->mm = NULL;
		task_unlock(current);

<irq and ctxsw to kthread>

context_switch(prev=foo, next=kthread)
	mm = next->mm;
	oldmm = prev->active_mm;

	if (!mm) {				// True for kthread
		next->active_mm = oldmm;
		mmgrab(oldmm);			// foo's mm has count +2
	}

	if (!prev->mm) {			// True for foo
		rq->prev_mm = oldmm;
	}

	finish_task_switch
		mm = rq->prev_mm;
		if (mm) {			// True (foo's mm)
			mmdrop(mm);		// foo's mm has count +1
		}

	[...]

<ctxsw to task bar>

context_switch(prev=kthread, next=bar)
	mm = next->mm;
	oldmm = prev->active_mm;		// foo's mm!

	if (!prev->mm) {			// True for kthread
		rq->prev_mm = oldmm;
	}

	finish_task_switch
		mm = rq->prev_mm;
		if (mm) {			// True (foo's mm)
			mmdrop(mm);		// foo's mm has count +0
		}

	[...]

<ctxsw back to task foo>

context_switch(prev=bar, next=foo)
	mm = next->mm;
	oldmm = prev->active_mm;

	if (!mm) {				// True for foo
		next->active_mm = oldmm;	// This is bar's mm
		mmgrab(oldmm);			// bar's mm has count +1
	}


	[return back to exit_mm]

mmdrop(mm);					// foo's mm has count -1

At this point, we've got an imbalanced count on the mm and could free it
prematurely as seen in the KASAN log. A subsequent context-switch away
from foo would therefore result in a use-after-free.

Assuming others agree with this diagnosis, I'm not sure how to fix it.
It's basically not safe to set current->mm = NULL with preemption enabled.

Will

next prev parent reply	other threads:[~2018-02-15 14:22 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-14 12:02 arm64/v4.16-rc1: KASAN: use-after-free Read in finish_task_switch Mark Rutland
2018-02-14 12:02 ` Mark Rutland
2018-02-14 15:07 ` Will Deacon
2018-02-14 15:07   ` Will Deacon
2018-02-14 16:51   ` Mark Rutland
2018-02-14 16:51     ` Mark Rutland
2018-02-14 18:53     ` Mathieu Desnoyers
2018-02-14 18:53       ` Mathieu Desnoyers
2018-02-15 11:49       ` Peter Zijlstra
2018-02-15 11:49         ` Peter Zijlstra
2018-02-15 13:13         ` Mathieu Desnoyers
2018-02-15 13:13           ` Mathieu Desnoyers
2018-02-15 14:22       ` Will Deacon [this message]
2018-02-15 14:22         ` Will Deacon
2018-02-15 15:33         ` Will Deacon
2018-02-15 15:33           ` Will Deacon
2018-02-15 16:47         ` Peter Zijlstra
2018-02-15 16:47           ` Peter Zijlstra
2018-02-15 18:21           ` Will Deacon
2018-02-15 18:21             ` Will Deacon
2018-02-15 22:08             ` Mathieu Desnoyers
2018-02-15 22:08               ` Mathieu Desnoyers
2018-02-16  0:02               ` Mathieu Desnoyers
2018-02-16  0:02                 ` Mathieu Desnoyers
2018-02-16  8:11                 ` Peter Zijlstra
2018-02-16  8:11                   ` Peter Zijlstra
2018-02-16 16:53               ` Mark Rutland
2018-02-16 16:53                 ` Mark Rutland
2018-02-16 17:17                 ` Mathieu Desnoyers
2018-02-16 17:17                   ` Mathieu Desnoyers
2018-02-16 18:33                   ` Mark Rutland
2018-02-16 18:33                     ` Mark Rutland
2018-02-19 11:26         ` Catalin Marinas
2018-02-19 11:26           ` Catalin Marinas

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180215142239.GF16623@arm.com \
    --to=will.deacon@arm.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.