Re: [PATCH RFT v8 4/9] fork: Add shadow stack support to clone3()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Catalin Marinas <catalin.marinas@arm.com>
To: Mark Brown <broonie@kernel.org>
Cc: "Rick P. Edgecombe" <rick.p.edgecombe@intel.com>,
	Deepak Gupta <debug@rivosinc.com>,
	Szabolcs Nagy <Szabolcs.Nagy@arm.com>,
	"H.J. Lu" <hjl.tools@gmail.com>,
	Florian Weimer <fweimer@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Christian Brauner <brauner@kernel.org>,
	Shuah Khan <shuah@kernel.org>,
	linux-kernel@vger.kernel.org, Will Deacon <will@kernel.org>,
	jannh@google.com, linux-kselftest@vger.kernel.org,
	linux-api@vger.kernel.org, Kees Cook <kees@kernel.org>
Subject: Re: [PATCH RFT v8 4/9] fork: Add shadow stack support to clone3()
Date: Tue, 13 Aug 2024 17:25:47 +0100	[thread overview]
Message-ID: <ZruJCyXDRNhw6U5A@arm.com> (raw)
In-Reply-To: <Zrag5A5K9pv1K9Uz@finisterre.sirena.org.uk>

On Sat, Aug 10, 2024 at 12:06:12AM +0100, Mark Brown wrote:
> On Fri, Aug 09, 2024 at 07:19:26PM +0100, Catalin Marinas wrote:
> > On Thu, Aug 08, 2024 at 09:15:25AM +0100, Mark Brown wrote:
> > > +	/* This should really be an atomic cmpxchg.  It is not. */
> > > +	if (access_remote_vm(mm, addr, &val, sizeof(val),
> > > +			     FOLL_FORCE) != sizeof(val))
> > > +		goto out;
> 
> > If we restrict the shadow stack creation only to the CLONE_VM case, we'd
> > not need the remote vm access, it's in the current mm context already.
> > More on this below.
> 
> The discussion in previous iterations was that it seemed better to allow
> even surprising use cases since it simplifies the analysis of what we
> have covered.  If the user has specified a shadow stack we just do what
> they asked for and let them worry about if it's useful.

Thanks for the summary of the past discussions, the patch makes more
sense now. I guess it's easier to follow a clone*() syscall where one
can set a new stack pointer even in the !CLONE_VM case. Just let it set
the shadow stack as well with the new ABI.

However, the x86 would be slightly inconsistent here between clone() and
clone3(). I guess it depends how you look at it. The classic clone()
syscall, if one doesn't pass CLONE_VM but does set new stack, there's no
new shadow stack allocated which I'd expect since it's a new stack.
Well, I doubt anyone cares about this scenario. Are there real cases of
!CLONE_VM but with a new stack?

> > > +	if (val != expected)
> > > +		goto out;
> 
> > I'm confused that we need to consume the token here. I could not find
> > the default shadow stack allocation doing this, only setting it via
> > create_rstor_token() (or I did not search enough). In the default case,
> > is the user consuming it? To me the only difference should been the
> > default allocation vs the one passed by the user via clone3(), with the
> > latter maybe requiring the user to set the token initially.
> 
> As discussed for a couple of previous versions if we don't have the
> token and userspace can specify any old shadow stack page as the shadow
> stack this allows clone3() to be used to overwrite the shadow stack of
> another thread, you can point to a shadow stack page which is currently
> in use and then run some code that causes shadow stack writes.  This
> could potentially then in turn be used as part of a bigger exploit
> chain, probably it's hard to get anything beyond just causing the other
> thread to fault but won't be impossible.
> 
> With a kernel allocated shadow stack this is not an issue since we are
> placing the shadow stack in new memory, userspace can't control where we
> place it so it can't overwrite an existing shadow stack.

IIUC, the kernel-allocated shadow stack will have the token always set
while the user-allocated one will be cleared. I was looking to
understand the inconsistency between these two cases in terms of the
final layout of the new shadow stack: one with the token, the other
without. I can see the need for checking but maybe start with requiring
it to be 0 and setting the token before returning, for consistency with
clone().

In the kernel-allocated shadow stack, is the token used for anything? I
can see it's used for signal delivery and return but I couldn't figure
out what it is used for in a thread's shadow stack.

Also, can one not use the clone3() to point to the clone()-allocated
shadow stack? Maybe that's unlikely as an app tends to stick to one
syscall flavour or the other.

> > > +		/*
> > > +		 * For CLONE_VFORK the child will share the parents
> > > +		 * shadow stack.  Make sure to clear the internal
> > > +		 * tracking of the thread shadow stack so the freeing
> > > +		 * logic run for child knows to leave it alone.
> > > +		 */
> > > +		if (clone_flags & CLONE_VFORK) {
> > > +			shstk->base = 0;
> > > +			shstk->size = 0;
> > > +			return 0;
> > > +		}
> 
> > I think we should leave the CLONE_VFORK check on its own independent of
> > the clone3() arguments. If one passes both CLONE_VFORK and specific
> > shadow stack address/size, they should be ignored (or maybe return an
> > error if you want to make it stricter).
> 
> This is existing logic from the current x86 code that's been reindented
> due to the addition of explicitly specified shadow stacks, it's not new
> behaviour.  It is needed to stop the child thinking it has the parent's
> shadow stack in the CLONE_VFORK case.

I figured that. But similar to the current !CLONE_VM behaviour where no
new shadow stack is allocated even if a new stack is passed to clone(),
I was thinking of something similar here for consistency: don't set up a
shadow stack in the CLONE_VFORK case or at least allow it only if a new
stack is being set up (if we extend this to clone(), it would be a small
ABI change).

> > > -	/*
> > > -	 * For !CLONE_VM the child will use a copy of the parents shadow
> > > -	 * stack.
> > > -	 */
> > > -	if (!(clone_flags & CLONE_VM))
> > > -		return 0;
> > > +		/*
> > > +		 * For !CLONE_VM the child will use a copy of the
> > > +		 * parents shadow stack.
> > > +		 */
> > > +		if (!(clone_flags & CLONE_VM))
> > > +			return 0;
> 
> > Is the !CLONE_VM case specific only to the default shadow stack
> > allocation? Sorry if this has been discussed already (or I completely
> > forgot) but I thought we'd only implement this for the thread creation
> > case. The typical fork() for a new process should inherit the parent's
> > layout, so applicable to the clone3() with the shadow stack arguments as
> > well (which should be ignored or maybe return an error with !CLONE_VM).
> 
> This is again all existing behaviour for the case where the user has not
> specified a shadow stack reindented, as mentioned above if the user has
> specified one explicitly then we just do what we were asked.  The
> existing behaviour is to only create a new shadow stack for the child in
> the CLONE_VM case and leave the child using the same shadow stack as the
> parent in the copied mm for !CLONE_VM.

I guess I was rather questioning the current choices than the new
clone3() ABI. But even for the new clone3() ABI, does it make sense to
set up a shadow stack if the current stack isn't changed? We'll end up
with a lot of possible combinations that will never get tested but
potentially become obscure ABI. Limiting the options to the sane choices
only helps with validation and unsurprising changes later on.

> > > @@ -2790,6 +2808,8 @@ pid_t kernel_clone(struct kernel_clone_args *args)
> > >  	 */
> > >  	trace_sched_process_fork(current, p);
> > >  
> > > +	shstk_post_fork(p, args);
> 
> > Do we need this post fork call? Can we not handle the setup via the
> > copy_thread() path in shstk_alloc_thread_stack()?
> 
> It looks like we do actually have the new mm in the process before we
> call copy_thread() so we could move things into there though we'd loose
> a small bit of factoring out of the error handling (at one point I had
> more code factored out but right now it's quite small, looking again we
> could also factor out the get_task_mm()/mput()).  ISTR having the new
> process' mm was the biggest reason for this initially but looking again
> I'm not sure why that was.  It does still feel like even the small
> amount that's factored out currently is useful though, a bit less
> duplication in the architecture code which feels welcome here.

I think you can probably keep this. My comment was based on the
assumption that we only support the CLONE_VM case where we wouldn't need
the access_remote_vm(), just some direct write similar to
write_user_shstk_64().

I still think we should have limited this ABI to the CLONE_VM and
!CLONE_VFORK cases but I don't have a strong view if the consensus was
to allow it for classic fork() and vfork() like uses (I just think they
won't be used).

-- 
Catalin

next prev parent reply	other threads:[~2024-08-13 16:25 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-08  8:15 [PATCH RFT v8 0/9] fork: Support shadow stacks in clone3() Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 1/9] Documentation: userspace-api: Add shadow stack API documentation Mark Brown
2024-08-14 10:40   ` Catalin Marinas
2024-08-08  8:15 ` [PATCH RFT v8 2/9] selftests: Provide helper header for shadow stack testing Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 3/9] mm: Introduce ARCH_HAS_USER_SHADOW_STACK Mark Brown
2024-08-14 10:41   ` Catalin Marinas
2024-08-08  8:15 ` [PATCH RFT v8 4/9] fork: Add shadow stack support to clone3() Mark Brown
2024-08-09 18:19   ` Catalin Marinas
2024-08-09 23:06     ` Mark Brown
2024-08-13 16:25       ` Catalin Marinas [this message]
2024-08-13 18:58         ` Mark Brown
2024-08-14  9:38           ` Catalin Marinas
2024-08-14 13:20             ` Mark Brown
2024-08-15  0:18   ` Edgecombe, Rick P
2024-08-15 14:24     ` Mark Brown
2024-08-16  8:44     ` Catalin Marinas
2024-08-16 10:51       ` Mark Brown
2024-08-16 15:29         ` Catalin Marinas
2024-08-16 15:46           ` Mark Brown
2024-08-16 14:52       ` Edgecombe, Rick P
2024-08-16 15:30         ` Mark Brown
2024-08-16 15:38         ` Catalin Marinas
2024-08-16 17:06           ` Mark Brown
2024-08-16 17:08     ` Jann Horn
2024-08-16 17:17       ` Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 5/9] selftests/clone3: Remove redundant flushes of output streams Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 6/9] selftests/clone3: Factor more of main loop into test_clone3() Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 7/9] selftests/clone3: Explicitly handle child exits due to signals Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 8/9] selftests/clone3: Allow tests to flag if -E2BIG is a valid error code Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 9/9] selftests/clone3: Test shadow stack support Mark Brown
2024-08-08 17:54 ` [PATCH RFT v8 0/9] fork: Support shadow stacks in clone3() Kees Cook
2024-08-15  0:19   ` Edgecombe, Rick P
2024-08-16 15:52 ` Jann Horn
2024-08-16 16:19   ` Mark Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZruJCyXDRNhw6U5A@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=Szabolcs.Nagy@arm.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=broonie@kernel.org \
    --cc=bsegall@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=debug@rivosinc.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fweimer@redhat.com \
    --cc=hjl.tools@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kees@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.