public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Catalin Marinas <catalin.marinas@arm.com>
To: Mark Brown <broonie@kernel.org>
Cc: "Rick P. Edgecombe" <rick.p.edgecombe@intel.com>,
	Deepak Gupta <debug@rivosinc.com>,
	Szabolcs Nagy <Szabolcs.Nagy@arm.com>,
	"H.J. Lu" <hjl.tools@gmail.com>,
	Florian Weimer <fweimer@redhat.com>,
	Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Peter Zijlstra <peterz@infradead.org>,
	Juri Lelli <juri.lelli@redhat.com>,
	Vincent Guittot <vincent.guittot@linaro.org>,
	Dietmar Eggemann <dietmar.eggemann@arm.com>,
	Steven Rostedt <rostedt@goodmis.org>,
	Ben Segall <bsegall@google.com>, Mel Gorman <mgorman@suse.de>,
	Valentin Schneider <vschneid@redhat.com>,
	Christian Brauner <brauner@kernel.org>,
	Shuah Khan <shuah@kernel.org>,
	linux-kernel@vger.kernel.org, Will Deacon <will@kernel.org>,
	jannh@google.com, linux-kselftest@vger.kernel.org,
	linux-api@vger.kernel.org, Kees Cook <kees@kernel.org>
Subject: Re: [PATCH RFT v8 4/9] fork: Add shadow stack support to clone3()
Date: Tue, 13 Aug 2024 17:25:47 +0100	[thread overview]
Message-ID: <ZruJCyXDRNhw6U5A@arm.com> (raw)
In-Reply-To: <Zrag5A5K9pv1K9Uz@finisterre.sirena.org.uk>

On Sat, Aug 10, 2024 at 12:06:12AM +0100, Mark Brown wrote:
> On Fri, Aug 09, 2024 at 07:19:26PM +0100, Catalin Marinas wrote:
> > On Thu, Aug 08, 2024 at 09:15:25AM +0100, Mark Brown wrote:
> > > +	/* This should really be an atomic cmpxchg.  It is not. */
> > > +	if (access_remote_vm(mm, addr, &val, sizeof(val),
> > > +			     FOLL_FORCE) != sizeof(val))
> > > +		goto out;
> 
> > If we restrict the shadow stack creation only to the CLONE_VM case, we'd
> > not need the remote vm access, it's in the current mm context already.
> > More on this below.
> 
> The discussion in previous iterations was that it seemed better to allow
> even surprising use cases since it simplifies the analysis of what we
> have covered.  If the user has specified a shadow stack we just do what
> they asked for and let them worry about if it's useful.

Thanks for the summary of the past discussions, the patch makes more
sense now. I guess it's easier to follow a clone*() syscall where one
can set a new stack pointer even in the !CLONE_VM case. Just let it set
the shadow stack as well with the new ABI.

However, the x86 would be slightly inconsistent here between clone() and
clone3(). I guess it depends how you look at it. The classic clone()
syscall, if one doesn't pass CLONE_VM but does set new stack, there's no
new shadow stack allocated which I'd expect since it's a new stack.
Well, I doubt anyone cares about this scenario. Are there real cases of
!CLONE_VM but with a new stack?

> > > +	if (val != expected)
> > > +		goto out;
> 
> > I'm confused that we need to consume the token here. I could not find
> > the default shadow stack allocation doing this, only setting it via
> > create_rstor_token() (or I did not search enough). In the default case,
> > is the user consuming it? To me the only difference should been the
> > default allocation vs the one passed by the user via clone3(), with the
> > latter maybe requiring the user to set the token initially.
> 
> As discussed for a couple of previous versions if we don't have the
> token and userspace can specify any old shadow stack page as the shadow
> stack this allows clone3() to be used to overwrite the shadow stack of
> another thread, you can point to a shadow stack page which is currently
> in use and then run some code that causes shadow stack writes.  This
> could potentially then in turn be used as part of a bigger exploit
> chain, probably it's hard to get anything beyond just causing the other
> thread to fault but won't be impossible.
> 
> With a kernel allocated shadow stack this is not an issue since we are
> placing the shadow stack in new memory, userspace can't control where we
> place it so it can't overwrite an existing shadow stack.

IIUC, the kernel-allocated shadow stack will have the token always set
while the user-allocated one will be cleared. I was looking to
understand the inconsistency between these two cases in terms of the
final layout of the new shadow stack: one with the token, the other
without. I can see the need for checking but maybe start with requiring
it to be 0 and setting the token before returning, for consistency with
clone().

In the kernel-allocated shadow stack, is the token used for anything? I
can see it's used for signal delivery and return but I couldn't figure
out what it is used for in a thread's shadow stack.

Also, can one not use the clone3() to point to the clone()-allocated
shadow stack? Maybe that's unlikely as an app tends to stick to one
syscall flavour or the other.

> > > +		/*
> > > +		 * For CLONE_VFORK the child will share the parents
> > > +		 * shadow stack.  Make sure to clear the internal
> > > +		 * tracking of the thread shadow stack so the freeing
> > > +		 * logic run for child knows to leave it alone.
> > > +		 */
> > > +		if (clone_flags & CLONE_VFORK) {
> > > +			shstk->base = 0;
> > > +			shstk->size = 0;
> > > +			return 0;
> > > +		}
> 
> > I think we should leave the CLONE_VFORK check on its own independent of
> > the clone3() arguments. If one passes both CLONE_VFORK and specific
> > shadow stack address/size, they should be ignored (or maybe return an
> > error if you want to make it stricter).
> 
> This is existing logic from the current x86 code that's been reindented
> due to the addition of explicitly specified shadow stacks, it's not new
> behaviour.  It is needed to stop the child thinking it has the parent's
> shadow stack in the CLONE_VFORK case.

I figured that. But similar to the current !CLONE_VM behaviour where no
new shadow stack is allocated even if a new stack is passed to clone(),
I was thinking of something similar here for consistency: don't set up a
shadow stack in the CLONE_VFORK case or at least allow it only if a new
stack is being set up (if we extend this to clone(), it would be a small
ABI change).

> > > -	/*
> > > -	 * For !CLONE_VM the child will use a copy of the parents shadow
> > > -	 * stack.
> > > -	 */
> > > -	if (!(clone_flags & CLONE_VM))
> > > -		return 0;
> > > +		/*
> > > +		 * For !CLONE_VM the child will use a copy of the
> > > +		 * parents shadow stack.
> > > +		 */
> > > +		if (!(clone_flags & CLONE_VM))
> > > +			return 0;
> 
> > Is the !CLONE_VM case specific only to the default shadow stack
> > allocation? Sorry if this has been discussed already (or I completely
> > forgot) but I thought we'd only implement this for the thread creation
> > case. The typical fork() for a new process should inherit the parent's
> > layout, so applicable to the clone3() with the shadow stack arguments as
> > well (which should be ignored or maybe return an error with !CLONE_VM).
> 
> This is again all existing behaviour for the case where the user has not
> specified a shadow stack reindented, as mentioned above if the user has
> specified one explicitly then we just do what we were asked.  The
> existing behaviour is to only create a new shadow stack for the child in
> the CLONE_VM case and leave the child using the same shadow stack as the
> parent in the copied mm for !CLONE_VM.

I guess I was rather questioning the current choices than the new
clone3() ABI. But even for the new clone3() ABI, does it make sense to
set up a shadow stack if the current stack isn't changed? We'll end up
with a lot of possible combinations that will never get tested but
potentially become obscure ABI. Limiting the options to the sane choices
only helps with validation and unsurprising changes later on.

> > > @@ -2790,6 +2808,8 @@ pid_t kernel_clone(struct kernel_clone_args *args)
> > >  	 */
> > >  	trace_sched_process_fork(current, p);
> > >  
> > > +	shstk_post_fork(p, args);
> 
> > Do we need this post fork call? Can we not handle the setup via the
> > copy_thread() path in shstk_alloc_thread_stack()?
> 
> It looks like we do actually have the new mm in the process before we
> call copy_thread() so we could move things into there though we'd loose
> a small bit of factoring out of the error handling (at one point I had
> more code factored out but right now it's quite small, looking again we
> could also factor out the get_task_mm()/mput()).  ISTR having the new
> process' mm was the biggest reason for this initially but looking again
> I'm not sure why that was.  It does still feel like even the small
> amount that's factored out currently is useful though, a bit less
> duplication in the architecture code which feels welcome here.

I think you can probably keep this. My comment was based on the
assumption that we only support the CLONE_VM case where we wouldn't need
the access_remote_vm(), just some direct write similar to
write_user_shstk_64().

I still think we should have limited this ABI to the CLONE_VM and
!CLONE_VFORK cases but I don't have a strong view if the consensus was
to allow it for classic fork() and vfork() like uses (I just think they
won't be used).

-- 
Catalin

  reply	other threads:[~2024-08-13 16:25 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-08  8:15 [PATCH RFT v8 0/9] fork: Support shadow stacks in clone3() Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 1/9] Documentation: userspace-api: Add shadow stack API documentation Mark Brown
2024-08-14 10:40   ` Catalin Marinas
2024-08-08  8:15 ` [PATCH RFT v8 2/9] selftests: Provide helper header for shadow stack testing Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 3/9] mm: Introduce ARCH_HAS_USER_SHADOW_STACK Mark Brown
2024-08-14 10:41   ` Catalin Marinas
2024-08-08  8:15 ` [PATCH RFT v8 4/9] fork: Add shadow stack support to clone3() Mark Brown
2024-08-09 18:19   ` Catalin Marinas
2024-08-09 23:06     ` Mark Brown
2024-08-13 16:25       ` Catalin Marinas [this message]
2024-08-13 18:58         ` Mark Brown
2024-08-14  9:38           ` Catalin Marinas
2024-08-14 13:20             ` Mark Brown
2024-08-15  0:18   ` Edgecombe, Rick P
2024-08-15 14:24     ` Mark Brown
2024-08-16  8:44     ` Catalin Marinas
2024-08-16 10:51       ` Mark Brown
2024-08-16 15:29         ` Catalin Marinas
2024-08-16 15:46           ` Mark Brown
2024-08-16 14:52       ` Edgecombe, Rick P
2024-08-16 15:30         ` Mark Brown
2024-08-16 15:38         ` Catalin Marinas
2024-08-16 17:06           ` Mark Brown
2024-08-16 17:08     ` Jann Horn
2024-08-16 17:17       ` Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 5/9] selftests/clone3: Remove redundant flushes of output streams Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 6/9] selftests/clone3: Factor more of main loop into test_clone3() Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 7/9] selftests/clone3: Explicitly handle child exits due to signals Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 8/9] selftests/clone3: Allow tests to flag if -E2BIG is a valid error code Mark Brown
2024-08-08  8:15 ` [PATCH RFT v8 9/9] selftests/clone3: Test shadow stack support Mark Brown
2024-08-08 17:54 ` [PATCH RFT v8 0/9] fork: Support shadow stacks in clone3() Kees Cook
2024-08-15  0:19   ` Edgecombe, Rick P
2024-08-16 15:52 ` Jann Horn
2024-08-16 16:19   ` Mark Brown

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZruJCyXDRNhw6U5A@arm.com \
    --to=catalin.marinas@arm.com \
    --cc=Szabolcs.Nagy@arm.com \
    --cc=bp@alien8.de \
    --cc=brauner@kernel.org \
    --cc=broonie@kernel.org \
    --cc=bsegall@google.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=debug@rivosinc.com \
    --cc=dietmar.eggemann@arm.com \
    --cc=fweimer@redhat.com \
    --cc=hjl.tools@gmail.com \
    --cc=hpa@zytor.com \
    --cc=jannh@google.com \
    --cc=juri.lelli@redhat.com \
    --cc=kees@kernel.org \
    --cc=linux-api@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-kselftest@vger.kernel.org \
    --cc=mgorman@suse.de \
    --cc=mingo@redhat.com \
    --cc=peterz@infradead.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=rostedt@goodmis.org \
    --cc=shuah@kernel.org \
    --cc=tglx@linutronix.de \
    --cc=vincent.guittot@linaro.org \
    --cc=vschneid@redhat.com \
    --cc=will@kernel.org \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox