The Linux Kernel Mailing List
 help / color / mirror / Atom feed
From: Lorenzo Stoakes <ljs@kernel.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: "Edgecombe, Rick P" <rick.p.edgecombe@intel.com>,
	 "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
	 "Liam.Howlett@oracle.com" <Liam.Howlett@oracle.com>,
	"linux-mm@kvack.org" <linux-mm@kvack.org>,
	 "surenb@google.com" <surenb@google.com>,
	"vbabka@kernel.org" <vbabka@kernel.org>,
	 "shakeel.butt@linux.dev" <shakeel.butt@linux.dev>,
	"akpm@linux-foundation.org" <akpm@linux-foundation.org>
Subject: Re: [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path
Date: Fri, 8 May 2026 21:39:27 +0100	[thread overview]
Message-ID: <af5Iqr7oLukvdJVJ@lucifer> (raw)
In-Reply-To: <7a82a652-f861-4d72-8bd7-4e082af482f2@intel.com>

On Tue, May 05, 2026 at 09:39:09AM -0700, Dave Hansen wrote:
> On 5/4/26 16:15, Edgecombe, Rick P wrote:
> >
> > I guess the problem is the lock ordering. Not sure if there is any slow path
> > avoidance details that could make this splat a false positive. But how about
> > this simpler munmap() case:
> >
> > Shadow stack signal                          munmap()
> > -------------------                          --------
> > vma_start_read() (VM_SHADOW_STACK check)
> >                                              mmap_write_lock()
> > mmap_read_lock() (user fault) <- deadlock
> >                                              vma_start_write() <-deadlock
>
> It's a little more complicated than that in practice, but I think you're
> right.
>
> I'm not sure when this would happen in practice because the fault is
> actually on the VMA that's being held for read. So I think another
> writer would have had to sneak in there and zap the VMA.

Honestly I think any work around is just going to be more complicated than the
existing code, which sort of defeats the purpose of the series.

There's not really a way to speculate with a VMA seqnum because you'd have to be
able to observe its vma->vm_lock_seq and to do that you'd have to find it again
immediately afterwards and then you'd looked up twice, got the lock twice only
to confirm its the same damn thing :)

The issue is that a page fault on the same thread is always going to risk an
mmap read lock being taken (possibly due to I/O waiting and fault retry for
one). And faults/zaps are inherently racey and neither acquire the write lock so
the read lock doesn't preclude them.

And you can't realy disable page faults because you're potentially relying on
them to populate what you're touching...

Also there's some tricky stuff done on initial stack initialisation that can
cause a headache as well (when stack is set up), see relocate_vma_down() to make
life more painful.

So I think the existing code is simpler.

It doesn't mean it isn't still useful to move towards having VMA locks
everywhere though :) unless Suren or others can find a flaw with that...

>
> The funny thing is that the fault handler is really just trying to find
> the VMA. The thing causing the fault *has* the VMA. So it's as simple as
> just passing the VMA down into the fault handler, right? How hard could
> it be? ;)
>
> There are still games to play, but they all involve dropping locks and
> retrying, like:
>
> retry:
> 	vma = lock_vma_under_rcu()
> 	// muck with VMA
> 	pagefault_disable() // avoid deadlock
> 	ret = copy_from_user()
> 	pagefault_enable()
> 	vma_end_read();
>
> 	if (!ret)
> 		return SUCCESS;
>
> 	mmap_read_lock()
> 	vma = vma_lookup()
> 	mmap_read_unlock() // avoid deadlock before touching userspace
> 	// check for valid VMA to avoid looping when there is no VMA
> 	if (!vma)
> 		return -ERRNO;
>
> 	// uh oh, slow path, something faulted
> 	get_user_pages()??
> 	//or
> 	copy_from_user() without the VMA??
>
> 	goto retry;
>
>
> This also needs some very careful thought, but something like this
> should work, where we avoid fault handling (and lock taking) in the
> actual #PF and do it in a context where the VMA lock is held:
>
> 	vma = lock_vma_under_rcu();
> 	pagefault_disable() // avoid deadlock
>
> 	while (1) {
> 		ret = copy_from_user()
> 		if (!ret)
> 			break;
> 		handle_mm_fault(vma, address, FAULT_FLAG_VMA_LOCK...);
> 	};
>
> 	pagefault_enable()
> 	vma_end_read();
>
> That's effectively just short-circuiting the #PF code which does the:
>
>         vma = lock_vma_under_rcu(mm, address);
> 	...
>         fault = handle_mm_fault(vma, address, ... FAULT_FLAG_VMA_LOCK)
>
> sequence _itself_.

  reply	other threads:[~2026-05-08 20:39 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <20260429181954.F50224AE@davehans-spike.ostc.intel.com>
     [not found] ` <20260429182005.00BF70D8@davehans-spike.ostc.intel.com>
2026-05-04 23:15   ` [PATCH 6/6] x86/mm: Avoid mmap lock for shadow stack pop fast path Edgecombe, Rick P
2026-05-05 16:39     ` Dave Hansen
2026-05-08 20:39       ` Lorenzo Stoakes [this message]
     [not found] ` <20260429181955.0C443845@davehans-spike.ostc.intel.com>
2026-05-08 10:12   ` [PATCH 1/6] mm: Make per-VMA locks available universally David Hildenbrand (Arm)
2026-05-08 10:58     ` David Hildenbrand (Arm)
2026-05-08 16:55       ` Lorenzo Stoakes
2026-05-08 16:52 ` [PATCH 0/6] mm: Make per-VMA locks available in all builds Lorenzo Stoakes
2026-05-08 17:01   ` Dave Hansen
     [not found] ` <20260429181957.7511C256@davehans-spike.ostc.intel.com>
2026-05-08 17:06   ` [PATCH 2/6] binder: Make shrinker rely solely on per-VMA lock Lorenzo Stoakes
     [not found] ` <20260429181959.BC9DABC5@davehans-spike.ostc.intel.com>
2026-05-08 17:26   ` [PATCH 3/6] mm: Add RCU-based VMA lookup that waits for writers Lorenzo Stoakes
2026-05-08 20:15     ` Lorenzo Stoakes
     [not found] ` <20260429182000.93887DFB@davehans-spike.ostc.intel.com>
2026-05-08 17:29   ` [PATCH 4/6] binder: Remove mmap_lock fallback Lorenzo Stoakes
     [not found] ` <20260429182002.BB61C7BC@davehans-spike.ostc.intel.com>
2026-05-08 17:32   ` [PATCH 5/6] tcp: Remove mmap_lock fallback path Lorenzo Stoakes

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=af5Iqr7oLukvdJVJ@lucifer \
    --to=ljs@kernel.org \
    --cc=Liam.Howlett@oracle.com \
    --cc=akpm@linux-foundation.org \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=rick.p.edgecombe@intel.com \
    --cc=shakeel.butt@linux.dev \
    --cc=surenb@google.com \
    --cc=vbabka@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox