Re: [PATCH v3 2/2] x86/sgx: Resolve EREMOVE page vs EAUG page data race

Intel SGX development
 help / color / mirror / Atom feed

From: Dmitrii Kuvaiskii <dmitrii.kuvaiskii@intel.com>
To: dave.hansen@intel.com
Cc: dave.hansen@linux.intel.com, dmitrii.kuvaiskii@intel.com,
	haitao.huang@linux.intel.com, jarkko@kernel.org,
	kai.huang@intel.com, kailun.qin@intel.com,
	linux-kernel@vger.kernel.org, linux-sgx@vger.kernel.org,
	mona.vij@intel.com, reinette.chatre@intel.com,
	stable@vger.kernel.org
Subject: Re: [PATCH v3 2/2] x86/sgx: Resolve EREMOVE page vs EAUG page data race
Date: Fri,  7 Jun 2024 10:21:46 -0700	[thread overview]
Message-ID: <20240607172146.536993-1-dmitrii.kuvaiskii@intel.com> (raw)
In-Reply-To: <01bb6519-680e-45bf-b8bd-34763658aa17@intel.com>

On Tue, May 28, 2024 at 09:23:13AM -0700, Dave Hansen wrote:
> On 5/17/24 04:06, Dmitrii Kuvaiskii wrote:
> ...
>
> First, why is SGX so special here?  How is the SGX problem different
> than what the core mm code does?

Here is my understanding why SGX is so special and why I have to introduce
a new bit SGX_ENCL_PAGE_BEING_REMOVED.

In SGX's removal of the enclave page, two operations must happen
atomically: the PTE entry must be removed and the page must be EREMOVE'd.

Generally, to guarantee atomicity, encl->lock is acquired. Ideally, if
this encl->lock could be acquired at the beginning of
sgx_encl_remove_pages() and be released at the very end of this function,
there would be no EREMOVE page vs EAUG page data race, and my bug fix
(with SGX_ENCL_PAGE_BEING_REMOVED bit) wouldn't be needed.

However, the current implementation of sgx_encl_remove_pages() has to
release encl->lock before removing the PTE entry. Releasing the lock is
required because the function that removes the PTE entry --
sgx_zap_enclave_ptes() -- acquires another, enclave-MM lock:
mmap_read_lock(encl_mm->mm).

The two locks must be taken in this order:
1. mmap_read_lock(encl_mm->mm)
2. mutex_lock(&encl->lock)

This lock order is apparent from e.g. sgx_encl_add_page(). This order also
seems to make intuitive sense: VMA callbacks are called with the MM lock
being held, so the MM lock should be the first in lock order.

So, if sgx_encl_remove_pages() would _not_ release encl->lock before
calling sgx_zap_enclave_ptes(), this would violate the lock order and
might lead to deadlocks. At the same time, releasing encl->lock in the
middle of the two-operations flow leads to a data race that I found in
this patch series.

Quick summary:
- Removing the enclave page requires two operations: removing the PTE and
  performing EREMOVE.
- The complete flow of removing the enclave page cannot be protected by a
  single encl->lock, because it would violate the lock order and would
  lead to deadlocks.
- The current upstream implementation thus breaks the flow into two
  critical sections, releasing encl->lock before sgx_zap_enclave_ptes()
  and re-acquiring this lock afterwards. This leads to a data race.
- My patch restores "atomicity" of the flow by introducing a new flag
  SGX_ENCL_PAGE_BEING_REMOVED.

> > --- a/arch/x86/kernel/cpu/sgx/encl.h
> > +++ b/arch/x86/kernel/cpu/sgx/encl.h
> > @@ -25,6 +25,9 @@
> >  /* 'desc' bit marking that the page is being reclaimed. */
> >  #define SGX_ENCL_PAGE_BEING_RECLAIMED  BIT(3)
> >
> > +/* 'desc' bit marking that the page is being removed. */
> > +#define SGX_ENCL_PAGE_BEING_REMOVED    BIT(2)
>
> Second, convince me that this _needs_ a new bit.  Why can't we just have
> a bit that effectively means "return EBUSY if you see this bit when
> handling a fault".

As Haitao mentioned in his reply, the bit SGX_ENCL_PAGE_BEING_RECLAIMED is
also used in reclaimer_writing_to_pcmd(). If we would re-use this bit to
mark a page being removed, reclaimer_writing_to_pcmd() would incorrectly
return 1, meaning that the reclaimer is about to write to the PCMD page,
which is not true.

> >  struct sgx_encl_page {
> >     unsigned long desc;
> >     unsigned long vm_max_prot_bits:8;
> > diff --git a/arch/x86/kernel/cpu/sgx/ioctl.c b/arch/x86/kernel/cpu/sgx/ioctl.c
> > index 5d390df21440..de59219ae794 100644
> > --- a/arch/x86/kernel/cpu/sgx/ioctl.c
> > +++ b/arch/x86/kernel/cpu/sgx/ioctl.c
> > @@ -1142,6 +1142,7 @@ static long sgx_encl_remove_pages(struct sgx_encl *encl,
> >          * Do not keep encl->lock because of dependency on
> >          * mmap_lock acquired in sgx_zap_enclave_ptes().
> >          */
> > +       entry->desc |= SGX_ENCL_PAGE_BEING_REMOVED;
>
> This also needs a comment, no matter what.

Ok, I will write something along the lines that we want to prevent a data
race with an EAUG flow, and since we have to release encl->lock (which
would otherwise prevent the data race) we instead set a bit to mark this
enclave page as being in the process of removal, so that the EAUG flow
backs off and retries later.

--
Dmitrii Kuvaiskii

next prev parent reply	other threads:[~2024-06-07 17:30 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-17 11:06 [PATCH v3 0/2] x86/sgx: Fix two data races in EAUG/EREMOVE flows Dmitrii Kuvaiskii
2024-05-17 11:06 ` [PATCH v3 1/2] x86/sgx: Resolve EAUG race where losing thread returns SIGBUS Dmitrii Kuvaiskii
2024-05-17 11:06 ` [PATCH v3 2/2] x86/sgx: Resolve EREMOVE page vs EAUG page data race Dmitrii Kuvaiskii
2024-05-28 16:23   ` Dave Hansen
2024-06-03 18:42     ` Haitao Huang
2024-06-07 17:43       ` Dave Hansen
2024-06-07 17:21     ` Dmitrii Kuvaiskii [this message]
2024-05-28 16:01 ` [PATCH v3 0/2] x86/sgx: Fix two data races in EAUG/EREMOVE flows Dave Hansen
2024-06-07 17:47   ` Dmitrii Kuvaiskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240607172146.536993-1-dmitrii.kuvaiskii@intel.com \
    --to=dmitrii.kuvaiskii@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=haitao.huang@linux.intel.com \
    --cc=jarkko@kernel.org \
    --cc=kai.huang@intel.com \
    --cc=kailun.qin@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=mona.vij@intel.com \
    --cc=reinette.chatre@intel.com \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox