From: "Huang, Kai" <kai.huang@intel.com>
To: "Kuvaiskii, Dmitrii" <dmitrii.kuvaiskii@intel.com>
Cc: "linux-sgx@vger.kernel.org" <linux-sgx@vger.kernel.org>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"jarkko@kernel.org" <jarkko@kernel.org>,
"Chatre, Reinette" <reinette.chatre@intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"Qin, Kailun" <kailun.qin@intel.com>,
"haitao.huang@linux.intel.com" <haitao.huang@linux.intel.com>,
"stable@vger.kernel.org" <stable@vger.kernel.org>,
"Vij, Mona" <mona.vij@intel.com>
Subject: Re: [PATCH v4 3/3] x86/sgx: Resolve EREMOVE page vs EAUG page data race
Date: Fri, 9 Aug 2024 11:19:22 +0000 [thread overview]
Message-ID: <8ab0f2d8aaf80e263796e18010e0fa0a4f0686a3.camel@intel.com> (raw)
In-Reply-To: <20240809093520.954552-1-dmitrii.kuvaiskii@intel.com>
On Fri, 2024-08-09 at 02:35 -0700, Dmitrii Kuvaiskii wrote:
> On Thu, Jul 25, 2024 at 01:21:56PM +1200, Huang, Kai wrote:
> >
> > > Two enclave threads may try to add and remove the same enclave page
> > > simultaneously (e.g., if the SGX runtime supports both lazy allocation
> > > and MADV_DONTNEED semantics). Consider some enclave page added to the
> > > enclave. User space decides to temporarily remove this page (e.g.,
> > > emulating the MADV_DONTNEED semantics) on CPU1. At the same time, user
> > > space performs a memory access on the same page on CPU2, which results
> > > in a #PF and ultimately in sgx_vma_fault(). Scenario proceeds as
> > > follows:
> > >
> > > [ ... skipped ... ]
> > >
> > > Here, CPU1 removed the page. However CPU2 installed the PTE entry on the
> > > same page. This enclave page becomes perpetually inaccessible (until
> > > another SGX_IOC_ENCLAVE_REMOVE_PAGES ioctl). This is because the page is
> > > marked accessible in the PTE entry but is not EAUGed, and any subsequent
> > > access to this page raises a fault: with the kernel believing there to
> > > be a valid VMA, the unlikely error code X86_PF_SGX encountered by code
> > > path do_user_addr_fault() -> access_error() causes the SGX driver's
> > > sgx_vma_fault() to be skipped and user space receives a SIGSEGV instead.
> > > The userspace SIGSEGV handler cannot perform EACCEPT because the page
> > > was not EAUGed. Thus, the user space is stuck with the inaccessible
> > > page.
> >
> > Reading the code, it seems the ioctl(sgx_ioc_enclave_modify_types) also zaps
> > EPC mapping when converting a normal page to TSC. Thus IIUC it should also
> > suffer this issue?
>
> Technically yes, sgx_enclave_modify_types() has a similar code path and
> can be patched in a similar way.
>
> Practically though, I can't imagine an SGX program or framework to allow a
> scenario when CPU1 modifies the type of the enclave page from REG to TCS
> and at the same time CPU2 performs a memory access on the same page. This
> would be clearly a bug in the SGX program/framework. For example, Gramine
> always follows the path of: create a new REG enclave page, modify it to
> TCS, only then start using it; i.e., there is never a point in time at
> which the REG page is allocated and ready to be converted to a TCS page,
> and some other thread/CPU accesses it in-between these steps.
I think we need to understand the consequence of such bug (assuming such
behaviour is 100% a bug) both to kernel and to enclave.
To the kernel I don't see any big issue: for the sgx_vma_fault() path it will
find the EPC page is already loaded thus just setup the mapping again; for the
sgx_enclave_modify_types() path the worst case is __emodt() could fail,
resulting in enclave being killed probably.
So if this race is 100% a bug, and will also certainly kill the enclave, then
I guess it is fine not to handle. But there's an "assuming" here.
On the other hand, there's no risk if we apply BUSY flag here too. If it is a
bug in enclave, then it can die anyway; otherwise it may survive.
>
> TLDR: I can add similar handling to sgx_enclave_modify_types() if
> reviewers insist, but I don't see how this data race can ever be
> triggered by benign real-world SGX applications.
>
So as mentioned above, I intend to suggest to also apply the BUSY flag here.
And we can have a consist rule in the kernel:
If an enclave page is under certainly operation by the kernel with the mapping
removed, other threads trying to access that page are temporarily blocked and
should retry.
But this is only my 2cents, and I'll leave to maintainers.
next prev parent reply other threads:[~2024-08-09 11:19 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-07-05 7:45 [PATCH v4 0/3] x86/sgx: Fix two data races in EAUG/EREMOVE flows Dmitrii Kuvaiskii
2024-07-05 7:45 ` [PATCH v4 1/3] x86/sgx: Split SGX_ENCL_PAGE_BEING_RECLAIMED into two flags Dmitrii Kuvaiskii
2024-07-10 15:15 ` Haitao Huang
2024-07-17 10:36 ` Jarkko Sakkinen
2024-08-12 8:12 ` Dmitrii Kuvaiskii
2024-08-15 18:29 ` Jarkko Sakkinen
2024-07-17 10:37 ` Jarkko Sakkinen
2024-08-12 8:16 ` Dmitrii Kuvaiskii
2024-08-15 18:30 ` Jarkko Sakkinen
2024-07-25 2:00 ` Huang, Kai
2024-07-05 7:45 ` [PATCH v4 2/3] x86/sgx: Resolve EAUG race where losing thread returns SIGBUS Dmitrii Kuvaiskii
2024-07-17 10:38 ` Jarkko Sakkinen
2024-08-12 8:21 ` Dmitrii Kuvaiskii
2024-08-15 18:31 ` Jarkko Sakkinen
2024-07-25 0:52 ` Huang, Kai
2024-07-05 7:45 ` [PATCH v4 3/3] x86/sgx: Resolve EREMOVE page vs EAUG page data race Dmitrii Kuvaiskii
2024-07-10 15:16 ` Haitao Huang
2024-07-17 10:38 ` Jarkko Sakkinen
2024-08-12 8:25 ` Dmitrii Kuvaiskii
2024-08-15 18:34 ` Jarkko Sakkinen
2024-08-15 18:37 ` Jarkko Sakkinen
2024-07-25 1:21 ` Huang, Kai
2024-08-09 9:35 ` Dmitrii Kuvaiskii
2024-08-09 11:19 ` Huang, Kai [this message]
2024-08-12 8:32 ` Dmitrii Kuvaiskii
2024-08-12 10:34 ` Huang, Kai
2024-07-05 9:03 ` [PATCH v4 0/3] x86/sgx: Fix two data races in EAUG/EREMOVE flows Jarkko Sakkinen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8ab0f2d8aaf80e263796e18010e0fa0a4f0686a3.camel@intel.com \
--to=kai.huang@intel.com \
--cc=dave.hansen@linux.intel.com \
--cc=dmitrii.kuvaiskii@intel.com \
--cc=haitao.huang@linux.intel.com \
--cc=jarkko@kernel.org \
--cc=kailun.qin@intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-sgx@vger.kernel.org \
--cc=mona.vij@intel.com \
--cc=reinette.chatre@intel.com \
--cc=stable@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox