From: Jarkko Sakkinen <jarkko@kernel.org>
To: Reinette Chatre <reinette.chatre@intel.com>,
dave.hansen@linux.intel.com, tglx@linutronix.de, bp@alien8.de,
mingo@redhat.com, linux-sgx@vger.kernel.org, x86@kernel.org
Cc: seanjc@google.com, tony.luck@intel.com, hpa@zytor.com,
linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH V2] x86/sgx: Fix free page accounting
Date: Wed, 10 Nov 2021 17:11:50 +0200 [thread overview]
Message-ID: <8e0bb87f05b79317a06ed2d8ab5e2f5cf6132b6a.camel@kernel.org> (raw)
In-Reply-To: <b2e69e9febcae5d98d331de094d9cc7ce3217e66.1636487172.git.reinette.chatre@intel.com>
On Tue, 2021-11-09 at 12:00 -0800, Reinette Chatre wrote:
> The SGX driver maintains a single global free page counter,
> sgx_nr_free_pages, that reflects the number of free pages available
> across all NUMA nodes. Correspondingly, a list of free pages is
> associated with each NUMA node and sgx_nr_free_pages is updated
> every time a page is added or removed from any of the free page
> lists. The main usage of sgx_nr_free_pages is by the reclaimer
> that will run when it (sgx_nr_free_pages) goes below a watermark
> to ensure that there are always some free pages available to, for
> example, support efficient page faults.
>
> With sgx_nr_free_pages accessed and modified from a few places
> it is essential to ensure that these accesses are done safely but
> this is not the case. sgx_nr_free_pages is read without any
> protection and updated with inconsistent protection by any one
> of the spin locks associated with the individual NUMA nodes.
> For example:
>
> CPU_A CPU_B
> ----- -----
> spin_lock(&nodeA->lock); spin_lock(&nodeB->lock);
> ... ...
> sgx_nr_free_pages--; /* NOT SAFE */ sgx_nr_free_pages--;
>
> spin_unlock(&nodeA->lock); spin_unlock(&nodeB->lock);
I don't understand the "NOT SAFE" part here. It is safe to access
the variable, even when it is not atomic...
I don't understand what the sequence above should tell me.
> The consequence of sgx_nr_free_pages not being protected is that
> its value may not accurately reflect the actual number of free
> pages on the system, impacting the availability of free pages in
> support of many flows. The problematic scenario isu when the
In non-atomicity is not a problem, when it is not a problem :-)
> reclaimer does not run because it believes there to be sufficient
> free pages while any attempt to allocate a page fails because there
> are no free pages available.
>
> The worst scenario observed was a user space hang because of
> repeated page faults caused by no free pages made available.
>
> The following flow was encountered:
> asm_exc_page_fault
> ...
> sgx_vma_fault()
> sgx_encl_load_page()
> sgx_encl_eldu() // Encrypted page needs to be loaded from backing
> // storage into newly allocated SGX memory page
> sgx_alloc_epc_page() // Allocate a page of SGX memory
> __sgx_alloc_epc_page() // Fails, no free SGX memory
> ...
> if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) // Wake reclaimer
> wake_up(&ksgxd_waitq);
> return -EBUSY; // Return -EBUSY giving reclaimer time to run
> return -EBUSY;
> return -EBUSY;
> return VM_FAULT_NOPAGE;
>
> The reclaimer is triggered in above flow with the following code:
>
> static bool sgx_should_reclaim(unsigned long watermark)
> {
> return sgx_nr_free_pages < watermark &&
> !list_empty(&sgx_active_page_list);
> }
>
> In the problematic scenario there were no free pages available yet the
> value of sgx_nr_free_pages was above the watermark. The allocation of
> SGX memory thus always failed because of a lack of free pages while no
> free pages were made available because the reclaimer is never started
> because of sgx_nr_free_pages' incorrect value. The consequence was that
> user space kept encountering VM_FAULT_NOPAGE that caused the same
> address to be accessed repeatedly with the same result.
That causes sgx_should_reclaim() executed to be multiple times as the
fault is retried. Eventually it should be successful.
> Change the global free page counter to an atomic type that
> ensures simultaneous updates are done safely. While doing so, move
> the updating of the variable outside of the spin lock critical
> section to which it does not belong.
>
> Cc: stable@vger.kernel.org
> Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>
I'm not yet sure if this a bug, even if it'd be a improvement
to the performance.
/Jarkko
next prev parent reply other threads:[~2021-11-10 15:11 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-09 20:00 [PATCH V2] x86/sgx: Fix free page accounting Reinette Chatre
2021-11-10 15:11 ` Jarkko Sakkinen [this message]
2021-11-10 18:51 ` Reinette Chatre
2021-11-10 19:16 ` Luck, Tony
2021-11-11 2:56 ` Jarkko Sakkinen
2021-11-11 2:55 ` Jarkko Sakkinen
2021-11-11 3:26 ` Luck, Tony
2021-11-11 3:50 ` Jarkko Sakkinen
2021-11-11 4:01 ` Dave Hansen
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=8e0bb87f05b79317a06ed2d8ab5e2f5cf6132b6a.camel@kernel.org \
--to=jarkko@kernel.org \
--cc=bp@alien8.de \
--cc=dave.hansen@linux.intel.com \
--cc=hpa@zytor.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-sgx@vger.kernel.org \
--cc=mingo@redhat.com \
--cc=reinette.chatre@intel.com \
--cc=seanjc@google.com \
--cc=stable@vger.kernel.org \
--cc=tglx@linutronix.de \
--cc=tony.luck@intel.com \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox