Re: [PATCH V2] x86/sgx: Fix free page accounting

public inbox for linux-sgx@vger.kernel.org
 help / color / mirror / Atom feed

From: Jarkko Sakkinen <jarkko@kernel.org>
To: Reinette Chatre <reinette.chatre@intel.com>,
	dave.hansen@linux.intel.com, tglx@linutronix.de, bp@alien8.de,
	mingo@redhat.com, linux-sgx@vger.kernel.org, x86@kernel.org
Cc: seanjc@google.com, tony.luck@intel.com, hpa@zytor.com,
	linux-kernel@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH V2] x86/sgx: Fix free page accounting
Date: Wed, 10 Nov 2021 17:11:50 +0200	[thread overview]
Message-ID: <8e0bb87f05b79317a06ed2d8ab5e2f5cf6132b6a.camel@kernel.org> (raw)
In-Reply-To: <b2e69e9febcae5d98d331de094d9cc7ce3217e66.1636487172.git.reinette.chatre@intel.com>

On Tue, 2021-11-09 at 12:00 -0800, Reinette Chatre wrote:
> The SGX driver maintains a single global free page counter,
> sgx_nr_free_pages, that reflects the number of free pages available
> across all NUMA nodes. Correspondingly, a list of free pages is
> associated with each NUMA node and sgx_nr_free_pages is updated
> every time a page is added or removed from any of the free page
> lists. The main usage of sgx_nr_free_pages is by the reclaimer
> that will run when it (sgx_nr_free_pages) goes below a watermark
> to ensure that there are always some free pages available to, for
> example, support efficient page faults.
> 
> With sgx_nr_free_pages accessed and modified from a few places
> it is essential to ensure that these accesses are done safely but
> this is not the case. sgx_nr_free_pages is read without any
> protection and updated with inconsistent protection by any one
> of the spin locks associated with the individual NUMA nodes.
> For example:
> 
>       CPU_A                                 CPU_B
>       -----                                 -----
>  spin_lock(&nodeA->lock);              spin_lock(&nodeB->lock);
>  ...                                   ...
>  sgx_nr_free_pages--;  /* NOT SAFE */  sgx_nr_free_pages--;
> 
>  spin_unlock(&nodeA->lock);            spin_unlock(&nodeB->lock);

I don't understand the "NOT SAFE" part here. It is safe to access
the variable, even when it is not atomic...

I don't understand what the sequence above should tell me.

> The consequence of sgx_nr_free_pages not being protected is that
> its value may not accurately reflect the actual number of free
> pages on the system, impacting the availability of free pages in
> support of many flows. The problematic scenario isu when the

In non-atomicity is not a problem, when it is not a problem :-)

> reclaimer does not run because it believes there to be sufficient
> free pages while any attempt to allocate a page fails because there
> are no free pages available.
> 
> The worst scenario observed was a user space hang because of
> repeated page faults caused by no free pages made available.
> 
> The following flow was encountered:
> asm_exc_page_fault
>  ...
>    sgx_vma_fault()
>      sgx_encl_load_page()
>        sgx_encl_eldu() // Encrypted page needs to be loaded from backing
>                        // storage into newly allocated SGX memory page
>          sgx_alloc_epc_page() // Allocate a page of SGX memory
>            __sgx_alloc_epc_page() // Fails, no free SGX memory
>            ...
>            if (sgx_should_reclaim(SGX_NR_LOW_PAGES)) // Wake reclaimer
>              wake_up(&ksgxd_waitq);
>            return -EBUSY; // Return -EBUSY giving reclaimer time to run
>        return -EBUSY;
>      return -EBUSY;
>    return VM_FAULT_NOPAGE;
> 
> The reclaimer is triggered in above flow with the following code:
> 
> static bool sgx_should_reclaim(unsigned long watermark)
> {
>         return sgx_nr_free_pages < watermark &&
>                !list_empty(&sgx_active_page_list);
> }
> 
> In the problematic scenario there were no free pages available yet the
> value of sgx_nr_free_pages was above the watermark. The allocation of
> SGX memory thus always failed because of a lack of free pages while no
> free pages were made available because the reclaimer is never started
> because of sgx_nr_free_pages' incorrect value. The consequence was that
> user space kept encountering VM_FAULT_NOPAGE that caused the same
> address to be accessed repeatedly with the same result.

That causes sgx_should_reclaim() executed to be multiple times as the
fault is retried. Eventually it should be successful.

> Change the global free page counter to an atomic type that
> ensures simultaneous updates are done safely. While doing so, move
> the updating of the variable outside of the spin lock critical
> section to which it does not belong.
> 
> Cc: stable@vger.kernel.org
> Fixes: 901ddbb9ecf5 ("x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()")
> Suggested-by: Dave Hansen <dave.hansen@linux.intel.com>
> Reviewed-by: Tony Luck <tony.luck@intel.com>
> Signed-off-by: Reinette Chatre <reinette.chatre@intel.com>

I'm not yet sure if this a bug, even if it'd be a improvement
to the performance.

/Jarkko

next prev parent reply	other threads:[~2021-11-10 15:11 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-09 20:00 [PATCH V2] x86/sgx: Fix free page accounting Reinette Chatre
2021-11-10 15:11 ` Jarkko Sakkinen [this message]
2021-11-10 18:51   ` Reinette Chatre
2021-11-10 19:16     ` Luck, Tony
2021-11-11  2:56       ` Jarkko Sakkinen
2021-11-11  2:55     ` Jarkko Sakkinen
2021-11-11  3:26       ` Luck, Tony
2021-11-11  3:50         ` Jarkko Sakkinen
2021-11-11  4:01           ` Dave Hansen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8e0bb87f05b79317a06ed2d8ab5e2f5cf6132b6a.camel@kernel.org \
    --to=jarkko@kernel.org \
    --cc=bp@alien8.de \
    --cc=dave.hansen@linux.intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=reinette.chatre@intel.com \
    --cc=seanjc@google.com \
    --cc=stable@vger.kernel.org \
    --cc=tglx@linutronix.de \
    --cc=tony.luck@intel.com \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox