Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Jarkko Sakkinen <jarkko@kernel.org>
To: Dave Hansen <dave.hansen@intel.com>
Cc: linux-sgx@vger.kernel.org, haitao.huang@intel.com,
	dan.j.williams@intel.com, Thomas Gleixner <tglx@linutronix.de>,
	Ingo Molnar <mingo@redhat.com>, Borislav Petkov <bp@alien8.de>,
	x86@kernel.org, "H. Peter Anvin" <hpa@zytor.com>,
	Dave Hansen <dave.hansen@linux.intel.com>,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page()
Date: Tue, 23 Feb 2021 21:17:12 +0200	[thread overview]
Message-ID: <YDVUuF3rqnRCr+Bb@kernel.org> (raw)
In-Reply-To: <7acc3c1c-373e-cfee-e838-2af170e87d98@intel.com>

On Sun, Feb 21, 2021 at 04:54:33PM -0800, Dave Hansen wrote:
> > +/* Nodes with one or more EPC sections. */
> > +static nodemask_t sgx_numa_mask;
> 
> I'd also add that this is for optimization only.
> 
> > +/* Array of lists of EPC sections for each NUMA node. */
> > +struct list_head *sgx_numa_nodes;
> 
> I'd much prefer:
> 
> /*
>  * Array with one list_head for each possible NUMA node.  Each
>  * list contains all the sgx_epc_section's which are on that
>  * node.
>  */
> 
> Otherwise, it's hard to imagine what this structure looks like.

OK.

> >  /*
> >   * These variables are part of the state of the reclaimer, and must be accessed
> >   * with sgx_reclaimer_lock acquired.
> > @@ -473,6 +479,26 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
> >  	return page;
> >  }
> >  
> > +static struct sgx_epc_page *__sgx_alloc_epc_page_from_node(int nid)
> > +{
> > +	struct sgx_epc_section *section;
> > +	struct sgx_epc_page *page;
> > +
> > +	if (WARN_ON_ONCE(nid < 0 || nid >= MAX_NUMNODES))
> > +		return NULL;
> > +
> > +	if (!node_isset(nid, sgx_numa_mask))
> > +		return NULL;
> > +
> > +	list_for_each_entry(section, &sgx_numa_nodes[nid], section_list) {
> > +		page = __sgx_alloc_epc_page_from_section(section);
> > +		if (page)
> > +			return page;
> > +	}
> > +
> > +	return NULL;
> > +}
> > +
> >  /**
> >   * __sgx_alloc_epc_page() - Allocate an EPC page
> >   *
> > @@ -485,13 +511,17 @@ static struct sgx_epc_page *__sgx_alloc_epc_page_from_section(struct sgx_epc_sec
> >   */
> >  struct sgx_epc_page *__sgx_alloc_epc_page(void)
> >  {
> > +	int current_nid = numa_node_id();
> >  	struct sgx_epc_section *section;
> >  	struct sgx_epc_page *page;
> >  	int i;
> >  
> > +	page = __sgx_alloc_epc_page_from_node(current_nid);
> > +	if (page)
> > +		return page;
> 
> Comments, please.
> 
> 	/* Try to allocate EPC from the current node, first: */
> 
> then:
> 
> 	/* Search all EPC sections, ignoring locality: */
> 
> >  	for (i = 0; i < sgx_nr_epc_sections; i++) {
> >  		section = &sgx_epc_sections[i];
> > -
> >  		page = __sgx_alloc_epc_page_from_section(section);
> >  		if (page)
> >  			return page;
> 
> This still has the problem that it exerts too much pressure on the
> low-numbered sgx_epc_sections[].  If a node's sections are full, it
> always tries to go after sgx_epc_sections[0].

I have a better idea. See below.

> It can be in another patch, but I think the *minimal* thing we can do
> here for a NUMA allocator is to try to at least balance the allocations.
> 
> Instead of having a for-each-section loop, I'd make it for-each-node ->
> for-each-section.  Something like:
> 
> 	for (i = 0; i < num_possible_nodes(); i++) {
> 		node = (numa_node_id() + i) % num_possible_nodes()
> 		
> 		if (!node_isset(nid, sgx_numa_mask))
> 			continue;
> 
> 		list_for_each_entry(section, &sgx_numa_nodes[nid],
> 				    section_list) {
> 			__sgx_alloc_epc_page_from_section(section)
> 		}
> 	}
> 	
> Then you have a single loop instead of a "try local then a fall back".
> 
> Also, that "node++" thing might be able to use next_online_node().
> 
> > @@ -665,8 +695,12 @@ static bool __init sgx_page_cache_init(void)
> >  {
> >  	u32 eax, ebx, ecx, edx, type;
> >  	u64 pa, size;
> > +	int nid;
> >  	int i;
> >  
> > +	nodes_clear(sgx_numa_mask);
> > +	sgx_numa_nodes = kmalloc_array(MAX_NUMNODES, sizeof(*sgx_numa_nodes), GFP_KERNEL);
> 
> MAX_NUMNODES will always be the largest compile-time constant.  That's
> 4k, IIRC.  num_possible_nodes() might be as small as 1 if NUMA is off.

Right.

> >  	for (i = 0; i < ARRAY_SIZE(sgx_epc_sections); i++) {
> >  		cpuid_count(SGX_CPUID, i + SGX_CPUID_EPC, &eax, &ebx, &ecx, &edx);
> >  
> > @@ -690,6 +724,22 @@ static bool __init sgx_page_cache_init(void)
> >  		}
> >  
> >  		sgx_nr_epc_sections++;
> > +
> > +		nid = numa_map_to_online_node(phys_to_target_node(pa));
> > +
> > +		if (nid == NUMA_NO_NODE) {
> > +			pr_err(FW_BUG "unable to map EPC section %d to online node.\n", nid);
> > +			nid = 0;
> 
> Could we dump out the physical address there?  I think that's even more
> informative than a section number.

Yes.

> > +		} else if (WARN_ON_ONCE(nid < 0 || nid >= MAX_NUMNODES)) {
> > +			nid = 0;
> > +		}
> 
> I'm not sure we really need to check for these.  If we're worried about
> the firmware returning these, I'd expect numa_map_to_online_node() to
> sanity check them for us.

Yes, let's remove it.

> > +		if (!node_isset(nid, sgx_numa_mask)) {
> > +			INIT_LIST_HEAD(&sgx_numa_nodes[nid]);
> > +			node_set(nid, sgx_numa_mask);
> > +		}
> > +
> > +		list_add_tail(&sgx_epc_sections[i].section_list, &sgx_numa_nodes[nid]);
> >  	}
> >  
> >  	if (!sgx_nr_epc_sections) {
> > diff --git a/arch/x86/kernel/cpu/sgx/sgx.h b/arch/x86/kernel/cpu/sgx/sgx.h
> > index 5fa42d143feb..4bc31bc4bacf 100644
> > --- a/arch/x86/kernel/cpu/sgx/sgx.h
> > +++ b/arch/x86/kernel/cpu/sgx/sgx.h
> > @@ -45,6 +45,7 @@ struct sgx_epc_section {
> >  	spinlock_t lock;
> >  	struct list_head page_list;
> >  	unsigned long free_cnt;
> > +	struct list_head section_list;
> 
> Maybe name this numa_section_list.

Instead, let's just:

1. Have a global sgx_free_epc_list and remove sgx_epc_section.
   Pages from this are allocated from this in LIFO fashion.
2. Instead add struct list_head node_list and use that for node
   associated pages.
3. Replace 'int section' with 'int node'.

This will remove one layer of abstraction and provide better fallback
scheme.

E.g. allocate:

1. Check node_list of current node.
2. As a fallback check sgx_free_epc_list.
3. list_del() for two lists.

/Jarkko

next prev parent reply	other threads:[~2021-02-23 19:18 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-21  2:06 [PATCH] x86/sgx: Add a basic NUMA allocation scheme to sgx_alloc_epc_page() Jarkko Sakkinen
2021-02-22  0:54 ` Dave Hansen
2021-02-23 19:14   ` Dave Hansen
2021-02-24 17:29     ` Jarkko Sakkinen
2021-02-23 19:17   ` Jarkko Sakkinen [this message]
2021-02-23 19:20     ` Dave Hansen
2021-02-23 19:33       ` Jarkko Sakkinen
2021-02-23 21:42 ` Dave Hansen
2021-02-24 17:31   ` Jarkko Sakkinen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=YDVUuF3rqnRCr+Bb@kernel.org \
    --to=jarkko@kernel.org \
    --cc=bp@alien8.de \
    --cc=dan.j.williams@intel.com \
    --cc=dave.hansen@intel.com \
    --cc=dave.hansen@linux.intel.com \
    --cc=haitao.huang@intel.com \
    --cc=hpa@zytor.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-sgx@vger.kernel.org \
    --cc=mingo@redhat.com \
    --cc=tglx@linutronix.de \
    --cc=x86@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.