From: "Huang, Kai" <kai.huang@intel.com>
To: "chenridong@huawei.com" <chenridong@huawei.com>,
"linux-sgx@vger.kernel.org" <linux-sgx@vger.kernel.org>,
"cgroups@vger.kernel.org" <cgroups@vger.kernel.org>,
"mkoutny@suse.com" <mkoutny@suse.com>,
"dave.hansen@linux.intel.com" <dave.hansen@linux.intel.com>,
"haitao.huang@linux.intel.com" <haitao.huang@linux.intel.com>,
"tim.c.chen@linux.intel.com" <tim.c.chen@linux.intel.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"mingo@redhat.com" <mingo@redhat.com>,
"tglx@linutronix.de" <tglx@linutronix.de>,
"tj@kernel.org" <tj@kernel.org>,
"jarkko@kernel.org" <jarkko@kernel.org>,
"Mehta, Sohil" <sohil.mehta@intel.com>,
"hpa@zytor.com" <hpa@zytor.com>, "bp@alien8.de" <bp@alien8.de>,
"x86@kernel.org" <x86@kernel.org>
Cc: "mikko.ylinen@linux.intel.com" <mikko.ylinen@linux.intel.com>,
"seanjc@google.com" <seanjc@google.com>,
"anakrish@microsoft.com" <anakrish@microsoft.com>,
"Zhang, Bo" <zhanb@microsoft.com>,
"kristen@linux.intel.com" <kristen@linux.intel.com>,
"yangjie@microsoft.com" <yangjie@microsoft.com>,
"Li, Zhiquan1" <zhiquan1.li@intel.com>,
"chrisyan@microsoft.com" <chrisyan@microsoft.com>
Subject: Re: [PATCH v15 12/14] x86/sgx: Turn on per-cgroup EPC reclamation
Date: Thu, 20 Jun 2024 23:53:13 +0000 [thread overview]
Message-ID: <0c657bb3de56ecd22c67193714a4de45875f36cb.camel@intel.com> (raw)
In-Reply-To: <op.2pn6wbdwwjvjmi@hhuan26-mobl.amr.corp.intel.com>
On Thu, 2024-06-20 at 10:06 -0500, Haitao Huang wrote:
> Hi Kai
>
> On Thu, 20 Jun 2024 05:30:16 -0500, Huang, Kai <kai.huang@intel.com> wrote:
>
> >
> > On 18/06/2024 12:53 am, Huang, Haitao wrote:
> > > From: Kristen Carlson Accardi <kristen@linux.intel.com>
> > >
> > > Previous patches have implemented all infrastructure needed for
> > > per-cgroup EPC page tracking and reclaiming. But all reclaimable EPC
> > > pages are still tracked in the global LRU as sgx_epc_page_lru() always
> > > returns reference to the global LRU.
> > >
> > > Change sgx_epc_page_lru() to return the LRU of the cgroup in which the
> > > given EPC page is allocated.
> > >
> > > This makes all EPC pages tracked in per-cgroup LRUs and the global
> > > reclaimer (ksgxd) will not be able to reclaim any pages from the global
> > > LRU. However, in cases of over-committing, i.e., the sum of cgroup
> > > limits greater than the total capacity, cgroups may never reclaim but
> > > the total usage can still be near the capacity. Therefore a global
> > > reclamation is still needed in those cases and it should be performed
> > > from the root cgroup.
> > >
> > > Modify sgx_reclaim_pages_global(), to reclaim from the root EPC cgroup
> > > when cgroup is enabled. Similar to sgx_cgroup_reclaim_pages(), return
> > > the next cgroup so callers can use it as the new starting node for next
> > > round of reclamation if needed.
> > >
> > > Also update sgx_can_reclaim_global(), to check emptiness of LRUs of all
> > > cgroups when EPC cgroup is enabled, otherwise only check the global LRU.
> > >
> > > Finally, change sgx_reclaim_direct(), to check and ensure there are free
> > > pages at cgroup level so forward progress can be made by the caller.
> >
> > Reading above, it's not clear how the _new_ global reclaim works with
> > multiple LRUs.
> >
> > E.g., the current global reclaim essentially treats all EPC pages equally
> > when scanning those pages. From the above, I don't see how this is
> > achieved in the new global reclaim.
> >
> > The changelog should:
> >
> > 1) describe the how does existing global reclaim work, and then describe
> > how to achieve the same beahviour in the new global reclaim which works
> > with multiple LRUs;
> >
> > 2) If there's any behaviour difference between the "existing" vs the
> > "new"
> > global reclaim, the changelog should point out the difference, and
> > explain
> > why such difference is OK.
>
> Sure I can explain. here is what I plan to add for v16:
>
> Note the original global reclaimer has
> only one LRU and always scans and reclaims from the head of this global
> LRU. The new global reclaimer always starts the scanning from the root
> node, only moves down to its descendants if more reclamation is needed
> or the root node does not have SGX_NR_TO_SCAN (16) pages in the LRU.
> This makes the enclave pages in the root node more likely being
> reclaimed if they are not frequently used (not 'young'). Unless we track
> pages in one LRU again, we can not really match exactly the same
> behavior of the original global reclaimer. And one-LRU approach would
> make per-cgroup reclamation scanning and reclaiming too complex. On the
> other hand, this design is acceptable for following reasons:
>
> 1) For all purposes of using cgroups, enclaves will need live in
> non-root (leaf for cgroup v2) nodes where limits can be enforced
> per-cgroup.
I don't see how it matters. If ROOT is empty, then it moves to the first
descendant.
> 2) Global reclamation now only happens in situation mentioned above when
> a lower level cgroup not at its limit can't allocate due to over
> commit at global level.
Really? In sgx_reclaim_direct() the code says:
/*
* Make sure there are some free pages at both cgroup and global levels.
* In both cases, only make one attempt of reclamation to avoid lengthy
* block on the caller.
*/
Yeah only one attempt will be made for global level but it is still global
reclaim.
> 3) The pages in root being slightly penalized are not busily used
> anyway.
The 1) says in practice the root will have no enclaves, thus no EPC at
all.
In other words, in practice the global reclaim will always skip the root
and move to the first descendant.
> 4) In cases that multiple rounds of reclamation is needed, the caller of
> sgx_reclaim_page_global() can still recall for reclaiming in 'next'
> descendant in round robin way, each round scans for SGX_NR_SCAN pages
> from the head of 'next' cgroup's LRU.
"multiple rounds of reclamation" isn't clear enough. Does it mean
multiple call of sgx_cgroup_reclaim_pages(), or does it mean each trigger
of global reclaim?
The current patch seems to be the former. See the 'next_cg' is reset to
NULL for each loop of the main loop in ksgxd().
This essentially means each trigger of global reclaim will start from the
ROOT, or in practice the first descendant (based on 1) and 3) above) will
always be the victim of each global reclaim.
I agree it's hard to _EXACTLY_ match the existing global reclaim, but IMHO
we should at least treats all cgroups equally.
next prev parent reply other threads:[~2024-06-20 23:53 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-06-17 12:53 [PATCH v15 00/14] Add Cgroup support for SGX EPC memory Huang, Haitao
2024-06-17 12:53 ` [PATCH v15 01/14] x86/sgx: Replace boolean parameters with enums Huang, Haitao
2024-06-17 12:53 ` [PATCH v15 02/14] cgroup/misc: Add per resource callbacks for CSS events Huang, Haitao
2024-06-17 12:53 ` [PATCH v15 03/14] cgroup/misc: Export APIs for SGX driver Huang, Haitao
2024-06-17 12:53 ` [PATCH v15 04/14] cgroup/misc: Add SGX EPC resource type Huang, Haitao
2024-06-17 12:53 ` [PATCH v15 05/14] x86/sgx: Implement basic EPC misc cgroup functionality Huang, Haitao
2024-06-18 11:31 ` Huang, Kai
2024-06-18 12:56 ` Haitao Huang
2024-06-18 23:15 ` Huang, Kai
2024-06-18 23:23 ` Haitao Huang
2024-06-19 2:00 ` Huang, Kai
2024-06-17 12:53 ` [PATCH v15 06/14] x86/sgx: Add sgx_epc_lru_list to encapsulate LRU list Huang, Haitao
2024-06-17 12:53 ` [PATCH v15 07/14] x86/sgx: Abstract tracking reclaimable pages in LRU Huang, Haitao
2024-06-17 12:53 ` [PATCH v15 08/14] x86/sgx: Add basic EPC reclamation flow for cgroup Huang, Haitao
2024-06-20 13:28 ` Huang, Kai
2024-06-20 16:05 ` Haitao Huang
2024-06-20 22:52 ` Huang, Kai
2024-06-17 12:53 ` [PATCH v15 09/14] x86/sgx: Abstract check for global reclaimable pages Huang, Haitao
2024-06-17 12:53 ` [PATCH v15 10/14] x86/sgx: Implement async reclamation for cgroup Huang, Haitao
2024-06-20 15:39 ` Haitao Huang
2024-06-17 12:53 ` [PATCH v15 11/14] x86/sgx: Charge mem_cgroup for per-cgroup reclamation Huang, Haitao
2024-06-17 12:53 ` [PATCH v15 12/14] x86/sgx: Turn on per-cgroup EPC reclamation Huang, Haitao
2024-06-20 10:30 ` Huang, Kai
2024-06-20 15:06 ` Haitao Huang
2024-06-20 23:53 ` Huang, Kai [this message]
2024-06-17 12:53 ` [PATCH v15 13/14] Docs/x86/sgx: Add description for cgroup support Huang, Haitao
2024-06-17 12:53 ` [PATCH v15 14/14] selftests/sgx: Add scripts for EPC cgroup testing Huang, Haitao
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=0c657bb3de56ecd22c67193714a4de45875f36cb.camel@intel.com \
--to=kai.huang@intel.com \
--cc=anakrish@microsoft.com \
--cc=bp@alien8.de \
--cc=cgroups@vger.kernel.org \
--cc=chenridong@huawei.com \
--cc=chrisyan@microsoft.com \
--cc=dave.hansen@linux.intel.com \
--cc=haitao.huang@linux.intel.com \
--cc=hpa@zytor.com \
--cc=jarkko@kernel.org \
--cc=kristen@linux.intel.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-sgx@vger.kernel.org \
--cc=mikko.ylinen@linux.intel.com \
--cc=mingo@redhat.com \
--cc=mkoutny@suse.com \
--cc=seanjc@google.com \
--cc=sohil.mehta@intel.com \
--cc=tglx@linutronix.de \
--cc=tim.c.chen@linux.intel.com \
--cc=tj@kernel.org \
--cc=x86@kernel.org \
--cc=yangjie@microsoft.com \
--cc=zhanb@microsoft.com \
--cc=zhiquan1.li@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox