* [PATCH] x86: sgx: Don't track poisoned pages for reclaiming @ 2025-02-11 15:01 Andrew Zaborowski 2025-02-11 16:25 ` Dave Hansen 0 siblings, 1 reply; 11+ messages in thread From: Andrew Zaborowski @ 2025-02-11 15:01 UTC (permalink / raw) To: x86, linux-sgx, linux-kernel Cc: Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin, balrogg Pages used by an enclave only get page->poison set in arch_memory_failure() but stay on sgx_active_page_list. page->poison is not checked in the reclaimer logic meaning that a page could be reclaimed and go through ETRACK, EBLOCK and EWB. This can lead to the firmware receiving and MCE in one of those operations and going into "unbreakable shutdown" and triggering a kernel panic on remaining cores. Remove the affected page from sgx_active_page_list but don't add it immediately to &node->sgx_poison_page_list to keep most of the current semantics. It'll be added to &node->sgx_poison_page_list later in sgx_encl_release()->sgx_free_epc_page() Tested with CONFIG_PROVE_LOCKING as suggested by Tony Luck. Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com> --- arch/x86/kernel/cpu/sgx/main.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c index 671c26513..7076464d4 100644 --- a/arch/x86/kernel/cpu/sgx/main.c +++ b/arch/x86/kernel/cpu/sgx/main.c @@ -719,6 +719,8 @@ int arch_memory_failure(unsigned long pfn, int flags) goto out; } + sgx_unmark_page_reclaimable(page); + /* * TBD: Add additional plumbing to enable pre-emptive * action for asynchronous poison notification. Until -- 2.43.5 ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming 2025-02-11 15:01 [PATCH] x86: sgx: Don't track poisoned pages for reclaiming Andrew Zaborowski @ 2025-02-11 16:25 ` Dave Hansen 2025-02-11 21:03 ` Jarkko Sakkinen 2025-02-12 0:22 ` Andrew Zaborowski 0 siblings, 2 replies; 11+ messages in thread From: Dave Hansen @ 2025-02-11 16:25 UTC (permalink / raw) To: Andrew Zaborowski, x86, linux-sgx, linux-kernel Cc: Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin, balrogg I don't expect everyone to know the rules of every little part of the kernel. But, it's really easy to see a pattern with: git log arch/x86/kernel/cpu/sgx/ That usually works for every little nook and cranny of the kernel and will show you what the subject rules are. Could you do that for this patch for v2, please? Also, this isn't about _tracking_ pages per se. It's avoiding SGX page reclaim, don't you think? On 2/11/25 07:01, Andrew Zaborowski wrote: > Pages used by an enclave only get page->poison set in Could we please call these "epc_page" instead of "page"? > arch_memory_failure() but stay on sgx_active_page_list. > page->poison is not checked in the reclaimer logic meaning that a page could be > reclaimed and go through ETRACK, EBLOCK and EWB. This can lead to the > firmware receiving and MCE in one of those operations and going into > "unbreakable shutdown" and triggering a kernel panic on remaining cores. This requires low-level SGX implementation knowledge to fully understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, how they are involved in reclaim and also why EREMOVE doesn't lead to the same fate. Can it be written in a more approachable way? During SGX reclaim, the CPU actually touches the SGX data page, encrypting and writing its contents out to normal memory. These "EWB" writeback operations are implemented in what are effectively big, complicated chunks of microcode. Any machine checks encountered during this writeback operation are usually fatal to the entire system. If an epc_page has poison, reclaiming it is highly likely to bring the whole system down. The SGX reclaim code does not currently check for poison. -- > Remove the affected page from sgx_active_page_list but don't add it > immediately to &node->sgx_poison_page_list to keep most of the current > semantics. What semantics are being kept? Are they important? > Tested with CONFIG_PROVE_LOCKING as suggested by Tony Luck. "I tested it with lockdep and it didn't blow up" is definitely better than "I booted this and it didn't blow up" or not testing it at all. But even better would be demonstrating in the changelog that the locking rules were understood and respected in this patch. > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > index 671c26513..7076464d4 100644 > --- a/arch/x86/kernel/cpu/sgx/main.c > +++ b/arch/x86/kernel/cpu/sgx/main.c > @@ -719,6 +719,8 @@ int arch_memory_failure(unsigned long pfn, int flags) > goto out; > } > > + sgx_unmark_page_reclaimable(page); > + > /* > * TBD: Add additional plumbing to enable pre-emptive > * action for asynchronous poison notification. Until I'll 100% buy that this is the most expeditious fix. But is it the _best_ one? In the end, this patch has the semantics of avoiding SGX reclaim on poisoned pages. Wouldn't it be most straightforward to implement that in the SGX *reclaim* code? ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming 2025-02-11 16:25 ` Dave Hansen @ 2025-02-11 21:03 ` Jarkko Sakkinen 2025-02-11 21:18 ` Huang, Kai 2025-02-12 0:22 ` Andrew Zaborowski 1 sibling, 1 reply; 11+ messages in thread From: Jarkko Sakkinen @ 2025-02-11 21:03 UTC (permalink / raw) To: Dave Hansen Cc: Andrew Zaborowski, x86, linux-sgx, linux-kernel, Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin, balrogg On Tue, Feb 11, 2025 at 08:25:58AM -0800, Dave Hansen wrote: > > arch_memory_failure() but stay on sgx_active_page_list. > > page->poison is not checked in the reclaimer logic meaning that a page could be > > reclaimed and go through ETRACK, EBLOCK and EWB. This can lead to the > > firmware receiving and MCE in one of those operations and going into > > "unbreakable shutdown" and triggering a kernel panic on remaining cores. > > This requires low-level SGX implementation knowledge to fully > understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, > how they are involved in reclaim and also why EREMOVE doesn't lead to > the same fate. Does it? [I'll dig up Intel SDM to check this] BR, Jarkko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming 2025-02-11 21:03 ` Jarkko Sakkinen @ 2025-02-11 21:18 ` Huang, Kai 2025-02-11 23:24 ` Jarkko Sakkinen 2025-02-11 23:31 ` Dave Hansen 0 siblings, 2 replies; 11+ messages in thread From: Huang, Kai @ 2025-02-11 21:18 UTC (permalink / raw) To: Jarkko Sakkinen, Dave Hansen Cc: Andrew Zaborowski, x86, linux-sgx, linux-kernel, Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin, balrogg On 12/02/2025 10:03 am, Jarkko Sakkinen wrote: > On Tue, Feb 11, 2025 at 08:25:58AM -0800, Dave Hansen wrote: >>> arch_memory_failure() but stay on sgx_active_page_list. >>> page->poison is not checked in the reclaimer logic meaning that a page could be >>> reclaimed and go through ETRACK, EBLOCK and EWB. This can lead to the >>> firmware receiving and MCE in one of those operations and going into >>> "unbreakable shutdown" and triggering a kernel panic on remaining cores. >> >> This requires low-level SGX implementation knowledge to fully >> understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, >> how they are involved in reclaim and also why EREMOVE doesn't lead to >> the same fate. > > Does it? [I'll dig up Intel SDM to check this] > I just did. :-) It seems EREMOVE only reads and updates the EPCM entry for the target EPC page but won't actually access that EPC page. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming 2025-02-11 21:18 ` Huang, Kai @ 2025-02-11 23:24 ` Jarkko Sakkinen 2025-02-11 23:31 ` Dave Hansen 1 sibling, 0 replies; 11+ messages in thread From: Jarkko Sakkinen @ 2025-02-11 23:24 UTC (permalink / raw) To: Huang, Kai Cc: Dave Hansen, Andrew Zaborowski, x86, linux-sgx, linux-kernel, Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin, balrogg On Wed, Feb 12, 2025 at 10:18:11AM +1300, Huang, Kai wrote: > > > On 12/02/2025 10:03 am, Jarkko Sakkinen wrote: > > On Tue, Feb 11, 2025 at 08:25:58AM -0800, Dave Hansen wrote: > > > > arch_memory_failure() but stay on sgx_active_page_list. > > > > page->poison is not checked in the reclaimer logic meaning that a page could be > > > > reclaimed and go through ETRACK, EBLOCK and EWB. This can lead to the > > > > firmware receiving and MCE in one of those operations and going into > > > > "unbreakable shutdown" and triggering a kernel panic on remaining cores. > > > > > > This requires low-level SGX implementation knowledge to fully > > > understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, > > > how they are involved in reclaim and also why EREMOVE doesn't lead to > > > the same fate. > > > > Does it? [I'll dig up Intel SDM to check this] > > > > I just did. :-) > > It seems EREMOVE only reads and updates the EPCM entry for the target EPC > page but won't actually access that EPC page. That was fast, thank you! This is pretty much also that should be explicitly stated in the commit message. BR, Jarkko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming 2025-02-11 21:18 ` Huang, Kai 2025-02-11 23:24 ` Jarkko Sakkinen @ 2025-02-11 23:31 ` Dave Hansen 2025-02-12 0:32 ` andrzej zaborowski ` (2 more replies) 1 sibling, 3 replies; 11+ messages in thread From: Dave Hansen @ 2025-02-11 23:31 UTC (permalink / raw) To: Huang, Kai, Jarkko Sakkinen Cc: Andrew Zaborowski, x86, linux-sgx, linux-kernel, Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin, balrogg On 2/11/25 13:18, Huang, Kai wrote: >>> This requires low-level SGX implementation knowledge to fully >>> understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, >>> how they are involved in reclaim and also why EREMOVE doesn't lead to >>> the same fate. >> >> Does it? [I'll dig up Intel SDM to check this] >> > I just did. 🙂 > > It seems EREMOVE only reads and updates the EPCM entry for the target > EPC page but won't actually access that EPC page. Actually, now that I think about it even more, why would ETRACK or EBLOCK access the page itself? They seem superficially like they'd be metadata-only too. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming 2025-02-11 23:31 ` Dave Hansen @ 2025-02-12 0:32 ` andrzej zaborowski 2025-02-12 0:37 ` Dave Hansen 2025-02-12 10:38 ` Huang, Kai 2025-02-12 21:25 ` Jarkko Sakkinen 2 siblings, 1 reply; 11+ messages in thread From: andrzej zaborowski @ 2025-02-12 0:32 UTC (permalink / raw) To: Dave Hansen Cc: Huang, Kai, Jarkko Sakkinen, x86, linux-sgx, linux-kernel, Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin On Wed, 12 Feb 2025 at 00:31, Dave Hansen <dave.hansen@intel.com> wrote: > On 2/11/25 13:18, Huang, Kai wrote: > >>> This requires low-level SGX implementation knowledge to fully > >>> understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, > >>> how they are involved in reclaim and also why EREMOVE doesn't lead to > >>> the same fate. > >> > >> Does it? [I'll dig up Intel SDM to check this] > >> > > I just did. 🙂 > > > > It seems EREMOVE only reads and updates the EPCM entry for the target > > EPC page but won't actually access that EPC page. > > Actually, now that I think about it even more, why would ETRACK or > EBLOCK access the page itself? They seem superficially like they'd be > metadata-only too. I haven't seen a crash in either of these (always in EWB), I didn't want to imply that. But starting that sequence seems wrong knowing we cannot reclaim the page. Best regards ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming 2025-02-12 0:32 ` andrzej zaborowski @ 2025-02-12 0:37 ` Dave Hansen 0 siblings, 0 replies; 11+ messages in thread From: Dave Hansen @ 2025-02-12 0:37 UTC (permalink / raw) To: andrzej zaborowski Cc: Huang, Kai, Jarkko Sakkinen, x86, linux-sgx, linux-kernel, Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin On 2/11/25 16:32, andrzej zaborowski wrote: >> Actually, now that I think about it even more, why would ETRACK or >> EBLOCK access the page itself? They seem superficially like they'd be >> metadata-only too. > I haven't seen a crash in either of these (always in EWB), I didn't > want to imply that. But starting that sequence seems wrong knowing we > cannot reclaim the page. That's kinda another reason not to delve into the details too deeply. I think you wanted to talk about the "writeback process" as a thing and not really about "ETRACK, EBLOCK and EWB" per se. Writing back an SGX page is the problem. The names of the three instructions that implement the writeback or that there _are_ even three of them isn't super relevant. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming 2025-02-11 23:31 ` Dave Hansen 2025-02-12 0:32 ` andrzej zaborowski @ 2025-02-12 10:38 ` Huang, Kai 2025-02-12 21:25 ` Jarkko Sakkinen 2 siblings, 0 replies; 11+ messages in thread From: Huang, Kai @ 2025-02-12 10:38 UTC (permalink / raw) To: Dave Hansen, Jarkko Sakkinen Cc: Andrew Zaborowski, x86, linux-sgx, linux-kernel, Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin, balrogg On 12/02/2025 12:31 pm, Dave Hansen wrote: > On 2/11/25 13:18, Huang, Kai wrote: >>>> This requires low-level SGX implementation knowledge to fully >>>> understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, >>>> how they are involved in reclaim and also why EREMOVE doesn't lead to >>>> the same fate. >>> >>> Does it? [I'll dig up Intel SDM to check this] >>> >> I just did. 🙂 >> >> It seems EREMOVE only reads and updates the EPCM entry for the target >> EPC page but won't actually access that EPC page. > > Actually, now that I think about it even more, why would ETRACK or > EBLOCK access the page itself? They seem superficially like they'd be > metadata-only too. Looking at the pseudo code, AFAICT EBLOCK doesn't touch the EPC page either, but ETRACK actually reads SECS (ETRACK must take SECS page as input): (* All processors must have completed the previous tracking cycle*) IF ( (DS:RCX).TRACKING ≠ 0) ) ...... Here the DS:RCX is the SECS page. I think this also is consistent with what Andrew said: "I haven't seen a crash in either of these (always in EWB), ..." because a poisoned EPC page being regular enclave page has much higher possibility. In fact, ETRACK only takes SECS page but I think the chance that the kernel code can still reach ETRACK while SECS page is poisoned should be just 0. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming 2025-02-11 23:31 ` Dave Hansen 2025-02-12 0:32 ` andrzej zaborowski 2025-02-12 10:38 ` Huang, Kai @ 2025-02-12 21:25 ` Jarkko Sakkinen 2 siblings, 0 replies; 11+ messages in thread From: Jarkko Sakkinen @ 2025-02-12 21:25 UTC (permalink / raw) To: Dave Hansen Cc: Huang, Kai, Andrew Zaborowski, x86, linux-sgx, linux-kernel, Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin, balrogg On Tue, Feb 11, 2025 at 03:31:54PM -0800, Dave Hansen wrote: > On 2/11/25 13:18, Huang, Kai wrote: > >>> This requires low-level SGX implementation knowledge to fully > >>> understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, > >>> how they are involved in reclaim and also why EREMOVE doesn't lead to > >>> the same fate. > >> > >> Does it? [I'll dig up Intel SDM to check this] > >> > > I just did. 🙂 > > > > It seems EREMOVE only reads and updates the EPCM entry for the target > > EPC page but won't actually access that EPC page. > > Actually, now that I think about it even more, why would ETRACK or > EBLOCK access the page itself? They seem superficially like they'd be > metadata-only too. Did a sanity check to SDM. I think you're correct, and also there's zero rational reason them use anything but EPCM (no legit reason to access payload itself). BR, Jarkko ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] x86: sgx: Don't track poisoned pages for reclaiming 2025-02-11 16:25 ` Dave Hansen 2025-02-11 21:03 ` Jarkko Sakkinen @ 2025-02-12 0:22 ` Andrew Zaborowski 1 sibling, 0 replies; 11+ messages in thread From: Andrew Zaborowski @ 2025-02-12 0:22 UTC (permalink / raw) To: Dave Hansen Cc: x86, linux-sgx, linux-kernel, Dave Hansen, Tony Luck, Thomas Gleixner, Borislav Petkov, Ingo Molnar, H . Peter Anvin On Tue, 11 Feb 2025 at 17:25, Dave Hansen <dave.hansen@intel.com> wrote: > git log arch/x86/kernel/cpu/sgx/ > > That usually works for every little nook and cranny of the kernel and > will show you what the subject rules are. > > Could you do that for this patch for v2, please? My bad, I'll use x86/sgx: ... > > Also, this isn't about _tracking_ pages per se. It's avoiding SGX page > reclaim, don't you think? Ok, the goal is to avoid the crash so I'll update the subject line. That said clearing SGX_EPC_PAGE_RECLAIMER_TRACKED for a page that cannot be reclaimed seems reasonable on its own. > > On 2/11/25 07:01, Andrew Zaborowski wrote: > > Pages used by an enclave only get page->poison set in > > Could we please call these "epc_page" instead of "page"? Ok, makes sense. > > > arch_memory_failure() but stay on sgx_active_page_list. > > page->poison is not checked in the reclaimer logic meaning that a page could be > > reclaimed and go through ETRACK, EBLOCK and EWB. This can lead to the > > firmware receiving and MCE in one of those operations and going into > > "unbreakable shutdown" and triggering a kernel panic on remaining cores. > > This requires low-level SGX implementation knowledge to fully > understand. Both what "ETRACK, EBLOCK and EWB" are in the first place, > how they are involved in reclaim and also why EREMOVE doesn't lead to > the same fate. > > Can it be written in a more approachable way? > > During SGX reclaim, the CPU actually touches the SGX data page, > encrypting and writing its contents out to normal memory. These "EWB" > writeback operations are implemented in what are effectively big, > complicated chunks of microcode. Any machine checks encountered during > this writeback operation are usually fatal to the entire system. > > If an epc_page has poison, reclaiming it is highly likely to bring the > whole system down. The SGX reclaim code does not currently check for poison. Ok, I agree part of this explanation is fit for the commit message. In a busy area of code I don't think every commit touching it should explain what the code does but here it makes sense. As a side note with a set of enclaves fighting badly enough for the EPC memory the probability of an MCE happening inside one of the EWB operations can become considerable because of the sheer amount of time spent in them and I don't think anything can be done about it. > > -- > > > Remove the affected page from sgx_active_page_list but don't add it > > immediately to &node->sgx_poison_page_list to keep most of the current > > semantics. > > What semantics are being kept? Are they important? That's a good question. An epc_page can be on one of 3 or 4 lists, or none, so the logic is already complicated. My guess is that moving the page to &node->sgx_poison_page_list only after EREMOVE is done adds some order, because functionally it doesn't change anything. (But __sgx_sanitize_pages skips the EREMOVE for poisoned pages) So epc_page->poison is used temporarily while the page is mapped into an enclave. Once the enclave is released a poisoned page is moved to sgx_poison_page_list. The current semantics ensure that by that time the epc_page is not referenced by an encl_page and is not on any other list so for all purposes it's forgotten. > > > Tested with CONFIG_PROVE_LOCKING as suggested by Tony Luck. > > "I tested it with lockdep and it didn't blow up" is definitely better > than "I booted this and it didn't blow up" or not testing it at all. > > But even better would be demonstrating in the changelog that the locking > rules were understood and respected in this patch. > > > diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c > > index 671c26513..7076464d4 100644 > > --- a/arch/x86/kernel/cpu/sgx/main.c > > +++ b/arch/x86/kernel/cpu/sgx/main.c > > @@ -719,6 +719,8 @@ int arch_memory_failure(unsigned long pfn, int flags) > > goto out; > > } > > > > + sgx_unmark_page_reclaimable(page); > > + > > /* > > * TBD: Add additional plumbing to enable pre-emptive > > * action for asynchronous poison notification. Until > > I'll 100% buy that this is the most expeditious fix. > > But is it the _best_ one? > > In the end, this patch has the semantics of avoiding SGX reclaim on > poisoned pages. Wouldn't it be most straightforward to implement that in > the SGX *reclaim* code? I don't know. If we know we cannot reclaim a page should we force the reclaimer to still look at its flags? I'm sure we don't want to have to check epc_page->poison multiple times if we can remove it from a relevant list once. Perhaps it's not great that my patch adds a period when the epc_page is not on *any* list, everywhere else this is done only for very short times. I'm not sure what it would take to eremove the page earlier so it can be added to sgx_poison_page_list, I think the "TBD" comment in arch_memory_failure() refers to that and I hoped to leave it as a TBD. Best regards ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2025-02-12 21:25 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2025-02-11 15:01 [PATCH] x86: sgx: Don't track poisoned pages for reclaiming Andrew Zaborowski 2025-02-11 16:25 ` Dave Hansen 2025-02-11 21:03 ` Jarkko Sakkinen 2025-02-11 21:18 ` Huang, Kai 2025-02-11 23:24 ` Jarkko Sakkinen 2025-02-11 23:31 ` Dave Hansen 2025-02-12 0:32 ` andrzej zaborowski 2025-02-12 0:37 ` Dave Hansen 2025-02-12 10:38 ` Huang, Kai 2025-02-12 21:25 ` Jarkko Sakkinen 2025-02-12 0:22 ` Andrew Zaborowski
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox