public inbox for stable@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
       [not found] <20220831173829.126661-1-jarkko@kernel.org>
@ 2022-08-31 17:38 ` Jarkko Sakkinen
  2022-08-31 20:39   ` Reinette Chatre
  0 siblings, 1 reply; 15+ messages in thread
From: Jarkko Sakkinen @ 2022-08-31 17:38 UTC (permalink / raw)
  To: linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Reinette Chatre, Dave Hansen,
	Paul Menzel, Jarkko Sakkinen, stable, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

In sgx_init(), if misc_register() fails or misc_register() succeeds but
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped. This may leave some unsanitized pages, which does
not matter, because SGX will be disabled for the whole power cycle.

This triggers WARN_ON() because sgx_dirty_page_list ends up being
non-empty, and dumps the call stack:

[    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
[    0.268591] ------------[ cut here ]------------
[    0.268592] WARNING: CPU: 6 PID: 83 at
arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
[    0.268598] Modules linked in:
[    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
[    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
07/06/2022
[    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
[    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
[    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
[    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
0000000000000000
[    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
00000000ffffffff
[    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
ffff8dcd820a4180
[    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
ffffb6c74006bce0
[    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
0000000000000000
[    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
knlGS:0000000000000000
[    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
00000000003706e0
[    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    0.268622] Call Trace:
[    0.268624]  <TASK>
[    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
[    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[    0.268634]  ? __kthread_parkme+0x36/0x90
[    0.268637]  kthread+0xe5/0x110
[    0.268639]  ? kthread_complete_and_exit+0x20/0x20
[    0.268642]  ret_from_fork+0x1f/0x30
[    0.268647]  </TASK>
[    0.268648] ---[ end trace 0000000000000000 ]---

Ultimately this can crash the kernel, if the following is set:

	/proc/sys/kernel/panic_on_warn

In premature stop, print nothing, as the number is by practical means a
random number. Otherwise, it is an indicator of a bug in the driver, and
therefore print the number of unsanitized pages with pr_err().

Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
v6:
- Address Reinette's feedback:
  https://lore.kernel.org/linux-sgx/Yw6%2FiTzSdSw%2FY%2FVO@kernel.org/

v5:
- Add the klog dump and sysctl option to the commit message.

v4:
- Explain expectations for dirty_page_list in the function header, instead
  of an inline comment.
- Improve commit message to explain the conditions better.
- Return the number of pages left dirty to ksgxd() and print warning after
  the 2nd call, if there are any.

v3:
- Remove WARN_ON().
- Tuned comments and the commit message a bit.

v2:
- Replaced WARN_ON() with optional pr_info() inside
  __sgx_sanitize_pages().
- Rewrote the commit message.
- Added the fixes tag.
---
 arch/x86/kernel/cpu/sgx/main.c | 42 ++++++++++++++++++++++++++++------
 1 file changed, 35 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 515e2a5f25bb..bcd6b64961bd 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -49,17 +49,20 @@ static LIST_HEAD(sgx_dirty_page_list);
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
  * from the input list, and made available for the page allocator. SECS pages
  * prepending their children in the input list are left intact.
+ *
+ * Contents of the @dirty_page_list must be thread-local, i.e.
+ * not shared by multiple threads.
  */
-static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
+static long __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
 	struct sgx_epc_page *page;
+	long left_dirty = 0;
 	LIST_HEAD(dirty);
 	int ret;
 
-	/* dirty_page_list is thread-local, no need for a lock: */
 	while (!list_empty(dirty_page_list)) {
 		if (kthread_should_stop())
-			return;
+			return -ECANCELED;
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
@@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 		} else {
 			/* The page is not yet clean - move to the dirty list. */
 			list_move_tail(&page->list, &dirty);
+			left_dirty++;
 		}
 
 		cond_resched();
 	}
 
 	list_splice(&dirty, dirty_page_list);
+	return left_dirty;
 }
 
 static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
@@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
 
 static int ksgxd(void *p)
 {
+	long ret;
+
 	set_freezable();
 
 	/*
 	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
+	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	if (ret == -ECANCELED)
+		/* kthread stopped */
+		return 0;
 
-	/* sanity check: */
-	WARN_ON(!list_empty(&sgx_dirty_page_list));
+	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	switch (ret) {
+	case 0:
+		/* success, no unsanitized pages */
+		break;
+
+	case -ECANCELED:
+		/* kthread stopped */
+		return 0;
+
+	default:
+		/*
+		 * Never expected to happen in a working driver. If it happens
+		 * the bug is expected to be in the sanitization process, but
+		 * successfully sanitized pages are still valid and driver can
+		 * be used and most importantly debugged without issues. To put
+		 * short, the global state of kernel is not corrupted so no
+		 * reason to do any more complicated rollback.
+		 */
+		pr_err("%ld unsanitized pages\n", ret);
+	}
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31 17:38 ` [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error Jarkko Sakkinen
@ 2022-08-31 20:39   ` Reinette Chatre
  2022-09-01 10:50     ` Huang, Kai
  2022-09-01 21:53     ` Jarkko Sakkinen
  0 siblings, 2 replies; 15+ messages in thread
From: Reinette Chatre @ 2022-08-31 20:39 UTC (permalink / raw)
  To: Jarkko Sakkinen, linux-sgx
  Cc: Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel, stable,
	Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Hi Jarkko,

On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> In sgx_init(), if misc_register() fails or misc_register() succeeds but
> neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> prematurely stopped. This may leave some unsanitized pages, which does
> not matter, because SGX will be disabled for the whole power cycle.
> 
> This triggers WARN_ON() because sgx_dirty_page_list ends up being
> non-empty, and dumps the call stack:
> 
> [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> [    0.268591] ------------[ cut here ]------------
> [    0.268592] WARNING: CPU: 6 PID: 83 at
> arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> [    0.268598] Modules linked in:
> [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> 07/06/2022
> [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> 0000000000000000
> [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> 00000000ffffffff
> [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> ffff8dcd820a4180
> [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> ffffb6c74006bce0
> [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> 0000000000000000
> [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> knlGS:0000000000000000
> [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> 00000000003706e0
> [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> 0000000000000000
> [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> 0000000000000400
> [    0.268622] Call Trace:
> [    0.268624]  <TASK>
> [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [    0.268634]  ? __kthread_parkme+0x36/0x90
> [    0.268637]  kthread+0xe5/0x110
> [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> [    0.268642]  ret_from_fork+0x1f/0x30
> [    0.268647]  </TASK>
> [    0.268648] ---[ end trace 0000000000000000 ]---
> 

Are you still planning to trim this?

> Ultimately this can crash the kernel, if the following is set:
> 
> 	/proc/sys/kernel/panic_on_warn
> 
> In premature stop, print nothing, as the number is by practical means a
> random number. Otherwise, it is an indicator of a bug in the driver, and
> therefore print the number of unsanitized pages with pr_err().

I think that "print the number of unsanitized pages with pr_err()" 
contradicts the patch subject of "Do not consider unsanitized pages
an error".

...

> @@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	long ret;
> +
>  	set_freezable();
>  
>  	/*
>  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (ret == -ECANCELED)
> +		/* kthread stopped */
> +		return 0;
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	switch (ret) {
> +	case 0:
> +		/* success, no unsanitized pages */
> +		break;
> +
> +	case -ECANCELED:
> +		/* kthread stopped */
> +		return 0;
> +
> +	default:
> +		/*
> +		 * Never expected to happen in a working driver. If it happens
> +		 * the bug is expected to be in the sanitization process, but
> +		 * successfully sanitized pages are still valid and driver can
> +		 * be used and most importantly debugged without issues. To put
> +		 * short, the global state of kernel is not corrupted so no
> +		 * reason to do any more complicated rollback.
> +		 */
> +		pr_err("%ld unsanitized pages\n", ret);
> +	}
>  
>  	while (!kthread_should_stop()) {
>  		if (try_to_freeze())


I think I am missing something here. A lot of logic is added here but I
do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
the reclaimer was canceled. I am thus wondering, could the above not be
simplified to something similar to V1:

@@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
 
 static int ksgxd(void *p)
 {
+	unsigned long left_dirty;
+
 	set_freezable();
 
 	/*
@@ -395,10 +402,10 @@ static int ksgxd(void *p)
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
 	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
 
-	/* sanity check: */
-	WARN_ON(!list_empty(&sgx_dirty_page_list));
+	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	if (left_dirty && !kthread_should_stop())
+		pr_err("%lu unsanitized pages\n", left_dirty);
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())


Reinette

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31 20:39   ` Reinette Chatre
@ 2022-09-01 10:50     ` Huang, Kai
  2022-09-01 21:47       ` jarkko
  2022-09-01 21:53     ` Jarkko Sakkinen
  1 sibling, 1 reply; 15+ messages in thread
From: Huang, Kai @ 2022-09-01 10:50 UTC (permalink / raw)
  To: linux-sgx@vger.kernel.org, Chatre, Reinette, jarkko@kernel.org
  Cc: pmenzel@molgen.mpg.de, bp@alien8.de, dave.hansen@linux.intel.com,
	Dhanraj, Vijay, x86@kernel.org, mingo@redhat.com,
	tglx@linutronix.de, hpa@zytor.com, haitao.huang@linux.intel.com,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org

On Wed, 2022-08-31 at 13:39 -0700, Reinette Chatre wrote:
> >   static int ksgxd(void *p)
> >   {
> > +	long ret;
> > +
> >   	set_freezable();
> >   
> >   	/*
> >   	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> >   	 * required for SECS pages, whose child pages blocked EREMOVE.
> >   	 */
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (ret == -ECANCELED)
> > +		/* kthread stopped */
> > +		return 0;
> >   
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	switch (ret) {
> > +	case 0:
> > +		/* success, no unsanitized pages */
> > +		break;
> > +
> > +	case -ECANCELED:
> > +		/* kthread stopped */
> > +		return 0;
> > +
> > +	default:
> > +		/*
> > +		 * Never expected to happen in a working driver. If it
> > happens
> > +		 * the bug is expected to be in the sanitization process,
> > but
> > +		 * successfully sanitized pages are still valid and driver
> > can
> > +		 * be used and most importantly debugged without issues. To
> > put
> > +		 * short, the global state of kernel is not corrupted so no
> > +		 * reason to do any more complicated rollback.
> > +		 */
> > +		pr_err("%ld unsanitized pages\n", ret);
> > +	}
> >   
> >   	while (!kthread_should_stop()) {
> >   		if (try_to_freeze())
> 
> 
> I think I am missing something here. A lot of logic is added here but I
> do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> the reclaimer was canceled. I am thus wondering, could the above not be
> simplified to something similar to V1:
> 
> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	unsigned long left_dirty;
> +
>  	set_freezable();
>  
>  	/*
> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (left_dirty && !kthread_should_stop())
> +		pr_err("%lu unsanitized pages\n", left_dirty);
>  

This basically means driver bug if I understand correctly.  To be consistent
with the behaviour of existing code, how about just WARN()?
	
	...
	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
	WARN_ON(left_dirty && !kthread_should_stop());

It seems there's little value to print out the unsanitized pages here.  The
existing code doesn't print it anyway.

-- 
Thanks,
-Kai



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-01 10:50     ` Huang, Kai
@ 2022-09-01 21:47       ` jarkko
  0 siblings, 0 replies; 15+ messages in thread
From: jarkko @ 2022-09-01 21:47 UTC (permalink / raw)
  To: Huang, Kai
  Cc: linux-sgx@vger.kernel.org, Chatre, Reinette,
	pmenzel@molgen.mpg.de, bp@alien8.de, dave.hansen@linux.intel.com,
	Dhanraj, Vijay, x86@kernel.org, mingo@redhat.com,
	tglx@linutronix.de, hpa@zytor.com, haitao.huang@linux.intel.com,
	stable@vger.kernel.org, linux-kernel@vger.kernel.org

On Thu, Sep 01, 2022 at 10:50:07AM +0000, Huang, Kai wrote:
> On Wed, 2022-08-31 at 13:39 -0700, Reinette Chatre wrote:
> > >   static int ksgxd(void *p)
> > >   {
> > > +	long ret;
> > > +
> > >   	set_freezable();
> > >   
> > >   	/*
> > >   	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> > >   	 * required for SECS pages, whose child pages blocked EREMOVE.
> > >   	 */
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	if (ret == -ECANCELED)
> > > +		/* kthread stopped */
> > > +		return 0;
> > >   
> > > -	/* sanity check: */
> > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	switch (ret) {
> > > +	case 0:
> > > +		/* success, no unsanitized pages */
> > > +		break;
> > > +
> > > +	case -ECANCELED:
> > > +		/* kthread stopped */
> > > +		return 0;
> > > +
> > > +	default:
> > > +		/*
> > > +		 * Never expected to happen in a working driver. If it
> > > happens
> > > +		 * the bug is expected to be in the sanitization process,
> > > but
> > > +		 * successfully sanitized pages are still valid and driver
> > > can
> > > +		 * be used and most importantly debugged without issues. To
> > > put
> > > +		 * short, the global state of kernel is not corrupted so no
> > > +		 * reason to do any more complicated rollback.
> > > +		 */
> > > +		pr_err("%ld unsanitized pages\n", ret);
> > > +	}
> > >   
> > >   	while (!kthread_should_stop()) {
> > >   		if (try_to_freeze())
> > 
> > 
> > I think I am missing something here. A lot of logic is added here but I
> > do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> > the reclaimer was canceled. I am thus wondering, could the above not be
> > simplified to something similar to V1:
> > 
> > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> >  
> >  static int ksgxd(void *p)
> >  {
> > +	unsigned long left_dirty;
> > +
> >  	set_freezable();
> >  
> >  	/*
> > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >  	 */
> >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> >  
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (left_dirty && !kthread_should_stop())
> > +		pr_err("%lu unsanitized pages\n", left_dirty);
> >  
> 
> This basically means driver bug if I understand correctly.  To be consistent
> with the behaviour of existing code, how about just WARN()?
> 	
> 	...
> 	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> 	WARN_ON(left_dirty && !kthread_should_stop());
> 
> It seems there's little value to print out the unsanitized pages here.  The
> existing code doesn't print it anyway.

Using WARN IMHO here is too strong measure, given that
it tear down the whole kernel, if panic_on_warn is enabled.

For debugging, any information is useful information, so
would not make sense not print the number of pages, if 
that is available. That could very well point out the
issue why all pages are not sanitized if there was a bug.

BR, Jarkko

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-08-31 20:39   ` Reinette Chatre
  2022-09-01 10:50     ` Huang, Kai
@ 2022-09-01 21:53     ` Jarkko Sakkinen
  2022-09-01 21:56       ` Jarkko Sakkinen
  2022-09-01 22:34       ` Reinette Chatre
  1 sibling, 2 replies; 15+ messages in thread
From: Jarkko Sakkinen @ 2022-09-01 21:53 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> > In sgx_init(), if misc_register() fails or misc_register() succeeds but
> > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > prematurely stopped. This may leave some unsanitized pages, which does
> > not matter, because SGX will be disabled for the whole power cycle.
> > 
> > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > non-empty, and dumps the call stack:
> > 
> > [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> > [    0.268591] ------------[ cut here ]------------
> > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> > [    0.268598] Modules linked in:
> > [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> > [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> > 07/06/2022
> > [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> > [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> > 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> > ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> > [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> > [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> > 0000000000000000
> > [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> > 00000000ffffffff
> > [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> > ffff8dcd820a4180
> > [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> > ffffb6c74006bce0
> > [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> > 0000000000000000
> > [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> > knlGS:0000000000000000
> > [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> > 00000000003706e0
> > [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > 0000000000000000
> > [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > 0000000000000400
> > [    0.268622] Call Trace:
> > [    0.268624]  <TASK>
> > [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> > [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> > [    0.268634]  ? __kthread_parkme+0x36/0x90
> > [    0.268637]  kthread+0xe5/0x110
> > [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> > [    0.268642]  ret_from_fork+0x1f/0x30
> > [    0.268647]  </TASK>
> > [    0.268648] ---[ end trace 0000000000000000 ]---
> > 
> 
> Are you still planning to trim this?
> 
> > Ultimately this can crash the kernel, if the following is set:
> > 
> > 	/proc/sys/kernel/panic_on_warn
> > 
> > In premature stop, print nothing, as the number is by practical means a
> > random number. Otherwise, it is an indicator of a bug in the driver, and
> > therefore print the number of unsanitized pages with pr_err().
> 
> I think that "print the number of unsanitized pages with pr_err()" 
> contradicts the patch subject of "Do not consider unsanitized pages
> an error".
> 
> ...
> 
> > @@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
> >  
> >  static int ksgxd(void *p)
> >  {
> > +	long ret;
> > +
> >  	set_freezable();
> >  
> >  	/*
> >  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >  	 */
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (ret == -ECANCELED)
> > +		/* kthread stopped */
> > +		return 0;
> >  
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	switch (ret) {
> > +	case 0:
> > +		/* success, no unsanitized pages */
> > +		break;
> > +
> > +	case -ECANCELED:
> > +		/* kthread stopped */
> > +		return 0;
> > +
> > +	default:
> > +		/*
> > +		 * Never expected to happen in a working driver. If it happens
> > +		 * the bug is expected to be in the sanitization process, but
> > +		 * successfully sanitized pages are still valid and driver can
> > +		 * be used and most importantly debugged without issues. To put
> > +		 * short, the global state of kernel is not corrupted so no
> > +		 * reason to do any more complicated rollback.
> > +		 */
> > +		pr_err("%ld unsanitized pages\n", ret);
> > +	}
> >  
> >  	while (!kthread_should_stop()) {
> >  		if (try_to_freeze())
> 
> 
> I think I am missing something here. A lot of logic is added here but I
> do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> the reclaimer was canceled. I am thus wondering, could the above not be
> simplified to something similar to V1:
> 
> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	unsigned long left_dirty;
> +
>  	set_freezable();
>  
>  	/*
> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
>  	__sgx_sanitize_pages(&sgx_dirty_page_list);

IMHO, would make sense also to have here:

        if (!kthread_should_stop())
                return 0;

> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (left_dirty && !kthread_should_stop())
> +		pr_err("%lu unsanitized pages\n", left_dirty);

That would be incorrect, if the function returned
because of kthread stopped.

If you do the check here you already have a window
where kthread could have been stopped anyhow.

So even this would be less correct:

        if (kthreas_should_stop()) {
                return 0;
        }  else if (left_dirty) {
                pr_err("%lu unsanitized pages\n", left_dirty);
        }

So in the end you end as complicated and less correct
fix. This all is explained in the commit message.

If you unconditionally print error, you don't have
a meaning for the number of unsanitized pags.

BR, Jarkko

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-01 21:53     ` Jarkko Sakkinen
@ 2022-09-01 21:56       ` Jarkko Sakkinen
  2022-09-01 22:01         ` Jarkko Sakkinen
  2022-09-01 22:34       ` Reinette Chatre
  1 sibling, 1 reply; 15+ messages in thread
From: Jarkko Sakkinen @ 2022-09-01 21:56 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Fri, Sep 02, 2022 at 12:53:55AM +0300, Jarkko Sakkinen wrote:
> On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
> > Hi Jarkko,
> > 
> > On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> > > In sgx_init(), if misc_register() fails or misc_register() succeeds but
> > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > prematurely stopped. This may leave some unsanitized pages, which does
> > > not matter, because SGX will be disabled for the whole power cycle.
> > > 
> > > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > > non-empty, and dumps the call stack:
> > > 
> > > [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> > > [    0.268591] ------------[ cut here ]------------
> > > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> > > [    0.268598] Modules linked in:
> > > [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> > > [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> > > 07/06/2022
> > > [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> > > [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> > > 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> > > ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> > > [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> > > [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> > > 0000000000000000
> > > [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> > > 00000000ffffffff
> > > [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> > > ffff8dcd820a4180
> > > [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> > > ffffb6c74006bce0
> > > [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> > > 0000000000000000
> > > [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> > > knlGS:0000000000000000
> > > [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> > > 00000000003706e0
> > > [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > 0000000000000000
> > > [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > 0000000000000400
> > > [    0.268622] Call Trace:
> > > [    0.268624]  <TASK>
> > > [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> > > [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> > > [    0.268634]  ? __kthread_parkme+0x36/0x90
> > > [    0.268637]  kthread+0xe5/0x110
> > > [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> > > [    0.268642]  ret_from_fork+0x1f/0x30
> > > [    0.268647]  </TASK>
> > > [    0.268648] ---[ end trace 0000000000000000 ]---
> > > 
> > 
> > Are you still planning to trim this?
> > 
> > > Ultimately this can crash the kernel, if the following is set:
> > > 
> > > 	/proc/sys/kernel/panic_on_warn
> > > 
> > > In premature stop, print nothing, as the number is by practical means a
> > > random number. Otherwise, it is an indicator of a bug in the driver, and
> > > therefore print the number of unsanitized pages with pr_err().
> > 
> > I think that "print the number of unsanitized pages with pr_err()" 
> > contradicts the patch subject of "Do not consider unsanitized pages
> > an error".
> > 
> > ...
> > 
> > > @@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
> > >  
> > >  static int ksgxd(void *p)
> > >  {
> > > +	long ret;
> > > +
> > >  	set_freezable();
> > >  
> > >  	/*
> > >  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> > >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> > >  	 */
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	if (ret == -ECANCELED)
> > > +		/* kthread stopped */
> > > +		return 0;
> > >  
> > > -	/* sanity check: */
> > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	switch (ret) {
> > > +	case 0:
> > > +		/* success, no unsanitized pages */
> > > +		break;
> > > +
> > > +	case -ECANCELED:
> > > +		/* kthread stopped */
> > > +		return 0;
> > > +
> > > +	default:
> > > +		/*
> > > +		 * Never expected to happen in a working driver. If it happens
> > > +		 * the bug is expected to be in the sanitization process, but
> > > +		 * successfully sanitized pages are still valid and driver can
> > > +		 * be used and most importantly debugged without issues. To put
> > > +		 * short, the global state of kernel is not corrupted so no
> > > +		 * reason to do any more complicated rollback.
> > > +		 */
> > > +		pr_err("%ld unsanitized pages\n", ret);
> > > +	}
> > >  
> > >  	while (!kthread_should_stop()) {
> > >  		if (try_to_freeze())
> > 
> > 
> > I think I am missing something here. A lot of logic is added here but I
> > do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> > the reclaimer was canceled. I am thus wondering, could the above not be
> > simplified to something similar to V1:
> > 
> > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> >  
> >  static int ksgxd(void *p)
> >  {
> > +	unsigned long left_dirty;
> > +
> >  	set_freezable();
> >  
> >  	/*
> > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >  	 */
> >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> 
> IMHO, would make sense also to have here:
> 
>         if (!kthread_should_stop())
>                 return 0;
> 
> > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> >  
> > -	/* sanity check: */
> > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > +	if (left_dirty && !kthread_should_stop())
> > +		pr_err("%lu unsanitized pages\n", left_dirty);
> 
> That would be incorrect, if the function returned
> because of kthread stopped.
> 
> If you do the check here you already have a window
> where kthread could have been stopped anyhow.
> 
> So even this would be less correct:
> 
>         if (kthreas_should_stop()) {
>                 return 0;
>         }  else if (left_dirty) {
>                 pr_err("%lu unsanitized pages\n", left_dirty);
>         }
> 
> So in the end you end as complicated and less correct
> fix. This all is explained in the commit message.
> 
> If you unconditionally print error, you don't have
> a meaning for the number of unsanitized pags.

If you add my long comment explaining the error case, the SLOC
size is almost the same. That takes most space in my patch.

BR, Jarkko

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-01 21:56       ` Jarkko Sakkinen
@ 2022-09-01 22:01         ` Jarkko Sakkinen
  0 siblings, 0 replies; 15+ messages in thread
From: Jarkko Sakkinen @ 2022-09-01 22:01 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Fri, Sep 02, 2022 at 12:56:34AM +0300, Jarkko Sakkinen wrote:
> On Fri, Sep 02, 2022 at 12:53:55AM +0300, Jarkko Sakkinen wrote:
> > On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
> > > Hi Jarkko,
> > > 
> > > On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> > > > In sgx_init(), if misc_register() fails or misc_register() succeeds but
> > > > neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
> > > > prematurely stopped. This may leave some unsanitized pages, which does
> > > > not matter, because SGX will be disabled for the whole power cycle.
> > > > 
> > > > This triggers WARN_ON() because sgx_dirty_page_list ends up being
> > > > non-empty, and dumps the call stack:
> > > > 
> > > > [    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
> > > > [    0.268591] ------------[ cut here ]------------
> > > > [    0.268592] WARNING: CPU: 6 PID: 83 at
> > > > arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
> > > > [    0.268598] Modules linked in:
> > > > [    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
> > > > [    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
> > > > 07/06/2022
> > > > [    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
> > > > [    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
> > > > 84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
> > > > ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
> > > > [    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
> > > > [    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
> > > > 0000000000000000
> > > > [    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
> > > > 00000000ffffffff
> > > > [    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
> > > > ffff8dcd820a4180
> > > > [    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
> > > > ffffb6c74006bce0
> > > > [    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
> > > > 0000000000000000
> > > > [    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
> > > > knlGS:0000000000000000
> > > > [    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > > > [    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
> > > > 00000000003706e0
> > > > [    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
> > > > 0000000000000000
> > > > [    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
> > > > 0000000000000400
> > > > [    0.268622] Call Trace:
> > > > [    0.268624]  <TASK>
> > > > [    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
> > > > [    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> > > > [    0.268634]  ? __kthread_parkme+0x36/0x90
> > > > [    0.268637]  kthread+0xe5/0x110
> > > > [    0.268639]  ? kthread_complete_and_exit+0x20/0x20
> > > > [    0.268642]  ret_from_fork+0x1f/0x30
> > > > [    0.268647]  </TASK>
> > > > [    0.268648] ---[ end trace 0000000000000000 ]---
> > > > 
> > > 
> > > Are you still planning to trim this?
> > > 
> > > > Ultimately this can crash the kernel, if the following is set:
> > > > 
> > > > 	/proc/sys/kernel/panic_on_warn
> > > > 
> > > > In premature stop, print nothing, as the number is by practical means a
> > > > random number. Otherwise, it is an indicator of a bug in the driver, and
> > > > therefore print the number of unsanitized pages with pr_err().
> > > 
> > > I think that "print the number of unsanitized pages with pr_err()" 
> > > contradicts the patch subject of "Do not consider unsanitized pages
> > > an error".
> > > 
> > > ...
> > > 
> > > > @@ -388,17 +393,40 @@ void sgx_reclaim_direct(void)
> > > >  
> > > >  static int ksgxd(void *p)
> > > >  {
> > > > +	long ret;
> > > > +
> > > >  	set_freezable();
> > > >  
> > > >  	/*
> > > >  	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
> > > >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> > > >  	 */
> > > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > +	if (ret == -ECANCELED)
> > > > +		/* kthread stopped */
> > > > +		return 0;
> > > >  
> > > > -	/* sanity check: */
> > > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > > +	ret = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > > +	switch (ret) {
> > > > +	case 0:
> > > > +		/* success, no unsanitized pages */
> > > > +		break;
> > > > +
> > > > +	case -ECANCELED:
> > > > +		/* kthread stopped */
> > > > +		return 0;
> > > > +
> > > > +	default:
> > > > +		/*
> > > > +		 * Never expected to happen in a working driver. If it happens
> > > > +		 * the bug is expected to be in the sanitization process, but
> > > > +		 * successfully sanitized pages are still valid and driver can
> > > > +		 * be used and most importantly debugged without issues. To put
> > > > +		 * short, the global state of kernel is not corrupted so no
> > > > +		 * reason to do any more complicated rollback.
> > > > +		 */
> > > > +		pr_err("%ld unsanitized pages\n", ret);
> > > > +	}
> > > >  
> > > >  	while (!kthread_should_stop()) {
> > > >  		if (try_to_freeze())
> > > 
> > > 
> > > I think I am missing something here. A lot of logic is added here but I
> > > do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> > > the reclaimer was canceled. I am thus wondering, could the above not be
> > > simplified to something similar to V1:
> > > 
> > > @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> > >  
> > >  static int ksgxd(void *p)
> > >  {
> > > +	unsigned long left_dirty;
> > > +
> > >  	set_freezable();
> > >  
> > >  	/*
> > > @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> > >  	 * required for SECS pages, whose child pages blocked EREMOVE.
> > >  	 */
> > >  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > 
> > IMHO, would make sense also to have here:
> > 
> >         if (!kthread_should_stop())
> >                 return 0;
> > 
> > > -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > >  
> > > -	/* sanity check: */
> > > -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> > > +	if (left_dirty && !kthread_should_stop())
> > > +		pr_err("%lu unsanitized pages\n", left_dirty);
> > 
> > That would be incorrect, if the function returned
> > because of kthread stopped.
> > 
> > If you do the check here you already have a window
> > where kthread could have been stopped anyhow.
> > 
> > So even this would be less correct:
> > 
> >         if (kthreas_should_stop()) {
> >                 return 0;
> >         }  else if (left_dirty) {
> >                 pr_err("%lu unsanitized pages\n", left_dirty);
> >         }
> > 
> > So in the end you end as complicated and less correct
> > fix. This all is explained in the commit message.
> > 
> > If you unconditionally print error, you don't have
> > a meaning for the number of unsanitized pags.
> 
> If you add my long comment explaining the error case, the SLOC
> size is almost the same. That takes most space in my patch.

Printing pages if the thread was stopped is just making
the bug look different and have less output, it does not
fix anything, except prevent kernel panic.

BR, Jarkko

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-01 21:53     ` Jarkko Sakkinen
  2022-09-01 21:56       ` Jarkko Sakkinen
@ 2022-09-01 22:34       ` Reinette Chatre
  2022-09-01 23:56         ` Jarkko Sakkinen
  1 sibling, 1 reply; 15+ messages in thread
From: Reinette Chatre @ 2022-09-01 22:34 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Hi Jarkko,

On 9/1/2022 2:53 PM, Jarkko Sakkinen wrote:
> On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
>> On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:

>> I think I am missing something here. A lot of logic is added here but I
>> do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
>> the reclaimer was canceled. I am thus wondering, could the above not be
>> simplified to something similar to V1:
>>
>> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>>  
>>  static int ksgxd(void *p)
>>  {
>> +	unsigned long left_dirty;
>> +
>>  	set_freezable();
>>  
>>  	/*
>> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>>  	 */
>>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> 
> IMHO, would make sense also to have here:
> 
>         if (!kthread_should_stop())
>                 return 0;
> 

Would this not prematurely stop the thread when it should not be?

>> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>>  
>> -	/* sanity check: */
>> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
>> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
>> +	if (left_dirty && !kthread_should_stop())
>> +		pr_err("%lu unsanitized pages\n", left_dirty);
> 
> That would be incorrect, if the function returned
> because of kthread stopped.


I should have highlighted this but in my example I changed
left_dirty to be "unsigned long" with the intention that the
"return -ECANCELED" is replaced with "return 0".

__sgx_sanitize_pages() returns 0 when it exits because of
kthread stopped.

To elaborate I was thinking about:

+static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
+	unsigned long left_dirty = 0;
 	struct sgx_epc_page *page;
 	LIST_HEAD(dirty);
 	int ret;
 
-	/* dirty_page_list is thread-local, no need for a lock: */
 	while (!list_empty(dirty_page_list)) {
 		if (kthread_should_stop())
-			return;
+			return 0;
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
@@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 		} else {
 			/* The page is not yet clean - move to the dirty list. */
 			list_move_tail(&page->list, &dirty);
+			left_dirty++;
 		}
 
 		cond_resched();
 	}
 
 	list_splice(&dirty, dirty_page_list);
+	return left_dirty;
 }


and then with what I had in previous email the checks should work:

@@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
 
 static int ksgxd(void *p)
 {
+	unsigned long left_dirty;
+
 	set_freezable();
 
 	/*
@@ -395,10 +402,10 @@ static int ksgxd(void *p)
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
 	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
 
-	/* sanity check: */
-	WARN_ON(!list_empty(&sgx_dirty_page_list));
+	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	if (left_dirty && !kthread_should_stop())
+		pr_err("%lu unsanitized pages\n", left_dirty);
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())


> 
> If you do the check here you already have a window
> where kthread could have been stopped anyhow.
> 
> So even this would be less correct:
> 
>         if (kthreas_should_stop()) {
>                 return 0;
>         }  else if (left_dirty) {
>                 pr_err("%lu unsanitized pages\n", left_dirty);
>         }
> 
> So in the end you end as complicated and less correct
> fix. This all is explained in the commit message.
> 
> If you unconditionally print error, you don't have
> a meaning for the number of unsanitized pags.

Understood that the goal is to only print the
number of unsanitized pages if ksgxd has not been
stopped prematurely.

Reinette 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-01 22:34       ` Reinette Chatre
@ 2022-09-01 23:56         ` Jarkko Sakkinen
  2022-09-02 13:26           ` Jarkko Sakkinen
  0 siblings, 1 reply; 15+ messages in thread
From: Jarkko Sakkinen @ 2022-09-01 23:56 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Thu, Sep 01, 2022 at 03:34:10PM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 9/1/2022 2:53 PM, Jarkko Sakkinen wrote:
> > On Wed, Aug 31, 2022 at 01:39:53PM -0700, Reinette Chatre wrote:
> >> On 8/31/2022 10:38 AM, Jarkko Sakkinen wrote:
> 
> >> I think I am missing something here. A lot of logic is added here but I
> >> do not see why it is necessary.  ksgxd() knows via kthread_should_stop() if
> >> the reclaimer was canceled. I am thus wondering, could the above not be
> >> simplified to something similar to V1:
> >>
> >> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
> >>  
> >>  static int ksgxd(void *p)
> >>  {
> >> +	unsigned long left_dirty;
> >> +
> >>  	set_freezable();
> >>  
> >>  	/*
> >> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
> >>  	 * required for SECS pages, whose child pages blocked EREMOVE.
> >>  	 */
> >>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> > 
> > IMHO, would make sense also to have here:
> > 
> >         if (!kthread_should_stop())
> >                 return 0;
> > 
> 
> Would this not prematurely stop the thread when it should not be?
> 
> >> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
> >>  
> >> -	/* sanity check: */
> >> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> >> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> >> +	if (left_dirty && !kthread_should_stop())
> >> +		pr_err("%lu unsanitized pages\n", left_dirty);
> > 
> > That would be incorrect, if the function returned
> > because of kthread stopped.
> 
> 
> I should have highlighted this but in my example I changed
> left_dirty to be "unsigned long" with the intention that the
> "return -ECANCELED" is replaced with "return 0".

It wasn't supposed to be, it's an error. Thanks for spotting
that.

> 
> __sgx_sanitize_pages() returns 0 when it exits because of
> kthread stopped.
> 
> To elaborate I was thinking about:
> 
> +static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
>  {
> +	unsigned long left_dirty = 0;
>  	struct sgx_epc_page *page;
>  	LIST_HEAD(dirty);
>  	int ret;
>  
> -	/* dirty_page_list is thread-local, no need for a lock: */
>  	while (!list_empty(dirty_page_list)) {
>  		if (kthread_should_stop())
> -			return;
> +			return 0;
>  
>  		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
>  
> @@ -92,12 +95,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
>  		} else {
>  			/* The page is not yet clean - move to the dirty list. */
>  			list_move_tail(&page->list, &dirty);
> +			left_dirty++;
>  		}
>  
>  		cond_resched();
>  	}
>  
>  	list_splice(&dirty, dirty_page_list);
> +	return left_dirty;
>  }
> 
> 
> and then with what I had in previous email the checks should work:
> 
> @@ -388,6 +393,8 @@ void sgx_reclaim_direct(void)
>  
>  static int ksgxd(void *p)
>  {
> +	unsigned long left_dirty;
> +
>  	set_freezable();
>  
>  	/*
> @@ -395,10 +402,10 @@ static int ksgxd(void *p)
>  	 * required for SECS pages, whose child pages blocked EREMOVE.
>  	 */
>  	__sgx_sanitize_pages(&sgx_dirty_page_list);
> -	__sgx_sanitize_pages(&sgx_dirty_page_list);
>  
> -	/* sanity check: */
> -	WARN_ON(!list_empty(&sgx_dirty_page_list));
> +	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
> +	if (left_dirty && !kthread_should_stop())
> +		pr_err("%lu unsanitized pages\n", left_dirty);
>  
>  	while (!kthread_should_stop()) {
>  		if (try_to_freeze())
> 
> 
> > 
> > If you do the check here you already have a window
> > where kthread could have been stopped anyhow.
> > 
> > So even this would be less correct:
> > 
> >         if (kthreas_should_stop()) {
> >                 return 0;
> >         }  else if (left_dirty) {
> >                 pr_err("%lu unsanitized pages\n", left_dirty);
> >         }
> > 
> > So in the end you end as complicated and less correct
> > fix. This all is explained in the commit message.
> > 
> > If you unconditionally print error, you don't have
> > a meaning for the number of unsanitized pags.
> 
> Understood that the goal is to only print the
> number of unsanitized pages if ksgxd has not been
> stopped prematurely.

Yeah, and thus give as useful information for sysadmin/developer
as we can.

BR, Jarkko

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-01 23:56         ` Jarkko Sakkinen
@ 2022-09-02 13:26           ` Jarkko Sakkinen
  2022-09-02 15:53             ` Jarkko Sakkinen
  0 siblings, 1 reply; 15+ messages in thread
From: Jarkko Sakkinen @ 2022-09-02 13:26 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

[-- Attachment #1: Type: text/plain, Size: 441 bytes --]

On Fri, Sep 02, 2022 at 02:57:01AM +0300, Jarkko Sakkinen wrote:
> > Understood that the goal is to only print the
> > number of unsanitized pages if ksgxd has not been
> > stopped prematurely.
> 
> Yeah, and thus give as useful information for sysadmin/developer
> as we can.

I figured out how to get best of both worlds. See the
attached patch. I'll send formal series later on.

Keeps the accurancy, just postpones the exit.

BR, Jarkko

[-- Attachment #2: tmp.patch --]
[-- Type: text/plain, Size: 6614 bytes --]

From 31c43c3276667cef0a7f0687d489552f26da877b Mon Sep 17 00:00:00 2001
From: Jarkko Sakkinen <jarkko@kernel.org>
Date: Thu, 25 Aug 2022 08:12:30 +0300
Subject: [PATCH] x86/sgx: Do not consider unsanitized pages an error

In sgx_init(), if misc_register() fails or misc_register() succeeds but
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped. This may leave some unsanitized pages, which does
not matter, because SGX will be disabled for the whole power cycle.

This triggers WARN_ON() because sgx_dirty_page_list ends up being
non-empty, and dumps the call stack:

[    0.268103] sgx: EPC section 0x40200000-0x45f7ffff
[    0.268591] ------------[ cut here ]------------
[    0.268592] WARNING: CPU: 6 PID: 83 at
arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0
[    0.268598] Modules linked in:
[    0.268600] CPU: 6 PID: 83 Comm: ksgxd Not tainted 6.0.0-rc2 #382
[    0.268603] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0
07/06/2022
[    0.268604] RIP: 0010:ksgxd+0x1b7/0x1d0
[    0.268607] Code: ff e9 f2 fe ff ff 48 89 df e8 75 07 0e 00 84 c0 0f
84 c3 fe ff ff 31 ff e8 e6 07 0e 00 84 c0 0f 85 94 fe ff ff e9 af fe ff
ff <0f> 0b e9 7f fe ff ff e8 dd 9c 95 00 66 66 2e 0f 1f 84 00 00 00 00
[    0.268608] RSP: 0000:ffffb6c7404f3ed8 EFLAGS: 00010287
[    0.268610] RAX: ffffb6c740431a10 RBX: ffff8dcd8117b400 RCX:
0000000000000000
[    0.268612] RDX: 0000000080000000 RSI: ffffb6c7404319d0 RDI:
00000000ffffffff
[    0.268613] RBP: ffff8dcd820a4d80 R08: ffff8dcd820a4180 R09:
ffff8dcd820a4180
[    0.268614] R10: 0000000000000000 R11: 0000000000000006 R12:
ffffb6c74006bce0
[    0.268615] R13: ffff8dcd80e63880 R14: ffffffffa8a60f10 R15:
0000000000000000
[    0.268616] FS:  0000000000000000(0000) GS:ffff8dcf25580000(0000)
knlGS:0000000000000000
[    0.268617] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    0.268619] CR2: 0000000000000000 CR3: 0000000213410001 CR4:
00000000003706e0
[    0.268620] DR0: 0000000000000000 DR1: 0000000000000000 DR2:
0000000000000000
[    0.268621] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7:
0000000000000400
[    0.268622] Call Trace:
[    0.268624]  <TASK>
[    0.268627]  ? _raw_spin_lock_irqsave+0x24/0x60
[    0.268632]  ? _raw_spin_unlock_irqrestore+0x23/0x40
[    0.268634]  ? __kthread_parkme+0x36/0x90
[    0.268637]  kthread+0xe5/0x110
[    0.268639]  ? kthread_complete_and_exit+0x20/0x20
[    0.268642]  ret_from_fork+0x1f/0x30
[    0.268647]  </TASK>
[    0.268648] ---[ end trace 0000000000000000 ]---

Ultimately this can crash the kernel, if the following is set:

	/proc/sys/kernel/panic_on_warn

In premature stop, print nothing, as the number is by practical means a
random number. Otherwise, it is an indicator of a bug in the driver, and
therefore print the number of unsanitized pages with pr_err().

Link: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u
Fixes: 51ab30eb2ad4 ("x86/sgx: Replace section->init_laundry_list with sgx_dirty_page_list")
Cc: stable@vger.kernel.org # v5.13+
Reported-by: Paul Menzel <pmenzel@molgen.mpg.de>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>

v6:
- Address Reinette's feedback:
  https://lore.kernel.org/linux-sgx/Yw6%2FiTzSdSw%2FY%2FVO@kernel.org/

v5:
- Add the klog dump and sysctl option to the commit message.

v4:
- Explain expectations for dirty_page_list in the function header, instead
  of an inline comment.
- Improve commit message to explain the conditions better.
- Return the number of pages left dirty to ksgxd() and print warning after
  the 2nd call, if there are any.

v3:
- Remove WARN_ON().
- Tuned comments and the commit message a bit.

v2:
- Replaced WARN_ON() with optional pr_info() inside
  __sgx_sanitize_pages().
- Rewrote the commit message.
- Added the fixes tag.
---
 arch/x86/kernel/cpu/sgx/main.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 515e2a5f25bb..f29dcaddd140 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -49,17 +49,23 @@ static LIST_HEAD(sgx_dirty_page_list);
  * Reset post-kexec EPC pages to the uninitialized state. The pages are removed
  * from the input list, and made available for the page allocator. SECS pages
  * prepending their children in the input list are left intact.
+ *
+ * Contents of the @dirty_page_list must be thread-local, i.e.
+ * not shared by multiple threads.
+ *
+ * Return 0 when sanitization was successful or kthread was stopped, and the
+ * number of unsanitized pages otherwise.
  */
-static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
+static unsigned long __sgx_sanitize_pages(struct list_head *dirty_page_list)
 {
 	struct sgx_epc_page *page;
+	long left_dirty = 0;
 	LIST_HEAD(dirty);
 	int ret;
 
-	/* dirty_page_list is thread-local, no need for a lock: */
 	while (!list_empty(dirty_page_list)) {
 		if (kthread_should_stop())
-			return;
+			return 0;
 
 		page = list_first_entry(dirty_page_list, struct sgx_epc_page, list);
 
@@ -92,12 +98,14 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 		} else {
 			/* The page is not yet clean - move to the dirty list. */
 			list_move_tail(&page->list, &dirty);
+			left_dirty++;
 		}
 
 		cond_resched();
 	}
 
 	list_splice(&dirty, dirty_page_list);
+	return left_dirty;
 }
 
 static bool sgx_reclaimer_age(struct sgx_epc_page *epc_page)
@@ -388,17 +396,28 @@ void sgx_reclaim_direct(void)
 
 static int ksgxd(void *p)
 {
+	unsigned long left_dirty;
+
 	set_freezable();
 
 	/*
 	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
+	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	pr_debug("%ld unsanitized pages\n", left_dirty);
 
-	/* sanity check: */
-	WARN_ON(!list_empty(&sgx_dirty_page_list));
+	left_dirty = __sgx_sanitize_pages(&sgx_dirty_page_list);
+	/*
+	 * Never expected to happen in a working driver. If it happens the bug
+	 * is expected to be in the sanitization process, but successfully
+	 * sanitized pages are still valid and driver can be used and most
+	 * importantly debugged without issues. To put short, the global state
+	 * of kernel is not corrupted so no reason to do any more complicated
+	 * rollback.
+	 */
+	if (ret)
+		pr_err("%ld unsanitized pages\n", left_dirty);
 
 	while (!kthread_should_stop()) {
 		if (try_to_freeze())
-- 
2.37.2


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-02 13:26           ` Jarkko Sakkinen
@ 2022-09-02 15:53             ` Jarkko Sakkinen
  2022-09-02 16:08               ` Reinette Chatre
  0 siblings, 1 reply; 15+ messages in thread
From: Jarkko Sakkinen @ 2022-09-02 15:53 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Fri, Sep 02, 2022 at 04:26:51PM +0300, Jarkko Sakkinen wrote:
> +	if (ret)
> +		pr_err("%ld unsanitized pages\n", left_dirty);

Yeah, I know, should be 'left_dirty'. I just quickly drafted
the patch for the email.

BR, Jarkko

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-02 15:53             ` Jarkko Sakkinen
@ 2022-09-02 16:08               ` Reinette Chatre
  2022-09-02 16:30                 ` Jarkko Sakkinen
  0 siblings, 1 reply; 15+ messages in thread
From: Reinette Chatre @ 2022-09-02 16:08 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Hi Jarkko,

On 9/2/2022 8:53 AM, Jarkko Sakkinen wrote:
> On Fri, Sep 02, 2022 at 04:26:51PM +0300, Jarkko Sakkinen wrote:
>> +	if (ret)
>> +		pr_err("%ld unsanitized pages\n", left_dirty);
> 
> Yeah, I know, should be 'left_dirty'. I just quickly drafted
> the patch for the email.
> 

No problem - you did mention that it was an informal patch.

(btw ... also watch out for the long local parameter returned
as an unsigned long and the signed vs unsigned printing
format string.) I also continue to recommend that you trim
that backtrace ... this patch is heading to x86 area where
this is required.

Reinette

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-02 16:08               ` Reinette Chatre
@ 2022-09-02 16:30                 ` Jarkko Sakkinen
  2022-09-02 17:38                   ` Reinette Chatre
  0 siblings, 1 reply; 15+ messages in thread
From: Jarkko Sakkinen @ 2022-09-02 16:30 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Fri, Sep 02, 2022 at 09:08:23AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 9/2/2022 8:53 AM, Jarkko Sakkinen wrote:
> > On Fri, Sep 02, 2022 at 04:26:51PM +0300, Jarkko Sakkinen wrote:
> >> +	if (ret)
> >> +		pr_err("%ld unsanitized pages\n", left_dirty);
> > 
> > Yeah, I know, should be 'left_dirty'. I just quickly drafted
> > the patch for the email.
> > 
> 
> No problem - you did mention that it was an informal patch.
> 
> (btw ... also watch out for the long local parameter returned
> as an unsigned long and the signed vs unsigned printing
> format string.) I also continue to recommend that you trim

Point taken.

> that backtrace ... this patch is heading to x86 area where
> this is required.

Should I just cut the whole stack trace, and leave the
part before it?

BR, Jarkko

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-02 16:30                 ` Jarkko Sakkinen
@ 2022-09-02 17:38                   ` Reinette Chatre
  2022-09-02 19:20                     ` Jarkko Sakkinen
  0 siblings, 1 reply; 15+ messages in thread
From: Reinette Chatre @ 2022-09-02 17:38 UTC (permalink / raw)
  To: Jarkko Sakkinen
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

Hi Jarkko,

On 9/2/2022 9:30 AM, Jarkko Sakkinen wrote:
> On Fri, Sep 02, 2022 at 09:08:23AM -0700, Reinette Chatre wrote:
>> Hi Jarkko,
>>
>> On 9/2/2022 8:53 AM, Jarkko Sakkinen wrote:
>>> On Fri, Sep 02, 2022 at 04:26:51PM +0300, Jarkko Sakkinen wrote:
>>>> +	if (ret)
>>>> +		pr_err("%ld unsanitized pages\n", left_dirty);
>>>
>>> Yeah, I know, should be 'left_dirty'. I just quickly drafted
>>> the patch for the email.
>>>
>>
>> No problem - you did mention that it was an informal patch.
>>
>> (btw ... also watch out for the long local parameter returned
>> as an unsigned long and the signed vs unsigned printing
>> format string.) I also continue to recommend that you trim
> 
> Point taken.
> 
>> that backtrace ... this patch is heading to x86 area where
>> this is required.
> 
> Should I just cut the whole stack trace, and leave the
> part before it?

The trace is printed because of a WARN_ON() in the code.
I do not think there is anything very helpful in that trace.
I think the only helpful parts are the WARN itself that includes
the line number and then information on which kernel it was
encountered on.

How about something like (please note the FIXME within):

"
Paul reported the following WARN while running kernel vFIXME:
  WARNING: CPU: 6 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0

This is the WARN_ON() within ksgxd() that is triggered when
sgx_dirty_page_list (the list containing unsanitized pages) is non-empty.

In sgx_init(), if misc_register() fails or misc_register() succeeds but
neither sgx_drv_init() nor sgx_vepc_init() succeeds, then ksgxd will be
prematurely stopped. This may leave some unsanitized pages, which does
not matter, because SGX will be disabled for the whole power cycle.

Ultimately this can crash the kernel, if the following is set:

	/proc/sys/kernel/panic_on_warn

In premature stop, print nothing, as the number is by practical means a
random number. Otherwise, it is an indicator of a bug in the driver, and
therefore print the number of unsanitized pages with pr_err().
"

Reinette

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error
  2022-09-02 17:38                   ` Reinette Chatre
@ 2022-09-02 19:20                     ` Jarkko Sakkinen
  0 siblings, 0 replies; 15+ messages in thread
From: Jarkko Sakkinen @ 2022-09-02 19:20 UTC (permalink / raw)
  To: Reinette Chatre
  Cc: linux-sgx, Haitao Huang, Vijay Dhanraj, Dave Hansen, Paul Menzel,
	stable, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	maintainer:X86 ARCHITECTURE (32-BIT AND 64-BIT), H. Peter Anvin,
	open list:X86 ARCHITECTURE (32-BIT AND 64-BIT)

On Fri, Sep 02, 2022 at 10:38:34AM -0700, Reinette Chatre wrote:
> Hi Jarkko,
> 
> On 9/2/2022 9:30 AM, Jarkko Sakkinen wrote:
> > On Fri, Sep 02, 2022 at 09:08:23AM -0700, Reinette Chatre wrote:
> >> Hi Jarkko,
> >>
> >> On 9/2/2022 8:53 AM, Jarkko Sakkinen wrote:
> >>> On Fri, Sep 02, 2022 at 04:26:51PM +0300, Jarkko Sakkinen wrote:
> >>>> +	if (ret)
> >>>> +		pr_err("%ld unsanitized pages\n", left_dirty);
> >>>
> >>> Yeah, I know, should be 'left_dirty'. I just quickly drafted
> >>> the patch for the email.
> >>>
> >>
> >> No problem - you did mention that it was an informal patch.
> >>
> >> (btw ... also watch out for the long local parameter returned
> >> as an unsigned long and the signed vs unsigned printing
> >> format string.) I also continue to recommend that you trim
> > 
> > Point taken.
> > 
> >> that backtrace ... this patch is heading to x86 area where
> >> this is required.
> > 
> > Should I just cut the whole stack trace, and leave the
> > part before it?
> 
> The trace is printed because of a WARN_ON() in the code.
> I do not think there is anything very helpful in that trace.
> I think the only helpful parts are the WARN itself that includes
> the line number and then information on which kernel it was
> encountered on.
> 
> How about something like (please note the FIXME within):
> 
> "
> Paul reported the following WARN while running kernel vFIXME:
>   WARNING: CPU: 6 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:401 ksgxd+0x1b7/0x1d0

Yeah, this is a great idea, the use of WARN() is the whole point.
Thank you.

BR, Jarkko


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-09-02 19:21 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <20220831173829.126661-1-jarkko@kernel.org>
2022-08-31 17:38 ` [PATCH v2 2/6] x86/sgx: Do not consider unsanitized pages an error Jarkko Sakkinen
2022-08-31 20:39   ` Reinette Chatre
2022-09-01 10:50     ` Huang, Kai
2022-09-01 21:47       ` jarkko
2022-09-01 21:53     ` Jarkko Sakkinen
2022-09-01 21:56       ` Jarkko Sakkinen
2022-09-01 22:01         ` Jarkko Sakkinen
2022-09-01 22:34       ` Reinette Chatre
2022-09-01 23:56         ` Jarkko Sakkinen
2022-09-02 13:26           ` Jarkko Sakkinen
2022-09-02 15:53             ` Jarkko Sakkinen
2022-09-02 16:08               ` Reinette Chatre
2022-09-02 16:30                 ` Jarkko Sakkinen
2022-09-02 17:38                   ` Reinette Chatre
2022-09-02 19:20                     ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox