linux-sgx.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
       [not found] <ce0b4d26-3a6e-7c5a-5f66-44cba05f9f35@molgen.mpg.de>
@ 2022-08-19 16:02 ` Paul Menzel
  2022-08-19 18:28   ` Dave Hansen
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Menzel @ 2022-08-19 16:02 UTC (permalink / raw)
  To: Jarkko Sakkinen, Thomas Gleixner, Ingo Molnar, Borislav Petkov,
	x86
  Cc: linux-sgx, LKML

[Cc: +linux-sgx@vger.kernel.org]

Am 19.08.22 um 15:19 schrieb Paul Menzel:
> Dear Linux folks,
> 
> 
> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> 
> ```
> [    0.000000] Linux version 5.18.0-4-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> […]
> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> […]
> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> [    0.235853] ------------[ cut here ]------------
> [    0.235855] WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
> [    0.235861] Modules linked in:
> [    0.235862] CPU: 1 PID: 83 Comm: ksgxd Not tainted 5.18.0-4-amd64 #1 Debian 5.18.16-1
> [    0.235865] Hardware name: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> [    0.235866] RIP: 0010:ksgxd+0x1b7/0x1d0
> [    0.235869] Code: ff e9 f2 fe ff ff 48 89 df e8 55 56 0d 00 84 c0 0f 84 c3 fe ff ff 31 ff e8 c6 56 0d 00 84 c0 0f 85 94 fe ff ff e9 af fe ff ff <0f> 0b e9 7f fe ff ff e8 3d dd 93 00 66 66 2e 0f 1f 84 00 00 00 00
> [    0.235870] RSP: 0000:ffffaaed0097bed8 EFLAGS: 00010287
> [    0.235872] RAX: ffffaaed00431890 RBX: ffff9a323ccc8000 RCX: 0000000000000000
> [    0.235873] RDX: 0000000080000000 RSI: ffffaaed00431850 RDI: 00000000ffffffff
> [    0.235875] RBP: ffff9a31416ca080 R08: ffff9a31416cae40 R09: ffff9a31416cae40
> [    0.235876] R10: 0000000000000000 R11: 0000000000000001 R12: ffffaaed0006bce0
> [    0.235877] R13: ffff9a3140e9c480 R14: ffffffff9825ee60 R15: 0000000000000000
> [    0.235878] FS:  0000000000000000(0000) GS:ffff9a32e6640000(0000) knlGS:0000000000000000
> [    0.235880] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [    0.235881] CR2: 0000000000000000 CR3: 00000001fbe10001 CR4: 00000000003706e0
> [    0.235882] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [    0.235883] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [    0.235884] Call Trace:
> [    0.235893]  <TASK>
> [    0.235895]  ? _raw_spin_lock_irqsave+0x24/0x60
> [    0.235900]  ? _raw_spin_unlock_irqrestore+0x23/0x40
> [    0.235902]  ? __kthread_parkme+0x36/0x90
> [    0.235905]  kthread+0xe5/0x110
> [    0.235907]  ? kthread_complete_and_exit+0x20/0x20
> [    0.235909]  ret_from_fork+0x1f/0x30
> [    0.235914]  </TASK>
> [    0.235915] ---[ end trace 0000000000000000 ]---
> ```
> 
> 
> Kind regards,
> 
> Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-19 16:02 ` WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0 Paul Menzel
@ 2022-08-19 18:28   ` Dave Hansen
  2022-08-20  6:13     ` Paul Menzel
  2022-08-25  4:57     ` Jarkko Sakkinen
  0 siblings, 2 replies; 16+ messages in thread
From: Dave Hansen @ 2022-08-19 18:28 UTC (permalink / raw)
  To: Paul Menzel, Jarkko Sakkinen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Chatre, Reinette
  Cc: linux-sgx, LKML

On 8/19/22 09:02, Paul Menzel wrote:
> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> 
> ```
> [    0.000000] Linux version 5.18.0-4-amd64
> (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU
> ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC
> Debian 5.18.16-1 (2022-08-10)
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
> root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> […]
> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> […]
> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff

Hi Paul,

Would you be able to send the entire dmesg, along with:

	cat /proc/iomem # (as root)
and
	cpuid -1 --raw

I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
might be a case of the SGX initialization getting a bit too far along
when it should have been disabled.

We had some bugs where we didn't stop fast enough after spitting out the
"SGX Launch Control is locked..." errors.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-19 18:28   ` Dave Hansen
@ 2022-08-20  6:13     ` Paul Menzel
  2022-08-23 13:48       ` Paul Menzel
  2022-08-25  4:57     ` Jarkko Sakkinen
  1 sibling, 1 reply; 16+ messages in thread
From: Paul Menzel @ 2022-08-20  6:13 UTC (permalink / raw)
  To: Dave Hansen, Jarkko Sakkinen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Reinette Chatre
  Cc: linux-sgx, LKML

Dear Dave,


Thank you for your quick reply.


Am 19.08.22 um 20:28 schrieb Dave Hansen:
> On 8/19/22 09:02, Paul Menzel wrote:
>> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>>
>> ```
>> [    0.000000] Linux version 5.18.0-4-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
>> […]
>> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
>> […]
>> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff

> Would you be able to send the entire dmesg, along with:

The log message are attached to the first message, where I missed to 
carbon-copy linux-sgx@ [1].

> 	cat /proc/iomem # (as root)
> and
> 	cpuid -1 --raw

I am going to provide that next week. (Side note, Intel might have some 
Dell XPS 9370 test machines in some QA lab.)

> I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
> might be a case of the SGX initialization getting a bit too far along
> when it should have been disabled.
> 
> We had some bugs where we didn't stop fast enough after spitting out the
> "SGX Launch Control is locked..." errors.


Kind regards,

Paul


[1]: 
https://lore.kernel.org/lkml/ce0b4d26-3a6e-7c5a-5f66-44cba05f9f35@molgen.mpg.de/

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-20  6:13     ` Paul Menzel
@ 2022-08-23 13:48       ` Paul Menzel
  2022-08-23 16:32         ` Dave Hansen
  2022-08-25  2:12         ` Haitao Huang
  0 siblings, 2 replies; 16+ messages in thread
From: Paul Menzel @ 2022-08-23 13:48 UTC (permalink / raw)
  To: Dave Hansen, Jarkko Sakkinen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Reinette Chatre
  Cc: linux-sgx, LKML

Dear Dave,


Am 20.08.22 um 08:13 schrieb Paul Menzel:

> Am 19.08.22 um 20:28 schrieb Dave Hansen:
>> On 8/19/22 09:02, Paul Menzel wrote:
>>> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>>>
>>> ```
>>> [    0.000000] Linux version 5.18.0-4-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
>>> […]
>>> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
>>> […]
>>> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> 
>> Would you be able to send the entire dmesg, along with:
> 
> The log message are attached to the first message, where I missed to 
> carbon-copy linux-sgx@ [1].
> 
>>     cat /proc/iomem # (as root)
>> and
>>     cpuid -1 --raw
> 
> I am going to provide that next week. (Side note, Intel might have some 
> Dell XPS 9370 test machines in some QA lab.)

Please find both outputs at the end of the file.

>> I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
>> might be a case of the SGX initialization getting a bit too far along
>> when it should have been disabled.
>>
>> We had some bugs where we didn't stop fast enough after spitting out the
>> "SGX Launch Control is locked..." errors.

Let’s hope it’s something known to you.


Kind regards,

Paul


> [1]: https://lore.kernel.org/lkml/ce0b4d26-3a6e-7c5a-5f66-44cba05f9f35@molgen.mpg.de/


PS:

$ sudo cat /proc/iomem
[sudo] password for molgenit:
00000000-00000fff : Reserved
00001000-00057fff : System RAM
00058000-00058fff : Reserved
00059000-0009dfff : System RAM
0009e000-000fffff : Reserved
   00000000-00000000 : PCI Bus 0000:00
   00000000-00000000 : PCI Bus 0000:00
   00000000-00000000 : PCI Bus 0000:00
   00000000-00000000 : PCI Bus 0000:00
   00000000-00000000 : PCI Bus 0000:00
   00000000-00000000 : PCI Bus 0000:00
   00000000-00000000 : PCI Bus 0000:00
   00000000-00000000 : PCI Bus 0000:00
   000a0000-000dffff : PCI Bus 0000:00
     000c0000-000dffff : 0000:00:02.0
   000f0000-000fffff : System ROM
00100000-2d6c4fff : System RAM
2d6c5000-2d6c5fff : ACPI Non-volatile Storage
2d6c6000-2d6c6fff : Reserved
2d6c7000-3b6acfff : System RAM
3b6ad000-3b720fff : Reserved
3b721000-3ecf1fff : System RAM
3ecf2000-3f0b1fff : Reserved
3f0b2000-3f0fefff : ACPI Tables
3f0ff000-3f7b6fff : ACPI Non-volatile Storage
   3f798000-3f798fff : USBC000:00
3f7b7000-3ff25fff : Reserved
3ff26000-3fffefff : Unknown E820 type
3ffff000-3fffffff : System RAM
40000000-47ffffff : Reserved
   40200000-45f7ffff : INT0E0C:00
48000000-48dfffff : System RAM
48e00000-4f7fffff : Reserved
   4b800000-4f7fffff : Graphics Stolen Memory
4f800000-dfffffff : PCI Bus 0000:00
   50000000-5fffffff : 0000:00:02.0
   60000000-a9ffffff : PCI Bus 0000:03
   ac000000-da0fffff : PCI Bus 0000:03
   db000000-dbffffff : 0000:00:02.0
   dc000000-dc0fffff : PCI Bus 0000:6e
     dc000000-dc003fff : 0000:6e:00.0
       dc000000-dc003fff : nvme
   dc100000-dc1fffff : PCI Bus 0000:02
     dc100000-dc101fff : 0000:02:00.0
       dc100000-dc101fff : iwlwifi
   dc200000-dc2fffff : PCI Bus 0000:01
     dc200000-dc200fff : 0000:01:00.0
       dc200000-dc200fff : rtsx_pci
   dc300000-dc30ffff : 0000:00:1f.3
   dc310000-dc31ffff : 0000:00:14.0
     dc310000-dc31ffff : xhci-hcd
       dc318070-dc31846f : intel_xhci_usb_sw
   dc320000-dc327fff : 0000:00:04.0
     dc320000-dc327fff : proc_thermal
   dc328000-dc32bfff : 0000:00:1f.3
     dc328000-dc32bfff : ICH HD audio
   dc32c000-dc32ffff : 0000:00:1f.2
   dc330000-dc3300ff : 0000:00:1f.4
   dc331000-dc331fff : 0000:00:16.3
   dc332000-dc332fff : 0000:00:16.0
     dc332000-dc332fff : mei_me
   dc333000-dc333fff : 0000:00:15.1
     dc333000-dc3331ff : lpss_dev
       dc333000-dc3331ff : i2c_designware.1 lpss_dev
     dc333200-dc3332ff : lpss_priv
     dc333800-dc333fff : idma64.1
       dc333800-dc333fff : idma64.1 idma64.1
   dc334000-dc334fff : 0000:00:15.0
     dc334000-dc3341ff : lpss_dev
       dc334000-dc3341ff : i2c_designware.0 lpss_dev
     dc334200-dc3342ff : lpss_priv
     dc334800-dc334fff : idma64.0
       dc334800-dc334fff : idma64.0 idma64.0
   dc335000-dc335fff : 0000:00:14.2
     dc335000-dc335fff : Intel PCH thermal driver
   dffe0000-dfffffff : pnp 00:05
e0000000-efffffff : PCI MMCONFIG 0000 [bus 00-ff]
   e0000000-efffffff : Reserved
     e0000000-efffffff : pnp 00:05
fd000000-fe7fffff : PCI Bus 0000:00
   fd000000-fdabffff : pnp 00:06
   fdac0000-fdacffff : INT344B:00
     fdac0000-fdacffff : INT344B:00 INT344B:00
   fdad0000-fdadffff : pnp 00:06
   fdae0000-fdaeffff : INT344B:00
     fdae0000-fdaeffff : INT344B:00 INT344B:00
   fdaf0000-fdafffff : INT344B:00
     fdaf0000-fdafffff : INT344B:00 INT344B:00
   fdb00000-fdffffff : pnp 00:06
     fdc6000c-fdc6000f : iTCO_wdt
       fdc6000c-fdc6000f : iTCO_wdt iTCO_wdt
   fe000000-fe010fff : Reserved
   fe028000-fe028fff : pnp 00:08
   fe029000-fe029fff : pnp 00:08
   fe036000-fe03bfff : pnp 00:06
   fe03d000-fe3fffff : pnp 00:06
   fe410000-fe7fffff : pnp 00:06
fec00000-fec00fff : Reserved
   fec00000-fec003ff : IOAPIC 0
fed00000-fed003ff : HPET 0
   fed00000-fed003ff : PNP0103:00
fed10000-fed17fff : pnp 00:05
fed18000-fed18fff : pnp 00:05
fed19000-fed19fff : pnp 00:05
fed20000-fed3ffff : pnp 00:05
fed40000-fed44fff : MSFT0101:00
   fed40000-fed44fff : MSFT0101:00 MSFT0101:00
fed45000-fed8ffff : pnp 00:05
fed90000-fed90fff : dmar0
fed91000-fed91fff : dmar1
fee00000-fee00fff : Local APIC
   fee00000-fee00fff : Reserved
ff000000-ffffffff : Reserved
   ff000000-ffffffff : INT0800:00
     ff000000-ffffffff : pnp 00:05
100000000-2ae7fffff : System RAM
   190c00000-191801987 : Kernel code
   191a00000-19225ffff : Kernel rodata
   192400000-1926b57bf : Kernel data
   192d2b000-1931fffff : Kernel bss
2ae800000-2afffffff : RAM buffer

$ sudo cpuid -1 --raw
CPU:
    0x00000000 0x00: eax=0x00000016 ebx=0x756e6547 ecx=0x6c65746e 
edx=0x49656e69
    0x00000001 0x00: eax=0x000806ea ebx=0x00100800 ecx=0x7ffafbff 
edx=0xbfebfbff
    0x00000002 0x00: eax=0x76036301 ebx=0x00f0b5ff ecx=0x00000000 
edx=0x00c30000
    0x00000003 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x00000004 0x00: eax=0x1c004121 ebx=0x01c0003f ecx=0x0000003f 
edx=0x00000000
    0x00000004 0x01: eax=0x1c004122 ebx=0x01c0003f ecx=0x0000003f 
edx=0x00000000
    0x00000004 0x02: eax=0x1c004143 ebx=0x00c0003f ecx=0x000003ff 
edx=0x00000000
    0x00000004 0x03: eax=0x1c03c163 ebx=0x02c0003f ecx=0x00001fff 
edx=0x00000006
    0x00000004 0x04: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x00000005 0x00: eax=0x00000040 ebx=0x00000040 ecx=0x00000003 
edx=0x11142120
    0x00000006 0x00: eax=0x000027f7 ebx=0x00000002 ecx=0x00000009 
edx=0x00000000
    0x00000007 0x00: eax=0x00000000 ebx=0x029c67af ecx=0x00000000 
edx=0xbc002e00
    0x00000008 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x00000009 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x0000000a 0x00: eax=0x07300404 ebx=0x00000000 ecx=0x00000000 
edx=0x00000603
    0x0000000b 0x00: eax=0x00000001 ebx=0x00000002 ecx=0x00000100 
edx=0x00000000
    0x0000000b 0x01: eax=0x00000004 ebx=0x00000008 ecx=0x00000201 
edx=0x00000000
    0x0000000b 0x02: eax=0x00000000 ebx=0x00000000 ecx=0x00000002 
edx=0x00000000
    0x0000000c 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x0000000d 0x00: eax=0x0000001f ebx=0x00000440 ecx=0x00000440 
edx=0x00000000
    0x0000000d 0x01: eax=0x0000000f ebx=0x000003c0 ecx=0x00000100 
edx=0x00000000
    0x0000000d 0x02: eax=0x00000100 ebx=0x00000240 ecx=0x00000000 
edx=0x00000000
    0x0000000d 0x03: eax=0x00000040 ebx=0x000003c0 ecx=0x00000000 
edx=0x00000000
    0x0000000d 0x04: eax=0x00000040 ebx=0x00000400 ecx=0x00000000 
edx=0x00000000
    0x0000000d 0x08: eax=0x00000080 ebx=0x00000000 ecx=0x00000001 
edx=0x00000000
    0x0000000e 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x0000000f 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x00000010 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x00000011 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x00000012 0x00: eax=0x00000001 ebx=0x00000000 ecx=0x00000000 
edx=0x0000241f
    0x00000012 0x01: eax=0x00000036 ebx=0x00000000 ecx=0x0000001f 
edx=0x00000000
    0x00000012 0x02: eax=0x40200001 ebx=0x00000000 ecx=0x05d80001 
edx=0x00000000
    0x00000012 0x03: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x00000013 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x00000014 0x00: eax=0x00000001 ebx=0x0000000f ecx=0x00000007 
edx=0x00000000
    0x00000014 0x01: eax=0x02490002 ebx=0x003f3fff ecx=0x00000000 
edx=0x00000000
    0x00000015 0x00: eax=0x00000002 ebx=0x0000009e ecx=0x00000000 
edx=0x00000000
    0x00000016 0x00: eax=0x0000076c ebx=0x00000e10 ecx=0x00000064 
edx=0x00000000
    0x20000000 0x00: eax=0x0000076c ebx=0x00000e10 ecx=0x00000064 
edx=0x00000000
    0x80000000 0x00: eax=0x80000008 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x80000001 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000121 
edx=0x2c100800
    0x80000002 0x00: eax=0x65746e49 ebx=0x2952286c ecx=0x726f4320 
edx=0x4d542865
    0x80000003 0x00: eax=0x35692029 ebx=0x3533382d ecx=0x43205530 
edx=0x40205550
    0x80000004 0x00: eax=0x372e3120 ebx=0x7a484730 ecx=0x00000000 
edx=0x00000000
    0x80000005 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x80000006 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x01006040 
edx=0x00000000
    0x80000007 0x00: eax=0x00000000 ebx=0x00000000 ecx=0x00000000 
edx=0x00000100
    0x80000008 0x00: eax=0x00003027 ebx=0x00000000 ecx=0x00000000 
edx=0x00000000
    0x80860000 0x00: eax=0x0000076c ebx=0x00000e10 ecx=0x00000064 
edx=0x00000000
    0xc0000000 0x00: eax=0x0000076c ebx=0x00000e10 ecx=0x00000064 
edx=0x00000000

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-23 13:48       ` Paul Menzel
@ 2022-08-23 16:32         ` Dave Hansen
  2022-08-23 22:33           ` Paul Menzel
  2022-08-25  2:12         ` Haitao Huang
  1 sibling, 1 reply; 16+ messages in thread
From: Dave Hansen @ 2022-08-23 16:32 UTC (permalink / raw)
  To: Paul Menzel, Jarkko Sakkinen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Reinette Chatre
  Cc: linux-sgx, LKML

On 8/23/22 06:48, Paul Menzel wrote:
>>> I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
>>> might be a case of the SGX initialization getting a bit too far along
>>> when it should have been disabled.
>>>
>>> We had some bugs where we didn't stop fast enough after spitting out the
>>> "SGX Launch Control is locked..." errors.
> 
> Let’s hope it’s something known to you.

Thanks for the extra debug info.  Unfortunately, nothing is really
sticking out as an obvious problem.

The EREMOVE return codes would be interesting to know, as well as an
idea what the physical addresses are that fail and the _counts_ of how
many pages get sanitized versus fail.

But, I don't really have a theory about what could be going on yet.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-23 16:32         ` Dave Hansen
@ 2022-08-23 22:33           ` Paul Menzel
  2022-08-24 18:39             ` Dave Hansen
  2022-08-25  5:27             ` Jarkko Sakkinen
  0 siblings, 2 replies; 16+ messages in thread
From: Paul Menzel @ 2022-08-23 22:33 UTC (permalink / raw)
  To: Dave Hansen, Jarkko Sakkinen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Reinette Chatre
  Cc: linux-sgx, LKML

Dear Dave,


Thank you for your reply.

Am 23.08.22 um 18:32 schrieb Dave Hansen:
> On 8/23/22 06:48, Paul Menzel wrote:
>>>> I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
>>>> might be a case of the SGX initialization getting a bit too far along
>>>> when it should have been disabled.
>>>>
>>>> We had some bugs where we didn't stop fast enough after spitting out the
>>>> "SGX Launch Control is locked..." errors.
>>
>> Let’s hope it’s something known to you.
> 
> Thanks for the extra debug info.  Unfortunately, nothing is really
> sticking out as an obvious problem.
> 
> The EREMOVE return codes would be interesting to know, as well as an
> idea what the physical addresses are that fail and the _counts_ of how
> many pages get sanitized versus fail.

Is there a knob to print out this information? Or way to get this 
information using ftrace? I’d like to avoid rebuilding the Linux kernel.

> But, I don't really have a theory about what could be going on yet.

Kind regards,

Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-23 22:33           ` Paul Menzel
@ 2022-08-24 18:39             ` Dave Hansen
  2022-08-25  5:27             ` Jarkko Sakkinen
  1 sibling, 0 replies; 16+ messages in thread
From: Dave Hansen @ 2022-08-24 18:39 UTC (permalink / raw)
  To: Paul Menzel, Jarkko Sakkinen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Reinette Chatre
  Cc: linux-sgx, LKML

On 8/23/22 15:33, Paul Menzel wrote:
>> Thanks for the extra debug info.  Unfortunately, nothing is really
>> sticking out as an obvious problem.
>>
>> The EREMOVE return codes would be interesting to know, as well as an
>> idea what the physical addresses are that fail and the _counts_ of how
>> many pages get sanitized versus fail.
> 
> Is there a knob to print out this information? Or way to get this
> information using ftrace? I’d like to avoid rebuilding the Linux kernel.

You can probably do it with a kprobe and ftrace, but it's a little bit
of a pain since the ENCL* instructions are all inlined and don't get
wrapped in actual function calls.

I'd just rebuild the kernel if it were me.

Maybe we just just uninline all of the ENCL* instruction so that we
*can* more easily trace them.  It's not like they are performance sensitive.

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-23 13:48       ` Paul Menzel
  2022-08-23 16:32         ` Dave Hansen
@ 2022-08-25  2:12         ` Haitao Huang
  2022-08-25  5:49           ` Jarkko Sakkinen
  2022-08-26  9:54           ` Paul Menzel
  1 sibling, 2 replies; 16+ messages in thread
From: Haitao Huang @ 2022-08-25  2:12 UTC (permalink / raw)
  To: Dave Hansen, Jarkko Sakkinen, Thomas Gleixner, Ingo Molnar,
	Borislav Petkov, x86, Reinette Chatre, Paul Menzel
  Cc: linux-sgx, LKML

Hi Paul

On Tue, 23 Aug 2022 08:48:52 -0500, Paul Menzel <pmenzel@molgen.mpg.de>  
wrote:

> Dear Dave,
>
>
> Am 20.08.22 um 08:13 schrieb Paul Menzel:
>
>> Am 19.08.22 um 20:28 schrieb Dave Hansen:
>>> On 8/19/22 09:02, Paul Menzel wrote:
>>>> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>>>>
>>>> ```
>>>> [    0.000000] Linux version 5.18.0-4-amd64  
>>>> (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0,  
>>>> GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP  
>>>> PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64  
>>>> root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
>>>> […]
>>>> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0  
>>>> 07/06/2022
>>>> […]
>>>> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
>>
>>> Would you be able to send the entire dmesg, along with:
>>  The log message are attached to the first message, where I missed to  
>> carbon-copy linux-sgx@ [1].
>>
>>>     cat /proc/iomem # (as root)
>>> and
>>>     cpuid -1 --raw
>>  I am going to provide that next week. (Side note, Intel might have  
>> some Dell XPS 9370 test machines in some QA lab.)
>
> Please find both outputs at the end of the file.
>

Could you also check output of "sudo rdmsr -x 0x3a"?
Also was CONFIG_X86_SGX_KVM set?

If CONFIG_X86_SGX_KVM is not set and bit 17 (SGX_LC) of the MSR 3A not set,
then I think following sequence during sgx_init is possible:

sgx_page_cache_init -> sgx_setup_epc_section
                        ->put all physical EPC pages in sgx_dirty_page_list.
Kick off ksgxd.
Later, sgx_drv_init returns none-zero due to this check:
     if (!cpu_feature_enabled(X86_FEATURE_SGX_LC))
         return -ENODEV;
sgx_vepc_init also returns none-zero if CONFIG_X86_SGX_KVM was not set.

And sgx_init will call kthread_stop(ksgxd_tsk):
     ret = sgx_drv_init();

     if (sgx_vepc_init() && ret)
         goto err_provision;
...
err_provision:
     misc_deregister(&sgx_dev_provision);

err_kthread:
     kthread_stop(ksgxd_tsk);


That triggers __sgx_sanitize_pages return early due to these lines:
     /* dirty_page_list is thread-local, no need for a lock: */
     while (!list_empty(dirty_page_list)) {
         if (kthread_should_stop())
             return;

And that would trigger (depends on timing?) the warning in ksgxd due to  
non-empty sgx_dirty_page_list
at that moment.

Thanks
Haitao

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-19 18:28   ` Dave Hansen
  2022-08-20  6:13     ` Paul Menzel
@ 2022-08-25  4:57     ` Jarkko Sakkinen
  2022-08-25  5:25       ` Jarkko Sakkinen
  1 sibling, 1 reply; 16+ messages in thread
From: Jarkko Sakkinen @ 2022-08-25  4:57 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Paul Menzel, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Chatre, Reinette, linux-sgx, LKML

On Fri, Aug 19, 2022 at 11:28:24AM -0700, Dave Hansen wrote:
> On 8/19/22 09:02, Paul Menzel wrote:
> > On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> > 
> > ```
> > [    0.000000] Linux version 5.18.0-4-amd64
> > (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU
> > ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC
> > Debian 5.18.16-1 (2022-08-10)
> > [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
> > root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> > […]
> > [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> > […]
> > [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> 
> Hi Paul,
> 
> Would you be able to send the entire dmesg, along with:
> 
> 	cat /proc/iomem # (as root)
> and
> 	cpuid -1 --raw
> 
> I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
> might be a case of the SGX initialization getting a bit too far along
> when it should have been disabled.
> 
> We had some bugs where we didn't stop fast enough after spitting out the
> "SGX Launch Control is locked..." errors.

For some reason the pages do not get properly sanitized:

	/* sanity check: */
	WARN_ON(!list_empty(&sgx_dirty_page_list));

EPC should be good, given that EREMOVE does not fail.
If SGX would be disabled, also EREMOVE should fail.

BR, Jarkko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-25  4:57     ` Jarkko Sakkinen
@ 2022-08-25  5:25       ` Jarkko Sakkinen
  2022-08-25  6:46         ` Paul Menzel
  0 siblings, 1 reply; 16+ messages in thread
From: Jarkko Sakkinen @ 2022-08-25  5:25 UTC (permalink / raw)
  To: Dave Hansen, Paul Menzel
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Chatre, Reinette, linux-sgx, LKML

[-- Attachment #1: Type: text/plain, Size: 2051 bytes --]

On Thu, Aug 25, 2022 at 07:57:30AM +0300, Jarkko Sakkinen wrote:
> On Fri, Aug 19, 2022 at 11:28:24AM -0700, Dave Hansen wrote:
> > On 8/19/22 09:02, Paul Menzel wrote:
> > > On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> > > 
> > > ```
> > > [    0.000000] Linux version 5.18.0-4-amd64
> > > (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU
> > > ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC
> > > Debian 5.18.16-1 (2022-08-10)
> > > [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
> > > root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> > > […]
> > > [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> > > […]
> > > [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> > 
> > Hi Paul,
> > 
> > Would you be able to send the entire dmesg, along with:
> > 
> > 	cat /proc/iomem # (as root)
> > and
> > 	cpuid -1 --raw
> > 
> > I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
> > might be a case of the SGX initialization getting a bit too far along
> > when it should have been disabled.
> > 
> > We had some bugs where we didn't stop fast enough after spitting out the
> > "SGX Launch Control is locked..." errors.
> 
> For some reason the pages do not get properly sanitized:
> 
> 	/* sanity check: */
> 	WARN_ON(!list_empty(&sgx_dirty_page_list));
> 
> EPC should be good, given that EREMOVE does not fail.
> If SGX would be disabled, also EREMOVE should fail.

Sorry forgot that in no circumstances we're printing the
error code inside __sgx_sanitize_pages(). I wrote a quick
patch to address this (attached) [*].

Paul,

Any chance to try the patch out? It's pretty hard to attach
e.g. kprobe to grab this info. Does it reproduce every single
time?

Alternatively: what kind of workload is triggering this?
I do own 2020 model XPS13, which might be able to
reproduce the same issue.

[*] Also: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u

BR, Jarkko

[-- Attachment #2: 0001-x86-sgx-Print-EREMOVE-return-value-in-__sgx_sanitize.patch --]
[-- Type: text/plain, Size: 2077 bytes --]

From ddccefc8e864bd9973a5445202922b59760d3460 Mon Sep 17 00:00:00 2001
From: Jarkko Sakkinen <jarkko@kernel.org>
Date: Thu, 25 Aug 2022 08:12:30 +0300
Subject: [PATCH] x86/sgx: Print EREMOVE return value in __sgx_sanitize_pages()

In the 2nd run of __sgx_sanitize_pages() print the error
message. All EREMOVE's should succeed. This will allow to
provide some additional clues, if not.

Cc: Paul Menzel <pmenzel@molgen.mpg.de>
Cc: Dave Hansen <dave.hansen@linux.intel.com>
Cc: Reinette Chatre <reinette.chatre@intel.com>
Signed-off-by: Jarkko Sakkinen <jarkko@kernel.org>
---
 arch/x86/kernel/cpu/sgx/main.c | 9 ++++++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kernel/cpu/sgx/main.c b/arch/x86/kernel/cpu/sgx/main.c
index 515e2a5f25bb..33354921c59f 100644
--- a/arch/x86/kernel/cpu/sgx/main.c
+++ b/arch/x86/kernel/cpu/sgx/main.c
@@ -50,7 +50,7 @@ static LIST_HEAD(sgx_dirty_page_list);
  * from the input list, and made available for the page allocator. SECS pages
  * prepending their children in the input list are left intact.
  */
-static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
+static void __sgx_sanitize_pages(struct list_head *dirty_page_list, bool verbose)
 {
 	struct sgx_epc_page *page;
 	LIST_HEAD(dirty);
@@ -90,6 +90,9 @@ static void __sgx_sanitize_pages(struct list_head *dirty_page_list)
 			list_del(&page->list);
 			sgx_free_epc_page(page);
 		} else {
+			if (verbose)
+				pr_err_ratelimited(EREMOVE_ERROR_MESSAGE, ret, ret);
+
 			/* The page is not yet clean - move to the dirty list. */
 			list_move_tail(&page->list, &dirty);
 		}
@@ -394,8 +397,8 @@ static int ksgxd(void *p)
 	 * Sanitize pages in order to recover from kexec(). The 2nd pass is
 	 * required for SECS pages, whose child pages blocked EREMOVE.
 	 */
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
-	__sgx_sanitize_pages(&sgx_dirty_page_list);
+	__sgx_sanitize_pages(&sgx_dirty_page_list, false);
+	__sgx_sanitize_pages(&sgx_dirty_page_list, true);
 
 	/* sanity check: */
 	WARN_ON(!list_empty(&sgx_dirty_page_list));
-- 
2.37.1


^ permalink raw reply related	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-23 22:33           ` Paul Menzel
  2022-08-24 18:39             ` Dave Hansen
@ 2022-08-25  5:27             ` Jarkko Sakkinen
  1 sibling, 0 replies; 16+ messages in thread
From: Jarkko Sakkinen @ 2022-08-25  5:27 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Reinette Chatre, linux-sgx, LKML

On Wed, Aug 24, 2022 at 12:33:07AM +0200, Paul Menzel wrote:
> Dear Dave,
> 
> 
> Thank you for your reply.
> 
> Am 23.08.22 um 18:32 schrieb Dave Hansen:
> > On 8/23/22 06:48, Paul Menzel wrote:
> > > > > I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
> > > > > might be a case of the SGX initialization getting a bit too far along
> > > > > when it should have been disabled.
> > > > > 
> > > > > We had some bugs where we didn't stop fast enough after spitting out the
> > > > > "SGX Launch Control is locked..." errors.
> > > 
> > > Let’s hope it’s something known to you.
> > 
> > Thanks for the extra debug info.  Unfortunately, nothing is really
> > sticking out as an obvious problem.
> > 
> > The EREMOVE return codes would be interesting to know, as well as an
> > idea what the physical addresses are that fail and the _counts_ of how
> > many pages get sanitized versus fail.
> 
> Is there a knob to print out this information? Or way to get this
> information using ftrace? I’d like to avoid rebuilding the Linux kernel.

Since __sgx_sanitize_pages() is a local symbol, it's not possible
to attach kprobe into it, so we actually do require a code change
to see inside.

BR, Jarkko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-25  2:12         ` Haitao Huang
@ 2022-08-25  5:49           ` Jarkko Sakkinen
  2022-08-25  8:34             ` Jarkko Sakkinen
  2022-08-26  9:54           ` Paul Menzel
  1 sibling, 1 reply; 16+ messages in thread
From: Jarkko Sakkinen @ 2022-08-25  5:49 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Reinette Chatre, Paul Menzel, linux-sgx, LKML

On Wed, Aug 24, 2022 at 09:12:06PM -0500, Haitao Huang wrote:
> Hi Paul
> 
> On Tue, 23 Aug 2022 08:48:52 -0500, Paul Menzel <pmenzel@molgen.mpg.de>
> wrote:
> 
> > Dear Dave,
> > 
> > 
> > Am 20.08.22 um 08:13 schrieb Paul Menzel:
> > 
> > > Am 19.08.22 um 20:28 schrieb Dave Hansen:
> > > > On 8/19/22 09:02, Paul Menzel wrote:
> > > > > On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> > > > > 
> > > > > ```
> > > > > [    0.000000] Linux version 5.18.0-4-amd64
> > > > > (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5)
> > > > > 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713)
> > > > > #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
> > > > > [    0.000000] Command line:
> > > > > BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
> > > > > root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> > > > > […]
> > > > > [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS
> > > > > 1.21.0 07/06/2022
> > > > > […]
> > > > > [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> > > 
> > > > Would you be able to send the entire dmesg, along with:
> > >  The log message are attached to the first message, where I missed
> > > to carbon-copy linux-sgx@ [1].
> > > 
> > > >     cat /proc/iomem # (as root)
> > > > and
> > > >     cpuid -1 --raw
> > >  I am going to provide that next week. (Side note, Intel might have
> > > some Dell XPS 9370 test machines in some QA lab.)
> > 
> > Please find both outputs at the end of the file.
> > 
> 
> Could you also check output of "sudo rdmsr -x 0x3a"?
> Also was CONFIG_X86_SGX_KVM set?
> 
> If CONFIG_X86_SGX_KVM is not set and bit 17 (SGX_LC) of the MSR 3A not set,
> then I think following sequence during sgx_init is possible:
> 
> sgx_page_cache_init -> sgx_setup_epc_section
>                        ->put all physical EPC pages in sgx_dirty_page_list.
> Kick off ksgxd.
> Later, sgx_drv_init returns none-zero due to this check:
>     if (!cpu_feature_enabled(X86_FEATURE_SGX_LC))
>         return -ENODEV;
> sgx_vepc_init also returns none-zero if CONFIG_X86_SGX_KVM was not set.
> 
> And sgx_init will call kthread_stop(ksgxd_tsk):
>     ret = sgx_drv_init();
> 
>     if (sgx_vepc_init() && ret)
>         goto err_provision;
> ...
> err_provision:
>     misc_deregister(&sgx_dev_provision);
> 
> err_kthread:
>     kthread_stop(ksgxd_tsk);
> 
> 
> That triggers __sgx_sanitize_pages return early due to these lines:
>     /* dirty_page_list is thread-local, no need for a lock: */
>     while (!list_empty(dirty_page_list)) {
>         if (kthread_should_stop())
>             return;
> 
> And that would trigger (depends on timing?) the warning in ksgxd due to
> non-empty sgx_dirty_page_list
> at that moment.

You're correct, and it's not a bug but completely legit behaviour.

And given that non-empty dirty page list is legit behavior WARN_ON()
is not what should be used in here.

Fix coming in a bit.

BR, Jarkko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-25  5:25       ` Jarkko Sakkinen
@ 2022-08-25  6:46         ` Paul Menzel
  2022-08-25  8:39           ` Jarkko Sakkinen
  0 siblings, 1 reply; 16+ messages in thread
From: Paul Menzel @ 2022-08-25  6:46 UTC (permalink / raw)
  To: Jarkko Sakkinen, Dave Hansen
  Cc: Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Chatre, Reinette, linux-sgx, LKML

Dear Jarkko,


Am 25.08.22 um 07:25 schrieb Jarkko Sakkinen:
> On Thu, Aug 25, 2022 at 07:57:30AM +0300, Jarkko Sakkinen wrote:
>> On Fri, Aug 19, 2022 at 11:28:24AM -0700, Dave Hansen wrote:
>>> On 8/19/22 09:02, Paul Menzel wrote:
>>>> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>>>>
>>>> ```
>>>> [    0.000000] Linux version 5.18.0-4-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
>>>> […]
>>>> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
>>>> […]
>>>> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff

>>> Would you be able to send the entire dmesg, along with:
>>>
>>> 	cat /proc/iomem # (as root)
>>> and
>>> 	cpuid -1 --raw
>>>
>>> I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
>>> might be a case of the SGX initialization getting a bit too far along
>>> when it should have been disabled.
>>>
>>> We had some bugs where we didn't stop fast enough after spitting out the
>>> "SGX Launch Control is locked..." errors.
>>
>> For some reason the pages do not get properly sanitized:
>>
>> 	/* sanity check: */
>> 	WARN_ON(!list_empty(&sgx_dirty_page_list));
>>
>> EPC should be good, given that EREMOVE does not fail.
>> If SGX would be disabled, also EREMOVE should fail.
> 
> Sorry forgot that in no circumstances we're printing the
> error code inside __sgx_sanitize_pages(). I wrote a quick
> patch to address this (attached) [*].
> 
> Paul,
> 
> Any chance to try the patch out?

Yes, I am going to try it in the next days.

> It's pretty hard to attach e.g. kprobe to grab this info. Does it
> reproduce every single time?
Yes, on each boot up.

> Alternatively: what kind of workload is triggering this?
> I do own 2020 model XPS13, which might be able to
> reproduce the same issue.

The Dell XPS 13 9370 is from 2018 (Intel i5-8350U), so no idea if it 
happens with later processors.


Kind regards,

Paul


> [*] Also: https://lore.kernel.org/linux-sgx/20220825051827.246698-1-jarkko@kernel.org/T/#u

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-25  5:49           ` Jarkko Sakkinen
@ 2022-08-25  8:34             ` Jarkko Sakkinen
  0 siblings, 0 replies; 16+ messages in thread
From: Jarkko Sakkinen @ 2022-08-25  8:34 UTC (permalink / raw)
  To: Haitao Huang
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Reinette Chatre, Paul Menzel, linux-sgx, LKML

On Thu, Aug 25, 2022 at 08:49:53AM +0300, Jarkko Sakkinen wrote:
> On Wed, Aug 24, 2022 at 09:12:06PM -0500, Haitao Huang wrote:
> > Hi Paul
> > 
> > On Tue, 23 Aug 2022 08:48:52 -0500, Paul Menzel <pmenzel@molgen.mpg.de>
> > wrote:
> > 
> > > Dear Dave,
> > > 
> > > 
> > > Am 20.08.22 um 08:13 schrieb Paul Menzel:
> > > 
> > > > Am 19.08.22 um 20:28 schrieb Dave Hansen:
> > > > > On 8/19/22 09:02, Paul Menzel wrote:
> > > > > > On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> > > > > > 
> > > > > > ```
> > > > > > [    0.000000] Linux version 5.18.0-4-amd64
> > > > > > (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5)
> > > > > > 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713)
> > > > > > #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
> > > > > > [    0.000000] Command line:
> > > > > > BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64
> > > > > > root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> > > > > > […]
> > > > > > [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS
> > > > > > 1.21.0 07/06/2022
> > > > > > […]
> > > > > > [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> > > > 
> > > > > Would you be able to send the entire dmesg, along with:
> > > >  The log message are attached to the first message, where I missed
> > > > to carbon-copy linux-sgx@ [1].
> > > > 
> > > > >     cat /proc/iomem # (as root)
> > > > > and
> > > > >     cpuid -1 --raw
> > > >  I am going to provide that next week. (Side note, Intel might have
> > > > some Dell XPS 9370 test machines in some QA lab.)
> > > 
> > > Please find both outputs at the end of the file.
> > > 
> > 
> > Could you also check output of "sudo rdmsr -x 0x3a"?
> > Also was CONFIG_X86_SGX_KVM set?
> > 
> > If CONFIG_X86_SGX_KVM is not set and bit 17 (SGX_LC) of the MSR 3A not set,
> > then I think following sequence during sgx_init is possible:
> > 
> > sgx_page_cache_init -> sgx_setup_epc_section
> >                        ->put all physical EPC pages in sgx_dirty_page_list.
> > Kick off ksgxd.
> > Later, sgx_drv_init returns none-zero due to this check:
> >     if (!cpu_feature_enabled(X86_FEATURE_SGX_LC))
> >         return -ENODEV;
> > sgx_vepc_init also returns none-zero if CONFIG_X86_SGX_KVM was not set.
> > 
> > And sgx_init will call kthread_stop(ksgxd_tsk):
> >     ret = sgx_drv_init();
> > 
> >     if (sgx_vepc_init() && ret)
> >         goto err_provision;
> > ...
> > err_provision:
> >     misc_deregister(&sgx_dev_provision);
> > 
> > err_kthread:
> >     kthread_stop(ksgxd_tsk);
> > 
> > 
> > That triggers __sgx_sanitize_pages return early due to these lines:
> >     /* dirty_page_list is thread-local, no need for a lock: */
> >     while (!list_empty(dirty_page_list)) {
> >         if (kthread_should_stop())
> >             return;
> > 
> > And that would trigger (depends on timing?) the warning in ksgxd due to
> > non-empty sgx_dirty_page_list
> > at that moment.
> 
> You're correct, and it's not a bug but completely legit behaviour.
> 
> And given that non-empty dirty page list is legit behavior WARN_ON()
> is not what should be used in here.
> 
> Fix coming in a bit.

https://lore.kernel.org/linux-sgx/20220825080802.259528-1-jarkko@kernel.org/T/#u

BR, Jarkko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-25  6:46         ` Paul Menzel
@ 2022-08-25  8:39           ` Jarkko Sakkinen
  0 siblings, 0 replies; 16+ messages in thread
From: Jarkko Sakkinen @ 2022-08-25  8:39 UTC (permalink / raw)
  To: Paul Menzel
  Cc: Dave Hansen, Thomas Gleixner, Ingo Molnar, Borislav Petkov, x86,
	Chatre, Reinette, linux-sgx, LKML

On Thu, Aug 25, 2022 at 08:46:19AM +0200, Paul Menzel wrote:
> Dear Jarkko,
> 
> 
> Am 25.08.22 um 07:25 schrieb Jarkko Sakkinen:
> > On Thu, Aug 25, 2022 at 07:57:30AM +0300, Jarkko Sakkinen wrote:
> > > On Fri, Aug 19, 2022 at 11:28:24AM -0700, Dave Hansen wrote:
> > > > On 8/19/22 09:02, Paul Menzel wrote:
> > > > > On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
> > > > > 
> > > > > ```
> > > > > [    0.000000] Linux version 5.18.0-4-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
> > > > > [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
> > > > > […]
> > > > > [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
> > > > > […]
> > > > > [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
> 
> > > > Would you be able to send the entire dmesg, along with:
> > > > 
> > > > 	cat /proc/iomem # (as root)
> > > > and
> > > > 	cpuid -1 --raw
> > > > 
> > > > I'm suspecting either a BIOS problem.  Reinette (cc'd) also thought this
> > > > might be a case of the SGX initialization getting a bit too far along
> > > > when it should have been disabled.
> > > > 
> > > > We had some bugs where we didn't stop fast enough after spitting out the
> > > > "SGX Launch Control is locked..." errors.
> > > 
> > > For some reason the pages do not get properly sanitized:
> > > 
> > > 	/* sanity check: */
> > > 	WARN_ON(!list_empty(&sgx_dirty_page_list));
> > > 
> > > EPC should be good, given that EREMOVE does not fail.
> > > If SGX would be disabled, also EREMOVE should fail.
> > 
> > Sorry forgot that in no circumstances we're printing the
> > error code inside __sgx_sanitize_pages(). I wrote a quick
> > patch to address this (attached) [*].
> > 
> > Paul,
> > 
> > Any chance to try the patch out?
> 
> Yes, I am going to try it in the next days.
> 
> > It's pretty hard to attach e.g. kprobe to grab this info. Does it
> > reproduce every single time?
> Yes, on each boot up.
> 
> > Alternatively: what kind of workload is triggering this?
> > I do own 2020 model XPS13, which might be able to
> > reproduce the same issue.
> 
> The Dell XPS 13 9370 is from 2018 (Intel i5-8350U), so no idea if it happens
> with later processors.

I think this should work out, and actually fix the issue:

https://lore.kernel.org/linux-sgx/20220825080802.259528-1-jarkko@kernel.org/T/#u

Just to add, perhaps for some future issue, I think my laptop and yours
are comparable because they have the SGX side pretty much the same. For
Icelake, things are not as comparable because it uses different type of
encryption engine in the hardware.

BR, Jarkko

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0
  2022-08-25  2:12         ` Haitao Huang
  2022-08-25  5:49           ` Jarkko Sakkinen
@ 2022-08-26  9:54           ` Paul Menzel
  1 sibling, 0 replies; 16+ messages in thread
From: Paul Menzel @ 2022-08-26  9:54 UTC (permalink / raw)
  To: Haitao Huang, Dave Hansen, Jarkko Sakkinen, Thomas Gleixner,
	Ingo Molnar, Borislav Petkov, x86, Reinette Chatre
  Cc: linux-sgx, LKML

Dear Haitao,


Thank you for your reply. Just for the record:

Am 25.08.22 um 04:12 schrieb Haitao Huang:

> On Tue, 23 Aug 2022 08:48:52 -0500, Paul Menzel wrote:

>> Am 20.08.22 um 08:13 schrieb Paul Menzel:
>>
>>> Am 19.08.22 um 20:28 schrieb Dave Hansen:
>>>> On 8/19/22 09:02, Paul Menzel wrote:
>>>>> On the Dell XPS 13 9370, Linux 5.18.16 prints the warning below:
>>>>>
>>>>> ```
>>>>> [    0.000000] Linux version 5.18.0-4-amd64 (debian-kernel@lists.debian.org) (gcc-11 (Debian 11.3.0-5) 11.3.0, GNU ld (GNU Binutils for Debian) 2.38.90.20220713) #1 SMP PREEMPT_DYNAMIC Debian 5.18.16-1 (2022-08-10)
>>>>> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-5.18.0-4-amd64 root=UUID=56f398e0-1e25-4fda-aa9f-611dece4b333 ro quiet
>>>>> […]
>>>>> [    0.000000] DMI: Dell Inc. XPS 13 9370/0RMYH9, BIOS 1.21.0 07/06/2022
>>>>> […]
>>>>> [    0.235418] sgx: EPC section 0x40200000-0x45f7ffff
>>>
>>>> Would you be able to send the entire dmesg, along with:
>>>  The log message are attached to the first message, where I missed to 
>>> carbon-copy linux-sgx@ [1].
>>>
>>>>     cat /proc/iomem # (as root)
>>>> and
>>>>     cpuid -1 --raw
>>>  I am going to provide that next week. (Side note, Intel might have 
>>> some Dell XPS 9370 test machines in some QA lab.)
>>
>> Please find both outputs at the end of the file.
> 
> Could you also check output of "sudo rdmsr -x 0x3a"?

40005

> Also was CONFIG_X86_SGX_KVM set?

No, it’s not set in Debian’s Linux kernel configuration.

> If CONFIG_X86_SGX_KVM is not set and bit 17 (SGX_LC) of the MSR 3A not set,
> then I think following sequence during sgx_init is possible:

40005 = 0x09c45, so bit 17 (if starting from 0) is 0.

[…]


Kind regards,

Paul

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2022-08-26  9:54 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <ce0b4d26-3a6e-7c5a-5f66-44cba05f9f35@molgen.mpg.de>
2022-08-19 16:02 ` WARNING: CPU: 1 PID: 83 at arch/x86/kernel/cpu/sgx/main.c:446 ksgxd+0x1b7/0x1d0 Paul Menzel
2022-08-19 18:28   ` Dave Hansen
2022-08-20  6:13     ` Paul Menzel
2022-08-23 13:48       ` Paul Menzel
2022-08-23 16:32         ` Dave Hansen
2022-08-23 22:33           ` Paul Menzel
2022-08-24 18:39             ` Dave Hansen
2022-08-25  5:27             ` Jarkko Sakkinen
2022-08-25  2:12         ` Haitao Huang
2022-08-25  5:49           ` Jarkko Sakkinen
2022-08-25  8:34             ` Jarkko Sakkinen
2022-08-26  9:54           ` Paul Menzel
2022-08-25  4:57     ` Jarkko Sakkinen
2022-08-25  5:25       ` Jarkko Sakkinen
2022-08-25  6:46         ` Paul Menzel
2022-08-25  8:39           ` Jarkko Sakkinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).