Linux CXL
 help / color / mirror / Atom feed
* [ISSUE] memdev cannot be enabled after reboot due to failed dvsec range check [QEMU setup]
@ 2025-01-14 20:30 Fan Ni
  2025-01-15  1:06 ` Zhijian Li (Fujitsu)
  0 siblings, 1 reply; 4+ messages in thread
From: Fan Ni @ 2025-01-14 20:30 UTC (permalink / raw)
  To: linux-cxl; +Cc: a.manzanares, fan.ni, anisa.su887, dave

Hi,

Recently, while testing cxl with qemu setup, I found the memdev cannot
be enabled successfully after reboot. 

Here is the setup and the steps I have tried.

QEMU:
https://gitlab.com/qemu-project/qemu.git
branch: master
commit: 8032c78e556cd0baec111740a6c636863f9bd7c8

Kernel:
https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/
branch: next
2f84d072bdcb7d6ec66cc4d0de9f37a3dc394cd2

Steps to reproduce the issue.
1.  start the vm with cxl pmem device attached directly to RP.
2. Load the cxl drivers cxl_acpi cxl_core cxl_pci cxl_port cxl_mem, etc.
Everyting works expected, the memory is corrected enabled and shown with
cxl list.
3. Reboot the VM (run reboot command inside vm, no shutdown);
4. Load the cxl drivers as in step 2. the cxl pmem is not correctly enabled. 

dmesg shows some error as below:
-------------------------------
[   17.131729] cxl_core:cxl_hdm_decode_init:443: cxl_pci 0000:0d:00.0: DVSEC Range0 denied by platform
[   17.135267] cxl_pci 0000:0d:00.0: Range register decodes outside platform defined CXL ranges.
[   17.138428] cxl_core:cxl_bus_probe:2073: cxl_port endpoint2: probe: -6
[   17.141104] cxl_core:devm_cxl_add_port:936: cxl_mem mem0: endpoint2 added to port1
[   17.143703] cxl_mem mem0: endpoint2 failed probe
[   17.145324] cxl_core:cxl_bus_probe:2073: cxl_mem mem0: probe: -6
[   17.171416] cxl_core:cxl_detach_ep:1499: cxl_mem mem0: disconnect mem0 from port1
------------------------------
Compare the step 2 and 4 with debug info. we can see,
In step 2, when entry function: cxl_hdm_decode_init().

(gdb) p *info
$2 = {mem_enabled = false, ranges = 0, port = 0xffff8881097eac00, dvsec_range = {{start = 0, end = 0}, {start = 0, end = 0}}}

The info struct is from cxl_dvsec_rr_decode(), where if mem_enabled is
not enabled, it will return directly without reading dvsec range, so
ranges == 0.
This is what happened in step 2: no dvsec ranges are provided to the function for checking.

When init the hdm decoder in cxl_hdm_decode_init function, the memory enable bit will be set.

In step 4, after reboot, the enabled memory enable bit sustained and the dvsec range
register will be read from the device in cxl_dvsec_rr_decode. 
So when entrying cxl_hdm_decode_init(), 
------------------------------------
$2 = {mem_enabled = true, ranges = 1, port = 0xffff888103c77400, dvsec_range = {{start = 0, end = 536870911}, {start = 0, end = 0}}}
Breakpoint 2 at 0xffffffffc0657bbe: file drivers/cxl/core/pci.c, line 416.
------------------------------------
It will cause the dvsec_range_allowed() failing as the range from dvsec range
registers starts at address zero [0, 512], which does not match the hpa range
stored in cxld->hpa_range, causing the issue.

------------------------------------
Thread 1 hit Breakpoint 4, dvsec_range_allowed (dev=0xffff888108af9848,
    arg=0xffffc9000059f9b0) at drivers/cxl/core/pci.c:265
265		if (!(cxld->flags & CXL_DECODER_F_RAM))
(gdb) b 268
Breakpoint 5 at 0xffffffffc0657d31: file drivers/cxl/core/pci.c, line 271.
(gdb) p /x cxld->hpa_range
$5 = {start = 0xa90000000, end = 0xb8fffffff}
(gdb) p /x *dev_range
$7 = {start = 0x0, end = 0x1fffffff}
(gdb)
------------------------------------
The hpa_range is set when parsing the cfmws in __cxl_parse_cfmws.

Any throughts?

Open question: do we need to update the dvsec range register after we parse the
cfmws to make the two above match.
-- 
Fan Ni

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ISSUE] memdev cannot be enabled after reboot due to failed dvsec range check [QEMU setup]
  2025-01-14 20:30 [ISSUE] memdev cannot be enabled after reboot due to failed dvsec range check [QEMU setup] Fan Ni
@ 2025-01-15  1:06 ` Zhijian Li (Fujitsu)
  2025-01-15 23:02   ` Fan Ni
  0 siblings, 1 reply; 4+ messages in thread
From: Zhijian Li (Fujitsu) @ 2025-01-15  1:06 UTC (permalink / raw)
  To: Fan Ni, linux-cxl@vger.kernel.org
  Cc: a.manzanares@samsung.com, fan.ni@samsung.com,
	anisa.su887@gmail.com, dave@stgolabs.net, qemu-devel@nongnu.org

Cced QEMU,

Hi Fan,

I recalled we had a reboot issue[1] months ago
I guess your issue was caused by some registers not reset during reboot.

[1] https://lore.kernel.org/linux-cxl/20240409075846.85370-1-lizhijian@fujitsu.com/


On 15/01/2025 04:30, Fan Ni wrote:
> Hi,
> 
> Recently, while testing cxl with qemu setup, I found the memdev cannot
> be enabled successfully after reboot.
> 
> Here is the setup and the steps I have tried.
> 
> QEMU:
> https://gitlab.com/qemu-project/qemu.git
> branch: master
> commit: 8032c78e556cd0baec111740a6c636863f9bd7c8
> 
> Kernel:
> https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/
> branch: next
> 2f84d072bdcb7d6ec66cc4d0de9f37a3dc394cd2
> 
> Steps to reproduce the issue.
> 1.  start the vm with cxl pmem device attached directly to RP.
> 2. Load the cxl drivers cxl_acpi cxl_core cxl_pci cxl_port cxl_mem, etc.
> Everyting works expected, the memory is corrected enabled and shown with
> cxl list.
> 3. Reboot the VM (run reboot command inside vm, no shutdown);
> 4. Load the cxl drivers as in step 2. the cxl pmem is not correctly enabled.
> 
> dmesg shows some error as below:
> -------------------------------
> [   17.131729] cxl_core:cxl_hdm_decode_init:443: cxl_pci 0000:0d:00.0: DVSEC Range0 denied by platform
> [   17.135267] cxl_pci 0000:0d:00.0: Range register decodes outside platform defined CXL ranges.
> [   17.138428] cxl_core:cxl_bus_probe:2073: cxl_port endpoint2: probe: -6
> [   17.141104] cxl_core:devm_cxl_add_port:936: cxl_mem mem0: endpoint2 added to port1
> [   17.143703] cxl_mem mem0: endpoint2 failed probe
> [   17.145324] cxl_core:cxl_bus_probe:2073: cxl_mem mem0: probe: -6
> [   17.171416] cxl_core:cxl_detach_ep:1499: cxl_mem mem0: disconnect mem0 from port1
> ------------------------------
> Compare the step 2 and 4 with debug info. we can see,
> In step 2, when entry function: cxl_hdm_decode_init().
> 
> (gdb) p *info
> $2 = {mem_enabled = false, ranges = 0, port = 0xffff8881097eac00, dvsec_range = {{start = 0, end = 0}, {start = 0, end = 0}}}
> 
> The info struct is from cxl_dvsec_rr_decode(), where if mem_enabled is
> not enabled, it will return directly without reading dvsec range, so
> ranges == 0.
> This is what happened in step 2: no dvsec ranges are provided to the function for checking.
> 
> When init the hdm decoder in cxl_hdm_decode_init function, the memory enable bit will be set.
> 
> In step 4, after reboot, the enabled memory enable bit sustained and the dvsec range
> register will be read from the device in cxl_dvsec_rr_decode.
> So when entrying cxl_hdm_decode_init(),
> ------------------------------------
> $2 = {mem_enabled = true, ranges = 1, port = 0xffff888103c77400, dvsec_range = {{start = 0, end = 536870911}, {start = 0, end = 0}}}
> Breakpoint 2 at 0xffffffffc0657bbe: file drivers/cxl/core/pci.c, line 416.
> ------------------------------------
> It will cause the dvsec_range_allowed() failing as the range from dvsec range
> registers starts at address zero [0, 512], which does not match the hpa range
> stored in cxld->hpa_range, causing the issue.
> 
> ------------------------------------
> Thread 1 hit Breakpoint 4, dvsec_range_allowed (dev=0xffff888108af9848,
>      arg=0xffffc9000059f9b0) at drivers/cxl/core/pci.c:265
> 265		if (!(cxld->flags & CXL_DECODER_F_RAM))
> (gdb) b 268
> Breakpoint 5 at 0xffffffffc0657d31: file drivers/cxl/core/pci.c, line 271.
> (gdb) p /x cxld->hpa_range
> $5 = {start = 0xa90000000, end = 0xb8fffffff}
> (gdb) p /x *dev_range
> $7 = {start = 0x0, end = 0x1fffffff}
> (gdb)
> ------------------------------------
> The hpa_range is set when parsing the cfmws in __cxl_parse_cfmws.
> 
> Any throughts?
> 
> Open question: do we need to update the dvsec range register after we parse the
> cfmws to make the two above match.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ISSUE] memdev cannot be enabled after reboot due to failed dvsec range check [QEMU setup]
  2025-01-15  1:06 ` Zhijian Li (Fujitsu)
@ 2025-01-15 23:02   ` Fan Ni
  2025-01-16 10:46     ` Jonathan Cameron
  0 siblings, 1 reply; 4+ messages in thread
From: Fan Ni @ 2025-01-15 23:02 UTC (permalink / raw)
  To: Zhijian Li (Fujitsu)
  Cc: Fan Ni, linux-cxl@vger.kernel.org, a.manzanares@samsung.com,
	anisa.su887@gmail.com, dave@stgolabs.net, qemu-devel@nongnu.org

On Wed, Jan 15, 2025 at 01:06:24AM +0000, Zhijian Li (Fujitsu) wrote:
> Cced QEMU,
> 
> Hi Fan,
> 
> I recalled we had a reboot issue[1] months ago
> I guess your issue was caused by some registers not reset during reboot.
> 
> [1] https://lore.kernel.org/linux-cxl/20240409075846.85370-1-lizhijian@fujitsu.com/
> 
Hi Zhijian,
Thanks for the pointer. With the fix applied, the issue goes away.

Fan
> 
> On 15/01/2025 04:30, Fan Ni wrote:
> > Hi,
> > 
> > Recently, while testing cxl with qemu setup, I found the memdev cannot
> > be enabled successfully after reboot.
> > 
> > Here is the setup and the steps I have tried.
> > 
> > QEMU:
> > https://gitlab.com/qemu-project/qemu.git
> > branch: master
> > commit: 8032c78e556cd0baec111740a6c636863f9bd7c8
> > 
> > Kernel:
> > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/
> > branch: next
> > 2f84d072bdcb7d6ec66cc4d0de9f37a3dc394cd2
> > 
> > Steps to reproduce the issue.
> > 1.  start the vm with cxl pmem device attached directly to RP.
> > 2. Load the cxl drivers cxl_acpi cxl_core cxl_pci cxl_port cxl_mem, etc.
> > Everyting works expected, the memory is corrected enabled and shown with
> > cxl list.
> > 3. Reboot the VM (run reboot command inside vm, no shutdown);
> > 4. Load the cxl drivers as in step 2. the cxl pmem is not correctly enabled.
> > 
> > dmesg shows some error as below:
> > -------------------------------
> > [   17.131729] cxl_core:cxl_hdm_decode_init:443: cxl_pci 0000:0d:00.0: DVSEC Range0 denied by platform
> > [   17.135267] cxl_pci 0000:0d:00.0: Range register decodes outside platform defined CXL ranges.
> > [   17.138428] cxl_core:cxl_bus_probe:2073: cxl_port endpoint2: probe: -6
> > [   17.141104] cxl_core:devm_cxl_add_port:936: cxl_mem mem0: endpoint2 added to port1
> > [   17.143703] cxl_mem mem0: endpoint2 failed probe
> > [   17.145324] cxl_core:cxl_bus_probe:2073: cxl_mem mem0: probe: -6
> > [   17.171416] cxl_core:cxl_detach_ep:1499: cxl_mem mem0: disconnect mem0 from port1
> > ------------------------------
> > Compare the step 2 and 4 with debug info. we can see,
> > In step 2, when entry function: cxl_hdm_decode_init().
> > 
> > (gdb) p *info
> > $2 = {mem_enabled = false, ranges = 0, port = 0xffff8881097eac00, dvsec_range = {{start = 0, end = 0}, {start = 0, end = 0}}}
> > 
> > The info struct is from cxl_dvsec_rr_decode(), where if mem_enabled is
> > not enabled, it will return directly without reading dvsec range, so
> > ranges == 0.
> > This is what happened in step 2: no dvsec ranges are provided to the function for checking.
> > 
> > When init the hdm decoder in cxl_hdm_decode_init function, the memory enable bit will be set.
> > 
> > In step 4, after reboot, the enabled memory enable bit sustained and the dvsec range
> > register will be read from the device in cxl_dvsec_rr_decode.
> > So when entrying cxl_hdm_decode_init(),
> > ------------------------------------
> > $2 = {mem_enabled = true, ranges = 1, port = 0xffff888103c77400, dvsec_range = {{start = 0, end = 536870911}, {start = 0, end = 0}}}
> > Breakpoint 2 at 0xffffffffc0657bbe: file drivers/cxl/core/pci.c, line 416.
> > ------------------------------------
> > It will cause the dvsec_range_allowed() failing as the range from dvsec range
> > registers starts at address zero [0, 512], which does not match the hpa range
> > stored in cxld->hpa_range, causing the issue.
> > 
> > ------------------------------------
> > Thread 1 hit Breakpoint 4, dvsec_range_allowed (dev=0xffff888108af9848,
> >      arg=0xffffc9000059f9b0) at drivers/cxl/core/pci.c:265
> > 265		if (!(cxld->flags & CXL_DECODER_F_RAM))
> > (gdb) b 268
> > Breakpoint 5 at 0xffffffffc0657d31: file drivers/cxl/core/pci.c, line 271.
> > (gdb) p /x cxld->hpa_range
> > $5 = {start = 0xa90000000, end = 0xb8fffffff}
> > (gdb) p /x *dev_range
> > $7 = {start = 0x0, end = 0x1fffffff}
> > (gdb)
> > ------------------------------------
> > The hpa_range is set when parsing the cfmws in __cxl_parse_cfmws.
> > 
> > Any throughts?
> > 
> > Open question: do we need to update the dvsec range register after we parse the
> > cfmws to make the two above match.

-- 
Fan Ni (From gmail)

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [ISSUE] memdev cannot be enabled after reboot due to failed dvsec range check [QEMU setup]
  2025-01-15 23:02   ` Fan Ni
@ 2025-01-16 10:46     ` Jonathan Cameron
  0 siblings, 0 replies; 4+ messages in thread
From: Jonathan Cameron @ 2025-01-16 10:46 UTC (permalink / raw)
  To: Fan Ni
  Cc: Zhijian Li (Fujitsu), linux-cxl@vger.kernel.org,
	a.manzanares@samsung.com, anisa.su887@gmail.com,
	dave@stgolabs.net, qemu-devel@nongnu.org

On Wed, 15 Jan 2025 23:02:32 +0000
Fan Ni <nifan.cxl@gmail.com> wrote:

> On Wed, Jan 15, 2025 at 01:06:24AM +0000, Zhijian Li (Fujitsu) wrote:
> > Cced QEMU,
> > 
> > Hi Fan,
> > 
> > I recalled we had a reboot issue[1] months ago
> > I guess your issue was caused by some registers not reset during reboot.
> > 
> > [1] https://lore.kernel.org/linux-cxl/20240409075846.85370-1-lizhijian@fujitsu.com/
> >   
> Hi Zhijian,
> Thanks for the pointer. With the fix applied, the issue goes away.

Note that as per the thread above, that fix is not sufficient which
is why I dropped it again from my trees.

Reset is not currently well handled by the qemu code.
I'm happy to look at patches to fully support it but that fix needs
to be complete and not break any other cases.

Jonathan


> 
> Fan
> > 
> > On 15/01/2025 04:30, Fan Ni wrote:  
> > > Hi,
> > > 
> > > Recently, while testing cxl with qemu setup, I found the memdev cannot
> > > be enabled successfully after reboot.
> > > 
> > > Here is the setup and the steps I have tried.
> > > 
> > > QEMU:
> > > https://gitlab.com/qemu-project/qemu.git
> > > branch: master
> > > commit: 8032c78e556cd0baec111740a6c636863f9bd7c8
> > > 
> > > Kernel:
> > > https://git.kernel.org/pub/scm/linux/kernel/git/cxl/cxl.git/
> > > branch: next
> > > 2f84d072bdcb7d6ec66cc4d0de9f37a3dc394cd2
> > > 
> > > Steps to reproduce the issue.
> > > 1.  start the vm with cxl pmem device attached directly to RP.
> > > 2. Load the cxl drivers cxl_acpi cxl_core cxl_pci cxl_port cxl_mem, etc.
> > > Everyting works expected, the memory is corrected enabled and shown with
> > > cxl list.
> > > 3. Reboot the VM (run reboot command inside vm, no shutdown);
> > > 4. Load the cxl drivers as in step 2. the cxl pmem is not correctly enabled.
> > > 
> > > dmesg shows some error as below:
> > > -------------------------------
> > > [   17.131729] cxl_core:cxl_hdm_decode_init:443: cxl_pci 0000:0d:00.0: DVSEC Range0 denied by platform
> > > [   17.135267] cxl_pci 0000:0d:00.0: Range register decodes outside platform defined CXL ranges.
> > > [   17.138428] cxl_core:cxl_bus_probe:2073: cxl_port endpoint2: probe: -6
> > > [   17.141104] cxl_core:devm_cxl_add_port:936: cxl_mem mem0: endpoint2 added to port1
> > > [   17.143703] cxl_mem mem0: endpoint2 failed probe
> > > [   17.145324] cxl_core:cxl_bus_probe:2073: cxl_mem mem0: probe: -6
> > > [   17.171416] cxl_core:cxl_detach_ep:1499: cxl_mem mem0: disconnect mem0 from port1
> > > ------------------------------
> > > Compare the step 2 and 4 with debug info. we can see,
> > > In step 2, when entry function: cxl_hdm_decode_init().
> > > 
> > > (gdb) p *info
> > > $2 = {mem_enabled = false, ranges = 0, port = 0xffff8881097eac00, dvsec_range = {{start = 0, end = 0}, {start = 0, end = 0}}}
> > > 
> > > The info struct is from cxl_dvsec_rr_decode(), where if mem_enabled is
> > > not enabled, it will return directly without reading dvsec range, so
> > > ranges == 0.
> > > This is what happened in step 2: no dvsec ranges are provided to the function for checking.
> > > 
> > > When init the hdm decoder in cxl_hdm_decode_init function, the memory enable bit will be set.
> > > 
> > > In step 4, after reboot, the enabled memory enable bit sustained and the dvsec range
> > > register will be read from the device in cxl_dvsec_rr_decode.
> > > So when entrying cxl_hdm_decode_init(),
> > > ------------------------------------
> > > $2 = {mem_enabled = true, ranges = 1, port = 0xffff888103c77400, dvsec_range = {{start = 0, end = 536870911}, {start = 0, end = 0}}}
> > > Breakpoint 2 at 0xffffffffc0657bbe: file drivers/cxl/core/pci.c, line 416.
> > > ------------------------------------
> > > It will cause the dvsec_range_allowed() failing as the range from dvsec range
> > > registers starts at address zero [0, 512], which does not match the hpa range
> > > stored in cxld->hpa_range, causing the issue.
> > > 
> > > ------------------------------------
> > > Thread 1 hit Breakpoint 4, dvsec_range_allowed (dev=0xffff888108af9848,
> > >      arg=0xffffc9000059f9b0) at drivers/cxl/core/pci.c:265
> > > 265		if (!(cxld->flags & CXL_DECODER_F_RAM))
> > > (gdb) b 268
> > > Breakpoint 5 at 0xffffffffc0657d31: file drivers/cxl/core/pci.c, line 271.
> > > (gdb) p /x cxld->hpa_range
> > > $5 = {start = 0xa90000000, end = 0xb8fffffff}
> > > (gdb) p /x *dev_range
> > > $7 = {start = 0x0, end = 0x1fffffff}
> > > (gdb)
> > > ------------------------------------
> > > The hpa_range is set when parsing the cfmws in __cxl_parse_cfmws.
> > > 
> > > Any throughts?
> > > 
> > > Open question: do we need to update the dvsec range register after we parse the
> > > cfmws to make the two above match.  
> 


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2025-01-16 10:46 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2025-01-14 20:30 [ISSUE] memdev cannot be enabled after reboot due to failed dvsec range check [QEMU setup] Fan Ni
2025-01-15  1:06 ` Zhijian Li (Fujitsu)
2025-01-15 23:02   ` Fan Ni
2025-01-16 10:46     ` Jonathan Cameron

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox