Linux CXL
 help / color / mirror / Atom feed
From: Alison Schofield <alison.schofield@intel.com>
To: Hongjian Fan <hongjian.fan@seagate.com>
Cc: "linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>
Subject: Re: Question on deferring dax registration to cxl module for CXL_REGION
Date: Thu, 11 Jan 2024 16:26:48 -0800	[thread overview]
Message-ID: <ZaCHSE13Q/KJAZQU@aschofie-mobl2> (raw)
In-Reply-To: <CH0PR20MB4250F264AA40C6E9CF1F599C90682@CH0PR20MB4250.namprd20.prod.outlook.com>

On Thu, Jan 11, 2024 at 09:03:48PM +0000, Hongjian Fan wrote:
> Hi CXL experts,
> 
> 
> I have observed the following behavior on iomem_resourece when CONFIG_CXL_REGION is enabled in the kernel.
> 
> CXL windows are inserted into iomem_resourece based on CEDT CFMWS. If there is only one CXL device attached to the host, the CXL window matches the soft reserved memory range, and the CXL window is inserted as the child of the PCI mem and the parent of the soft reserve. But if there are multiple CXL windows, each of the CXL window is part of the soft reserved memory range, the CXL window is inserted as the child of the soft reserved memory.
> 
> Function dax_hmem_platform_probe defers the dax region registration for the CXL window to cxl module.
> However, two issues seem to occur:
>         1) If the CXL window is not the direct child of the iomem_resourece, dax_hmem_platform_probe will not be able to detect and defer it. This means that if CFMWS contains multiple CXL windows, no deferral would happen.
>         2) If a CXL1.1 device is behind the CXL window, and the dax region registration is deferred. The dax region will not be created because CXL1.1 device doesn't have the HDM decoder and other features needed by the CXL module to create the dax region.
> 
> DAX ( and hmem ) module is not visible to the CXL device's features behind a CXL window, so it is impossible to defer only the CXL window for CXL2.0 devices.
> 
> If I want to make dax region show up when a single CXL1.1 device is attached, I can see two potential approaches:
>         1) Do not defer the CXL window in dax_hmem_platform_probe.
>                 Can we simply not defer? Current code will not defer if multiple CXL windows presents. Is any issue observed when multiple CXL devices are attached?
>         2) Defer all CXL windows, and let cxl module create the dax region for CXL1.1 device.
>                 But where should this creation be? It would be a long path to handle all the unvailable features from function cxl_pci_probe to reach function devm_cxl_add_dax_region.
> 
> Please provide your comments.
> 
> 

Hi Hondjian Fan,

This is familiar. In Aug '23 I stopped work on a patchset [1] aimed at
improving the soft reserved resource handling. From that cover letter:

1) Soft reserved resources were observed as sometimes being the parent
and sometimes being the child of a region resource. Patch 1 clears up
that inconsistency.

2) Soft reserved resources were also observed as stranded after region 
teardown, making the address space the region released unavailable for
reallocation. Patch 2 implements soft reserved resource removal.

By v3 of the set, we were rethinking the approach as Patch 2's juggling
of soft reserved spaces seemed silly and error prone. Also, the folks who
were hitting the soft reserved issue during hotplug were able to use CFMWS
address space not in the Soft Reserved range as a work-around.

Dan offered a couple of new approaches since then:
(I hope I'm not misquoting)

1) Insert cxl intersecting soft reserved resources into a separate
(non iomem_resource) resource tree, when / if any CXL region assembly
fails walk that side tree and move them all over to iomem_resource.

2) Given that it is already the case that the device-dax core waits for
cxl_acpi to mark ranges as IORES_DESC_CXL, and that we do not expect that
to fail. It means that cxl_acpi can then turn around and ask the device-dax
core to cache and delete the soft reserve address ranges. Then if CXL notices
a region assembly failure it can signal device-dax to release that cached
range as a new CXL disconnected DAX region.

3) CXL acpi walks the resource range knowing that at the beginning of time
Soft Reserved ranges are unparented making them easier to delete and
register them as "just in case" recovery ranges to device-dax.

Can you comment on whether any/all of these suggestions seems to address
what you are seeing?

Others thoughts on the approach this might take next.

Thanks,
Alison


[1] https://lore.kernel.org/linux-cxl/cover.1692638817.git.alison.schofield@intel.com/






> 
> Below is the /proc/iomem output from my hardware:
> 
> 1) When there is a single CXL2.0 device on the host, the CXL window is inserted in PCI mem and the soft reserved region is a child of the CXL window:
> 
>         6080000000-707fffffff : CXL Window 0
>           6080000000-707fffffff : region0
>                 6080000000-707fffffff : Soft Reserved
>                   6080000000-707fffffff : dax0.0
>                         6080000000-707fffffff : System RAM (kmem)
> 
>         A cxl region is inserted under the CXL window by function discover_region and the dax region is registered by cxl_dax_region_probe
> 
> 2) When there is a single CXL1.1 device on the host, it is similar but neither cxl region nor dax is created:
> 
>         6080000000-707fffffff : CXL Window 0
>           6080000000-707fffffff : Soft Reserved
> 
>         HDM decoder and other CXL2.0 features are missing from the CXL1.1 device so the CXL driver will not create related CXL structures. Because of the absence of the dax region, there is no numa node created for the cxl memory and the cxl memory is not usable in user space.
> 
> 3) When there are multiple CXL devices, regardless CXL1.1 or 2.0, the CXL window is created under the soft reserved region:
> 
>         6080000000-807fffffff : Soft Reserved
>           6080000000-707fffffff : CXL Window 0
>                 6080000000-707fffffff : region0
>                   6080000000-707fffffff : dax2.0
>                         6080000000-707fffffff : System RAM (kmem)
>           7080000000-807fffffff : CXL Window 1
>                 7080000000-807fffffff : dax3.0
>                   7080000000-807fffffff : System RAM (kmem)
> 
>         Both dax regions are registered by dax_hmem_platform_probe. The cxl region is created under CXL Window for the CXL2.0 devices.
> 
> 
> 
> Thanks,
> Hongjian Fan
> 
> Seagate Internal
> 

  reply	other threads:[~2024-01-12  0:26 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-01-11 21:03 Question on deferring dax registration to cxl module for CXL_REGION Hongjian Fan
2024-01-12  0:26 ` Alison Schofield [this message]
2024-01-12 22:38   ` Hongjian Fan
2024-02-15 21:33   ` Nathan Fontenot
2024-02-29 21:52     ` Dan Williams

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ZaCHSE13Q/KJAZQU@aschofie-mobl2 \
    --to=alison.schofield@intel.com \
    --cc=hongjian.fan@seagate.com \
    --cc=linux-cxl@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox