From: Gregory Price <gregory.price@memverge.com>
To: "Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>
Cc: 'Dan Williams' <dan.j.williams@intel.com>,
"Zhijian Li (Fujitsu)" <lizhijian@fujitsu.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
Jonathan Cameron <Jonathan.Cameron@huawei.com>,
"dave.jiang@intel.com" <dave.jiang@intel.com>,
Fan Ni <fan.ni@samsung.com>
Subject: Re: CXL volatile memory: How to restore the previous region/Interleave set
Date: Wed, 29 May 2024 12:40:41 -0400 [thread overview]
Message-ID: <ZldaiYTv1fPcGbCs@memverge.com> (raw)
In-Reply-To: <TYWPR01MB10082738078CF843AF514F7E790F22@TYWPR01MB10082.jpnprd01.prod.outlook.com>
On Wed, May 29, 2024 at 11:33:46AM +0000, Yasunori Gotou (Fujitsu) wrote:
> Hi Dan-san,
>
> > > Q3, For CXL volatile memory devices without LSA installed, if users
> > > expect to restore the Interleave set to the previous configuration
> > > after reboot, the questions are:
> > > Q3.1 Where should the Interleave Set information be stored?
> > > Q3.2 Which component is responsible for restoring the Interleave Set?
> >
> > The expectation is that BIOS, or the OS for hotplug devices, deploys a default
> > region configuration policy. That policy in the common is likely one of either
> > maximizing performance (maximize interleave across host-bridges), or
> > maximizing error isolation (create an x1-interleave region per endpoint).
>
> To be honest, I feel CFMWS seems to be something incomplete spec..
>
> When I first saw the " CXL* Type 3 Memory Device Software Guide", and noticed existing
> CFMWS, I thought that the firmware would create it based on some configuration,
> and OS would read it and create region for each window information.
> Even if user would execute cxl create-region command and configure interleaved region,
> I thought OS would tell it to firmware (or something), and CFMWS would reflect it on the next boot.
Ok this has just made me realize that I really do need to write that
article on the various forms of interleaving in a post-CXL world.
Quoting some of the specification rq:
CXL 3.1 Section 9.18.1.3: CXL Fixed Memory Window Structure
"""
The CFMWS structure describes zero or more Host Physical Address (HPA)
windows that are associated with each CXL Host Bridge. Each window
represents a contiguous HPA range that may be interleaved across one or
more targets, some of which are CXL Host Bridges. Associated with each
window are a set of restrictions that govern its usage. It is the
OSPM's responsibility to utilize each window for the specified use.
The HPA ranges described by CFMWS may include addresses that are current
assigned to CXL.mem devices. Before assigning HPAs from a fixed-memory
window, the OSPM must check the current assignments and avoid any
conflicts.
For any given HPA, it shall not be described by more than one CFMWS
entry
"""
Dan, please correct me if I'm wrong, but I'm fairly certain the
following is accurate.
The CFMWS is the BIOS/EFI's mechanism to report the system configuration
to the Operating System, not the Operating System's mechanism to change
system configurations (such as interleave). What you're talking about
is re-configuring HDM Decoders to interleave devices *presented by* the
CFMWS to the operating system.
Confusing, I know. But stick with me.
The interleave referred to the CFMWS is the BIOS/EFI telling the system
that memory accesses to this (physicall address) region will be interleaved
across the set of devices that are backing that region. The operating system
is responsible for reading these settings and presenting the memory to the
system accordingly.
The BIOS for example could configure all devices behind a single CFMW as
a "Single Device" that interleaves many physical devices, and the OS should
present it as such. In this scenario, there is no need to configure an
interleave region via cxl-cli - the BIOS already did that for you and
presented all these devices as a single device. All you need to do is
online the memory.
Configuring the CFMWS *should* (but may not) manifest as a set of BIOS/EFI
options that say how to configure a set of CXL devices behind one or more
host bridges prior to OS boot. This has its limitations. For example, you'd
need to reboot the system to make changes and hotplugging a memory device
becomes impossible. The BIOS/EFI would also need to understand when the
prior configuration is no longer valid - complicated and problematic.
Additionally, for more dynamic environments (devices behind a switch,
or a DCD) this more "static" configuration may (read: does) reduce your
management flexibility. I.e. hotplug may not be possible.
Alternatively, the BIOS may configure each device separately, and the
OS is may create a region that interleaves those devices explicitly by
programming an HDM decoder.
In this scenario, the OS could tear down the region, hotplug that device,
and recreate the region with new settings accordingly. Greater
management flexibility, but more software/management complexity.
This requires the OS to recreate the region/interleave set on each
reboot - and is probably the preferred mechanism for configuring the
system (if only because hotplug and device failure is not uncommon).
In this scenario, re-configuration looks a lot like storage mounting.
The device is either there or it isn't, and the configuration file
either works or it doesn't. Alternatively the daemon setting this all
up is free to try to make auto-configuration decisions.
(Final note about interleave for completion sake, but not really
relevant to this discussion)
Alternatively you could just online each device as a separate region,
and simply use something like set_mempolicy/numactl to implement
interleave on a per-task basis.
>
> But, really is that the above scenario is only for persistent memory with LSA.
> Even if a user configures a new region for volatile memory, and I could not find any specification to
> tell the new configuration to the Firmware.
>
> Could you tell me why such interface is not defined in the CXL specification?
> Is it just because there is no place to store region information for volatile memory?
>
>
> IMHO, users want to keep previous configuration after reboot even if it is volatile memory.
> Though users don't concern about contents of volatile memory, they want to keep region/interleave
> configuration after reboot. Especially, if previous configuration is some years ago, I'll bet
> users will forget how they configured regions against cxl volatile memory.
>
Probably we want some daemon that reconfigures this similar to how we're
doing it with storage. You register a preferred configuration given the
hardware environment that is valid until the hardware changes.
The OS shouldn't really be telling the firmware to configure itself if
only because what happens if you unplug a device?
~Gregory
next prev parent reply other threads:[~2024-05-29 16:40 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-05-24 7:32 CXL volatile memory: How to restore the previous region/Interleave set Zhijian Li (Fujitsu)
2024-05-29 1:08 ` Dan Williams
2024-05-29 10:19 ` Zhijian Li (Fujitsu)
2024-05-29 15:44 ` Gregory Price
2024-05-30 9:56 ` Zhijian Li (Fujitsu)
2024-05-29 11:33 ` Yasunori Gotou (Fujitsu)
2024-05-29 16:40 ` Gregory Price [this message]
2024-05-30 10:35 ` Yuquan Wang
2024-05-31 15:50 ` Gregory Price
2024-05-30 10:54 ` Yasunori Gotou (Fujitsu)
2024-05-31 20:56 ` Dan Williams
2024-06-03 5:01 ` Yasunori Gotou (Fujitsu)
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ZldaiYTv1fPcGbCs@memverge.com \
--to=gregory.price@memverge.com \
--cc=Jonathan.Cameron@huawei.com \
--cc=dan.j.williams@intel.com \
--cc=dave.jiang@intel.com \
--cc=fan.ni@samsung.com \
--cc=linux-cxl@vger.kernel.org \
--cc=lizhijian@fujitsu.com \
--cc=y-goto@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox