From: Dan Williams <dan.j.williams@intel.com>
To: "Yasunori Gotou (Fujitsu)" <y-goto@fujitsu.com>,
'Dan Williams' <dan.j.williams@intel.com>,
"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>
Subject: RE: Questions about CXL device (type 3 memory) hotplug
Date: Tue, 23 May 2023 10:36:06 -0700 [thread overview]
Message-ID: <646cf986dd030_afb7729452@dwillia2-xfh.jf.intel.com.notmuch> (raw)
In-Reply-To: <TYWPR01MB10082C8CB891827C6438D311D90409@TYWPR01MB10082.jpnprd01.prod.outlook.com>
Yasunori Gotou (Fujitsu) wrote:
>
> Thank you for your answer!
> Its progress seems to be better than I thought.
>
> I would like to ask more questions.
>
> > Yasunori Gotou (Fujitsu) wrote:
[..]
> > Correct, after the device is added and the driver attaches there is still a step
> > needed to configure a CXL region.
> >
> > For now that step is to manually run:
> >
> > cxl create-region
> >
> > ...later we might consider some udev rules to automatically assemble regions
> > from discovered capacity.
>
> Hmm, I suppose 2 types of udev rules may be necessary.
> The first one is for notify new CXL device is detected, and cxl-command assemble
> a region automatically.
Yes, I suspect this ends up being similar to the mdadm monitor policy
where the device arrival events trigger notification to a daemon that
can apply an assembly policy.
> The second one is for notify region is configured, online is execute for each
> memory block on the region by the notification, and rollback when one of the block
> fails hotadd If necessary.
This policy needs to coordinate with the
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE policy and the memhp_default_state
setting. I.e. the kernel may do this automatically depending on those
settings.
> > > Q4) Current CXL drivers/tools support Hot-removal request from PCIe?
> > >
> > > CXL specification says "In a managed Hot-Remove flow, software is
> > > notified of a hot removal request."
> >
> > Currently there is a requirement that:
> >
> > cxl disable-memdev
> >
> > ...is run before the device can be removed. There is no warning from the PCI
> > hotplug driver. Which means that if end user does the wrong sequence they
> > can crash the kernel / remove memory that may still be in active use.
>
> Ok.
> Though "Surprising remove" is not guaranteed by specification, I think
> "managed hot-removed flow" should be realized.
> I'll chase more what should we do about it.
The nuance here is that even though the PCI hotplug driver supports an
attention button and pauses to let the OS acknowledge the removal. That
acknowledgement is not coordinated with the associated drivers instead
those drivers just receive a ->remove() notification that can not be
failed.
So, this means that the CXL device must be shutdown manually with
daxctl offline-memory
cxl disable-region
cxl disable-memdev
...*before* the hotplug attention button is pressed. If any of those
commands fail the device is in active use by the kernel and the hotplug
attempt needs to be cancelled. My expectation is that CXL memory device
removal is not possible in the majority of cases. This is why the
Dynamic Capacity Device definition in CXL 3.0 allows for the flexibility
of partial removal.
> > > I think that CXL drivers/tools need to find which sections belongs to the
> > > requested device, and execute offline them at least. In addition,
> > > Fabric Manager may need to prepare removing the device due to
> > configuration
> > > change.
> > >
> > > Does current CXL drivers/tools can execute them?
> > > Otherwise, does it need to be implemented yet?
> >
> > Currently the 'cxl disable-memdev' command is not smart about determining
> > when the device is in active use it just claims that it is always in use. That is in
> > progress to be improved.
>
> Ok. I see.
>
> >
> > > Q5) How CXL driver treat region/namespace size against section size?
> > > Current x86-64 section size can be 2Gbyte, but CXL region size may be
> > > able to smaller than it.
> >
> > The section size is still 128MB, the hotplug memory block size is what expands
> > to 2GB. That size limits what can be onlined via the dax_kmem driver.
>
> Oops.
> OK, I understand I should change my word "section" to "hotplug memory block".
>
> One of the background of this question is "rollback".
> "If memory hotadd or hotremove for a memory block fails, is rollback available?".
>
> If a block hotadd sequence fails in the device for some reasons, its user wants to remove
> the device for the moment, and may want to retry hotadd again or try other device.
> To achieve it, already onlined blocks before failed block should be offlined again.
>
> If a block hotremove sequence fails in the device, its user would like to keep the device
> online to postpone replacing it or select other device for device pooling. (vice vesa).
> I don't find which component handle this situation.
It depends on how the memory is onlined and whether it gets pinned by
the kernel. As long as all of the memory is onlined to ZONE_MOVABLE then
there is a good chance to be able to get it back. However, ZONE_MOVABLE
is not a guarantee that memory can be removed later, and ZONE_MOVABLE
requires some ratio of ZONE_NORMAL memory to be present to make it
usable. See "Zone Imbalances" in
Documentation/admin-guide/mm/memory-hotplug.rst.
> I noticed that current users prefer online after device detection immediately, and kernel
> supports it. Though it is natural for some use-case, I feel it may be obstacle for rollback of
> CXL device hotplug failure.
Yes, this is a platform owner policy tradeoff decision. Maximize hotplug
capability by limiting how the memory is used, or maximize the
utilization of the memory by limiting hotplug flexibility. The kernel
defaults to maximizing the utilization of the memory, but administrator
policy can go as far as only allowing memory access through the
dedicated device-dax interface.
next prev parent reply other threads:[~2023-05-23 17:36 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-05-22 8:06 Questions about CXL device (type 3 memory) hotplug Yasunori Gotou (Fujitsu)
2023-05-23 0:11 ` Dan Williams
2023-05-23 8:31 ` Yasunori Gotou (Fujitsu)
2023-05-23 17:36 ` Dan Williams [this message]
2023-05-24 11:12 ` Yasunori Gotou (Fujitsu)
2023-05-24 20:51 ` Dan Williams
2023-05-25 10:32 ` Yasunori Gotou (Fujitsu)
2023-05-26 8:05 ` Yasunori Gotou (Fujitsu)
2023-05-26 14:48 ` Dan Williams
2023-05-29 8:07 ` Yasunori Gotou (Fujitsu)
2023-06-06 17:58 ` Dan Williams
2023-06-08 7:39 ` Yasunori Gotou (Fujitsu)
2023-06-08 18:37 ` Dan Williams
2023-06-09 1:02 ` Yasunori Gotou (Fujitsu)
2023-05-23 13:34 ` Vikram Sethi
2023-05-23 18:40 ` Dan Williams
2023-05-24 0:02 ` Vikram Sethi
2023-05-24 4:03 ` Dan Williams
2023-05-24 14:47 ` Vikram Sethi
2023-05-24 21:20 ` Dan Williams
2023-05-31 4:25 ` Vikram Sethi
2023-06-06 20:54 ` Dan Williams
2023-06-07 1:06 ` Vikram Sethi
2023-06-07 15:12 ` Jonathan Cameron
2023-06-07 18:44 ` Vikram Sethi
2023-06-08 15:19 ` Jonathan Cameron
2023-06-08 18:41 ` Dan Williams
2024-03-27 7:10 ` Yuquan Wang
2024-03-27 7:18 ` Yuquan Wang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=646cf986dd030_afb7729452@dwillia2-xfh.jf.intel.com.notmuch \
--to=dan.j.williams@intel.com \
--cc=linux-cxl@vger.kernel.org \
--cc=y-goto@fujitsu.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox