qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Adam Manzanares <a.manzanares@samsung.com>
To: Jonathan Cameron <Jonathan.Cameron@huawei.com>
Cc: Gregory Price <gregory.price@memverge.com>,
	Gregory Price <gourry.memverge@gmail.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"mst@redhat.com" <mst@redhat.com>,
	"marcel.apfelbaum@gmail.com" <marcel.apfelbaum@gmail.com>,
	"imammedo@redhat.com" <imammedo@redhat.com>,
	"ani@anisinha.ca" <ani@anisinha.ca>,
	"alison.schofield@intel.com" <alison.schofield@intel.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	"bwidawsk@kernel.org" <bwidawsk@kernel.org>,
	"hchkuo@avery-design.com.tw" <hchkuo@avery-design.com.tw>,
	"cbrowy@avery-design.com" <cbrowy@avery-design.com>,
	"ira.weiny@intel.com" <ira.weiny@intel.com>,
	"Dan Williams" <dan.j.williams@intel.com>
Subject: Re: [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices
Date: Thu, 27 Oct 2022 18:10:23 +0000	[thread overview]
Message-ID: <20221027181014.GA317461@bgt-140510-bm01> (raw)
In-Reply-To: <20221027115854.00001f76@huawei.com>

On Thu, Oct 27, 2022 at 11:58:54AM +0100, Jonathan Cameron wrote:
> On Wed, 26 Oct 2022 16:47:18 -0400
> Gregory Price <gregory.price@memverge.com> wrote:
> 
> > On Wed, Oct 26, 2022 at 08:13:24PM +0000, Adam Manzanares wrote:
> > > On Tue, Oct 25, 2022 at 08:47:33PM -0400, Gregory Price wrote:  
> > > > Submitted as an extention to the multi-feature branch maintained
> > > > by Jonathan Cameron at:
> > > > https://urldefense.com/v3/__https://gitlab.com/jic23/qemu/-/tree/cxl-2022-10-24__;!!EwVzqGoTKBqv-0DWAJBm!RyiGL5B1XmQnVFwgxikKJeosPMKtoO1cTr61gIq8fwqfju8l4cbGZGwAEkKXIJB-Dbkfi_LNN2rGCbzMISz65cTxpAxI9pQ$   
> > > > 
> > > > 
> > > > Summary of Changes:
> > > > 1) E820 CFMW Bug fix.  
> > > > 2) Add CXL_CAPACITY_MULTIPLIER definition to replace magic numbers
> > > > 3) Multi-Region and Volatile Memory support for CXL Type-3 Devices
> > > > 4) CXL Type-3 SRAT Generation when NUMA node is attached to memdev
> 
> +CC Dan for a question on status of Generic Ports ACPI code first ECN.
> Given that was done on the mailing list I can find any public tracking
> of whether it was accepted or not - hence whether we can get on with
> implementation.  There hasn't been a release ACPI spec since before
> that was proposed so we need that confirmation of the code first proposal
> being accepted to get things moving.
> 
> > > > 
> > > > 
> > > > Regarding the E820 fix
> > > >   * This bugfix is required for memory regions to work on x86
> > > >   * input from Dan Williams and others suggest that E820 entry for
> > > >     the CFMW should not exist, as it is expected to be dynamically
> > > >     assigned at runtime.  If this entry exists, it instead blocks
> > > >     region creation by nature of the memory region being marked as
> > > >     reserved.  
> > > 
> > > For CXL 2.0 it is my understanding that volatile capacity present at boot will
> > > be advertised by the firmware. In the absence of EFI I would assume this would
> > > be provided in the e820 map.   
> 
> Whilst this is one option, it is certainly not the case that all CXL 2.0
> platforms will decide to do any setup of CXL memory (volatile or not) in the
> firmware.  They can leave it entirely to the OS, so a cold plug flow.
> Early platforms will do the setup in BIOS to support unware OSes, once
> that's not a problem any more the only reason you'd want to do this is if
> you don't have other RAM to boot the system, or for some reason you want
> your host kernel etc in the CXL attached memory.
> 
> I'd expect to see BIOS having OS managed configuration as an option in the
> intermediate period where some OSes are aware, others not.
> OS knows more about usecase / policy so can make better choices on interleaving
> etc of volatile CXL type 3 memory (let alone the fun corner of devices
> where you can dynamically change the mix of volatile and non volatile
> memory).
> 
> 
> > 
> > The issue in this case is very explicitly that a double-mapping occurs
> > for the same region.  An E820 mapping for RESERVED is set *and* the
> > region driver allocates a CXL CFMW mapping.  As a result the region
> > drive straight up fails to allocate regions.
> > 
> > So in either case - at least from my view - the entry added as RESERVED
> > is just wrong.
> > 
> > This is separate from type-3 device SRAT entries and default mappings
> > for volatile regions.  For this situation, if you explicitly assign the
> > memdev backing a type-3 device to a numa node, then an SRAT area is
> > generated and an explicit e820 entry is generated and marked usable -
> > though I think there are likely issues with this kind of
> > double-referencing.
> 
> SRAT setup for CXL type 3 devices is to my mind is a job for a full BIOS,
> not QEMU level of faking a few things. That BIOS should also
> be doing the full configuration (HDM Decoders and all the rest).  ARM has
> a prototype for one of the fixed virtual platforms that does this (talk
> at Plumbers Uconf), we should look to do the same if we want to test
> those flows using QEMU via appropriate changes in EDK2 to walk topology
> and configure everything.  Until the devices are configured there is no
> way to configure the SLIT, HMAT entries that align with the SRAT ones
> (in theory those can be updated at runtime via _SLI, _HMA but in 
> practice, I'm fairly sure Linux doesn't support doing that.)
> 
> 
> > 
> > > 
> > > Is the region driver meant to cover volatile capacity present at boot? I was
> > > under the impression that it would be used for hot added volatile memory. It
> > > would be good to cover all of these assumptions for the e820 fix.  
> > 
> > This region appears to cover hotplug memory behind the CFMW.  The
> > problem is that this e820 RESERVED mapping blocks the CFMW region from
> > being used at all.
> > 
> > Without this, you can't use a type-3 persistent region, even with
> > support, let alone a volatile region.  In attempting to use a persistent
> > region as volatile via ndctl and friends, I'm seeing further issues (it
> > cannot be assigned to a numa node successfully), but that's a separate
> > issue.
> For the Numa node bit...
> 
> So far, the CDAT table isn't used in Linux (read out for debug purposes
> is supported only).  That all needs to be added yet to get the NUMA node
> allocations to work correctly.  It shouldn't have anything to do with SRAT
> unless the BIOS has done the full full configuration (same way CXL will work
> with a legacy OS).  Come to think of it, that should definitely be on the
> priority list for kernel support (don't think it was on the list on Tuesday?)
> 
> > 
> > > 
> > > Lastly it is my understanding that the region driver does not have support for
> > > volatile memory. It would be great to add that functionality if we make this
> > > change in QEMU.
> > >   
> > 
> > Right now this is true, but it seems a bit of a chicken/egg scenario.
> > Nothing to test against vs no support.  Nudging this along such that we
> > can at least report an (unusable) hot-add volatile memory region would
> > provide someone working with the region driver something to poke and
> > prod at.
> 
> Agreed. This is same place as Ben's original CXL QEMU work on non volatile.
> Need something to develop against so we'll at least have QEMU patches
> available whilst the kernel side is in development (hopefully this cycle)
> > 
> > > > Regarding SRAT Generation for Type-3 Devices
> > > >   * Co-Developed by Davidlohr Bueso.  Built from his base patch and
> > > >     extended to work with both volatile and persistent regions.
> > > >   * This can be used to demonstrate static type-3 device mapping and
> > > >     testing numa-access to type-3 device memory regions.  
> > 
> > Regarding "volatile memory present at boot" - there is still two ways
> > for that memory to be onlined: Statically (entered as an explicit e820
> > region after reading the SRAT), or Dynamically (hot-add by the region
> > driver).
> > 
> > This patch would at least allow an SRAT to be generated if you
> > explicitly add a NUMA node mapping to it.  Although I concede that I'm
> > not entirely sure what is "correct" here.
> 
> For hotplug, needs to be the region driver, not here (or BIOS if you
> are doing a BIOS driven flow - in which case the whole topology is
> discovered and mostly configured by the BIOS before the OS reaches it
> - including filling in SRAT, SLIT, HMAT etCc).
> > 
> > What this ends up looking like is mapping a memdev to both a numa node
> > and to a type-3 device.  Though that seems wrong.
> > 
> > After further testing it seems like creating a CPU-less, Memory-less
> > NUMA node with the intent of mapping volatile memory regions to it is
> > not supported (yet?).
> 
> Yup, and I doubt it ever will be for reasons above. 
> 
> BIOS does it all, or OS does it all.  Mixing and matching is a mess
> (there is an exception - Generic Port entries in SRAT - those we do
> need for the OS driven flow and possibly also the BIOS flow
> - patches welcome! I'd forgotten that on my list of QEMU stuff that
> needs doing.)

Based on the discussions is it safe to say we have settled on an OS driven flow
for the current QEMU CXL emulation. 

> 
> https://urldefense.com/v3/__https://lore.kernel.org/all/e1a52da9aec90766da5de51b1b839fd95d63a5af.camel@intel.com/__;!!EwVzqGoTKBqv-0DWAJBm!XLqjPd1aXt3i5NXrZhakQlGdgJ5o4tcfV_5iUEp8vwBLv8T1Ft1OVHPI0p7KpYSFDYzhAGj_ulMz1LfZVmJgrOvrO-_v7udl$  
> 
> If anyone is implementing that, also good to do Generic Initiators
> as they are very similar.
> 
> Jonathan
>  
> 

  parent reply	other threads:[~2022-10-27 18:11 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <CGME20221026004835uscas1p1d37255ba8daaba8fc7e16e5129cb94b5@uscas1p1.samsung.com>
2022-10-26  0:47 ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Gregory Price
2022-10-26  0:47   ` [PATCH 1/4] hw/i386/pc.c: CXL Fixed Memory Window should not reserve e820 in bios Gregory Price
2022-10-26 20:33     ` Michael S. Tsirkin
2022-10-26  0:47   ` [PATCH 2/4] hw/cxl: Add CXL_CAPACITY_MULTIPLIER definition Gregory Price
2022-11-14 17:55     ` Jonathan Cameron via
2022-10-26  0:47   ` [PATCH 3/4] hw/cxl: Multi-Region CXL Type-3 Devices (Volatile and Persistent) Gregory Price
2022-11-14 17:53     ` Jonathan Cameron via
2022-11-14 23:00       ` Gregory Price
2022-11-17 13:53         ` Jonathan Cameron via
2022-11-23 17:42       ` Gregory Price
2022-10-26  0:47   ` [PATCH 4/4] hw/acpi/cxl.c: Fill in SRAT for vmem/pmem if NUMA node is assigned Gregory Price
2022-10-26 20:13   ` [PATCH 0/4 v3] Multi-Region and Volatile Memory support for CXL Type-3 Devices Adam Manzanares
2022-10-26 20:47     ` Gregory Price
2022-10-27 10:58       ` Jonathan Cameron via
2022-10-27 14:29         ` Gregory Price
2022-10-27 18:10         ` Adam Manzanares [this message]
2022-10-26 20:15   ` Michael S. Tsirkin
2022-10-26 20:20   ` Michael S. Tsirkin
2022-10-26 20:48     ` Gregory Price

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221027181014.GA317461@bgt-140510-bm01 \
    --to=a.manzanares@samsung.com \
    --cc=Jonathan.Cameron@huawei.com \
    --cc=alison.schofield@intel.com \
    --cc=ani@anisinha.ca \
    --cc=bwidawsk@kernel.org \
    --cc=cbrowy@avery-design.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave@stgolabs.net \
    --cc=gourry.memverge@gmail.com \
    --cc=gregory.price@memverge.com \
    --cc=hchkuo@avery-design.com.tw \
    --cc=imammedo@redhat.com \
    --cc=ira.weiny@intel.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).