Re: [RFC 00/13] vfio: introduce vfio-cxl to support CXL type-2 accelerator passthrough

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Jonathan Cameron <Jonathan.Cameron@Huawei.com>
To: Zhi Wang <zhiw@nvidia.com>
Cc: "Tian, Kevin" <kevin.tian@intel.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-cxl@vger.kernel.org" <linux-cxl@vger.kernel.org>,
	"alex.williamson@redhat.com" <alex.williamson@redhat.com>,
	Jason Gunthorpe <jgg@nvidia.com>,
	"Schofield, Alison" <alison.schofield@intel.com>,
	"Williams, Dan J" <dan.j.williams@intel.com>,
	"Jiang, Dave" <dave.jiang@intel.com>,
	"dave@stgolabs.net" <dave@stgolabs.net>,
	"Weiny, Ira" <ira.weiny@intel.com>,
	"Verma, Vishal L" <vishal.l.verma@intel.com>,
	"alucerop@amd.com" <alucerop@amd.com>,
	Andy Currid <ACurrid@nvidia.com>, Neo Jia <cjia@nvidia.com>,
	Surath Mitra <smitra@nvidia.com>,
	"Ankit Agrawal" <ankita@nvidia.com>,
	Aniket Agashe <aniketa@nvidia.com>,
	"Kirti Wankhede" <kwankhede@nvidia.com>,
	"Tarun Gupta (SW-GPU)" <targupta@nvidia.com>,
	"zhiwang@kernel.org" <zhiwang@kernel.org>
Subject: Re: [RFC 00/13] vfio: introduce vfio-cxl to support CXL type-2 accelerator passthrough
Date: Wed, 25 Sep 2024 14:05:15 +0100	[thread overview]
Message-ID: <20240925140515.000077f5@Huawei.com> (raw)
In-Reply-To: <75c0c6f1-07e4-43c1-819c-2182bdd0b47c@nvidia.com>

On Tue, 24 Sep 2024 08:30:17 +0000
Zhi Wang <zhiw@nvidia.com> wrote:

> On 23/09/2024 11.00, Tian, Kevin wrote:
> > External email: Use caution opening links or attachments
> > 
> >   
> >> From: Zhi Wang <zhiw@nvidia.com>
> >> Sent: Saturday, September 21, 2024 6:35 AM
> >>  
> > [...]  
> >> - Create a CXL region and map it to the VM. A mapping between HPA and DPA
> >> (Device PA) needs to be created to access the device memory directly. HDM
> >> decoders in the CXL topology need to be configured level by level to
> >> manage the mapping. After the region is created, it needs to be mapped to
> >> GPA in the virtual HDM decoders configured by the VM.  
> > 
> > Any time when a new address space is introduced it's worthy of more
> > context to help people who have no CXL background better understand
> > the mechanism and think any potential hole.
> > 
> > At a glance looks we are talking about a mapping tier:
> > 
> >    GPA->HPA->DPA
> > 
> > The location/size of HPA/DPA for a cxl region are decided and mapped
> > at @open_device and the HPA range is mapped to GPA at @mmap.
> > 
> > In addition the guest also manages a virtual HDM decoder:
> > 
> >    GPA->vDPA
> > 
> > Ideally the vDPA range selected by guest is a subset of the physical
> > cxl region so based on offset and vHDM the VMM may figure out
> > which offset in the cxl region to be mmaped for the corresponding
> > GPA (which in the end maps to the desired DPA).
> > 
> > Is this understanding correct?
> >   
> 
> Yes. Many thanks to summarize this. It is a design decision from a 
> discussion in the CXL discord channel.
> 
> > btw is one cxl device only allowed to create one region? If multiple
> > regions are possible how will they be exposed to the guest?
> >  
> 
> It is not an (shouldn't be) enforced requirement from the VFIO cxl core. 
> It is really requirement-driven. I am expecting what kind of use cases 
> in reality that needs multiple CXL regions in the host and then passing 
> multiple regions to the guest.

Mix of back invalidate and non back invalidate supporting device memory
maybe?  A bounce region for p2p traffic would the obvious reason to do
this without paying the cost of large snoop filters. If anyone puts PMEM
on the device, then maybe mix of that at volatile. In theory you might
do separate regions for QoS reasons but seems unlikely to me...

Anyhow not an immediately problem as I don't know of any
BI capable hosts yet and doubt anyone (other than Dan) cares about PMEM :)


> 
> Presumably, the host creates one large CXL region that covers the entire 
> DPA, while QEMU can virtually partition it into different regions and 
> map them to different virtual CXL region if QEMU presents multiple HDM 
> decoders to the guest.

I'm not sure why it would do that. Can't think why you'd break up
a host region - maybe I'm missing something.

...

> >> In the L2 guest, a dummy CXL device driver is provided to attach to the
> >> virtual pass-thru device.
> >>
> >> The dummy CXL type-2 device driver can successfully be loaded with the
> >> kernel cxl core type2 support, create CXL region by requesting the CXL
> >> core to allocate HPA and DPA and configure the HDM decoders.  
> > 
> > It'd be good to see a real cxl device working to add confidence on
> > the core design.  
> 
> To leverage the opportunity of F2F discussion in LPC, I proposed this 
> patchset to start the discussion and meanwhile offered an environment 
> for people to try and hack around. Also patches is good base for 
> discussion. We see what we will get. :)
> 
> There are devices already there and on-going. AMD's SFC (patches are 
> under review) and I think they are going to be the first variant driver 
> that use the core. NVIDIA's device is also coming and NVIDIA's variant 
> driver is going upstream for sure. Plus this emulated device, I assume 
> we will have three in-tree variant drivers talks to the CXL core.
Nice.
> 
> Thanks,
> Zhi.

next prev parent reply	other threads:[~2024-09-25 13:05 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-09-20 22:34 [RFC 00/13] vfio: introduce vfio-cxl to support CXL type-2 accelerator passthrough Zhi Wang
2024-09-20 22:34 ` [RFC 01/13] cxl: allow a type-2 device not to have memory device registers Zhi Wang
2024-09-23  8:01   ` Tian, Kevin
2024-09-23 15:38   ` Dave Jiang
2024-09-24  8:03     ` Zhi Wang
2024-09-20 22:34 ` [RFC 02/13] cxl: introduce cxl_get_hdm_info() Zhi Wang
2024-10-17 15:44   ` Jonathan Cameron
2024-10-19  5:38     ` Zhi Wang
2024-09-20 22:34 ` [RFC 03/13] cxl: introduce cxl_find_comp_reglock_offset() Zhi Wang
2024-09-20 22:34 ` [RFC 04/13] vfio: introduce vfio-cxl core preludes Zhi Wang
2024-10-11 18:33   ` Alex Williamson
2024-09-20 22:34 ` [RFC 05/13] vfio/cxl: expose CXL region to the usersapce via a new VFIO device region Zhi Wang
2024-10-11 19:12   ` Alex Williamson
2024-09-20 22:34 ` [RFC 06/13] vfio/pci: expose vfio_pci_rw() Zhi Wang
2024-09-20 22:34 ` [RFC 07/13] vfio/cxl: introduce vfio_cxl_core_{read, write}() Zhi Wang
2024-09-20 22:34 ` [RFC 08/13] vfio/cxl: emulate HDM decoder registers Zhi Wang
2024-09-20 22:34 ` [RFC 09/13] vfio/pci: introduce CXL device awareness Zhi Wang
2024-10-11 20:37   ` Alex Williamson
2024-09-20 22:34 ` [RFC 10/13] vfio/pci: emulate CXL DVSEC registers in the configuration space Zhi Wang
2024-10-11 21:02   ` Alex Williamson
2024-09-20 22:34 ` [RFC 11/13] vfio/cxl: introduce VFIO CXL device cap Zhi Wang
2024-10-11 21:14   ` Alex Williamson
2024-09-20 22:34 ` [RFC 12/13] vfio/cxl: VFIO variant driver for QEMU CXL accel device Zhi Wang
2024-09-20 22:34 ` [RFC 13/13] vfio/cxl: workaround: don't take resource region when cxl is enabled Zhi Wang
2024-09-23  8:00 ` [RFC 00/13] vfio: introduce vfio-cxl to support CXL type-2 accelerator passthrough Tian, Kevin
2024-09-24  8:30   ` Zhi Wang
2024-09-25 13:05     ` Jonathan Cameron [this message]
2024-09-27  7:18       ` Zhi Wang
2024-10-04 11:40         ` Jonathan Cameron
2024-10-19  5:30           ` Zhi Wang
2024-10-21 11:07             ` Alejandro Lucero Palau
2024-09-26  6:55     ` Tian, Kevin
2024-09-25 10:11 ` Alejandro Lucero Palau
2024-09-27  7:38   ` Zhi Wang
2024-09-27  7:38   ` Zhi Wang
2024-10-21 10:49 ` Zhi Wang
2024-10-21 13:10   ` Alejandro Lucero Palau
2024-10-30 11:56 ` Zhi Wang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240925140515.000077f5@Huawei.com \
    --to=jonathan.cameron@huawei.com \
    --cc=ACurrid@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=alison.schofield@intel.com \
    --cc=alucerop@amd.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=dan.j.williams@intel.com \
    --cc=dave.jiang@intel.com \
    --cc=dave@stgolabs.net \
    --cc=ira.weiny@intel.com \
    --cc=jgg@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-cxl@vger.kernel.org \
    --cc=smitra@nvidia.com \
    --cc=targupta@nvidia.com \
    --cc=vishal.l.verma@intel.com \
    --cc=zhiw@nvidia.com \
    --cc=zhiwang@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox