Re: [PATCH v1 1/1] vfio/nvgrace-gpu: Convey kvm that the device is wc safe

Kernel KVM virtualization development
 help / color / mirror / Atom feed

From: Jason Gunthorpe <jgg@nvidia.com>
To: Alex Williamson <alex.williamson@redhat.com>
Cc: ankita@nvidia.com, yishaih@nvidia.com,
	shameerali.kolothum.thodi@huawei.com, kevin.tian@intel.com,
	aniketa@nvidia.com, cjia@nvidia.com, kwankhede@nvidia.com,
	targupta@nvidia.com, vsethi@nvidia.com, acurrid@nvidia.com,
	apopple@nvidia.com, jhubbard@nvidia.com, danw@nvidia.com,
	rrameshbabu@nvidia.com, zhiw@nvidia.com, anuaggarwal@nvidia.com,
	mochs@nvidia.com, kvm@vger.kernel.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH v1 1/1] vfio/nvgrace-gpu: Convey kvm that the device is wc safe
Date: Thu, 29 Feb 2024 12:06:00 -0400	[thread overview]
Message-ID: <20240229160600.GJ9179@nvidia.com> (raw)
In-Reply-To: <20240229085639.484b920c.alex.williamson@redhat.com>

On Thu, Feb 29, 2024 at 08:56:39AM -0700, Alex Williamson wrote:
> On Wed, 28 Feb 2024 19:48:01 +0000
> <ankita@nvidia.com> wrote:
> 
> > From: Ankit Agrawal <ankita@nvidia.com>
> > 
> > The NVIDIA Grace Hopper GPUs have device memory that is supposed to be
> > used as a regular RAM. It is accessible through CPU-GPU chip-to-chip
> > cache coherent interconnect and is present in the system physical
> > address space. The device memory is split into two regions - termed
> > as usemem and resmem - in the system physical address space,
> > with each region mapped and exposed to the VM as a separate fake
> > device BAR [1].
> > 
> > Owing to a hardware defect for Multi-Instance GPU (MIG) feature [2],
> > there is a requirement - as a workaround - for the resmem BAR to
> > display uncached memory characteristics. Based on [3], on system with
> > FWB enabled such as Grace Hopper, the requisite properties
> > (uncached, unaligned access) can be achieved through a VM mapping (S1)
> > of NORMAL_NC and host mapping (S2) of MT_S2_FWB_NORMAL_NC.
> > 
> > KVM currently maps the MMIO region in S2 as MT_S2_FWB_DEVICE_nGnRE by
> > default. The fake device BARs thus displays DEVICE_nGnRE behavior in the
> > VM.
> > 
> > The following table summarizes the behavior for the various S1 and S2
> > mapping combinations for systems with FWB enabled [3].
> > S1           |  S2           | Result
> > NORMAL_WB    |  NORMAL_NC    | NORMAL_NC
> > NORMAL_WT    |  NORMAL_NC    | NORMAL_NC
> > NORMAL_NC    |  NORMAL_NC    | NORMAL_NC
> > NORMAL_WB    |  DEVICE_nGnRE | DEVICE_nGnRE
> > NORMAL_WT    |  DEVICE_nGnRE | DEVICE_nGnRE
> > NORMAL_NC    |  DEVICE_nGnRE | DEVICE_nGnRE
> > 
> > Recently a change was added that modifies this default behavior and
> > make KVM map MMIO as MT_S2_FWB_NORMAL_NC when a VMA flag
> > VM_ALLOW_ANY_UNCACHED is set. Setting S2 as MT_S2_FWB_NORMAL_NC
> > provides the desired behavior (uncached, unaligned access) for resmem.
> > 
> > Such setting is extended to the usemem as a middle-of-the-road
> > setting to take it closer to the desired final system memory
> > characteristics (cached, unaligned). This will eventually be
> > fixed with the ongoing proposal [4].
> > 
> > To use VM_ALLOW_ANY_UNCACHED flag, the platform must guarantee that
> > no action taken on the MMIO mapping can trigger an uncontained
> > failure. The Grace Hopper satisfies this requirement. So set
> > the VM_ALLOW_ANY_UNCACHED flag in the VMA.
> > 
> > Applied over next-20240227.
> > base-commit: 22ba90670a51
> > 
> > Link: https://lore.kernel.org/all/20240220115055.23546-4-ankita@nvidia.com/ [1]
> > Link: https://www.nvidia.com/en-in/technologies/multi-instance-gpu/ [2]
> > Link: https://developer.arm.com/documentation/ddi0487/latest/ section D8.5.5 [3]
> > Link: https://lore.kernel.org/all/20230907181459.18145-2-ankita@nvidia.com/ [4]
> > 
> > Cc: Alex Williamson <alex.williamson@redhat.com>
> > Cc: Kevin Tian <kevin.tian@intel.com>
> > Cc: Jason Gunthorpe <jgg@nvidia.com>
> > Cc: Vikram Sethi <vsethi@nvidia.com>
> > Cc: Zhi Wang <zhiw@nvidia.com>
> > Signed-off-by: Ankit Agrawal <ankita@nvidia.com>
> > ---
> >  drivers/vfio/pci/nvgrace-gpu/main.c | 18 ++++++++++++++++++
> >  1 file changed, 18 insertions(+)
> > 
> > diff --git a/drivers/vfio/pci/nvgrace-gpu/main.c b/drivers/vfio/pci/nvgrace-gpu/main.c
> > index 25814006352d..5539c9057212 100644
> > --- a/drivers/vfio/pci/nvgrace-gpu/main.c
> > +++ b/drivers/vfio/pci/nvgrace-gpu/main.c
> > @@ -181,6 +181,24 @@ static int nvgrace_gpu_mmap(struct vfio_device *core_vdev,
> >  
> >  	vma->vm_pgoff = start_pfn;
> >  
> > +	/*
> > +	 * The VM_ALLOW_ANY_UNCACHED VMA flag is implemented for ARM64,
> > +	 * allowing KVM stage 2 device mapping attributes to use Normal-NC
> > +	 * rather than DEVICE_nGnRE, which allows guest mappings
> > +	 * supporting write-combining attributes (WC). This also
> > +	 * unlocks memory-like operations such as unaligned accesses.
> > +	 * This setting suits the fake BARs as they are expected to
> > +	 * demonstrate such properties within the guest.
> > +	 *
> > +	 * ARM does not architecturally guarantee this is safe, and indeed
> > +	 * some MMIO regions like the GICv2 VCPU interface can trigger
> > +	 * uncontained faults if Normal-NC is used. The nvgrace-gpu
> > +	 * however is safe in that the platform guarantees that no
> > +	 * action taken on the MMIO mapping can trigger an uncontained
> > +	 * failure. Hence VM_ALLOW_ANY_UNCACHED is set in the VMA flags.
> > +	 */
> > +	vm_flags_set(vma, VM_ALLOW_ANY_UNCACHED);
> > +
> >  	return 0;
> >  }
> >  
> 
> The commit log sort of covers it, but this comment doesn't seem to
> cover why we're setting an uncached attribute to the usemem region
> which we're specifically mapping as coherent... did we end up giving
> this flag a really poor name if it's being used here to allow unaligned
> access?  Thanks,

Yeah, I sugged to fold that hunk into this:

        if (index == RESMEM_REGION_INDEX)
                vma->vm_page_prot = pgprot_writecombine(vma->vm_page_prot);

So it makes more sense. VM_ALLOW_ANY_UNCACHED shouldn't be used on the
cachable mapping. The comment should be more specific to this driver
and not so generic:

/*
 * nvgrace has no issue with uncontained failures on NORMAL_NC
 * access. Tell KVM to open up guest usage of NORMAL_NC for this mapping.
 */

Jason

next prev parent reply	other threads:[~2024-02-29 16:06 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-02-28 19:48 [PATCH v1 1/1] vfio/nvgrace-gpu: Convey kvm that the device is wc safe ankita
2024-02-29 15:56 ` Alex Williamson
2024-02-29 16:06   ` Jason Gunthorpe [this message]
2024-02-29 16:29     ` Ankit Agrawal
2024-02-29 16:07   ` Alex Williamson
2024-02-29 16:31     ` Ankit Agrawal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240229160600.GJ9179@nvidia.com \
    --to=jgg@nvidia.com \
    --cc=acurrid@nvidia.com \
    --cc=alex.williamson@redhat.com \
    --cc=aniketa@nvidia.com \
    --cc=ankita@nvidia.com \
    --cc=anuaggarwal@nvidia.com \
    --cc=apopple@nvidia.com \
    --cc=cjia@nvidia.com \
    --cc=danw@nvidia.com \
    --cc=jhubbard@nvidia.com \
    --cc=kevin.tian@intel.com \
    --cc=kvm@vger.kernel.org \
    --cc=kwankhede@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mochs@nvidia.com \
    --cc=rrameshbabu@nvidia.com \
    --cc=shameerali.kolothum.thodi@huawei.com \
    --cc=targupta@nvidia.com \
    --cc=vsethi@nvidia.com \
    --cc=yishaih@nvidia.com \
    --cc=zhiw@nvidia.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox