From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([209.51.188.92]:57461) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1guaZL-0001Gg-Vr for qemu-devel@nongnu.org; Fri, 15 Feb 2019 05:18:11 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1guaZI-00042h-8I for qemu-devel@nongnu.org; Fri, 15 Feb 2019 05:18:07 -0500 Date: Fri, 15 Feb 2019 16:30:20 +1100 From: David Gibson Message-ID: <20190215053020.GM4573@umbus.fritz.box> References: <20190214052144.59541-1-aik@ozlabs.ru> <20190214052144.59541-5-aik@ozlabs.ru> <20190215032203.GE4573@umbus.fritz.box> MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="OGW1Z2JKiS9bXo17" Content-Disposition: inline In-Reply-To: Subject: Re: [Qemu-devel] [PATCH qemu v2 4/4] spapr: Support NVIDIA V100 GPU with NVLink2 List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Alexey Kardashevskiy Cc: qemu-devel@nongnu.org, qemu-ppc@nongnu.org, Alex Williamson , Reza Arbab , Piotr Jaroszynski , Jose Ricardo Ziviani , Daniel Henrique Barboza --OGW1Z2JKiS9bXo17 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline Content-Transfer-Encoding: quoted-printable On Fri, Feb 15, 2019 at 03:42:56PM +1100, Alexey Kardashevskiy wrote: >=20 >=20 > On 15/02/2019 14:22, David Gibson wrote: > > On Thu, Feb 14, 2019 at 04:21:44PM +1100, Alexey Kardashevskiy wrote: > >> NVIDIA V100 GPUs have on-board RAM which is mapped into the host memory > >> space and accessible as normal RAM via an NVLink bus. The VFIO-PCI dri= ver > >> implements special regions for such GPUs and emulates an NVLink bridge. > >> NVLink2-enabled POWER9 CPUs also provide address translation services > >> which includes an ATS shootdown (ATSD) register exported via the NVLink > >> bridge device. > >> > >> This adds a quirk to VFIO to map the GPU memory and create an MR; > >> the new MR is stored in a PCI device as a QOM link. The sPAPR PCI uses > >> this to get the MR and map it to the system address space. > >> Another quirk does the same for ATSD. > >> > >> This adds 4 additional steps to the FDT builder in spapr-pci: > >> > >> 1. Search for specific GPUs and NPUs and collect findings in > >> sPAPRPHBState::nvgpus; > >> > >> 2. Add properties in the device tree such as "ibm,npu", "ibm,gpu", > >> "memory-block" and others to advertise the NVLink2 function to the gue= st; > >> > >> 3. Add new memory blocks with an extra "linux,memory-usable" to prevent > >> the guest OS from accessing the new memory until it is online by the G= PU > >> driver in the guest; > >> > >> 4. Add a npuphb# node representing an NPU unit for every vPHB as > >> the GPU driver uses it to detect NPU2 hardware and discover links; this > >> is not backed by any QEMU device as it does need to. > >> > >> This allocates space for GPU RAM and ATSD like we do for MMIOs by > >> adding 2 new parameters to the phb_placement() hook. Older machine typ= es > >> set these to zero. > >> > >> This puts new memory nodes in a separate NUMA node to replicate the ho= st > >> system setup as the GPU driver relies on this. > >> > >> This adds requirement similar to EEH - one IOMMU group per vPHB. > >> The reason for this is that ATSD registers belong to a physical NPU > >> so they cannot invalidate translations on GPUs attached to another NPU. > >> It is guaranteed by the host platform as it does not mix NVLink bridges > >> or GPUs from different NPU in the same IOMMU group. If more than one > >> IOMMU group is detected on a vPHB, this disables ATSD support for that > >> vPHB and prints a warning. > >> > >> Signed-off-by: Alexey Kardashevskiy > >> --- > >> > >> The example command line for redbud system: > >> > >> pbuild/qemu-aiku1804le-ppc64/ppc64-softmmu/qemu-system-ppc64 \ > >> -nodefaults \ > >> -chardev stdio,id=3DSTDIO0,signal=3Doff,mux=3Don \ > >> -device spapr-vty,id=3Dsvty0,reg=3D0x71000110,chardev=3DSTDIO0 \ > >> -mon id=3DMON0,chardev=3DSTDIO0,mode=3Dreadline -nographic -vga none \ > >> -enable-kvm -m 384G \ > >> -chardev socket,id=3DSOCKET0,server,nowait,host=3Dlocalhost,port=3D400= 00 \ > >> -mon chardev=3DSOCKET0,mode=3Dcontrol \ > >> -smp 80,sockets=3D1,threads=3D4 \ > >> -netdev "tap,id=3DTAP0,helper=3D/home/aik/qemu-bridge-helper --br=3Dbr= 0" \ > >> -device "virtio-net-pci,id=3Dvnet0,mac=3D52:54:00:12:34:56,netdev=3DTA= P0" \ > >> img/vdisk0.img \ > >> -device "vfio-pci,id=3Dvfio0004_04_00_0,host=3D0004:04:00.0" \ > >> -device "vfio-pci,id=3Dvfio0006_00_00_0,host=3D0006:00:00.0" \ > >> -device "vfio-pci,id=3Dvfio0006_00_00_1,host=3D0006:00:00.1" \ > >> -device "vfio-pci,id=3Dvfio0006_00_00_2,host=3D0006:00:00.2" \ > >> -device "vfio-pci,id=3Dvfio0004_05_00_0,host=3D0004:05:00.0" \ > >> -device "vfio-pci,id=3Dvfio0006_00_01_0,host=3D0006:00:01.0" \ > >> -device "vfio-pci,id=3Dvfio0006_00_01_1,host=3D0006:00:01.1" \ > >> -device "vfio-pci,id=3Dvfio0006_00_01_2,host=3D0006:00:01.2" \ > >> -device spapr-pci-host-bridge,id=3Dphb1,index=3D1 \ > >> -device "vfio-pci,id=3Dvfio0035_03_00_0,host=3D0035:03:00.0" \ > >> -device "vfio-pci,id=3Dvfio0007_00_00_0,host=3D0007:00:00.0" \ > >> -device "vfio-pci,id=3Dvfio0007_00_00_1,host=3D0007:00:00.1" \ > >> -device "vfio-pci,id=3Dvfio0007_00_00_2,host=3D0007:00:00.2" \ > >> -device "vfio-pci,id=3Dvfio0035_04_00_0,host=3D0035:04:00.0" \ > >> -device "vfio-pci,id=3Dvfio0007_00_01_0,host=3D0007:00:01.0" \ > >> -device "vfio-pci,id=3Dvfio0007_00_01_1,host=3D0007:00:01.1" \ > >> -device "vfio-pci,id=3Dvfio0007_00_01_2,host=3D0007:00:01.2" -snapshot= \ > >> -machine pseries \ > >> -L /home/aik/t/qemu-ppc64-bios/ -d guest_errors > >> > >> Note that QEMU attaches PCI devices to the last added vPHB so first > >> 8 devices - 4:04:00.0 till 6:00:01.2 - go to the default vPHB, and > >> 35:03:00.0..7:00:01.2 to the vPHB with id=3Dphb1. > >> --- > >> hw/vfio/pci.h | 2 + > >> include/hw/pci-host/spapr.h | 9 + > >> include/hw/ppc/spapr.h | 3 +- > >> hw/ppc/spapr.c | 25 ++- > >> hw/ppc/spapr_pci.c | 333 +++++++++++++++++++++++++++++++++++- > >> hw/vfio/pci-quirks.c | 120 +++++++++++++ > >> hw/vfio/pci.c | 14 ++ > >> hw/vfio/trace-events | 4 + > >> 8 files changed, 506 insertions(+), 4 deletions(-) > >> > >> diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h > >> index b1ae4c0..706c304 100644 > >> --- a/hw/vfio/pci.h > >> +++ b/hw/vfio/pci.h > >> @@ -194,6 +194,8 @@ int vfio_populate_vga(VFIOPCIDevice *vdev, Error *= *errp); > >> int vfio_pci_igd_opregion_init(VFIOPCIDevice *vdev, > >> struct vfio_region_info *info, > >> Error **errp); > >> +int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, Error **errp); > >> +int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error **errp); > >> =20 > >> void vfio_display_reset(VFIOPCIDevice *vdev); > >> int vfio_display_probe(VFIOPCIDevice *vdev, Error **errp); > >> diff --git a/include/hw/pci-host/spapr.h b/include/hw/pci-host/spapr.h > >> index 51d81c4..4a665cb 100644 > >> --- a/include/hw/pci-host/spapr.h > >> +++ b/include/hw/pci-host/spapr.h > >> @@ -87,6 +87,9 @@ struct sPAPRPHBState { > >> uint32_t mig_liobn; > >> hwaddr mig_mem_win_addr, mig_mem_win_size; > >> hwaddr mig_io_win_addr, mig_io_win_size; > >> + hwaddr nv2_gpa_win_addr; > >> + hwaddr nv2_atsd_win_addr; > >> + struct spapr_phb_pci_nvgpu_config *nvgpus; > >> }; > >> =20 > >> #define SPAPR_PCI_MEM_WIN_BUS_OFFSET 0x80000000ULL > >> @@ -105,6 +108,12 @@ struct sPAPRPHBState { > >> =20 > >> #define SPAPR_PCI_MSI_WINDOW 0x40000000000ULL > >> =20 > >> +#define SPAPR_PCI_NV2RAM64_WIN_BASE 0x10000000000ULL /* 1 TiB */ > >=20 > > This is not a good choice - we have POWER machines with well over 1 > > TiB RAM, and so it's also plausible we can get guests with over 1 TiB > > RAM which will collide with this. Especially likely with the sort of > > giant HPC guests that are most likely to have the GPUs passed through > > to them. >=20 > >=20 > > This is why we already moved the main PCI MMIO windows up to 32 TiB > > from the lower address they used to have. >=20 >=20 > How much space do you want to leave for the guest RAM ideally? Or it is > that 32TiB and I better move GPU RAM above that? Yeah, I wouldn't reduce it below the 32TiB limit we have already because of the PCI MMIO spaces. I'd put the NVLink RAM up higher. > >> + > >> +#define SPAPR_PCI_NV2ATSD_WIN_BASE SPAPR_PCI_LIMIT > >> +#define SPAPR_PCI_NV2ATSD_WIN_SIZE (16 * 0x10000) > >> + > >> static inline qemu_irq spapr_phb_lsi_qirq(struct sPAPRPHBState *phb, = int pin) > >> { > >> sPAPRMachineState *spapr =3D SPAPR_MACHINE(qdev_get_machine()); > >> diff --git a/include/hw/ppc/spapr.h b/include/hw/ppc/spapr.h > >> index cbd276e..d9edf23 100644 > >> --- a/include/hw/ppc/spapr.h > >> +++ b/include/hw/ppc/spapr.h > >> @@ -112,7 +112,8 @@ struct sPAPRMachineClass { > >> void (*phb_placement)(sPAPRMachineState *spapr, uint32_t index, > >> uint64_t *buid, hwaddr *pio,=20 > >> hwaddr *mmio32, hwaddr *mmio64, > >> - unsigned n_dma, uint32_t *liobns, Error **e= rrp); > >> + unsigned n_dma, uint32_t *liobns, hwaddr *n= v2gpa, > >> + hwaddr *nv2atsd, Error **errp); > >> sPAPRResizeHPT resize_hpt_default; > >> sPAPRCapabilities default_caps; > >> sPAPRIrq *irq; > >> diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c > >> index b967024..d18834c 100644 > >> --- a/hw/ppc/spapr.c > >> +++ b/hw/ppc/spapr.c > >> @@ -3844,7 +3844,8 @@ static const CPUArchIdList *spapr_possible_cpu_a= rch_ids(MachineState *machine) > >> static void spapr_phb_placement(sPAPRMachineState *spapr, uint32_t in= dex, > >> uint64_t *buid, hwaddr *pio, > >> hwaddr *mmio32, hwaddr *mmio64, > >> - unsigned n_dma, uint32_t *liobns, Err= or **errp) > >> + unsigned n_dma, uint32_t *liobns, > >> + hwaddr *nv2gpa, hwaddr *nv2atsd, Erro= r **errp) > >> { > >> /* > >> * New-style PHB window placement. > >> @@ -3889,6 +3890,9 @@ static void spapr_phb_placement(sPAPRMachineStat= e *spapr, uint32_t index, > >> *pio =3D SPAPR_PCI_BASE + index * SPAPR_PCI_IO_WIN_SIZE; > >> *mmio32 =3D SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM32_WIN_SI= ZE; > >> *mmio64 =3D SPAPR_PCI_BASE + (index + 1) * SPAPR_PCI_MEM64_WIN_SI= ZE; > >> + > >> + *nv2gpa =3D SPAPR_PCI_NV2RAM64_WIN_BASE + index * SPAPR_PCI_NV2RA= M64_WIN_SIZE; > >> + *nv2atsd =3D SPAPR_PCI_NV2ATSD_WIN_BASE + index * SPAPR_PCI_NV2AT= SD_WIN_SIZE; > >> } > >> =20 > >> static ICSState *spapr_ics_get(XICSFabric *dev, int irq) > >> @@ -4090,6 +4094,18 @@ DEFINE_SPAPR_MACHINE(4_0, "4.0", true); > >> /* > >> * pseries-3.1 > >> */ > >> +static void phb_placement_3_1(sPAPRMachineState *spapr, uint32_t inde= x, > >> + uint64_t *buid, hwaddr *pio, > >> + hwaddr *mmio32, hwaddr *mmio64, > >> + unsigned n_dma, uint32_t *liobns, > >> + hwaddr *nv2gpa, hwaddr *nv2atsd, Error = **errp) > >> +{ > >> + spapr_phb_placement(spapr, index, buid, pio, mmio32, mmio64, n_dm= a, liobns, > >> + nv2gpa, nv2atsd, errp); > >> + *nv2gpa =3D 0; > >> + *nv2atsd =3D 0; > >> +} > >> + > >> static void spapr_machine_3_1_class_options(MachineClass *mc) > >> { > >> sPAPRMachineClass *smc =3D SPAPR_MACHINE_CLASS(mc); > >> @@ -4098,6 +4114,7 @@ static void spapr_machine_3_1_class_options(Mach= ineClass *mc) > >> compat_props_add(mc->compat_props, hw_compat_3_1, hw_compat_3_1_l= en); > >> mc->default_cpu_type =3D POWERPC_CPU_TYPE_NAME("power8_v2.0"); > >> smc->update_dt_enabled =3D false; > >> + smc->phb_placement =3D phb_placement_3_1; > >> } > >> =20 > >> DEFINE_SPAPR_MACHINE(3_1, "3.1", false); > >> @@ -4229,7 +4246,8 @@ DEFINE_SPAPR_MACHINE(2_8, "2.8", false); > >> static void phb_placement_2_7(sPAPRMachineState *spapr, uint32_t inde= x, > >> uint64_t *buid, hwaddr *pio, > >> hwaddr *mmio32, hwaddr *mmio64, > >> - unsigned n_dma, uint32_t *liobns, Error= **errp) > >> + unsigned n_dma, uint32_t *liobns, > >> + hwaddr *nv2gpa, hwaddr *nv2atsd, Error = **errp) > >> { > >> /* Legacy PHB placement for pseries-2.7 and earlier machine types= */ > >> const uint64_t base_buid =3D 0x800000020000000ULL; > >> @@ -4273,6 +4291,9 @@ static void phb_placement_2_7(sPAPRMachineState = *spapr, uint32_t index, > >> * fallback behaviour of automatically splitting a large "32-bit" > >> * window into contiguous 32-bit and 64-bit windows > >> */ > >> + > >> + *nv2gpa =3D 0; > >> + *nv2atsd =3D 0; > >> } > >> =20 > >> static void spapr_machine_2_7_class_options(MachineClass *mc) > >> diff --git a/hw/ppc/spapr_pci.c b/hw/ppc/spapr_pci.c > >> index c3fb0ac..04a679e 100644 > >> --- a/hw/ppc/spapr_pci.c > >> +++ b/hw/ppc/spapr_pci.c > >> @@ -50,6 +50,46 @@ > >> #include "sysemu/hostmem.h" > >> #include "sysemu/numa.h" > >> =20 > >> +#define PHANDLE_PCIDEV(phb, pdev) (0x12000000 | \ > >> + (((phb)->index) << 16) | ((pdev)= ->devfn)) > >> +#define PHANDLE_GPURAM(phb, n) (0x110000FF | ((n) << 8) | \ > >> + (((phb)->index) << 16)) > >> +/* NVLink2 wants a separate NUMA node for its RAM */ > >> +#define GPURAM_ASSOCIATIVITY(phb, n) (255 - ((phb)->index * 3 + (n))) > >=20 > > IIUC this gives you non zero-based indices for the associtivity > > property. That's going to make a real mess of > > ibm,max-associativity-domains. >=20 >=20 > ? It does not on the host which I mimic here. And a bunch of CUDA tests > actually rely on node numbers (which is not a huge deal but it will be > annoying for the users). Does the host even use ibm,max-associativity-domains? I'm pretty sure at least parts of the PAPR code assume that nodes (at a given level of the heirarchy) will be encoded as 0..(max) as given in max-associativity domains. > >> +#define PHANDLE_NVLINK(phb, gn, nn) (0x00130000 | (((phb)->index) <<= 8) | \ > >> + ((gn) << 4) | (nn)) > >> + > >> +/* Max number of these GPUs per a physical box */ > >> +#define NVGPU_MAX_NUM 6 > >> + > >> +/* Max number of NVLinks per GPU in any physical box */ > >> +#define NVGPU_MAX_LINKS 3 > >> + > >> +/* > >> + * One NVLink bridge provides one ATSD register so it should be 18. > >> + * In practice though since we allow only one group per vPHB > >=20 > > Uh.. we do? Do you mean iommu group here? We used to enforce one per > > vPHB, but that's no longer the case. Or have you reintroduced that > > restriction? >=20 >=20 > Sort of. This is for ATSD (invalidation registers), these are advertised > to the guest via a PHB's property: > /sys/firmware/devicetree/base/pciex@6230200000000/ibm\,mmio-atsd >=20 > where /sys/firmware/devicetree/base/pciex@6230200000000 is a node for > that skiboot emulated "IBM Device 04ea" bridge which represents nvlinks. >=20 > We have 2 of these bridges per a machine. ATSD registers from one NPU > cannot work on another NPU. Hence the limitation. >=20 > I mean it is still possible to put GPUs in whatever combination but if > this happens, ATSD will be disabled (i.e. not advertised). Which is not > extremely useful though. >=20 >=20 >=20 > >> which equals > >> + * to an NPU2 which has maximum 6 NVLink bridges. > >> + */ > >> +#define NVGPU_MAX_ATSD 6 > >> + > >> +struct spapr_phb_pci_nvgpu_config { > >> + uint64_t nv2_ram_current; > >> + uint64_t nv2_atsd_current; > >> + int num; /* number of valid entries in gpus[] */ > >> + int gpunum; /* number of entries in gpus[] with gpdev!=3DNULL */ > >> + struct { > >> + int links; > >> + uint64_t tgt; > >> + uint64_t gpa; > >> + PCIDevice *gpdev; > >> + uint64_t atsd_gpa[NVGPU_MAX_LINKS]; > >> + PCIDevice *npdev[NVGPU_MAX_LINKS]; > >> + uint32_t link_speed[NVGPU_MAX_LINKS]; > >> + } gpus[NVGPU_MAX_NUM]; > >> + uint64_t atsd_prop[NVGPU_MAX_ATSD]; /* Big Endian for DT */ > >> + int atsd_num; > >> +}; > >> + > >> /* Copied from the kernel arch/powerpc/platforms/pseries/msi.c */ > >> #define RTAS_QUERY_FN 0 > >> #define RTAS_CHANGE_FN 1 > >> @@ -1255,6 +1295,7 @@ static uint32_t spapr_phb_get_pci_drc_index(sPAP= RPHBState *phb, > >> static void spapr_populate_pci_child_dt(PCIDevice *dev, void *fdt, in= t offset, > >> sPAPRPHBState *sphb) > >> { > >> + int i, j; > >> ResourceProps rp; > >> bool is_bridge =3D false; > >> int pci_status; > >> @@ -1355,6 +1396,60 @@ static void spapr_populate_pci_child_dt(PCIDevi= ce *dev, void *fdt, int offset, > >> if (sphb->pcie_ecs && pci_is_express(dev)) { > >> _FDT(fdt_setprop_cell(fdt, offset, "ibm,pci-config-space-type= ", 0x1)); > >> } > >> + > >> + if (sphb->nvgpus) { > >> + for (i =3D 0; i < sphb->nvgpus->num; ++i) { > >=20 > > So, if nvgpus->num is the number of valid entries in the table.. > >=20 > >> + PCIDevice *gpdev =3D sphb->nvgpus->gpus[i].gpdev; > >> + > >> + if (!gpdev) { > >=20 > > ..but you can also have gaps, couldn't you miss some if the table is sp= arse? >=20 >=20 > Well, nvgpus->num is the number of partially populated entries. Some > might have only GPUs, some only NPUs - whatever a creative user passed, > I just advertise it all to the guest. Oh, ok. So the gpus array only contains some of the relevant devices, but the number space is global? That is, a given "slot" (value of i, here) can contain either a GPU, an NPU or both? > Perhaps some sanity checking won't hurt but I cannot do much more here > than just making sure that we only properly advertise NPU bridges which > have a GPU attached as any other config is quite valid (like a GPU with > no nvlink or with just one nvlink). >=20 >=20 > >=20 > >> + continue; > >> + } > >> + if (dev =3D=3D gpdev) { > >> + uint32_t npus[sphb->nvgpus->gpus[i].links]; > >> + > >> + for (j =3D 0; j < sphb->nvgpus->gpus[i].links; ++j) { > >> + PCIDevice *npdev =3D sphb->nvgpus->gpus[i].npdev[= j]; > >> + > >> + npus[j] =3D cpu_to_be32(PHANDLE_PCIDEV(sphb, npde= v)); > >> + } > >> + _FDT(fdt_setprop(fdt, offset, "ibm,npu", npus, > >> + j * sizeof(npus[0]))); > >> + _FDT((fdt_setprop_cell(fdt, offset, "phandle", > >> + PHANDLE_PCIDEV(sphb, dev)))); > >> + continue; > >> + } > >> + for (j =3D 0; j < sphb->nvgpus->gpus[i].links; ++j) { > >> + if (dev !=3D sphb->nvgpus->gpus[i].npdev[j]) { > >> + continue; > >> + } > >> + > >> + _FDT((fdt_setprop_cell(fdt, offset, "phandle", > >> + PHANDLE_PCIDEV(sphb, dev)))); > >> + > >> + _FDT(fdt_setprop_cell(fdt, offset, "ibm,gpu", > >> + PHANDLE_PCIDEV(sphb, gpdev))); > >> + > >> + _FDT((fdt_setprop_cell(fdt, offset, "ibm,nvlink", > >> + PHANDLE_NVLINK(sphb, i, j)))); > >> + > >> + /* > >> + * If we ever want to emulate GPU RAM at the same loc= ation as on > >> + * the host - here is the encoding GPA->TGT: > >> + * > >> + * gta =3D ((sphb->nv2_gpa >> 42) & 0x1) << 42; > >> + * gta |=3D ((sphb->nv2_gpa >> 45) & 0x3) << 43; > >> + * gta |=3D ((sphb->nv2_gpa >> 49) & 0x3) << 45; > >> + * gta |=3D sphb->nv2_gpa & ((1UL << 43) - 1); > >> + */ > >> + _FDT(fdt_setprop_cell(fdt, offset, "memory-region", > >> + PHANDLE_GPURAM(sphb, i))); > >> + _FDT(fdt_setprop_u64(fdt, offset, "ibm,device-tgt-add= r", > >> + sphb->nvgpus->gpus[i].tgt)); > >> + _FDT(fdt_setprop_cell(fdt, offset, "ibm,nvlink-speed", > >> + sphb->nvgpus->gpus[i].link_spee= d[j])); > >> + } > >> + } > >> + } > >> } > >> =20 > >> /* create OF node for pci device and required OF DT properties */ > >> @@ -1596,7 +1691,9 @@ static void spapr_phb_realize(DeviceState *dev, = Error **errp) > >> smc->phb_placement(spapr, sphb->index, > >> &sphb->buid, &sphb->io_win_addr, > >> &sphb->mem_win_addr, &sphb->mem64_win_addr, > >> - windows_supported, sphb->dma_liobn, &local= _err); > >> + windows_supported, sphb->dma_liobn, > >> + &sphb->nv2_gpa_win_addr, > >> + &sphb->nv2_atsd_win_addr, &local_err); > >> if (local_err) { > >> error_propagate(errp, local_err); > >> return; > >> @@ -1843,6 +1940,8 @@ static Property spapr_phb_properties[] =3D { > >> pre_2_8_migration, false), > >> DEFINE_PROP_BOOL("pcie-extended-configuration-space", sPAPRPHBSta= te, > >> pcie_ecs, true), > >> + DEFINE_PROP_UINT64("gpa", sPAPRPHBState, nv2_gpa_win_addr, 0), > >> + DEFINE_PROP_UINT64("atsd", sPAPRPHBState, nv2_atsd_win_addr, 0), > >=20 > > Now that index is mandatory it's probably not essential to have these > > properties, although I guess they don't hurt. > >=20 > >> DEFINE_PROP_END_OF_LIST(), > >> }; > >> =20 > >> @@ -2069,6 +2168,80 @@ static void spapr_phb_pci_enumerate(sPAPRPHBSta= te *phb) > >> =20 > >> } > >> =20 > >> +static void spapr_phb_pci_find_nvgpu(PCIBus *bus, PCIDevice *pdev, vo= id *opaque) > >> +{ > >> + struct spapr_phb_pci_nvgpu_config *nvgpus =3D opaque; > >> + PCIBus *sec_bus; > >> + Object *mr_gpu, *mr_npu; > >> + uint64_t tgt =3D 0, gpa, atsd =3D 0; > >> + int i; > >> + > >> + mr_gpu =3D object_property_get_link(OBJECT(pdev), "nvlink2-mr[0]"= , NULL); > >> + mr_npu =3D object_property_get_link(OBJECT(pdev), "nvlink2-atsd-m= r[0]", NULL); > >> + if (mr_gpu) { > >> + gpa =3D nvgpus->nv2_ram_current; > >> + nvgpus->nv2_ram_current +=3D memory_region_size(MEMORY_REGION= (mr_gpu)); > >=20 > > Hrm. Changing persistent state in a function called "find" makes me > > nervous. >=20 >=20 > "Gather"? "Collect"? "Scan"? Collect or scan would both be fine. >=20 >=20 > >> + } else if (mr_npu) { > >> + if (nvgpus->atsd_num =3D=3D ARRAY_SIZE(nvgpus->atsd_prop)) { > >> + warn_report("Found too many ATSD registers per vPHB"); > >> + } else { > >> + atsd =3D nvgpus->nv2_atsd_current; > >> + nvgpus->atsd_prop[nvgpus->atsd_num] =3D cpu_to_be64(atsd); > >> + ++nvgpus->atsd_num; > >> + nvgpus->nv2_atsd_current +=3D > >> + memory_region_size(MEMORY_REGION(mr_npu)); > >> + } > >> + } > >> + > >> + tgt =3D object_property_get_uint(OBJECT(pdev), "tgt", NULL); > >=20 > > It seems like you ought to have an object_dynamic_cast() up the top of > > here to see if is a device you care about. I guess you'd get away > > with it in practice because other devices won't have the properties > > you test for, but I'd prefer not to rely on that (especially since > > "tgt" is such a brief, generic name, which come to think of it might > > be a good idea to change anyway). > > This would probably also be clearer if you have the recursing over > > bridges in one function and the handling of a single specific device > > you're interested in delegated to a another function. > >=20 > >> + if (tgt) { > >> + for (i =3D 0; i < nvgpus->num; ++i) { > >> + if (nvgpus->gpus[i].tgt =3D=3D tgt) { > >> + break; > >> + } > >> + } > >> + > >> + if (i =3D=3D nvgpus->num) { > >> + if (nvgpus->num =3D=3D ARRAY_SIZE(nvgpus->gpus)) { > >> + warn_report("Found too many NVLink bridges per GPU"); > >> + return; > >> + } > >> + ++nvgpus->num; > >> + } > >> + > >> + nvgpus->gpus[i].tgt =3D tgt; > >> + if (mr_gpu) { > >> + g_assert(!nvgpus->gpus[i].gpdev); > >> + nvgpus->gpus[i].gpdev =3D pdev; > >> + nvgpus->gpus[i].gpa =3D gpa; > >> + ++nvgpus->gpunum; > >> + } else { > >> + int j =3D nvgpus->gpus[i].links; > >> + > >> + ++nvgpus->gpus[i].links; > >> + > >> + g_assert(!nvgpus->gpus[i].npdev[j]); > >> + nvgpus->gpus[i].npdev[j] =3D pdev; > >> + nvgpus->gpus[i].atsd_gpa[j] =3D atsd; > >> + nvgpus->gpus[i].link_speed[j] =3D > >> + object_property_get_uint(OBJECT(pdev), "link_spee= d", NULL); > >> + } > >> + } > >> + > >> + if ((pci_default_read_config(pdev, PCI_HEADER_TYPE, 1) !=3D > >> + PCI_HEADER_TYPE_BRIDGE)) { > >> + return; > >> + } > >> + > >> + sec_bus =3D pci_bridge_get_sec_bus(PCI_BRIDGE(pdev)); > >> + if (!sec_bus) { > >> + return; > >> + } > >> + > >> + pci_for_each_device(sec_bus, pci_bus_num(sec_bus), > >> + spapr_phb_pci_find_nvgpu, opaque); > >> +} > >> + > >> int spapr_populate_pci_dt(sPAPRPHBState *phb, uint32_t intc_phandle, = void *fdt, > >> uint32_t nr_msis) > >> { > >> @@ -2187,6 +2360,69 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, u= int32_t intc_phandle, void *fdt, > >> spapr_phb_pci_enumerate(phb); > >> _FDT(fdt_setprop_cell(fdt, bus_off, "qemu,phb-enumerated", 0x1)); > >> =20 > >> + /* If there are existing NVLink2 MRs, unmap those before recreati= ng */ > >=20 > > Unmapping regions shouldn't be in a function that's about populating > > the dt, it probably wants to be part of the reset path. > >=20 > >> + if (phb->nvgpus) { > >> + for (i =3D 0; i < phb->nvgpus->num; ++i) { > >> + PCIDevice *gpdev =3D phb->nvgpus->gpus[i].gpdev; > >> + Object *nv_mrobj =3D object_property_get_link(OBJECT(gpde= v), > >> + "nvlink2-mr[0= ]", NULL); > >> + > >> + if (nv_mrobj) { > >> + memory_region_del_subregion(get_system_memory(), > >> + MEMORY_REGION(nv_mrobj)); > >> + } > >> + for (j =3D 0; j < phb->nvgpus->gpus[i].links; ++j) { > >> + PCIDevice *npdev =3D phb->nvgpus->gpus[i].npdev[j]; > >> + Object *atsd_mrobj; > >> + atsd_mrobj =3D object_property_get_link(OBJECT(npdev), > >> + "nvlink2-ats= d-mr[0]", > >> + NULL); > >> + if (atsd_mrobj) { > >> + memory_region_del_subregion(get_system_memory(), > >> + MEMORY_REGION(atsd_mr= obj)); > >> + } > >> + } > >> + } > >> + g_free(phb->nvgpus); > >> + phb->nvgpus =3D NULL; > >> + } > >> + > >> + /* Search for GPUs and NPUs */ > >> + if (phb->nv2_gpa_win_addr && phb->nv2_atsd_win_addr) { > >> + phb->nvgpus =3D g_new0(struct spapr_phb_pci_nvgpu_config, 1); > >=20 > > Likewise creating this internal state shouldn't be in a dt creation > > function. > >=20 > >> + phb->nvgpus->nv2_ram_current =3D phb->nv2_gpa_win_addr; > >> + phb->nvgpus->nv2_atsd_current =3D phb->nv2_atsd_win_addr; > >> + > >> + pci_for_each_device(bus, pci_bus_num(bus), > >> + spapr_phb_pci_find_nvgpu, phb->nvgpus); > >> + > >> + if (!phb->nvgpus->gpunum) { > >> + /* We did not find any interesting GPU */ > >> + g_free(phb->nvgpus); > >> + phb->nvgpus =3D NULL; > >> + } else { > >> + /* > >> + * ibm,mmio-atsd contains ATSD registers; these belong to= an NPU PHB > >> + * which we do not emulate as a separate device. Instead = we put > >> + * ibm,mmio-atsd to the vPHB with GPU and make sure that = we do not > >> + * put GPUs from different IOMMU groups to the same vPHB = to ensure > >> + * that the guest will use ATSDs from the corresponding N= PU. > >> + */ > >> + if (!spapr_phb_eeh_available(phb)) { > >=20 > > That's making assumptions about what spapr_phb_eeh_available() tests > > which might not always be warranted. >=20 >=20 > Correct. I could globally rename these helpers to "is_single_iommugroup" > to test if there is just a single IOMMU group per vPHB unless there is a > better idea may be? So what's the actual barrier to supporting multiple IOMMU groups here? > >> + warn_report("ATSD requires a separate vPHB per GPU IO= MMU group"); > >> + } else if (!phb->nvgpus->atsd_num) { > >> + warn_report("No ATSD registers found"); > >> + } else if (phb->nvgpus->atsd_num > 8) { > >> + warn_report("Bogus ATSD configuration is found"); > >> + } else { > >> + _FDT((fdt_setprop(fdt, bus_off, "ibm,mmio-atsd", > >> + phb->nvgpus->atsd_prop, > >> + phb->nvgpus->atsd_num * > >> + sizeof(phb->nvgpus->atsd_prop[0])))= ); > >> + } > >> + } > >> + } > >> + > >> /* Populate tree nodes with PCI devices attached */ > >> s_fdt.fdt =3D fdt; > >> s_fdt.node_off =3D bus_off; > >> @@ -2201,6 +2437,101 @@ int spapr_populate_pci_dt(sPAPRPHBState *phb, = uint32_t intc_phandle, void *fdt, > >> return ret; > >> } > >> =20 > >> + if (phb->nvgpus) { > >> + /* Add memory nodes for GPU RAM and mark them unusable */ > >> + for (i =3D 0; i < phb->nvgpus->num; ++i) { > >> + PCIDevice *gpdev =3D phb->nvgpus->gpus[i].gpdev; > >> + Object *nv_mrobj =3D object_property_get_link(OBJECT(gpde= v), > >> + "nvlink2-mr[0= ]", NULL); > >> + char *mem_name; > >> + int off; > >> + uint32_t at =3D cpu_to_be32(GPURAM_ASSOCIATIVITY(phb, i)); > >> + uint32_t associativity[] =3D { cpu_to_be32(0x4), at, at, = at, at }; > >=20 > > Putting at at every level of the associativity property seems bogus. > >=20 > > I mean it's kind of bogus we have 4 or 5 levels of associativity > > anyway (I think that was just copied from a PowerVM setup). At > > present level 4 is the qemu node, and the ones above it are always 0. > >=20 > > Possibly it would make sense for one of the higher levels to indicate > > system vs. NVLink RAM (0 or 1), then a single value for which NVLink > > node at level 4. >=20 >=20 > The only reason I did this in this way was the existing powernv config. > Perhaps the nvidia driver can cope with what you proposed, I'll give it > a try. Yeah, raw copying a powernv associativity structure to PAPR doesn't really make sense - it has to fit in with the associativity node structure we're already advertising. >=20 >=20 > >=20 > >> + uint64_t nv2_size =3D object_property_get_uint(nv_mrobj, > >> + "size", NULL= ); > >> + uint64_t mem_reg_property[2] =3D { > >=20 > > If there's a newline here.. > >=20 > >> + cpu_to_be64(phb->nvgpus->gpus[i].gpa), cpu_to_be64(nv= 2_size) }; > >=20 > > ..there should be one here ............................................= =2E......^ > >=20 > >> + > >> + mem_name =3D g_strdup_printf("memory@%lx", phb->nvgpus->g= pus[i].gpa); > >> + off =3D fdt_add_subnode(fdt, 0, mem_name); > >> + _FDT(off); > >> + _FDT((fdt_setprop_string(fdt, off, "device_type", "memory= "))); > >> + _FDT((fdt_setprop(fdt, off, "reg", mem_reg_property, > >> + sizeof(mem_reg_property)))); > >> + _FDT((fdt_setprop(fdt, off, "ibm,associativity", associat= ivity, > >> + sizeof(associativity)))); > >> + > >> + _FDT((fdt_setprop_string(fdt, off, "compatible", > >> + "ibm,coherent-device-memory"))); > >> + mem_reg_property[1] =3D 0; > >> + _FDT((fdt_setprop(fdt, off, "linux,usable-memory", mem_re= g_property, > >> + sizeof(mem_reg_property)))); > >> + _FDT((fdt_setprop_cell(fdt, off, "phandle", > >> + PHANDLE_GPURAM(phb, i)))); > >> + > >> + g_free(mem_name); > >> + } > >> + > >> + /* Add NPUs with links underneath */ > >> + if (phb->nvgpus->num) { > >> + char *npuname =3D g_strdup_printf("npuphb%d", phb->index); > >> + int npuoff =3D fdt_add_subnode(fdt, 0, npuname); > >> + int linkidx =3D 0; > >> + > >> + _FDT(npuoff); > >> + _FDT(fdt_setprop_cell(fdt, npuoff, "#address-cells", 1)); > >> + _FDT(fdt_setprop_cell(fdt, npuoff, "#size-cells", 0)); > >> + /* Advertise NPU as POWER9 so the guest can enable NPU2 c= ontexts */ > >> + _FDT((fdt_setprop_string(fdt, npuoff, "compatible", > >> + "ibm,power9-npu"))); > >> + g_free(npuname); > >> + > >> + for (i =3D 0; i < phb->nvgpus->num; ++i) { > >> + for (j =3D 0; j < phb->nvgpus->gpus[i].links; ++j) { > >=20 > > Here you assume that every entry up to nvgpus->num is populated, which > > you didn't elsewhere in the code. >=20 > Populated or half populated. Here I expose one NVLink per one passed > through IBM bridge, this ignores whether or not there is a GPU. There > should be a GPU and if there is none, then skiboot won't create the > bridges so I can have additional tests here but they won't fail and not > having them won't break things apart. Ah, ok, I think I misunderstood this. > >=20 > >> + char *linkname =3D g_strdup_printf("link@%d", lin= kidx); > >> + int off =3D fdt_add_subnode(fdt, npuoff, linkname= ); > >> + > >> + _FDT(off); > >> + _FDT((fdt_setprop_cell(fdt, off, "reg", linkidx))= ); > >=20 > > Generally it's better to base 'reg' off some absolute property rather > > than arbitrary index. Could you derive linkidx from i & j instead? >=20 > I can probably just ditch this one. >=20 > >=20 > >> + _FDT((fdt_setprop_string(fdt, off, "compatible", > >> + "ibm,npu-link"))); > >> + _FDT((fdt_setprop_cell(fdt, off, "phandle", > >> + PHANDLE_NVLINK(phb, i, j))= )); > >> + _FDT((fdt_setprop_cell(fdt, off, "ibm,npu-link-in= dex", > >> + linkidx))); > >> + g_free(linkname); > >> + ++linkidx; > >> + } > >> + } > >> + } > >> + > >> + /* Finally, when we finished with the device tree, map subreg= ions */ > >=20 > > Shouldn't be in a device tree function. > >=20 > >> + for (i =3D 0; i < phb->nvgpus->num; ++i) { > >> + PCIDevice *gpdev =3D phb->nvgpus->gpus[i].gpdev; > >> + Object *nv_mrobj =3D object_property_get_link(OBJECT(gpde= v), > >> + "nvlink2-mr[0= ]", NULL); > >> + > >> + if (nv_mrobj) { > >> + memory_region_add_subregion(get_system_memory(), > >> + phb->nvgpus->gpus[i].gpa, > >> + MEMORY_REGION(nv_mrobj)); > >> + } > >> + for (j =3D 0; j < phb->nvgpus->gpus[i].links; ++j) { > >> + PCIDevice *npdev =3D phb->nvgpus->gpus[i].npdev[j]; > >> + Object *atsd_mrobj; > >> + atsd_mrobj =3D object_property_get_link(OBJECT(npdev), > >> + "nvlink2-atsd-mr[= 0]", > >> + NULL); > >> + if (atsd_mrobj) { > >> + hwaddr atsd_gpa =3D phb->nvgpus->gpus[i].atsd_gpa= [j]; > >> + > >> + memory_region_add_subregion(get_system_memory(), = atsd_gpa, > >> + MEMORY_REGION(atsd_mr= obj)); > >> + } > >> + } > >> + } > >> + } /* if (phb->nvgpus) */ > >> + > >> return 0; > >> } > >> =20 > >> diff --git a/hw/vfio/pci-quirks.c b/hw/vfio/pci-quirks.c > >> index 40a1200..faeb9c5 100644 > >> --- a/hw/vfio/pci-quirks.c > >> +++ b/hw/vfio/pci-quirks.c > >> @@ -2180,3 +2180,123 @@ int vfio_add_virt_caps(VFIOPCIDevice *vdev, Er= ror **errp) > >> =20 > >> return 0; > >> } > >> + > >> +static void vfio_pci_nvlink2_get_tgt(Object *obj, Visitor *v, > >> + const char *name, > >> + void *opaque, Error **errp) > >> +{ > >> + uint64_t tgt =3D (uint64_t) opaque; > >> + visit_type_uint64(v, name, &tgt, errp); > >> +} > >> + > >> +static void vfio_pci_nvlink2_get_link_speed(Object *obj, Visitor *v, > >> + const char *name, > >> + void *opaque, Error = **errp) > >> +{ > >> + uint32_t link_speed =3D (uint32_t)(uint64_t) opaque; > >> + visit_type_uint32(v, name, &link_speed, errp); > >> +} > >> + > >> +int vfio_pci_nvidia_v100_ram_init(VFIOPCIDevice *vdev, Error **errp) > >> +{ > >> + int ret; > >> + void *p; > >> + struct vfio_region_info *nv2region =3D NULL; > >> + struct vfio_info_cap_header *hdr; > >> + MemoryRegion *nv2mr =3D g_malloc0(sizeof(*nv2mr)); > >> + > >> + ret =3D vfio_get_dev_region_info(&vdev->vbasedev, > >> + VFIO_REGION_TYPE_PCI_VENDOR_TYPE | > >> + PCI_VENDOR_ID_NVIDIA, > >> + VFIO_REGION_SUBTYPE_NVIDIA_NVLINK2= _RAM, > >> + &nv2region); > >> + if (ret) { > >> + return ret; > >> + } > >> + > >> + p =3D mmap(NULL, nv2region->size, PROT_READ | PROT_WRITE | PROT_E= XEC, > >> + MAP_SHARED, vdev->vbasedev.fd, nv2region->offset); > >> + > >> + if (!p) { > >> + return -errno; > >> + } > >> + > >> + memory_region_init_ram_ptr(nv2mr, OBJECT(vdev), "nvlink2-mr", > >> + nv2region->size, p); > >> + > >> + hdr =3D vfio_get_region_info_cap(nv2region, > >> + VFIO_REGION_INFO_CAP_NVLINK2_SSATG= T); > >> + if (hdr) { > >> + struct vfio_region_info_cap_nvlink2_ssatgt *cap =3D (void *) = hdr; > >> + > >> + object_property_add(OBJECT(vdev), "tgt", "uint64", > >> + vfio_pci_nvlink2_get_tgt, NULL, NULL, > >> + (void *) cap->tgt, NULL); > >> + trace_vfio_pci_nvidia_gpu_setup_quirk(vdev->vbasedev.name, ca= p->tgt, > >> + nv2region->size); > >> + } > >> + g_free(nv2region); > >> + > >> + return 0; > >> +} > >> + > >> +int vfio_pci_nvlink2_init(VFIOPCIDevice *vdev, Error **errp) > >> +{ > >> + int ret; > >> + void *p; > >> + struct vfio_region_info *atsd_region =3D NULL; > >> + struct vfio_info_cap_header *hdr; > >> + > >> + ret =3D vfio_get_dev_region_info(&vdev->vbasedev, > >> + VFIO_REGION_TYPE_PCI_VENDOR_TYPE | > >> + PCI_VENDOR_ID_IBM, > >> + VFIO_REGION_SUBTYPE_IBM_NVLINK2_AT= SD, > >> + &atsd_region); > >> + if (ret) { > >> + return ret; > >> + } > >> + > >> + /* Some NVLink bridges come without assigned ATSD, skip MR part */ > >> + if (atsd_region->size) { > >> + MemoryRegion *atsd_mr =3D g_malloc0(sizeof(*atsd_mr)); > >> + > >> + p =3D mmap(NULL, atsd_region->size, PROT_READ | PROT_WRITE | = PROT_EXEC, > >=20 > > PROT_EXEC ? >=20 > Yes, why? We can execute from that memory (nothing does now or probably > ever will but still can). Hm, ok. > >> + MAP_SHARED, vdev->vbasedev.fd, atsd_region->offset); > >> + > >> + if (!p) { > >> + return -errno; > >> + } > >> + > >> + memory_region_init_ram_device_ptr(atsd_mr, OBJECT(vdev), > >> + "nvlink2-atsd-mr", > >> + atsd_region->size, > >> + p); > >> + } > >> + > >> + hdr =3D vfio_get_region_info_cap(atsd_region, > >> + VFIO_REGION_INFO_CAP_NVLINK2_SSATG= T); > >> + if (hdr) { > >> + struct vfio_region_info_cap_nvlink2_ssatgt *cap =3D (void *) = hdr; > >> + > >> + object_property_add(OBJECT(vdev), "tgt", "uint64", > >> + vfio_pci_nvlink2_get_tgt, NULL, NULL, > >> + (void *) cap->tgt, NULL); > >> + trace_vfio_pci_nvlink2_setup_quirk_ssatgt(vdev->vbasedev.name= , cap->tgt, > >> + atsd_region->size); > >> + } > >> + > >> + hdr =3D vfio_get_region_info_cap(atsd_region, > >> + VFIO_REGION_INFO_CAP_NVLINK2_LNKSP= D); > >> + if (hdr) { > >> + struct vfio_region_info_cap_nvlink2_lnkspd *cap =3D (void *) = hdr; > >> + > >> + object_property_add(OBJECT(vdev), "link_speed", "uint32", > >> + vfio_pci_nvlink2_get_link_speed, NULL, NU= LL, > >> + (void *) (uint64_t) cap->link_speed, NULL= ); > >> + trace_vfio_pci_nvlink2_setup_quirk_lnkspd(vdev->vbasedev.name, > >> + cap->link_speed); > >> + } > >> + g_free(atsd_region); > >> + > >> + return 0; > >> +} > >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c > >> index dd12f36..07aa141 100644 > >> --- a/hw/vfio/pci.c > >> +++ b/hw/vfio/pci.c > >> @@ -3069,6 +3069,20 @@ static void vfio_realize(PCIDevice *pdev, Error= **errp) > >> goto out_teardown; > >> } > >> =20 > >> + if (vdev->vendor_id =3D=3D PCI_VENDOR_ID_NVIDIA) { > >=20 > > Surely we should be checking device id as well as vendor. nVidia > > makes a lot of cards and most of them don't do nvlink2. >=20 > If they do not - the host driver will return an error, end of story. Why > is this approach exactly bad? I guess. In that case why not just call the probe unconditionally? > >> + ret =3D vfio_pci_nvidia_v100_ram_init(vdev, errp); > >> + if (ret && ret !=3D -ENODEV) { > >> + error_report("Failed to setup NVIDIA V100 GPU RAM"); > >> + } > >> + } > >> + > >> + if (vdev->vendor_id =3D=3D PCI_VENDOR_ID_IBM) { > >=20 > > And IBM makes even more. >=20 > Same here - the driver just errors out. >=20 >=20 > I'll fix the rest of the comments. Thanks, >=20 >=20 > >=20 > >> + ret =3D vfio_pci_nvlink2_init(vdev, errp); > >> + if (ret && ret !=3D -ENODEV) { > >> + error_report("Failed to setup NVlink2 bridge"); > >> + } > >> + } > >> + > >> vfio_register_err_notifier(vdev); > >> vfio_register_req_notifier(vdev); > >> vfio_setup_resetfn_quirk(vdev); > >> diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events > >> index 579be19..6866884 100644 > >> --- a/hw/vfio/trace-events > >> +++ b/hw/vfio/trace-events > >> @@ -87,6 +87,10 @@ vfio_pci_igd_opregion_enabled(const char *name) "%s" > >> vfio_pci_igd_host_bridge_enabled(const char *name) "%s" > >> vfio_pci_igd_lpc_bridge_enabled(const char *name) "%s" > >> =20 > >> +vfio_pci_nvidia_gpu_setup_quirk(const char *name, uint64_t tgt, uint6= 4_t size) "%s tgt=3D0x%"PRIx64" size=3D0x%"PRIx64 > >> +vfio_pci_nvlink2_setup_quirk_ssatgt(const char *name, uint64_t tgt, u= int64_t size) "%s tgt=3D0x%"PRIx64" size=3D0x%"PRIx64 > >> +vfio_pci_nvlink2_setup_quirk_lnkspd(const char *name, uint32_t link_s= peed) "%s link_speed=3D0x%x" > >> + > >> # hw/vfio/common.c > >> vfio_region_write(const char *name, int index, uint64_t addr, uint64_= t data, unsigned size) " (%s:region%d+0x%"PRIx64", 0x%"PRIx64 ", %d)" > >> vfio_region_read(char *name, int index, uint64_t addr, unsigned size,= uint64_t data) " (%s:region%d+0x%"PRIx64", %d) =3D 0x%"PRIx64 > >=20 >=20 --=20 David Gibson | I'll have my music baroque, and my code david AT gibson.dropbear.id.au | minimalist, thank you. NOT _the_ _other_ | _way_ _around_! http://www.ozlabs.org/~dgibson --OGW1Z2JKiS9bXo17 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEdfRlhq5hpmzETofcbDjKyiDZs5IFAlxmTmwACgkQbDjKyiDZ s5IXqRAAgrztcdDgD7meyzvnKVxVW6abYBwGezo246Pg54NNrfOfGU8d+sSxIMbr mZmhQdXXKi6cpUiE6Drw4Q/IZqLY4llfwAzzJqlNeWBuAM7mbDo10zQQnxtkp8KJ mw5oOMGZtA1u6WMimowViLkKQB/4F1uW4vNNgDPSrd7MbmWsuq55W6rjLgXneXDk dqLIvZtGq7GCywQlkaoBYcd3XBgW1Os0iLtxSa7E2xTNlVD0vWrSX2WmUXNLLyvc 4GuAm+iVkLFETDnFGMSt5ie8pRDdV4Puda2D1+U/4d+w28f5H2fnaEw6nFOnVURR tXrXI80xnD+h/pv1u4OqDcS2jJm8Zu9M/RCDkFZJXKgIvCt9YR3VN+0e7SQdk/lZ xrpbH01I24xPIk+HS7DqAIPqlywzZDnpFu4OaHb6xlNsPg2auuru8C7HRrA05Igo Abq2ZMoMjxOimrq1e2+/wyPl7ZK7mbfYFCk8qX+E+twaeHa6lGgXKoDBK01pqAx4 PLWx/OyqObG6QSNDKV3B/jk6RjO8noYNYACIdx4oE39JSycYi5N+OUM8QJlVqgpe 9766UmmUDq1iQxhTzljDaheozDJuBY74sT/N4Xi+bTkgSzVlOcoS2FCNSD2K4G08 5CVNYafLQWVDdJk2T0o73pn7G5ie+MA/rI3jlenfBioCvaznRjg= =3CAx -----END PGP SIGNATURE----- --OGW1Z2JKiS9bXo17--