All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <alistair@popple.id.au>
To: linuxppc-dev@lists.ozlabs.org
Cc: "Alexey Kardashevskiy" <aik@ozlabs.ru>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Jose Ricardo Ziviani" <joserz@linux.ibm.com>,
	kvm@vger.kernel.org, "Sam Bobroff" <sbobroff@linux.ibm.com>,
	"Daniel Henrique Barboza" <danielhb413@gmail.com>,
	kvm-ppc@vger.kernel.org,
	"Piotr Jaroszynski" <pjaroszynski@nvidia.com>,
	"Leonardo Augusto Guimarães Garcia" <lagarcia@br.ibm.com>,
	"Reza Arbab" <arbab@linux.ibm.com>,
	"David Gibson" <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon
Date: Tue, 30 Apr 2019 05:45:28 +0000	[thread overview]
Message-ID: <5149814.2BROG1NTNO@townsend> (raw)
In-Reply-To: <da41cd35-32f6-043e-13ab-9a225c4e910a@ozlabs.ru>

Alexey,

> >>> +void pnv_try_isolate_nvidia_v100(struct pci_dev *bridge)
> >>> +{
> >>> +	u32 mask, val;
> >>> +	void __iomem *bar0_0, *bar0_120000, *bar0_a00000;
> >>> +	struct pci_dev *pdev;
> >>> +	u16 cmd = 0, cmdmask = PCI_COMMAND_MEMORY;
> >>> +
> >>> +	if (!bridge->subordinate)
> >>> +		return;
> >>> +
> >>> +	pdev = list_first_entry_or_null(&bridge->subordinate->devices,
> >>> +			struct pci_dev, bus_list);
> >>> +	if (!pdev)
> >>> +		return;
> >>> +
> >>> +	if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)

Don't you also need to check the PCIe devid to match only [PV]100 devices as 
well? I doubt there's any guarantee these registers will remain the same for 
all future (or older) NVIDIA devices.

IMHO this should really be done in the device driver in the guest. A malcious 
guest could load a modified driver that doesn't do this, but that should not 
compromise other guests which presumably load a non-compromised driver that 
disables the links on that guests GPU. However I guess in practice what you 
have here should work equally well.

- Alistair

> >>> +		return;
> >>> +
> >>> +	mask = nvlinkgpu_get_disable_mask(&pdev->dev);
> >>> +	if (!mask)
> >>> +		return;
> >>> +
> >>> +	bar0_0 = pci_iomap_range(pdev, 0, 0, 0x10000);
> >>> +	if (!bar0_0) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @0\n");
> >>> +		return;
> >>> +	}
> >>> +	bar0_120000 = pci_iomap_range(pdev, 0, 0x120000, 0x10000);
> >>> +	if (!bar0_120000) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @120000\n");
> >>> +		goto bar0_0_unmap;
> >>> +	}
> >>> +	bar0_a00000 = pci_iomap_range(pdev, 0, 0xA00000, 0x10000);
> >>> +	if (!bar0_a00000) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @A00000\n");
> >>> +		goto bar0_120000_unmap;
> >>> +	}
> >> 
> >> Is it really necessary to do three separate ioremaps vs one that would
> >> cover them all here?  I suspect you're just sneaking in PAGE_SIZE with
> >> the 0x10000 size mappings anyway.  Seems like it would simplify setup,
> >> error reporting, and cleanup to to ioremap to the PAGE_ALIGN'd range
> >> of the highest register accessed. Thanks,
> > 
> > Sure I can map it once, I just do not see the point in mapping/unmapping
> > all 0xa10000>>16\x161 system pages for a very short period of time while
> > we know precisely that we need just 3 pages.
> > 
> > Repost?
> 
> Ping?
> 
> Can this go in as it is (i.e. should I ping Michael) or this needs
> another round? It would be nice to get some formal acks. Thanks,
> 
> >> Alex
> >> 
> >>> +
> >>> +	pci_restore_state(pdev);
> >>> +	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
> >>> +	if ((cmd & cmdmask) != cmdmask)
> >>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd | cmdmask);
> >>> +
> >>> +	/*
> >>> +	 * The sequence is from "Tesla P100 and V100 SXM2 NVLink Isolation on
> >>> +	 * Multi-Tenant Systems".
> >>> +	 * The register names are not provided there either, hence raw values.
> >>> +	 */
> >>> +	iowrite32(0x4, bar0_120000 + 0x4C);
> >>> +	iowrite32(0x2, bar0_120000 + 0x2204);
> >>> +	val = ioread32(bar0_0 + 0x200);
> >>> +	val |= 0x02000000;
> >>> +	iowrite32(val, bar0_0 + 0x200);
> >>> +	val = ioread32(bar0_a00000 + 0x148);
> >>> +	val |= mask;
> >>> +	iowrite32(val, bar0_a00000 + 0x148);
> >>> +
> >>> +	if ((cmd | cmdmask) != cmd)
> >>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd);
> >>> +
> >>> +	pci_iounmap(pdev, bar0_a00000);
> >>> +bar0_120000_unmap:
> >>> +	pci_iounmap(pdev, bar0_120000);
> >>> +bar0_0_unmap:
> >>> +	pci_iounmap(pdev, bar0_0);
> >>> +}


WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <alistair@popple.id.au>
To: linuxppc-dev@lists.ozlabs.org
Cc: "Jose Ricardo Ziviani" <joserz@linux.ibm.com>,
	kvm@vger.kernel.org, "Alexey Kardashevskiy" <aik@ozlabs.ru>,
	"Daniel Henrique Barboza" <danielhb413@gmail.com>,
	"Piotr Jaroszynski" <pjaroszynski@nvidia.com>,
	kvm-ppc@vger.kernel.org, "Sam Bobroff" <sbobroff@linux.ibm.com>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Leonardo Augusto Guimarães Garcia" <lagarcia@br.ibm.com>,
	"Reza Arbab" <arbab@linux.ibm.com>,
	"David Gibson" <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon
Date: Tue, 30 Apr 2019 15:45:28 +1000	[thread overview]
Message-ID: <5149814.2BROG1NTNO@townsend> (raw)
In-Reply-To: <da41cd35-32f6-043e-13ab-9a225c4e910a@ozlabs.ru>

Alexey,

> >>> +void pnv_try_isolate_nvidia_v100(struct pci_dev *bridge)
> >>> +{
> >>> +	u32 mask, val;
> >>> +	void __iomem *bar0_0, *bar0_120000, *bar0_a00000;
> >>> +	struct pci_dev *pdev;
> >>> +	u16 cmd = 0, cmdmask = PCI_COMMAND_MEMORY;
> >>> +
> >>> +	if (!bridge->subordinate)
> >>> +		return;
> >>> +
> >>> +	pdev = list_first_entry_or_null(&bridge->subordinate->devices,
> >>> +			struct pci_dev, bus_list);
> >>> +	if (!pdev)
> >>> +		return;
> >>> +
> >>> +	if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)

Don't you also need to check the PCIe devid to match only [PV]100 devices as 
well? I doubt there's any guarantee these registers will remain the same for 
all future (or older) NVIDIA devices.

IMHO this should really be done in the device driver in the guest. A malcious 
guest could load a modified driver that doesn't do this, but that should not 
compromise other guests which presumably load a non-compromised driver that 
disables the links on that guests GPU. However I guess in practice what you 
have here should work equally well.

- Alistair

> >>> +		return;
> >>> +
> >>> +	mask = nvlinkgpu_get_disable_mask(&pdev->dev);
> >>> +	if (!mask)
> >>> +		return;
> >>> +
> >>> +	bar0_0 = pci_iomap_range(pdev, 0, 0, 0x10000);
> >>> +	if (!bar0_0) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @0\n");
> >>> +		return;
> >>> +	}
> >>> +	bar0_120000 = pci_iomap_range(pdev, 0, 0x120000, 0x10000);
> >>> +	if (!bar0_120000) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @120000\n");
> >>> +		goto bar0_0_unmap;
> >>> +	}
> >>> +	bar0_a00000 = pci_iomap_range(pdev, 0, 0xA00000, 0x10000);
> >>> +	if (!bar0_a00000) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @A00000\n");
> >>> +		goto bar0_120000_unmap;
> >>> +	}
> >> 
> >> Is it really necessary to do three separate ioremaps vs one that would
> >> cover them all here?  I suspect you're just sneaking in PAGE_SIZE with
> >> the 0x10000 size mappings anyway.  Seems like it would simplify setup,
> >> error reporting, and cleanup to to ioremap to the PAGE_ALIGN'd range
> >> of the highest register accessed. Thanks,
> > 
> > Sure I can map it once, I just do not see the point in mapping/unmapping
> > all 0xa10000>>16=161 system pages for a very short period of time while
> > we know precisely that we need just 3 pages.
> > 
> > Repost?
> 
> Ping?
> 
> Can this go in as it is (i.e. should I ping Michael) or this needs
> another round? It would be nice to get some formal acks. Thanks,
> 
> >> Alex
> >> 
> >>> +
> >>> +	pci_restore_state(pdev);
> >>> +	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
> >>> +	if ((cmd & cmdmask) != cmdmask)
> >>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd | cmdmask);
> >>> +
> >>> +	/*
> >>> +	 * The sequence is from "Tesla P100 and V100 SXM2 NVLink Isolation on
> >>> +	 * Multi-Tenant Systems".
> >>> +	 * The register names are not provided there either, hence raw values.
> >>> +	 */
> >>> +	iowrite32(0x4, bar0_120000 + 0x4C);
> >>> +	iowrite32(0x2, bar0_120000 + 0x2204);
> >>> +	val = ioread32(bar0_0 + 0x200);
> >>> +	val |= 0x02000000;
> >>> +	iowrite32(val, bar0_0 + 0x200);
> >>> +	val = ioread32(bar0_a00000 + 0x148);
> >>> +	val |= mask;
> >>> +	iowrite32(val, bar0_a00000 + 0x148);
> >>> +
> >>> +	if ((cmd | cmdmask) != cmd)
> >>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd);
> >>> +
> >>> +	pci_iounmap(pdev, bar0_a00000);
> >>> +bar0_120000_unmap:
> >>> +	pci_iounmap(pdev, bar0_120000);
> >>> +bar0_0_unmap:
> >>> +	pci_iounmap(pdev, bar0_0);
> >>> +}



WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <alistair@popple.id.au>
To: linuxppc-dev@lists.ozlabs.org
Cc: "Alexey Kardashevskiy" <aik@ozlabs.ru>,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Jose Ricardo Ziviani" <joserz@linux.ibm.com>,
	kvm@vger.kernel.org, "Sam Bobroff" <sbobroff@linux.ibm.com>,
	"Daniel Henrique Barboza" <danielhb413@gmail.com>,
	kvm-ppc@vger.kernel.org,
	"Piotr Jaroszynski" <pjaroszynski@nvidia.com>,
	"Leonardo Augusto Guimarães Garcia" <lagarcia@br.ibm.com>,
	"Reza Arbab" <arbab@linux.ibm.com>,
	"David Gibson" <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon
Date: Tue, 30 Apr 2019 15:45:28 +1000	[thread overview]
Message-ID: <5149814.2BROG1NTNO@townsend> (raw)
In-Reply-To: <da41cd35-32f6-043e-13ab-9a225c4e910a@ozlabs.ru>

Alexey,

> >>> +void pnv_try_isolate_nvidia_v100(struct pci_dev *bridge)
> >>> +{
> >>> +	u32 mask, val;
> >>> +	void __iomem *bar0_0, *bar0_120000, *bar0_a00000;
> >>> +	struct pci_dev *pdev;
> >>> +	u16 cmd = 0, cmdmask = PCI_COMMAND_MEMORY;
> >>> +
> >>> +	if (!bridge->subordinate)
> >>> +		return;
> >>> +
> >>> +	pdev = list_first_entry_or_null(&bridge->subordinate->devices,
> >>> +			struct pci_dev, bus_list);
> >>> +	if (!pdev)
> >>> +		return;
> >>> +
> >>> +	if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)

Don't you also need to check the PCIe devid to match only [PV]100 devices as 
well? I doubt there's any guarantee these registers will remain the same for 
all future (or older) NVIDIA devices.

IMHO this should really be done in the device driver in the guest. A malcious 
guest could load a modified driver that doesn't do this, but that should not 
compromise other guests which presumably load a non-compromised driver that 
disables the links on that guests GPU. However I guess in practice what you 
have here should work equally well.

- Alistair

> >>> +		return;
> >>> +
> >>> +	mask = nvlinkgpu_get_disable_mask(&pdev->dev);
> >>> +	if (!mask)
> >>> +		return;
> >>> +
> >>> +	bar0_0 = pci_iomap_range(pdev, 0, 0, 0x10000);
> >>> +	if (!bar0_0) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @0\n");
> >>> +		return;
> >>> +	}
> >>> +	bar0_120000 = pci_iomap_range(pdev, 0, 0x120000, 0x10000);
> >>> +	if (!bar0_120000) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @120000\n");
> >>> +		goto bar0_0_unmap;
> >>> +	}
> >>> +	bar0_a00000 = pci_iomap_range(pdev, 0, 0xA00000, 0x10000);
> >>> +	if (!bar0_a00000) {
> >>> +		pci_err(pdev, "Error mapping BAR0 @A00000\n");
> >>> +		goto bar0_120000_unmap;
> >>> +	}
> >> 
> >> Is it really necessary to do three separate ioremaps vs one that would
> >> cover them all here?  I suspect you're just sneaking in PAGE_SIZE with
> >> the 0x10000 size mappings anyway.  Seems like it would simplify setup,
> >> error reporting, and cleanup to to ioremap to the PAGE_ALIGN'd range
> >> of the highest register accessed. Thanks,
> > 
> > Sure I can map it once, I just do not see the point in mapping/unmapping
> > all 0xa10000>>16=161 system pages for a very short period of time while
> > we know precisely that we need just 3 pages.
> > 
> > Repost?
> 
> Ping?
> 
> Can this go in as it is (i.e. should I ping Michael) or this needs
> another round? It would be nice to get some formal acks. Thanks,
> 
> >> Alex
> >> 
> >>> +
> >>> +	pci_restore_state(pdev);
> >>> +	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
> >>> +	if ((cmd & cmdmask) != cmdmask)
> >>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd | cmdmask);
> >>> +
> >>> +	/*
> >>> +	 * The sequence is from "Tesla P100 and V100 SXM2 NVLink Isolation on
> >>> +	 * Multi-Tenant Systems".
> >>> +	 * The register names are not provided there either, hence raw values.
> >>> +	 */
> >>> +	iowrite32(0x4, bar0_120000 + 0x4C);
> >>> +	iowrite32(0x2, bar0_120000 + 0x2204);
> >>> +	val = ioread32(bar0_0 + 0x200);
> >>> +	val |= 0x02000000;
> >>> +	iowrite32(val, bar0_0 + 0x200);
> >>> +	val = ioread32(bar0_a00000 + 0x148);
> >>> +	val |= mask;
> >>> +	iowrite32(val, bar0_a00000 + 0x148);
> >>> +
> >>> +	if ((cmd | cmdmask) != cmd)
> >>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd);
> >>> +
> >>> +	pci_iounmap(pdev, bar0_a00000);
> >>> +bar0_120000_unmap:
> >>> +	pci_iounmap(pdev, bar0_120000);
> >>> +bar0_0_unmap:
> >>> +	pci_iounmap(pdev, bar0_0);
> >>> +}



  reply	other threads:[~2019-04-30  5:45 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11  6:48 [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon Alexey Kardashevskiy
2019-04-11  6:48 ` Alexey Kardashevskiy
2019-04-11  6:48 ` Alexey Kardashevskiy
2019-04-11 16:52 ` Alex Williamson
2019-04-11 16:52   ` Alex Williamson
2019-04-11 16:52   ` Alex Williamson
2019-04-12  3:48   ` Alexey Kardashevskiy
2019-04-12  3:48     ` Alexey Kardashevskiy
2019-04-12  3:48     ` Alexey Kardashevskiy
2019-04-29  6:50     ` Alexey Kardashevskiy
2019-04-29  6:50       ` Alexey Kardashevskiy
2019-04-29  6:50       ` Alexey Kardashevskiy
2019-04-30  5:45       ` Alistair Popple [this message]
2019-04-30  5:45         ` Alistair Popple
2019-04-30  5:45         ` Alistair Popple
2019-04-30  6:14         ` Alexey Kardashevskiy
2019-04-30  6:14           ` Alexey Kardashevskiy
2019-04-30  6:14           ` Alexey Kardashevskiy
2019-04-30 13:20           ` Alex Williamson
2019-04-30 13:20             ` Alex Williamson
2019-04-30 13:20             ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=5149814.2BROG1NTNO@townsend \
    --to=alistair@popple.id.au \
    --cc=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=arbab@linux.ibm.com \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=joserz@linux.ibm.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=lagarcia@br.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=pjaroszynski@nvidia.com \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.