Re: [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon

linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed

From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Alistair Popple <alistair@popple.id.au>, linuxppc-dev@lists.ozlabs.org
Cc: "Jose Ricardo Ziviani" <joserz@linux.ibm.com>,
	kvm@vger.kernel.org, "Sam Bobroff" <sbobroff@linux.ibm.com>,
	"Daniel Henrique Barboza" <danielhb413@gmail.com>,
	"Piotr Jaroszynski" <pjaroszynski@nvidia.com>,
	kvm-ppc@vger.kernel.org,
	"Alex Williamson" <alex.williamson@redhat.com>,
	"Leonardo Augusto Guimarães Garcia" <lagarcia@br.ibm.com>,
	"Reza Arbab" <arbab@linux.ibm.com>,
	"David Gibson" <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon
Date: Tue, 30 Apr 2019 16:14:35 +1000	[thread overview]
Message-ID: <cf35d5be-d0a3-aef3-d5eb-c2318ef7d92b@ozlabs.ru> (raw)
In-Reply-To: <5149814.2BROG1NTNO@townsend>



On 30/04/2019 15:45, Alistair Popple wrote:
> Alexey,
> 
>>>>> +void pnv_try_isolate_nvidia_v100(struct pci_dev *bridge)
>>>>> +{
>>>>> +	u32 mask, val;
>>>>> +	void __iomem *bar0_0, *bar0_120000, *bar0_a00000;
>>>>> +	struct pci_dev *pdev;
>>>>> +	u16 cmd = 0, cmdmask = PCI_COMMAND_MEMORY;
>>>>> +
>>>>> +	if (!bridge->subordinate)
>>>>> +		return;
>>>>> +
>>>>> +	pdev = list_first_entry_or_null(&bridge->subordinate->devices,
>>>>> +			struct pci_dev, bus_list);
>>>>> +	if (!pdev)
>>>>> +		return;
>>>>> +
>>>>> +	if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)
> 
> Don't you also need to check the PCIe devid to match only [PV]100 devices as 
> well? I doubt there's any guarantee these registers will remain the same for 
> all future (or older) NVIDIA devices.


I do not have the complete list of IDs and I already saw 3 different
device ids and this only works for machines with ibm,npu/gpu/nvlinks
properties so for now it works and for the future we are hoping to
either have an open source nvidia driver or some small minidriver (also
from nvidia, or may be a spec allowing us to write one) to allow
topology discovery on the host so we would not depend on the skiboot's
powernv DT.

> IMHO this should really be done in the device driver in the guest. A malcious 
> guest could load a modified driver that doesn't do this, but that should not 
> compromise other guests which presumably load a non-compromised driver that 
> disables the links on that guests GPU. However I guess in practice what you 
> have here should work equally well.

Doing it in the guest means a good guest needs to have an updated
driver, we do not really want to depend on this. The idea of IOMMU
groups is that the hypervisor provides isolation irrespective to what
the guest does.

Also vfio+qemu+slof needs to convey the nvlink topology to the guest,
seems like an unnecessary complication.



> - Alistair
> 
>>>>> +		return;
>>>>> +
>>>>> +	mask = nvlinkgpu_get_disable_mask(&pdev->dev);
>>>>> +	if (!mask)
>>>>> +		return;
>>>>> +
>>>>> +	bar0_0 = pci_iomap_range(pdev, 0, 0, 0x10000);
>>>>> +	if (!bar0_0) {
>>>>> +		pci_err(pdev, "Error mapping BAR0 @0\n");
>>>>> +		return;
>>>>> +	}
>>>>> +	bar0_120000 = pci_iomap_range(pdev, 0, 0x120000, 0x10000);
>>>>> +	if (!bar0_120000) {
>>>>> +		pci_err(pdev, "Error mapping BAR0 @120000\n");
>>>>> +		goto bar0_0_unmap;
>>>>> +	}
>>>>> +	bar0_a00000 = pci_iomap_range(pdev, 0, 0xA00000, 0x10000);
>>>>> +	if (!bar0_a00000) {
>>>>> +		pci_err(pdev, "Error mapping BAR0 @A00000\n");
>>>>> +		goto bar0_120000_unmap;
>>>>> +	}
>>>>
>>>> Is it really necessary to do three separate ioremaps vs one that would
>>>> cover them all here?  I suspect you're just sneaking in PAGE_SIZE with
>>>> the 0x10000 size mappings anyway.  Seems like it would simplify setup,
>>>> error reporting, and cleanup to to ioremap to the PAGE_ALIGN'd range
>>>> of the highest register accessed. Thanks,
>>>
>>> Sure I can map it once, I just do not see the point in mapping/unmapping
>>> all 0xa10000>>16=161 system pages for a very short period of time while
>>> we know precisely that we need just 3 pages.
>>>
>>> Repost?
>>
>> Ping?
>>
>> Can this go in as it is (i.e. should I ping Michael) or this needs
>> another round? It would be nice to get some formal acks. Thanks,
>>
>>>> Alex
>>>>
>>>>> +
>>>>> +	pci_restore_state(pdev);
>>>>> +	pci_read_config_word(pdev, PCI_COMMAND, &cmd);
>>>>> +	if ((cmd & cmdmask) != cmdmask)
>>>>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd | cmdmask);
>>>>> +
>>>>> +	/*
>>>>> +	 * The sequence is from "Tesla P100 and V100 SXM2 NVLink Isolation on
>>>>> +	 * Multi-Tenant Systems".
>>>>> +	 * The register names are not provided there either, hence raw values.
>>>>> +	 */
>>>>> +	iowrite32(0x4, bar0_120000 + 0x4C);
>>>>> +	iowrite32(0x2, bar0_120000 + 0x2204);
>>>>> +	val = ioread32(bar0_0 + 0x200);
>>>>> +	val |= 0x02000000;
>>>>> +	iowrite32(val, bar0_0 + 0x200);
>>>>> +	val = ioread32(bar0_a00000 + 0x148);
>>>>> +	val |= mask;
>>>>> +	iowrite32(val, bar0_a00000 + 0x148);
>>>>> +
>>>>> +	if ((cmd | cmdmask) != cmd)
>>>>> +		pci_write_config_word(pdev, PCI_COMMAND, cmd);
>>>>> +
>>>>> +	pci_iounmap(pdev, bar0_a00000);
>>>>> +bar0_120000_unmap:
>>>>> +	pci_iounmap(pdev, bar0_120000);
>>>>> +bar0_0_unmap:
>>>>> +	pci_iounmap(pdev, bar0_0);
>>>>> +}
> 
> 

-- 
Alexey

next prev parent reply	other threads:[~2019-04-30  6:16 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11  6:48 [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon Alexey Kardashevskiy
2019-04-11 16:52 ` Alex Williamson
2019-04-12  3:48   ` Alexey Kardashevskiy
2019-04-29  6:50     ` Alexey Kardashevskiy
2019-04-30  5:45       ` Alistair Popple
2019-04-30  6:14         ` Alexey Kardashevskiy [this message]
2019-04-30 13:20           ` Alex Williamson

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=cf35d5be-d0a3-aef3-d5eb-c2318ef7d92b@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=alex.williamson@redhat.com \
    --cc=alistair@popple.id.au \
    --cc=arbab@linux.ibm.com \
    --cc=danielhb413@gmail.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=joserz@linux.ibm.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=kvm@vger.kernel.org \
    --cc=lagarcia@br.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=pjaroszynski@nvidia.com \
    --cc=sbobroff@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).