From: Alistair Popple <alistair@popple.id.au>
To: linuxppc-dev@lists.ozlabs.org
Cc: "Alexey Kardashevskiy" <aik@ozlabs.ru>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Jose Ricardo Ziviani" <joserz@linux.ibm.com>,
kvm@vger.kernel.org, "Sam Bobroff" <sbobroff@linux.ibm.com>,
"Daniel Henrique Barboza" <danielhb413@gmail.com>,
kvm-ppc@vger.kernel.org,
"Piotr Jaroszynski" <pjaroszynski@nvidia.com>,
"Leonardo Augusto Guimarães Garcia" <lagarcia@br.ibm.com>,
"Reza Arbab" <arbab@linux.ibm.com>,
"David Gibson" <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon
Date: Tue, 30 Apr 2019 05:45:28 +0000 [thread overview]
Message-ID: <5149814.2BROG1NTNO@townsend> (raw)
In-Reply-To: <da41cd35-32f6-043e-13ab-9a225c4e910a@ozlabs.ru>
Alexey,
> >>> +void pnv_try_isolate_nvidia_v100(struct pci_dev *bridge)
> >>> +{
> >>> + u32 mask, val;
> >>> + void __iomem *bar0_0, *bar0_120000, *bar0_a00000;
> >>> + struct pci_dev *pdev;
> >>> + u16 cmd = 0, cmdmask = PCI_COMMAND_MEMORY;
> >>> +
> >>> + if (!bridge->subordinate)
> >>> + return;
> >>> +
> >>> + pdev = list_first_entry_or_null(&bridge->subordinate->devices,
> >>> + struct pci_dev, bus_list);
> >>> + if (!pdev)
> >>> + return;
> >>> +
> >>> + if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)
Don't you also need to check the PCIe devid to match only [PV]100 devices as
well? I doubt there's any guarantee these registers will remain the same for
all future (or older) NVIDIA devices.
IMHO this should really be done in the device driver in the guest. A malcious
guest could load a modified driver that doesn't do this, but that should not
compromise other guests which presumably load a non-compromised driver that
disables the links on that guests GPU. However I guess in practice what you
have here should work equally well.
- Alistair
> >>> + return;
> >>> +
> >>> + mask = nvlinkgpu_get_disable_mask(&pdev->dev);
> >>> + if (!mask)
> >>> + return;
> >>> +
> >>> + bar0_0 = pci_iomap_range(pdev, 0, 0, 0x10000);
> >>> + if (!bar0_0) {
> >>> + pci_err(pdev, "Error mapping BAR0 @0\n");
> >>> + return;
> >>> + }
> >>> + bar0_120000 = pci_iomap_range(pdev, 0, 0x120000, 0x10000);
> >>> + if (!bar0_120000) {
> >>> + pci_err(pdev, "Error mapping BAR0 @120000\n");
> >>> + goto bar0_0_unmap;
> >>> + }
> >>> + bar0_a00000 = pci_iomap_range(pdev, 0, 0xA00000, 0x10000);
> >>> + if (!bar0_a00000) {
> >>> + pci_err(pdev, "Error mapping BAR0 @A00000\n");
> >>> + goto bar0_120000_unmap;
> >>> + }
> >>
> >> Is it really necessary to do three separate ioremaps vs one that would
> >> cover them all here? I suspect you're just sneaking in PAGE_SIZE with
> >> the 0x10000 size mappings anyway. Seems like it would simplify setup,
> >> error reporting, and cleanup to to ioremap to the PAGE_ALIGN'd range
> >> of the highest register accessed. Thanks,
> >
> > Sure I can map it once, I just do not see the point in mapping/unmapping
> > all 0xa10000>>16\x161 system pages for a very short period of time while
> > we know precisely that we need just 3 pages.
> >
> > Repost?
>
> Ping?
>
> Can this go in as it is (i.e. should I ping Michael) or this needs
> another round? It would be nice to get some formal acks. Thanks,
>
> >> Alex
> >>
> >>> +
> >>> + pci_restore_state(pdev);
> >>> + pci_read_config_word(pdev, PCI_COMMAND, &cmd);
> >>> + if ((cmd & cmdmask) != cmdmask)
> >>> + pci_write_config_word(pdev, PCI_COMMAND, cmd | cmdmask);
> >>> +
> >>> + /*
> >>> + * The sequence is from "Tesla P100 and V100 SXM2 NVLink Isolation on
> >>> + * Multi-Tenant Systems".
> >>> + * The register names are not provided there either, hence raw values.
> >>> + */
> >>> + iowrite32(0x4, bar0_120000 + 0x4C);
> >>> + iowrite32(0x2, bar0_120000 + 0x2204);
> >>> + val = ioread32(bar0_0 + 0x200);
> >>> + val |= 0x02000000;
> >>> + iowrite32(val, bar0_0 + 0x200);
> >>> + val = ioread32(bar0_a00000 + 0x148);
> >>> + val |= mask;
> >>> + iowrite32(val, bar0_a00000 + 0x148);
> >>> +
> >>> + if ((cmd | cmdmask) != cmd)
> >>> + pci_write_config_word(pdev, PCI_COMMAND, cmd);
> >>> +
> >>> + pci_iounmap(pdev, bar0_a00000);
> >>> +bar0_120000_unmap:
> >>> + pci_iounmap(pdev, bar0_120000);
> >>> +bar0_0_unmap:
> >>> + pci_iounmap(pdev, bar0_0);
> >>> +}
WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <alistair@popple.id.au>
To: linuxppc-dev@lists.ozlabs.org
Cc: "Jose Ricardo Ziviani" <joserz@linux.ibm.com>,
kvm@vger.kernel.org, "Alexey Kardashevskiy" <aik@ozlabs.ru>,
"Daniel Henrique Barboza" <danielhb413@gmail.com>,
"Piotr Jaroszynski" <pjaroszynski@nvidia.com>,
kvm-ppc@vger.kernel.org, "Sam Bobroff" <sbobroff@linux.ibm.com>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Leonardo Augusto Guimarães Garcia" <lagarcia@br.ibm.com>,
"Reza Arbab" <arbab@linux.ibm.com>,
"David Gibson" <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon
Date: Tue, 30 Apr 2019 15:45:28 +1000 [thread overview]
Message-ID: <5149814.2BROG1NTNO@townsend> (raw)
In-Reply-To: <da41cd35-32f6-043e-13ab-9a225c4e910a@ozlabs.ru>
Alexey,
> >>> +void pnv_try_isolate_nvidia_v100(struct pci_dev *bridge)
> >>> +{
> >>> + u32 mask, val;
> >>> + void __iomem *bar0_0, *bar0_120000, *bar0_a00000;
> >>> + struct pci_dev *pdev;
> >>> + u16 cmd = 0, cmdmask = PCI_COMMAND_MEMORY;
> >>> +
> >>> + if (!bridge->subordinate)
> >>> + return;
> >>> +
> >>> + pdev = list_first_entry_or_null(&bridge->subordinate->devices,
> >>> + struct pci_dev, bus_list);
> >>> + if (!pdev)
> >>> + return;
> >>> +
> >>> + if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)
Don't you also need to check the PCIe devid to match only [PV]100 devices as
well? I doubt there's any guarantee these registers will remain the same for
all future (or older) NVIDIA devices.
IMHO this should really be done in the device driver in the guest. A malcious
guest could load a modified driver that doesn't do this, but that should not
compromise other guests which presumably load a non-compromised driver that
disables the links on that guests GPU. However I guess in practice what you
have here should work equally well.
- Alistair
> >>> + return;
> >>> +
> >>> + mask = nvlinkgpu_get_disable_mask(&pdev->dev);
> >>> + if (!mask)
> >>> + return;
> >>> +
> >>> + bar0_0 = pci_iomap_range(pdev, 0, 0, 0x10000);
> >>> + if (!bar0_0) {
> >>> + pci_err(pdev, "Error mapping BAR0 @0\n");
> >>> + return;
> >>> + }
> >>> + bar0_120000 = pci_iomap_range(pdev, 0, 0x120000, 0x10000);
> >>> + if (!bar0_120000) {
> >>> + pci_err(pdev, "Error mapping BAR0 @120000\n");
> >>> + goto bar0_0_unmap;
> >>> + }
> >>> + bar0_a00000 = pci_iomap_range(pdev, 0, 0xA00000, 0x10000);
> >>> + if (!bar0_a00000) {
> >>> + pci_err(pdev, "Error mapping BAR0 @A00000\n");
> >>> + goto bar0_120000_unmap;
> >>> + }
> >>
> >> Is it really necessary to do three separate ioremaps vs one that would
> >> cover them all here? I suspect you're just sneaking in PAGE_SIZE with
> >> the 0x10000 size mappings anyway. Seems like it would simplify setup,
> >> error reporting, and cleanup to to ioremap to the PAGE_ALIGN'd range
> >> of the highest register accessed. Thanks,
> >
> > Sure I can map it once, I just do not see the point in mapping/unmapping
> > all 0xa10000>>16=161 system pages for a very short period of time while
> > we know precisely that we need just 3 pages.
> >
> > Repost?
>
> Ping?
>
> Can this go in as it is (i.e. should I ping Michael) or this needs
> another round? It would be nice to get some formal acks. Thanks,
>
> >> Alex
> >>
> >>> +
> >>> + pci_restore_state(pdev);
> >>> + pci_read_config_word(pdev, PCI_COMMAND, &cmd);
> >>> + if ((cmd & cmdmask) != cmdmask)
> >>> + pci_write_config_word(pdev, PCI_COMMAND, cmd | cmdmask);
> >>> +
> >>> + /*
> >>> + * The sequence is from "Tesla P100 and V100 SXM2 NVLink Isolation on
> >>> + * Multi-Tenant Systems".
> >>> + * The register names are not provided there either, hence raw values.
> >>> + */
> >>> + iowrite32(0x4, bar0_120000 + 0x4C);
> >>> + iowrite32(0x2, bar0_120000 + 0x2204);
> >>> + val = ioread32(bar0_0 + 0x200);
> >>> + val |= 0x02000000;
> >>> + iowrite32(val, bar0_0 + 0x200);
> >>> + val = ioread32(bar0_a00000 + 0x148);
> >>> + val |= mask;
> >>> + iowrite32(val, bar0_a00000 + 0x148);
> >>> +
> >>> + if ((cmd | cmdmask) != cmd)
> >>> + pci_write_config_word(pdev, PCI_COMMAND, cmd);
> >>> +
> >>> + pci_iounmap(pdev, bar0_a00000);
> >>> +bar0_120000_unmap:
> >>> + pci_iounmap(pdev, bar0_120000);
> >>> +bar0_0_unmap:
> >>> + pci_iounmap(pdev, bar0_0);
> >>> +}
WARNING: multiple messages have this Message-ID (diff)
From: Alistair Popple <alistair@popple.id.au>
To: linuxppc-dev@lists.ozlabs.org
Cc: "Alexey Kardashevskiy" <aik@ozlabs.ru>,
"Alex Williamson" <alex.williamson@redhat.com>,
"Jose Ricardo Ziviani" <joserz@linux.ibm.com>,
kvm@vger.kernel.org, "Sam Bobroff" <sbobroff@linux.ibm.com>,
"Daniel Henrique Barboza" <danielhb413@gmail.com>,
kvm-ppc@vger.kernel.org,
"Piotr Jaroszynski" <pjaroszynski@nvidia.com>,
"Leonardo Augusto Guimarães Garcia" <lagarcia@br.ibm.com>,
"Reza Arbab" <arbab@linux.ibm.com>,
"David Gibson" <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon
Date: Tue, 30 Apr 2019 15:45:28 +1000 [thread overview]
Message-ID: <5149814.2BROG1NTNO@townsend> (raw)
In-Reply-To: <da41cd35-32f6-043e-13ab-9a225c4e910a@ozlabs.ru>
Alexey,
> >>> +void pnv_try_isolate_nvidia_v100(struct pci_dev *bridge)
> >>> +{
> >>> + u32 mask, val;
> >>> + void __iomem *bar0_0, *bar0_120000, *bar0_a00000;
> >>> + struct pci_dev *pdev;
> >>> + u16 cmd = 0, cmdmask = PCI_COMMAND_MEMORY;
> >>> +
> >>> + if (!bridge->subordinate)
> >>> + return;
> >>> +
> >>> + pdev = list_first_entry_or_null(&bridge->subordinate->devices,
> >>> + struct pci_dev, bus_list);
> >>> + if (!pdev)
> >>> + return;
> >>> +
> >>> + if (pdev->vendor != PCI_VENDOR_ID_NVIDIA)
Don't you also need to check the PCIe devid to match only [PV]100 devices as
well? I doubt there's any guarantee these registers will remain the same for
all future (or older) NVIDIA devices.
IMHO this should really be done in the device driver in the guest. A malcious
guest could load a modified driver that doesn't do this, but that should not
compromise other guests which presumably load a non-compromised driver that
disables the links on that guests GPU. However I guess in practice what you
have here should work equally well.
- Alistair
> >>> + return;
> >>> +
> >>> + mask = nvlinkgpu_get_disable_mask(&pdev->dev);
> >>> + if (!mask)
> >>> + return;
> >>> +
> >>> + bar0_0 = pci_iomap_range(pdev, 0, 0, 0x10000);
> >>> + if (!bar0_0) {
> >>> + pci_err(pdev, "Error mapping BAR0 @0\n");
> >>> + return;
> >>> + }
> >>> + bar0_120000 = pci_iomap_range(pdev, 0, 0x120000, 0x10000);
> >>> + if (!bar0_120000) {
> >>> + pci_err(pdev, "Error mapping BAR0 @120000\n");
> >>> + goto bar0_0_unmap;
> >>> + }
> >>> + bar0_a00000 = pci_iomap_range(pdev, 0, 0xA00000, 0x10000);
> >>> + if (!bar0_a00000) {
> >>> + pci_err(pdev, "Error mapping BAR0 @A00000\n");
> >>> + goto bar0_120000_unmap;
> >>> + }
> >>
> >> Is it really necessary to do three separate ioremaps vs one that would
> >> cover them all here? I suspect you're just sneaking in PAGE_SIZE with
> >> the 0x10000 size mappings anyway. Seems like it would simplify setup,
> >> error reporting, and cleanup to to ioremap to the PAGE_ALIGN'd range
> >> of the highest register accessed. Thanks,
> >
> > Sure I can map it once, I just do not see the point in mapping/unmapping
> > all 0xa10000>>16=161 system pages for a very short period of time while
> > we know precisely that we need just 3 pages.
> >
> > Repost?
>
> Ping?
>
> Can this go in as it is (i.e. should I ping Michael) or this needs
> another round? It would be nice to get some formal acks. Thanks,
>
> >> Alex
> >>
> >>> +
> >>> + pci_restore_state(pdev);
> >>> + pci_read_config_word(pdev, PCI_COMMAND, &cmd);
> >>> + if ((cmd & cmdmask) != cmdmask)
> >>> + pci_write_config_word(pdev, PCI_COMMAND, cmd | cmdmask);
> >>> +
> >>> + /*
> >>> + * The sequence is from "Tesla P100 and V100 SXM2 NVLink Isolation on
> >>> + * Multi-Tenant Systems".
> >>> + * The register names are not provided there either, hence raw values.
> >>> + */
> >>> + iowrite32(0x4, bar0_120000 + 0x4C);
> >>> + iowrite32(0x2, bar0_120000 + 0x2204);
> >>> + val = ioread32(bar0_0 + 0x200);
> >>> + val |= 0x02000000;
> >>> + iowrite32(val, bar0_0 + 0x200);
> >>> + val = ioread32(bar0_a00000 + 0x148);
> >>> + val |= mask;
> >>> + iowrite32(val, bar0_a00000 + 0x148);
> >>> +
> >>> + if ((cmd | cmdmask) != cmd)
> >>> + pci_write_config_word(pdev, PCI_COMMAND, cmd);
> >>> +
> >>> + pci_iounmap(pdev, bar0_a00000);
> >>> +bar0_120000_unmap:
> >>> + pci_iounmap(pdev, bar0_120000);
> >>> +bar0_0_unmap:
> >>> + pci_iounmap(pdev, bar0_0);
> >>> +}
next prev parent reply other threads:[~2019-04-30 5:45 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-11 6:48 [PATCH kernel v3] powerpc/powernv: Isolate NVLinks between GV100GL on Witherspoon Alexey Kardashevskiy
2019-04-11 6:48 ` Alexey Kardashevskiy
2019-04-11 6:48 ` Alexey Kardashevskiy
2019-04-11 16:52 ` Alex Williamson
2019-04-11 16:52 ` Alex Williamson
2019-04-11 16:52 ` Alex Williamson
2019-04-12 3:48 ` Alexey Kardashevskiy
2019-04-12 3:48 ` Alexey Kardashevskiy
2019-04-12 3:48 ` Alexey Kardashevskiy
2019-04-29 6:50 ` Alexey Kardashevskiy
2019-04-29 6:50 ` Alexey Kardashevskiy
2019-04-29 6:50 ` Alexey Kardashevskiy
2019-04-30 5:45 ` Alistair Popple [this message]
2019-04-30 5:45 ` Alistair Popple
2019-04-30 5:45 ` Alistair Popple
2019-04-30 6:14 ` Alexey Kardashevskiy
2019-04-30 6:14 ` Alexey Kardashevskiy
2019-04-30 6:14 ` Alexey Kardashevskiy
2019-04-30 13:20 ` Alex Williamson
2019-04-30 13:20 ` Alex Williamson
2019-04-30 13:20 ` Alex Williamson
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=5149814.2BROG1NTNO@townsend \
--to=alistair@popple.id.au \
--cc=aik@ozlabs.ru \
--cc=alex.williamson@redhat.com \
--cc=arbab@linux.ibm.com \
--cc=danielhb413@gmail.com \
--cc=david@gibson.dropbear.id.au \
--cc=joserz@linux.ibm.com \
--cc=kvm-ppc@vger.kernel.org \
--cc=kvm@vger.kernel.org \
--cc=lagarcia@br.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=pjaroszynski@nvidia.com \
--cc=sbobroff@linux.ibm.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.