All of lore.kernel.org
 help / color / mirror / Atom feed
From: Alistair Popple <alistair@popple.id.au>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: Reza Arbab <arbab@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org,
	Sam Bobroff <sam.bobroff@au1.ibm.com>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2
Date: Fri, 19 Oct 2018 12:47:05 +1100	[thread overview]
Message-ID: <513336724.BjiWrP5dVC@townsend> (raw)
In-Reply-To: <87caa06c-0f67-f2f1-6bc0-215f5c672bd8@ozlabs.ru>

> >>> wouldn't you also need to do that somewhere? Unless the driver
> >>> does it at startup?
> >> 
> >> VFIO performs GPU reset so I'd expect the GPUs to flush its caches
> >> without any software interactions. Am I hoping for too much here?
> > 
> > Sadly you are. It's not the GPU caches that need flushing, it's the CPU
> > caches. This needs to happen as part of the reset sequence, so I guess
> > you would need to add it to the VFIO driver.
> 
> Well, ok. Caches need flushing, will look into this but this fencing is
> still needed, is not it?

Yes. Although without the flushing I think you may get HMI's on any subsequent 
driver loads.

So from the point of view of what happens on the Skiboot/HW side this looks ok 
so long as all links on an NPU are assigned to the same guest (as this call 
resets every link on a given NPU).

Acked-by: Alistair Popple <alistair@popple.id.au>
 
> > - Alistair
> > 
> >>> - Alistair
> >>> 
> >>>>> - Alistair
> >>>>> 
> >>>>>>> - Alistair
> >>>>>>> 
> >>>>>>>>> - Alistair
> >>>>>>>>> 
> >>>>>>>>> On Monday, 15 October 2018 6:17:51 PM AEDT Alexey Kardashevskiy
> > 
> > wrote:
> >>>>>>>>>> Ping?
> >>>>>>>>>> 
> >>>>>>>>>> On 02/10/2018 13:20, Alexey Kardashevskiy wrote:
> >>>>>>>>>>> The skiboot firmware has a hot reset handler which fences the
> >>>>>>>>>>> NVIDIA V100
> >>>>>>>>>>> GPU RAM on Witherspoons and makes accesses no-op instead of
> >>>>>>>>>>> throwing HMIs:
> >>>>>>>>>>> https://github.com/open-power/skiboot/commit/fca2b2b839a67
> >>>>>>>>>>> 
> >>>>>>>>>>> Now we are going to pass V100 via VFIO which most certainly
> >>>>>>>>>>> involves
> >>>>>>>>>>> KVM guests which are often terminated without getting a chance
> >>>>>>>>>>> to
> >>>>>>>>>>> offline
> >>>>>>>>>>> GPU RAM so we end up with a running machine with misconfigured
> >>>>>>>>>>> memory.
> >>>>>>>>>>> Accessing this memory produces hardware management interrupts
> >>>>>>>>>>> (HMI)
> >>>>>>>>>>> which bring the host down.
> >>>>>>>>>>> 
> >>>>>>>>>>> To suppress HMIs, this wires up this hot reset hook to
> >>>>>>>>>>> vfio_pci_disable()
> >>>>>>>>>>> via pci_disable_device() which switches NPU2 to a safe mode and
> >>>>>>>>>>> prevents
> >>>>>>>>>>> HMIs.
> >>>>>>>>>>> 
> >>>>>>>>>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
> >>>>>>>>>>> ---
> >>>>>>>>>>> Changes:
> >>>>>>>>>>> v2:
> >>>>>>>>>>> * updated the commit log
> >>>>>>>>>>> ---
> >>>>>>>>>>> 
> >>>>>>>>>>>  arch/powerpc/platforms/powernv/pci-ioda.c | 10 ++++++++++
> >>>>>>>>>>>  1 file changed, 10 insertions(+)
> >>>>>>>>>>> 
> >>>>>>>>>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>>>>>> b/arch/powerpc/platforms/powernv/pci-ioda.c index
> >>>>>>>>>>> cde7102..e37b9cc 100644
> >>>>>>>>>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>>>>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
> >>>>>>>>>>> @@ -3688,6 +3688,15 @@ static void pnv_pci_release_device(struct
> >>>>>>>>>>> pci_dev *pdev)>>>>>>>>>
> >>>>>>>>>>> 
> >>>>>>>>>>>  		pnv_ioda_release_pe(pe);
> >>>>>>>>>>>  
> >>>>>>>>>>>  }
> >>>>>>>>>>> 
> >>>>>>>>>>> +static void pnv_npu_disable_device(struct pci_dev *pdev)
> >>>>>>>>>>> +{
> >>>>>>>>>>> +	struct eeh_dev *edev = pci_dev_to_eeh_dev(pdev);
> >>>>>>>>>>> +	struct eeh_pe *eehpe = edev ? edev->pe : NULL;
> >>>>>>>>>>> +
> >>>>>>>>>>> +	if (eehpe && eeh_ops && eeh_ops->reset)
> >>>>>>>>>>> +		eeh_ops->reset(eehpe, EEH_RESET_HOT);
> >>>>>>>>>>> +}
> >>>>>>>>>>> +
> >>>>>>>>>>> 
> >>>>>>>>>>>  static void pnv_pci_ioda_shutdown(struct pci_controller *hose)
> >>>>>>>>>>>  {
> >>>>>>>>>>>  
> >>>>>>>>>>>  	struct pnv_phb *phb = hose->private_data;
> >>>>>>>>>>> 
> >>>>>>>>>>> @@ -3732,6 +3741,7 @@ static const struct pci_controller_ops
> >>>>>>>>>>> pnv_npu_ioda_controller_ops = {>>>>>>>>>
> >>>>>>>>>>> 
> >>>>>>>>>>>  	.reset_secondary_bus	= pnv_pci_reset_secondary_bus,
> >>>>>>>>>>>  	.dma_set_mask		= pnv_npu_dma_set_mask,
> >>>>>>>>>>>  	.shutdown		= pnv_pci_ioda_shutdown,
> >>>>>>>>>>> 
> >>>>>>>>>>> +	.disable_device		= pnv_npu_disable_device,
> >>>>>>>>>>> 
> >>>>>>>>>>>  };
> >>>>>>>>>>>  
> >>>>>>>>>>>  static const struct pci_controller_ops
> >>>>>>>>>>>  pnv_npu_ocapi_ioda_controller_ops = {



      reply	other threads:[~2018-10-19  1:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-10-02  3:20 [PATCH kernel v2] powerpc/ioda/npu: Call skiboot's hot reset hook when disabling NPU2 Alexey Kardashevskiy
2018-10-15  7:17 ` Alexey Kardashevskiy
2018-10-16  0:38   ` Alistair Popple
2018-10-16  1:37     ` Alexey Kardashevskiy
2018-10-16  1:44       ` Alistair Popple
2018-10-16  2:02         ` Alexey Kardashevskiy
2018-10-16  2:19           ` Alistair Popple
2018-10-16  2:22             ` Alexey Kardashevskiy
2018-10-16  7:32               ` Alistair Popple
2018-10-16  7:55                 ` Alexey Kardashevskiy
2018-10-18  1:05                   ` Alistair Popple
2018-10-19  1:20                     ` Alexey Kardashevskiy
2018-10-19  1:47                       ` Alistair Popple [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=513336724.BjiWrP5dVC@townsend \
    --to=alistair@popple.id.au \
    --cc=aik@ozlabs.ru \
    --cc=arbab@linux.ibm.com \
    --cc=david@gibson.dropbear.id.au \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=sam.bobroff@au1.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.