linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
From: Alexey Kardashevskiy <aik@ozlabs.ru>
To: Gavin Shan <gwshan@linux.vnet.ibm.com>
Cc: linuxppc-dev@lists.ozlabs.org,
	Benjamin Herrenschmidt <benh@kernel.crashing.org>,
	Daniel Axtens <dja@axtens.net>,
	David Gibson <david@gibson.dropbear.id.au>
Subject: Re: [PATCH kernel 2/2] powerpc/powernv/ioda2: Delay PE disposal
Date: Tue, 26 Apr 2016 12:29:21 +1000	[thread overview]
Message-ID: <571ED281.7070607@ozlabs.ru> (raw)
In-Reply-To: <571846E9.80800@ozlabs.ru>

On 04/21/2016 01:20 PM, Alexey Kardashevskiy wrote:
> On 04/21/2016 10:21 AM, Gavin Shan wrote:
>> On Fri, Apr 08, 2016 at 04:36:44PM +1000, Alexey Kardashevskiy wrote:
>>> When SRIOV is disabled, the existing code presumes there is no
>>> virtual function (VF) in use and destroys all associated PEs.
>>> However it is possible to get into the situation when the user
>>> activated SRIOV disabling while a VF is still in use via VFIO.
>>> For example, unbinding a physical function (PF) while there is a guest
>>> running with a VF passed throuhgh via VFIO will trigger the bug.
>>>
>>> This defines an IODA2-specific IOMMU group release() callback.
>>> This moves all the disposal code from pnv_ioda_release_vf_PE() to this
>>> new callback so the cleanup happens when the last user of an IOMMU
>>> group released the reference.
>>>
>>> As pnv_pci_ioda2_release_dma_pe() was reduced to just calling
>>> iommu_group_put(), this merges pnv_pci_ioda2_release_dma_pe()
>>> into pnv_ioda_release_vf_PE().
>>>
>>
>> Sorry, I don't understand how it works. When PF's driver disables
>> IOV capability, the VF cannnot work. The guest is unlikely to know
>> that and still continue accessing the VF's resources (e.g. config
>> space and MMIO registers). It would cause EEH errors.
>
> The host disables IOV which removes VF devices which unbinds vfio_pci
> driver and does all the cleanup, eventually we get to QEMU's
> vfio_req_notifier_handler() and PCI hot unplug is initiated and the device
> disappears from the guest.
>
> If the guest cannot do PCI hotunplug, then EEH will make host stop it anyway.
>
> Here we do not really care what happens to the guest (it can detect EEH or
> hotunplug or simply crash), we need to make sure that the _host_ does not
> crash in any case because the root user did something weird.


Ping?

>
>
>>
>>> Signed-off-by: Alexey Kardashevskiy <aik@ozlabs.ru>
>>> ---
>>> arch/powerpc/platforms/powernv/pci-ioda.c | 33
>>> +++++++++++++------------------
>>> 1 file changed, 14 insertions(+), 19 deletions(-)
>>>
>>> diff --git a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> index ce9f2bf..8108c54 100644
>>> --- a/arch/powerpc/platforms/powernv/pci-ioda.c
>>> +++ b/arch/powerpc/platforms/powernv/pci-ioda.c
>>> @@ -1333,27 +1333,25 @@ static void pnv_pci_ioda2_set_bypass(struct
>>> pnv_ioda_pe *pe, bool enable);
>>> static void pnv_pci_ioda2_group_release(void *iommu_data)
>>> {
>>>     struct iommu_table_group *table_group = iommu_data;
>>> +    struct pnv_ioda_pe *pe = container_of(table_group,
>>> +            struct pnv_ioda_pe, table_group);
>>> +    struct pci_controller *hose = pci_bus_to_host(pe->parent_dev->bus);
>>
>> pe->parent_dev would be NULL for non-VF-PEs and it's protected by
>> CONFIG_PCI_IOV
>> in pci.h.
>
>
> Yeah, I'll fix it.
>
>>
>>> +    struct pnv_phb *phb = hose->private_data;
>>> +    struct iommu_table *tbl = pe->table_group.tables[0];
>>> +    int64_t rc;
>>>
>>> -    table_group->group = NULL;
>>> -}
>>> -
>>> -static void pnv_pci_ioda2_release_dma_pe(struct pci_dev *dev, struct
>>> pnv_ioda_pe *pe)
>>> -{
>>> -    struct iommu_table    *tbl;
>>> -    int64_t               rc;
>>> -
>>> -    tbl = pe->table_group.tables[0];
>>>     rc = pnv_pci_ioda2_unset_window(&pe->table_group, 0);
>>>     if (rc)
>>>         pe_warn(pe, "OPAL error %ld release DMA window\n", rc);
>>>
>>>     pnv_pci_ioda2_set_bypass(pe, false);
>>> -    if (pe->table_group.group) {
>>> -        iommu_group_put(pe->table_group.group);
>>> -        BUG_ON(pe->table_group.group);
>>> -    }
>>> +
>>> +    BUG_ON(!tbl);
>>>     pnv_pci_ioda2_table_free_pages(tbl);
>>> -    iommu_free_table(tbl, of_node_full_name(dev->dev.of_node));
>>> +    iommu_free_table(tbl, of_node_full_name(pe->parent_dev->dev.of_node));
>>> +
>>> +    pnv_ioda_deconfigure_pe(phb, pe);
>>> +    pnv_ioda_free_pe(phb, pe->pe_number);
>>> }
>>
>> It's not correct enough. One PE is comprised of DMA, MMIO, mapping info etc.
>> This function disposes all of them when DMA finishes its job. I don't figure
>> out a better way to represent all of them and their relationship. I guess
>> it's
>> worthy to have something in long term though it's not trival work.
>
>
> Sorry, I am missing your point here. I am not changing the resource
> deallocation here, I am just doing it slightly later and all I wonder at
> the moment is if there are races - like having 2 scripts - one doing unbind
> PF and another doing bind PF - will this crash the host in theory?
>
>
>>
>>>
>>> static void pnv_ioda_release_vf_PE(struct pci_dev *pdev)
>>> @@ -1376,16 +1374,13 @@ static void pnv_ioda_release_vf_PE(struct
>>> pci_dev *pdev)
>>>         if (pe->parent_dev != pdev)
>>>             continue;
>>>
>>> -        pnv_pci_ioda2_release_dma_pe(pdev, pe);
>>> -
>>>         /* Remove from list */
>>>         mutex_lock(&phb->ioda.pe_list_mutex);
>>>         list_del(&pe->list);
>>>         mutex_unlock(&phb->ioda.pe_list_mutex);
>>>
>>> -        pnv_ioda_deconfigure_pe(phb, pe);
>>> -
>>> -        pnv_ioda_free_pe(phb, pe->pe_number);
>>> +        if (pe->table_group.group)
>>> +            iommu_group_put(pe->table_group.group);
>>>     }
>>> }
>>>
>>> --
>>> 2.5.0.rc3
>>>
>>
>
>


-- 
Alexey

  reply	other threads:[~2016-04-26  2:29 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-04-08  6:36 [PATCH kernel 0/2] powerpc/powernv: Fix crash on PF unbind when VF is passed Alexey Kardashevskiy
2016-04-08  6:36 ` [PATCH kernel 1/2] powerpc/iommu: Get rid of default group_release() Alexey Kardashevskiy
2016-04-08  7:14   ` kbuild test robot
2016-04-14  1:35   ` David Gibson
2016-04-21  0:02   ` Gavin Shan
2016-04-21  3:17     ` Alexey Kardashevskiy
2016-04-08  6:36 ` [PATCH kernel 2/2] powerpc/powernv/ioda2: Delay PE disposal Alexey Kardashevskiy
2016-04-14  1:40   ` David Gibson
2016-04-15  1:29     ` Alexey Kardashevskiy
2016-04-15  2:26       ` David Gibson
2016-04-21  0:21   ` Gavin Shan
2016-04-21  3:20     ` Alexey Kardashevskiy
2016-04-26  2:29       ` Alexey Kardashevskiy [this message]
2016-04-27  1:07       ` Gavin Shan

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=571ED281.7070607@ozlabs.ru \
    --to=aik@ozlabs.ru \
    --cc=benh@kernel.crashing.org \
    --cc=david@gibson.dropbear.id.au \
    --cc=dja@axtens.net \
    --cc=gwshan@linux.vnet.ibm.com \
    --cc=linuxppc-dev@lists.ozlabs.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).