From: Wei Yang <weiyang@linux.vnet.ibm.com>
To: Alexey Kardashevskiy <aik@ozlabs.ru>
Cc: linuxppc-dev@lists.ozlabs.org,
Wei Yang <weiyang@linux.vnet.ibm.com>,
Alexey Kardashevskiy <aik@au1.ibm.com>,
gwshan@linux.vnet.ibm.com
Subject: Re: [PATCH 1/2] powerpc/powernv: reduce multi-hit of iommu_add_device()
Date: Wed, 30 Apr 2014 09:31:04 +0800 [thread overview]
Message-ID: <20140430013104.GA4755@richard> (raw)
In-Reply-To: <535FA4E9.9050403@ozlabs.ru>
On Tue, Apr 29, 2014 at 11:11:05PM +1000, Alexey Kardashevskiy wrote:
>On 04/29/2014 07:37 PM, Wei Yang wrote:
>> On Tue, Apr 29, 2014 at 05:55:48PM +1000, Alexey Kardashevskiy wrote:
>>> On 04/29/2014 04:49 PM, Wei Yang wrote:
>>>> On Mon, Apr 28, 2014 at 11:35:32PM +1000, Alexey Kardashevskiy wrote:
>>>>> On 04/23/2014 12:26 PM, Wei Yang wrote:
>>>>>> During the EEH hotplug event, iommu_add_device() will be invoked three times
>>>>>> and two of them will trigger warning or error.
>>>>>>
>>>>>> The three times to invoke the iommu_add_device() are:
>>>>>>
>>>>>> pci_device_add
>>>>>> ...
>>>>>> set_iommu_table_base_and_group <- 1st time, fail
>>>>>> device_add
>>>>>> ...
>>>>>> tce_iommu_bus_notifier <- 2nd time, succees
>>>>>> pcibios_add_pci_devices
>>>>>> ...
>>>>>> pcibios_setup_bus_devices <- 3rd time, re-attach
>>>>>>
>>>>>> The first time fails, since the dev->kobj->sd is not initialized. The
>>>>>> dev->kobj->sd is initialized in device_add().
>>>>>> The third time's warning is triggered by the re-attach of the iommu_group.
>>>>>>
>>>>>> After applying this patch, the error
>>>>>
>>>>> Nack.
>>>>>
>>>>> The patch still seems incorrect and we actually need to remove
>>>>> tce_iommu_bus_notifier completely as pcibios_setup_bus_devices is called
>>>> >from another notifier anyway. Could you please test it?
>>>>>
>>>>>
>>>>
>>>> Hi, Alexey,
>>>>
>>>> Nice to see your comment. Let me show what I got fist.
>>>>
>>>> Generally, when kernel enumerate on the pci device, following functions will
>>>> be invoked.
>>>>
>>>> pci_device_add
>>>> pcibios_setup_bus_device
>>>> ...
>>>> set_iommu_table_base_and_group
>>>> device_add
>>>> ...
>>>> tce_iommu_bus_notifier
>>>> pcibios_fixp_bus
>>>> pcibios_add_pci_devices
>>>> ...
>>>> pcibios_setup_bus_devices
>>>>
>>>> >From the call flow, we see for a normall pci device, the
>>>> pcibios_setup_bus_device() will be invoked twice.
>>>
>>>
>>> tce_iommu_bus_notifier should not be called for devices during boot-time
>>> PCI discovery as the discovery is done from subsys_initcall handler and
>>> tce_iommu_bus_notifier is registered as subsys_initcall_sync. Are you sure
>>> this is happening? You should see warnings as for PF's EEH but you do not
>>> say that.
>>>
>>
>> Let me make it clearer. I list the call flow for general purpose to show the
>> sequence of the call flow.
>>
>> The tce_iommu_bus_notifier is not invoked at the boot time. And none of them
>> do real thing at boot time.
>>
>> I don't understand your last sentence. I see warning and error during EEH
>> hotplug. If necessary, I will add this in the commit log.
>>
>>>
>>>> At the bootup time, none of them succeed to setup the dma, since the PE is not
>>>> assigned or the tbl is not set. The iommu tbl and group is setup in
>>>> pnv_pci_ioda_setup_DMA().
>>>>
>>>> This call flow maintains the same when EEH error happens on Bus PE, while the
>>>> behavior is a little different.
>>>>
>>>> pci_device_add
>>>> pcibios_setup_bus_device
>>>> ...
>>>> set_iommu_table_base_and_group <- fail, kobj->sd is not initialized
>>>
>>>
>>> Sorry, what is uninitialized? The only kobj here is the one in iommu_group
>>> and it must have been taken care of before adding a device.
>>
>> pci_dev->dev->kobj->sd.
>>
>> I have explained this in previous discussion. This kobj->sd is initialized in
>> device_add().
>>
>>>
>>>
>>>> device_add
>>>> ...
>>>> tce_iommu_bus_notifier <- succeed
>>>> pcibios_fixp_bus
>>>> pcibios_add_pci_devices
>>>> ...
>>>> pcibios_setup_bus_devices <- warning, re-attach
>>>
>>>
>>> This is why I am suggesting getting rid of tce_iommu_bus_notifier - we will
>>> avoid the warning.
>>
>> Then how about the first time's error?
>
>
>What do you call "first time error" here?
>
Would you please take a look at my commit log?
Currently, the iommu_group will be added three times. The error happens at the
first time we try to attatch the iommu_group in pci_device_add(). And yes,
this happens at the EEH recovery on PF for a hotplug case.
The error is:
iommu_tce: 0003:05:00.0 has not been added, ret=-14
And the reason is:
pci_dev->dev->kobj->sd is not initialized.
>
>>>> While this call flow will change a little on a VF. For a VF,
>>>> pcibios_fixp_bus() will not be invoked. Current behavior is this.
>>>
>>>
>>> You meant pcibios_fixup_bus? I would expect it to get called (from
>>> pci_rescan_bus() or something like that?) and this seems to be the problem
>>> here.
>>
>> When EEH happens and hotplug the pci bus, we need to do the pci_scan_bridge()
>> which will do the pcibios_fixup_bus().
>
>Are you talking now about EEH on PF (physical function) or EEH on VF
>(virtual function)?
>
>Are you calling pci_scan_bridge()
>
Yes, it is pcibios_add_pci_devices() do the hotplug and invoke the
pci_scan_bridge().
>
>> So you suggest to remove it? This is the generic code.
>
>
>I suggest removing tce_iommu_bus_notifier only. It must go. It was my
>mistake at the first place.
>
>
>>> How are VFs added in terms of code flow? Is there any git tree to look at?
>>>
>>
>> VF add code flow is a generic code in drivers/pci/iov.c, virtfn_add().
>
>virtfn_add -> pci_device_add -> pcibios_add_device -> pcibios_setup_device
>-> ppc_md.pci_dma_dev_setup -> pnv_pci_dma_dev_setup ->
>pnv_pci_ioda_dma_dev_setup ->
>set_iommu_table_base.
>
>
>What part of this is missing?
>
Currently, you will do set_iommu_table_base_and_group() in
pnv_pci_ioda_dma_dev_setup(). And this will fail and return an error.
The error reason is the kobj->sd is not initialized yet.
I hope it is clear this time.
>
>>
>>>
>>>
>>>> pci_device_add
>>>> pcibios_setup_bus_device
>>>> ...
>>>> set_iommu_table_base_and_group <- fail, kobj->sd is not initialized
>>>> device_add
>>>> ...
>>>> tce_iommu_bus_notifier <- succeed
>>>>
>>>> And if an EEH error happens just on a VF, I believe the same call flow should
>>>> go for recovery. (This is not set down, still under discussion with Gavin.)
>>>>
>>>> My conclusion is:
>>>> 1. remove the tce_iommu_bus_notifier completely will make the VF not work.
>>>> So I choose to revert the code and attach make the iommu group attachment
>>>> just happens in one place.
>>>>
>>>> BTW, I know my patch is not a perfect one. For a PF, the tbl will still be set
>>>> twice. I am not sure why we need to invoke pcibios_setup_device() twice on a
>>>> PF, maybe some platform need to fixup some thing after the pci_bus is added.
>>>> So I don't remove one of them to solve the problem.
>>>>
>>>> If you have a better idea, I am glad to take it.
>
>
>
>--
>Alexey
--
Richard Yang
Help you, Help me
next prev parent reply other threads:[~2014-04-30 1:31 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2014-04-23 2:26 [PATCH 1/2] powerpc/powernv: reduce multi-hit of iommu_add_device() Wei Yang
2014-04-23 2:26 ` [PATCH 2/2] powerpc/powernv: release the refcount for pci_dev Wei Yang
2014-04-28 13:35 ` [PATCH 1/2] powerpc/powernv: reduce multi-hit of iommu_add_device() Alexey Kardashevskiy
2014-04-29 6:49 ` Wei Yang
2014-04-29 7:55 ` Alexey Kardashevskiy
2014-04-29 9:37 ` Wei Yang
2014-04-29 13:11 ` Alexey Kardashevskiy
2014-04-30 1:31 ` Wei Yang [this message]
2014-04-30 0:28 ` Gavin Shan
2014-04-30 3:28 ` Wei Yang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20140430013104.GA4755@richard \
--to=weiyang@linux.vnet.ibm.com \
--cc=aik@au1.ibm.com \
--cc=aik@ozlabs.ru \
--cc=gwshan@linux.vnet.ibm.com \
--cc=linuxppc-dev@lists.ozlabs.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.