linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Jiang Liu <jiang.liu@huawei.com>
To: Don Dutile <ddutile@redhat.com>
Cc: Bjorn Helgaas <bhelgaas@google.com>, Jiang Liu <liuj97@gmail.com>,
	Kenji Kaneshige <kaneshige.kenji@jp.fujitsu.com>,
	Yinghai Lu <yinghai@kernel.org>,
	Taku Izumi <izumi.taku@jp.fujitsu.com>,
	Keping Chen <chenkeping@huawei.com>,
	linux-pci@vger.kernel.org, "Pearson, Greg" <greg.pearson@hp.com>
Subject: Re: [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges
Date: Thu, 26 Apr 2012 12:02:38 +0800	[thread overview]
Message-ID: <4F98C8DE.8030906@huawei.com> (raw)
In-Reply-To: <4F98C298.1010504@redhat.com>

On 2012-4-26 11:35, Don Dutile wrote:
> On 04/25/2012 12:50 PM, Bjorn Helgaas wrote:
>> On Mon, Apr 23, 2012 at 12:50 PM, Don Dutile<ddutile@redhat.com> wrote:
>>> On 04/23/2012 01:41 PM, Bjorn Helgaas wrote:
>>>>
>>>> On Mon, Apr 16, 2012 at 11:54 AM, Don Dutile<ddutile@redhat.com> wrote:
>>>>>
>>>>> On 04/16/2012 12:09 PM, Jiang Liu wrote:
>>>>>>
>>>>>>
>>>>>> Hi Don,
>>>>>> Thanks for your comments and please refer to inline comments
>>>>>> below.
>>>>>
>>>>>
>>>>>
>>>>> Thanks for the info below; couple quick replies below.. - Don
>>>>>
>>>>>
>>>>>> On 04/16/2012 11:30 PM, Don Dutile wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 04/13/2012 10:33 AM, Jiang Liu wrote:
>>>>>>>>
>>>>>>>>
>>>>>>>> On 04/13/2012 06:48 PM, Kenji Kaneshige wrote:
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> (2012/04/12 9:06), Bjorn Helgaas wrote:
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On Wed, Apr 11, 2012 at 9:34 AM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>> wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> On 04/11/2012 08:05 PM, Kenji Kaneshige wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> (2012/04/11 13:02), Bjorn Helgaas wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Tue, Apr 10, 2012 at 6:10 PM, Jiang Liu<liuj97@gmail.com>
>>>>>>>>>>>>> wrote:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> This patchset enhance pci_root driver to update MMCFG
>>>>>>>>>>>>>> information
>>>>>>>>>>>>>> when
>>>>>>>>>>>>>> hot-plugging PCI root bridges. It applies to Yinghai's
>>>>>>>>>>>>>> tree at
>>>>>>>>>>>>>>
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/yinghai/linux-yinghai.git
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> for-pci-root-bus-hotplug
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> The second patch is based on Taku Izumi work with some
>>>>>>>>>>>>>> enhancements to
>>>>>>>>>>>>>> correctly handle PCI host bridges without _CBA method.
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> I'm sorry I won't have time to really review these for a
>>>>>>>>>>>>> couple
>>>>>>>>>>>>> weeks.
>>>>>>>>>>>>>
>>>>>>>>>>>>> It always seemed wrong to me that we parse MCFG and set
>>>>>>>>>>>>> things up
>>>>>>>>>>>>> before we even look at PNP0A03/PNP0A08 devices. It would make
>>>>>>>>>>>>> more
>>>>>>>>>>>>> sense to me to have something in acpi_pci_root_add() to set up
>>>>>>>>>>>>> MMCONFIG using _CBA if available, and falling back to parsing
>>>>>>>>>>>>> MCFG
>>>>>>>>>>>>> if
>>>>>>>>>>>>> it's not.
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> I think your idea could make the code (design) much cleaner.
>>>>>>>>>>>> Do you have any other reason why you think "It always seemed
>>>>>>>>>>>> wrong..."?
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> The current scheme is just an ugly design. Does I need more
>>>>>>>>>> reasons?
>>>>>>>>>> :)
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Ok, I just wanted to know if I'm missing anything we need to
>>>>>>>>> take into account when re-factoring the code.
>>>>>>>>>
>>>>>>>>> By the way, the following code makes me think there could be
>>>>>>>>> some hardwares that need a fixup using mmconfig access before
>>>>>>>>> scanning the PCI tree. If this is the case, we would need
>>>>>>>>> something to enable early mmconfig initialization for those
>>>>>>>>> hardwares.
>>>>>>>>>
>>>>>>>>> static __init int pci_arch_init(void)
>>>>>>>>> {
>>>>>>>>> ...
>>>>>>>>> if (!(pci_probe& PCI_PROBE_NOEARLY))
>>>>>>>>> pci_mmcfg_early_init();
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Regards,
>>>>>>>>> Kenji Kaneshige
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> If MMCFG could be treated as an optional configuration space access
>>>>>>>> method,
>>>>>>>> we can refine the MMCFG code according to Bjorn's suggestion.
>>>>>>>> And as
>>>>>>>> Kenji
>>>>>>>> has mentioned, there may be some risks ahead. So need more
>>>>>>>> confirmation
>>>>>>>> from other PCI experts here.
>>>>>>>>
>>>>>>> I looked at the thread, but didn't know which suggestion of
>>>>>>> Bjorn's you
>>>>>>> were referring to.
>>>>>>> But, mmcfg access to PCI config space is need for any cap structure
>>>>>>> greater than 256 byte offset. A number of devices have cap
>>>>>>> structures
>>>>>>> in this upper PCI config space, esp. SRIOV devices.
>>>>>>> So, if 'optional MMCFG' only means at the beginning of kernel
>>>>>>> scanning
>>>>>>> of
>>>>>>> PCI (pass-0 scanning), that should be ok, but in-depth, pass-1
>>>>>>> scanning
>>>>>>> of PCIe devices may require MMCFG for full functional support.
>>>>>>
>>>>>>
>>>>>> For mainstream systems with support of ACPI and MMCFG, the booting
>>>>>> sequences are about:
>>>>>> 1) Probe for legacy PCI configuration access mechanism, such as
>>>>>> CONF1,
>>>>>> CONF2, BIOS
>>>>>> 2) Start ACPICA/ACPI subsystem with the legacy PCI configuration
>>>>>> access
>>>>>> mechanism
>>>>>> 3) Enumerate PCI root bridges (PNP0A03/PNP0A08) in ACPI namespace and
>>>>>> bind
>>>>>> pci_root
>>>>>> driver to them
>>>>>> 4) pci_root driver calls into arch code to add MMCFG information
>>>>>> for the
>>>>>> host bridge
>>>>>> 5) pci_root driver calls PCI core to enumerate all PCI devices
>>>>>> under the
>>>>>> host bridge
>>>>>>
>>>>>> The above flow should work for SRIOV case. But still need to check
>>>>>> following cases:
>>>>>> 1) ACPICA/ACPI subsystem has no dependency on MMCFG
>>>>>> 2) Systems implementing SFI instead of ACPI work as expected
>>>>>> 3) ACPI has been disabled by user (Bjorn points out we could
>>>>>> ignore this
>>>>>> case)
>>>>>
>>>>>
>>>>> Agreed. My least favorite bz: "I set boot param to noacpi and can't
>>>>> scan
>>>>> entire PCI space.... duh!
>>>>>
>>>>>
>>>>>> 4) Some host bridges are not reported by ACPI (Bjorn points out we
>>>>>> should
>>>>>> eventually
>>>>>> get rid of the blind probing logic)
>>>>>
>>>>>
>>>>> And depend on BIOS-ACPI to be correct all the time?
>>>>> ....hahahahahaha ...
>>>>> sorry.... you hit my funny bone! ;-)
>>>>> Is blind probing problematic ?
>>>>> Seems like a pci-fixup/quirk can be implemented under arch/<>/pci to
>>>>> handle
>>>>> these cases, and thus, depend on ACPI for host-bridge info... wait!
>>>>> did I
>>>>> just
>>>>> say depend on ACPI?!?! :)
>>>>
>>>>
>>>> Hope your funny bone has stopped tingling by now :)
>>>>
>>> Not when ACPI's always there to bang it again! ;-)
>>>
>>>> When we probe blindly, we don't know what resources are available on
>>>> the bus (except for AMD systems). Therefore, we can't do reliable
>>>> assignment, and we have to rely on whatever the BIOS did.
>>>>
>>>> Blind probing finds devices not exposed by the BIOS. This might be a
>>>> BIOS bug, or it might be a conscious decision to hide the devices from
>>>> the OS. Some OEMs hide devices to reduce the likelihood of users
>>>> messing things up with setpci.
>>>>
>>>> It would be interesting and relatively easy to figure out whether
>>>> Windows ever discovers a device behind an unreported host bridge. My
>>>> guess is "no," but I haven't had time to verify this.
>>>>
>>> Well, call me crazy(again!), but why put a host-bridge device on a
>>> system
>>> and then (try to) hide it with software(BIOS/ACPI) ?
>>> Sounds like a recipe for disaster if the OS tries to
>>> reconfigure address space or bus-number space.....
>>
>> Some OEMs do leave host bridges unreported. Here's an example, from
>> an HP DL380 G7:
>>
> I didn't doubt they tried to.... again, why!?!
>
>> ACPI: PCI Root Bridge [PCI0] (domain 0000 [bus 00-1a])
>> pci 0000:00:00.0: [8086:3406] type 0 class 0x000600
>> pci 0000:00:01.0: [8086:3408] type 1 class 0x000604
>> pci 0000:00:02.0: [8086:3409] type 1 class 0x000604
>> ...
>> PCI: Discovered peer bus 3e
>> pci 0000:3e:00.0: [8086:2c70] type 0 class 0x000600
>> pci 0000:3e:00.1: [8086:2d81] type 0 class 0x000600
>> pci 0000:3e:02.0: [8086:2d90] type 0 class 0x000600
>> ...
>> PCI: Discovered peer bus 3f
>> pci 0000:3f:00.0: [8086:2c70] type 0 class 0x000600
>> pci 0000:3f:00.1: [8086:2d81] type 0 class 0x000600
>> pci 0000:3f:02.0: [8086:2d90] type 0 class 0x000600
>>
>> ACPI did not report host bridges leading to buses 3e and 3f; we found
>> those devices by probing blindly.
>>
>> In this case, these are Intel CPU uncore devices, and they don't
>> consume MMIO or IO port resources. But you're absolutely right that
>> we can't safely reconfigure anything we find this way.
>>
>>> Hiding a PCI device is as simple as not loading its driver... :)
>>
>> Sure, if it's the OS that wants to hide it. In this case, it's the
>> OEM that wants to hide it, and the only mechanism for doing that is to
>> refrain from telling the OS how to find it. I'm pretty confident that
>> the DL380 omission of those host bridges is an intentional choice by
>> the BIOS writers.
>>
>> Bjorn
>
> but this kind of hiding will break code paths/logic like pci=assign-busses,
> b/c if the hidden bus nums aren't registered, and they are used by
> the assign-bus code, then duplicate busses will have overlapping
> sec-num/sub-num
> ranges and then there are two bridges that will race (or as I
> experienced, hang)
> to reply with config cycles in the overlapping range.
> so, scanning & registering (but not configuring) 'hidden' (non-ACPI-id'd)
> bridges/busses sounds like something the pci code has to do.

This is really a issue. If we could distinguish segment 0 and other
segments, things may become much more simple. Segment 0 may be treated
specially for compatibility issues.



      parent reply	other threads:[~2012-04-26  4:04 UTC|newest]

Thread overview: 26+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-04-11  0:10 [PATCH V4 0/6] PCI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
2012-04-11  0:10 ` [PATCH V4 1/6] PCI, x86: split out pci_mmcfg_check_reserved() for code reuse Jiang Liu
2012-04-11  0:10 ` [PATCH V4 2/6] PCI, x86: split out pci_mmconfig_alloc() " Jiang Liu
2012-04-11  0:11 ` [PATCH V4 3/6] PCI, x86: use RCU list to protect mmconfig list Jiang Liu
2012-04-11  0:11 ` [PATCH V4 4/6] PCI, x86: introduce pci_mmcfg_arch_map()/pci_mmcfg_arch_unmap() Jiang Liu
2012-04-11  0:11 ` [PATCH V4 5/6] PCI, x86: introduce pci_mmconfig_insert()/delete() for PCI root bridge hotplug Jiang Liu
2012-04-11  0:11 ` [PATCH V4 6/6] PCI, ACPI, x86: update MMCFG information when hot-plugging PCI host bridges Jiang Liu
2012-04-18  6:47   ` Taku Izumi
2012-04-18  7:49     ` Jiang Liu
2012-04-19  6:49       ` Taku Izumi
2012-04-19  7:04         ` Jiang Liu
2012-04-11  4:02 ` [PATCH V4 0/6] PCI, " Bjorn Helgaas
2012-04-11 12:05   ` Kenji Kaneshige
2012-04-11 15:34     ` Jiang Liu
2012-04-12  0:06       ` Bjorn Helgaas
2012-04-13 10:48         ` Kenji Kaneshige
2012-04-13 14:33           ` Jiang Liu
2012-04-16 15:30             ` Don Dutile
2012-04-16 16:09               ` Jiang Liu
2012-04-16 17:54                 ` Don Dutile
2012-04-23 17:41                   ` Bjorn Helgaas
2012-04-23 18:50                     ` Don Dutile
2012-04-25 16:50                       ` Bjorn Helgaas
2012-04-26  3:35                         ` Don Dutile
2012-04-26  3:53                           ` Jiang Liu
2012-04-26  4:02                           ` Jiang Liu [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4F98C8DE.8030906@huawei.com \
    --to=jiang.liu@huawei.com \
    --cc=bhelgaas@google.com \
    --cc=chenkeping@huawei.com \
    --cc=ddutile@redhat.com \
    --cc=greg.pearson@hp.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=kaneshige.kenji@jp.fujitsu.com \
    --cc=linux-pci@vger.kernel.org \
    --cc=liuj97@gmail.com \
    --cc=yinghai@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).