From: poza@codeaurora.org
To: okaya@codeaurora.org
Cc: Lukas Wunner <lukas@wunner.de>,
linux-pci@vger.kernel.org, linux-arm-msm@vger.kernel.org,
linux-arm-kernel@lists.infradead.org,
Bjorn Helgaas <bhelgaas@google.com>,
Keith Busch <keith.busch@intel.com>,
open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
Date: Tue, 03 Jul 2018 18:41:33 +0530 [thread overview]
Message-ID: <9e871cc3978fbdca12ccf8a91f34ad07@codeaurora.org> (raw)
In-Reply-To: <8b6ce0f415858463d1c0588c29e30415@codeaurora.org>
On 2018-07-03 17:00, okaya@codeaurora.org wrote:
> On 2018-07-03 04:34, Lukas Wunner wrote:
>> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>>> If a bridge supports hotplug and observes a PCIe fatal error, the
>>> following
>>> events happen:
>>>
>>> 1. AER driver removes the devices from PCI tree on fatal error
>>> 2. AER driver brings down the link by issuing a secondary bus reset
>>> waits
>>> for the link to come up.
>>> 3. Hotplug driver observes a link down interrupt
>>> 4. Hotplug driver tries to remove the devices waiting for the rescan
>>> lock
>>> but devices are already removed by the AER driver and AER driver is
>>> waiting
>>> for the link to come back up.
>>> 5. AER driver tries to re-enumerate devices after polling for the
>>> link
>>> state to go up.
>>> 6. Hotplug driver obtains the lock and tries to remove the devices
>>> again.
>>>
>>> If a bridge is a hotplug capable bridge, mask hotplug interrupts
>>> before the
>>> reset and unmask afterwards.
>>
>> Would it work for you if you just amended the AER driver to skip
>> removal and re-enumeration of devices if the port is a hotplug bridge?
>> Just check for is_hotplug_bridge in struct pci_dev.
>
> The reason why we want to remove devices before secondary bus reset is
> to quiesce pcie bus traffic before issuing a reset.
>
> Skipping this step might cause transactions to be lost in the middle
> of the reset as there will be active traffic flowing and drivers will
> suddenly start reading ffs.
>
> I don't think we can skip this step.
>
what if we only have conditional enumeration ? (leaving removing
devices followed by SBR as is) ?
following code is doing little more extra work than our normal ERR_FATAL
path.
pciehp_unconfigure_device doing little more than enumeration to
quiescence the bus.
/*
* Ensure that no new Requests will be generated from
* the device.
*/
if (presence) {
pci_read_config_word(dev, PCI_COMMAND, &command);
command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
command |= PCI_COMMAND_INTX_DISABLE;
pci_write_config_word(dev, PCI_COMMAND, command);
}
>
>>
>> That would seem like a much simpler solution, given that it is known
>> that the link will flap on reset, causing the hotplug driver to remove
>> and re-enumerate devices. That would also cover cases where hotplug
>> is
>> handled by a different driver than pciehp, or by the platform
>> firmware.
>>
>> Thanks,
>>
>> Lukas
WARNING: multiple messages have this Message-ID (diff)
From: poza@codeaurora.org
To: okaya@codeaurora.org
Cc: linux-pci@vger.kernel.org,
open list <linux-kernel@vger.kernel.org>,
Keith Busch <keith.busch@intel.com>,
Lukas Wunner <lukas@wunner.de>,
linux-arm-msm@vger.kernel.org,
Bjorn Helgaas <bhelgaas@google.com>,
linux-arm-kernel@lists.infradead.org
Subject: Re: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
Date: Tue, 03 Jul 2018 18:41:33 +0530 [thread overview]
Message-ID: <9e871cc3978fbdca12ccf8a91f34ad07@codeaurora.org> (raw)
In-Reply-To: <8b6ce0f415858463d1c0588c29e30415@codeaurora.org>
On 2018-07-03 17:00, okaya@codeaurora.org wrote:
> On 2018-07-03 04:34, Lukas Wunner wrote:
>> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>>> If a bridge supports hotplug and observes a PCIe fatal error, the
>>> following
>>> events happen:
>>>
>>> 1. AER driver removes the devices from PCI tree on fatal error
>>> 2. AER driver brings down the link by issuing a secondary bus reset
>>> waits
>>> for the link to come up.
>>> 3. Hotplug driver observes a link down interrupt
>>> 4. Hotplug driver tries to remove the devices waiting for the rescan
>>> lock
>>> but devices are already removed by the AER driver and AER driver is
>>> waiting
>>> for the link to come back up.
>>> 5. AER driver tries to re-enumerate devices after polling for the
>>> link
>>> state to go up.
>>> 6. Hotplug driver obtains the lock and tries to remove the devices
>>> again.
>>>
>>> If a bridge is a hotplug capable bridge, mask hotplug interrupts
>>> before the
>>> reset and unmask afterwards.
>>
>> Would it work for you if you just amended the AER driver to skip
>> removal and re-enumeration of devices if the port is a hotplug bridge?
>> Just check for is_hotplug_bridge in struct pci_dev.
>
> The reason why we want to remove devices before secondary bus reset is
> to quiesce pcie bus traffic before issuing a reset.
>
> Skipping this step might cause transactions to be lost in the middle
> of the reset as there will be active traffic flowing and drivers will
> suddenly start reading ffs.
>
> I don't think we can skip this step.
>
what if we only have conditional enumeration ? (leaving removing
devices followed by SBR as is) ?
following code is doing little more extra work than our normal ERR_FATAL
path.
pciehp_unconfigure_device doing little more than enumeration to
quiescence the bus.
/*
* Ensure that no new Requests will be generated from
* the device.
*/
if (presence) {
pci_read_config_word(dev, PCI_COMMAND, &command);
command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
command |= PCI_COMMAND_INTX_DISABLE;
pci_write_config_word(dev, PCI_COMMAND, command);
}
>
>>
>> That would seem like a much simpler solution, given that it is known
>> that the link will flap on reset, causing the hotplug driver to remove
>> and re-enumerate devices. That would also cover cases where hotplug
>> is
>> handled by a different driver than pciehp, or by the platform
>> firmware.
>>
>> Thanks,
>>
>> Lukas
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
WARNING: multiple messages have this Message-ID (diff)
From: poza@codeaurora.org (poza at codeaurora.org)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset
Date: Tue, 03 Jul 2018 18:41:33 +0530 [thread overview]
Message-ID: <9e871cc3978fbdca12ccf8a91f34ad07@codeaurora.org> (raw)
In-Reply-To: <8b6ce0f415858463d1c0588c29e30415@codeaurora.org>
On 2018-07-03 17:00, okaya at codeaurora.org wrote:
> On 2018-07-03 04:34, Lukas Wunner wrote:
>> On Mon, Jul 02, 2018 at 06:52:47PM -0400, Sinan Kaya wrote:
>>> If a bridge supports hotplug and observes a PCIe fatal error, the
>>> following
>>> events happen:
>>>
>>> 1. AER driver removes the devices from PCI tree on fatal error
>>> 2. AER driver brings down the link by issuing a secondary bus reset
>>> waits
>>> for the link to come up.
>>> 3. Hotplug driver observes a link down interrupt
>>> 4. Hotplug driver tries to remove the devices waiting for the rescan
>>> lock
>>> but devices are already removed by the AER driver and AER driver is
>>> waiting
>>> for the link to come back up.
>>> 5. AER driver tries to re-enumerate devices after polling for the
>>> link
>>> state to go up.
>>> 6. Hotplug driver obtains the lock and tries to remove the devices
>>> again.
>>>
>>> If a bridge is a hotplug capable bridge, mask hotplug interrupts
>>> before the
>>> reset and unmask afterwards.
>>
>> Would it work for you if you just amended the AER driver to skip
>> removal and re-enumeration of devices if the port is a hotplug bridge?
>> Just check for is_hotplug_bridge in struct pci_dev.
>
> The reason why we want to remove devices before secondary bus reset is
> to quiesce pcie bus traffic before issuing a reset.
>
> Skipping this step might cause transactions to be lost in the middle
> of the reset as there will be active traffic flowing and drivers will
> suddenly start reading ffs.
>
> I don't think we can skip this step.
>
what if we only have conditional enumeration ? (leaving removing
devices followed by SBR as is) ?
following code is doing little more extra work than our normal ERR_FATAL
path.
pciehp_unconfigure_device doing little more than enumeration to
quiescence the bus.
/*
* Ensure that no new Requests will be generated from
* the device.
*/
if (presence) {
pci_read_config_word(dev, PCI_COMMAND, &command);
command &= ~(PCI_COMMAND_MASTER | PCI_COMMAND_SERR);
command |= PCI_COMMAND_INTX_DISABLE;
pci_write_config_word(dev, PCI_COMMAND, command);
}
>
>>
>> That would seem like a much simpler solution, given that it is known
>> that the link will flap on reset, causing the hotplug driver to remove
>> and re-enumerate devices. That would also cover cases where hotplug
>> is
>> handled by a different driver than pciehp, or by the platform
>> firmware.
>>
>> Thanks,
>>
>> Lukas
next prev parent reply other threads:[~2018-07-03 13:11 UTC|newest]
Thread overview: 87+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-07-02 22:52 [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Sinan Kaya
2018-07-02 22:52 ` Sinan Kaya
2018-07-02 22:52 ` Sinan Kaya
2018-07-02 22:52 ` [PATCH V5 1/3] PCI: pciehp: implement mask and unmask interrupt functions Sinan Kaya
2018-07-02 22:52 ` Sinan Kaya
2018-07-02 22:52 ` Sinan Kaya
2018-07-02 22:52 ` [PATCH V5 2/3] PCI: pciehp: reuse pciehp_mask/unmask_irq() in reset_slot() Sinan Kaya
2018-07-02 22:52 ` Sinan Kaya
2018-07-02 22:52 ` Sinan Kaya
2018-07-02 22:52 ` Sinan Kaya
2018-07-02 22:52 ` [PATCH V5 3/3] PCI: Mask and unmask hotplug interrupts during reset Sinan Kaya
2018-07-02 22:52 ` Sinan Kaya
2018-07-02 22:52 ` Sinan Kaya
2018-07-02 22:52 ` Sinan Kaya
2018-07-03 8:34 ` Lukas Wunner
2018-07-03 10:52 ` poza
2018-07-03 10:52 ` poza at codeaurora.org
2018-07-03 10:52 ` poza
2018-07-03 12:04 ` okaya
2018-07-03 12:04 ` okaya at codeaurora.org
2018-07-03 12:04 ` okaya
2018-07-03 11:30 ` okaya
2018-07-03 11:30 ` okaya
2018-07-03 11:30 ` okaya at codeaurora.org
2018-07-03 11:30 ` okaya
2018-07-03 13:11 ` poza [this message]
2018-07-03 13:11 ` poza at codeaurora.org
2018-07-03 13:11 ` poza
2018-07-03 13:25 ` Sinan Kaya
2018-07-03 13:25 ` Sinan Kaya
2018-07-03 13:25 ` Sinan Kaya
2018-07-03 13:31 ` Sinan Kaya
2018-07-03 13:31 ` Sinan Kaya
2018-07-03 13:31 ` Sinan Kaya
2018-07-03 13:59 ` Lukas Wunner
2018-07-03 14:10 ` poza
2018-07-03 14:10 ` poza at codeaurora.org
2018-07-03 14:10 ` poza
2018-07-03 14:17 ` Lukas Wunner
2018-07-03 15:34 ` Sinan Kaya
2018-07-03 15:34 ` Sinan Kaya
2018-07-03 15:34 ` Sinan Kaya
2018-07-29 12:32 ` Lukas Wunner
2018-07-03 14:12 ` Lukas Wunner
2018-07-03 14:29 ` poza
2018-07-03 14:29 ` poza at codeaurora.org
2018-07-03 14:29 ` poza
2018-07-29 12:19 ` Lukas Wunner
2018-07-03 14:34 ` Lukas Wunner
2018-07-03 15:12 ` poza
2018-07-03 15:12 ` poza at codeaurora.org
2018-07-03 15:12 ` poza
2018-07-03 15:49 ` Sinan Kaya
2018-07-03 15:49 ` Sinan Kaya
2018-07-03 15:49 ` Sinan Kaya
2018-07-03 15:43 ` Sinan Kaya
2018-07-03 15:43 ` Sinan Kaya
2018-07-08 17:14 ` Lukas Wunner
2018-07-09 14:48 ` Sinan Kaya
2018-07-09 14:48 ` Sinan Kaya
2018-07-09 14:48 ` Sinan Kaya
2018-07-09 16:00 ` Lukas Wunner
2018-07-10 18:30 ` Sinan Kaya
2018-07-10 18:30 ` Sinan Kaya
2018-07-10 18:30 ` Sinan Kaya
2018-07-10 18:30 ` Sinan Kaya
2018-07-20 20:01 ` Bjorn Helgaas
2018-07-20 20:01 ` Bjorn Helgaas
2018-07-20 20:01 ` Bjorn Helgaas
2018-07-21 2:58 ` Sinan Kaya
2018-07-21 2:58 ` Sinan Kaya
2018-07-21 2:58 ` Sinan Kaya
2018-07-21 6:07 ` Sinan Kaya
2018-07-21 6:07 ` Sinan Kaya
2018-07-21 6:07 ` Sinan Kaya
2018-07-25 8:29 ` poza
2018-07-25 8:29 ` poza at codeaurora.org
2018-07-29 18:02 ` Lukas Wunner
2018-07-31 18:44 ` [PATCH V5 0/3] PCI: separate hotplug handling from fatal error handling Bjorn Helgaas
2018-07-31 18:44 ` Bjorn Helgaas
2018-07-31 18:44 ` Bjorn Helgaas
2018-07-31 18:54 ` Sinan Kaya
2018-07-31 18:54 ` Sinan Kaya
2018-07-31 18:54 ` Sinan Kaya
2018-07-31 20:16 ` Bjorn Helgaas
2018-07-31 20:16 ` Bjorn Helgaas
2018-07-31 20:16 ` Bjorn Helgaas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=9e871cc3978fbdca12ccf8a91f34ad07@codeaurora.org \
--to=poza@codeaurora.org \
--cc=bhelgaas@google.com \
--cc=keith.busch@intel.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=lukas@wunner.de \
--cc=okaya@codeaurora.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.