* [PATCH] PCI: check bridge->bus in pci_host_common_remove
@ 2024-10-28 8:46 Peng Fan (OSS)
2024-11-15 6:20 ` Manivannan Sadhasivam
2024-11-19 17:49 ` Bjorn Helgaas
0 siblings, 2 replies; 11+ messages in thread
From: Peng Fan (OSS) @ 2024-10-28 8:46 UTC (permalink / raw)
To: Will Deacon, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Pali Rohár, open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list
Cc: Peng Fan
From: Peng Fan <peng.fan@nxp.com>
When PCI node was created using an overlay and the overlay is
reverted/destroyed, the "linux,pci-domain" property no longer exists,
so of_get_pci_domain_nr will return failure. Then
of_pci_bus_release_domain_nr will actually use the dynamic IDA, even
if the IDA was allocated in static IDA. So the flow is as below:
A: of_changeset_revert
pci_host_common_remove
pci_bus_release_domain_nr
of_pci_bus_release_domain_nr
of_get_pci_domain_nr # fails because overlay is gone
ida_free(&pci_domain_nr_dynamic_ida)
With driver calls pci_host_common_remove explicity, the flow becomes:
B pci_host_common_remove
pci_bus_release_domain_nr
of_pci_bus_release_domain_nr
of_get_pci_domain_nr # succeeds in this order
ida_free(&pci_domain_nr_static_ida)
A of_changeset_revert
pci_host_common_remove
With updated flow, the pci_host_common_remove will be called twice,
so need to check 'bridge->bus' to avoid accessing invalid pointer.
Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
Signed-off-by: Peng Fan <peng.fan@nxp.com>
---
V1:
Not sure to keep the fixes here. I could drop the Fixes tag if it is
improper.
This is to revisit the patch [1] which was rejected last year. This
new flow is using the suggest flow following Bjorn's suggestion.
But of_changeset_revert will still invoke plaform_remove and then
pci_host_common_remove. So worked out this patch together with a patch
to jailhouse driver as below:
static void destroy_vpci_of_overlay(void)
{
+ struct device_node *vpci_node = NULL;
+
if (overlay_applied) {
+#if LINUX_VERSION_CODE >= KERNEL_VERSION(6,6,0)
+ vpci_node = of_find_node_by_path("/pci@0");
+ if (vpci_node) {
+ struct platform_device *pdev = of_find_device_by_node(vpci_node);
+ if (!pdev)
+ printk("Not found device for /pci@0\n");
+ else {
+ pci_host_common_remove(pdev);
+ platform_device_put(pdev);
+ }
+ }
+ of_node_put(vpci_node);
+#endif
+
of_changeset_revert(&overlay_changeset);
[1] https://lore.kernel.org/lkml/20230908224858.GA306960@bhelgaas/T/#md12e6097d91a012ede78c997fc5abf460029a569
drivers/pci/controller/pci-host-common.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c
index cf5f59a745b3..5a9c29fc57cd 100644
--- a/drivers/pci/controller/pci-host-common.c
+++ b/drivers/pci/controller/pci-host-common.c
@@ -86,8 +86,10 @@ void pci_host_common_remove(struct platform_device *pdev)
struct pci_host_bridge *bridge = platform_get_drvdata(pdev);
pci_lock_rescan_remove();
- pci_stop_root_bus(bridge->bus);
- pci_remove_root_bus(bridge->bus);
+ if (bridge->bus) {
+ pci_stop_root_bus(bridge->bus);
+ pci_remove_root_bus(bridge->bus);
+ }
pci_unlock_rescan_remove();
}
EXPORT_SYMBOL_GPL(pci_host_common_remove);
--
2.37.1
^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: check bridge->bus in pci_host_common_remove
2024-10-28 8:46 [PATCH] PCI: check bridge->bus in pci_host_common_remove Peng Fan (OSS)
@ 2024-11-15 6:20 ` Manivannan Sadhasivam
2024-11-15 10:14 ` Peng Fan
2024-11-19 17:49 ` Bjorn Helgaas
1 sibling, 1 reply; 11+ messages in thread
From: Manivannan Sadhasivam @ 2024-11-15 6:20 UTC (permalink / raw)
To: Peng Fan (OSS)
Cc: Will Deacon, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, Bjorn Helgaas, Pali Rohár,
open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list,
Peng Fan
On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
> From: Peng Fan <peng.fan@nxp.com>
>
> When PCI node was created using an overlay and the overlay is
> reverted/destroyed, the "linux,pci-domain" property no longer exists,
> so of_get_pci_domain_nr will return failure. Then
> of_pci_bus_release_domain_nr will actually use the dynamic IDA, even
> if the IDA was allocated in static IDA. So the flow is as below:
> A: of_changeset_revert
> pci_host_common_remove
> pci_bus_release_domain_nr
> of_pci_bus_release_domain_nr
> of_get_pci_domain_nr # fails because overlay is gone
> ida_free(&pci_domain_nr_dynamic_ida)
>
> With driver calls pci_host_common_remove explicity, the flow becomes:
> B pci_host_common_remove
> pci_bus_release_domain_nr
> of_pci_bus_release_domain_nr
> of_get_pci_domain_nr # succeeds in this order
> ida_free(&pci_domain_nr_static_ida)
> A of_changeset_revert
> pci_host_common_remove
>
> With updated flow, the pci_host_common_remove will be called twice,
> so need to check 'bridge->bus' to avoid accessing invalid pointer.
>
> Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> Signed-off-by: Peng Fan <peng.fan@nxp.com>
I went through the previous discussion [1] and I couldn't see an agreement on
the point raised by Bjorn on 'removing the host bridge before the overlay'.
I do think this is a valid point and if you do not think so, please state the
reason.
- Mani
[1] https://lore.kernel.org/lkml/20230913115737.GA426735@bhelgaas/
> ---
>
> V1:
> Not sure to keep the fixes here. I could drop the Fixes tag if it is
> improper.
> This is to revisit the patch [1] which was rejected last year. This
> new flow is using the suggest flow following Bjorn's suggestion.
> But of_changeset_revert will still invoke plaform_remove and then
> pci_host_common_remove. So worked out this patch together with a patch
> to jailhouse driver as below:
> static void destroy_vpci_of_overlay(void)
> {
> + struct device_node *vpci_node = NULL;
> +
> if (overlay_applied) {
> +#if LINUX_VERSION_CODE >= KERNEL_VERSION(6,6,0)
> + vpci_node = of_find_node_by_path("/pci@0");
> + if (vpci_node) {
> + struct platform_device *pdev = of_find_device_by_node(vpci_node);
> + if (!pdev)
> + printk("Not found device for /pci@0\n");
> + else {
> + pci_host_common_remove(pdev);
> + platform_device_put(pdev);
> + }
> + }
> + of_node_put(vpci_node);
> +#endif
> +
> of_changeset_revert(&overlay_changeset);
>
> [1] https://lore.kernel.org/lkml/20230908224858.GA306960@bhelgaas/T/#md12e6097d91a012ede78c997fc5abf460029a569
>
> drivers/pci/controller/pci-host-common.c | 6 ++++--
> 1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/pci/controller/pci-host-common.c b/drivers/pci/controller/pci-host-common.c
> index cf5f59a745b3..5a9c29fc57cd 100644
> --- a/drivers/pci/controller/pci-host-common.c
> +++ b/drivers/pci/controller/pci-host-common.c
> @@ -86,8 +86,10 @@ void pci_host_common_remove(struct platform_device *pdev)
> struct pci_host_bridge *bridge = platform_get_drvdata(pdev);
>
> pci_lock_rescan_remove();
> - pci_stop_root_bus(bridge->bus);
> - pci_remove_root_bus(bridge->bus);
> + if (bridge->bus) {
> + pci_stop_root_bus(bridge->bus);
> + pci_remove_root_bus(bridge->bus);
> + }
> pci_unlock_rescan_remove();
> }
> EXPORT_SYMBOL_GPL(pci_host_common_remove);
> --
> 2.37.1
>
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] PCI: check bridge->bus in pci_host_common_remove
2024-11-15 6:20 ` Manivannan Sadhasivam
@ 2024-11-15 10:14 ` Peng Fan
2024-11-15 14:47 ` Manivannan Sadhasivam
0 siblings, 1 reply; 11+ messages in thread
From: Peng Fan @ 2024-11-15 10:14 UTC (permalink / raw)
To: Manivannan Sadhasivam, Peng Fan (OSS)
Cc: Will Deacon, Lorenzo Pieralisi, Krzysztof Wilczyński,
Rob Herring, Bjorn Helgaas, Pali Rohár,
open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list
Hi Manivannan,
> Subject: Re: [PATCH] PCI: check bridge->bus in
> pci_host_common_remove
>
> On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
> > From: Peng Fan <peng.fan@nxp.com>
> >
> > When PCI node was created using an overlay and the overlay is
> > reverted/destroyed, the "linux,pci-domain" property no longer exists,
> > so of_get_pci_domain_nr will return failure. Then
> > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> even
> > if the IDA was allocated in static IDA. So the flow is as below:
> > A: of_changeset_revert
> > pci_host_common_remove
> > pci_bus_release_domain_nr
> > of_pci_bus_release_domain_nr
> > of_get_pci_domain_nr # fails because overlay is gone
> > ida_free(&pci_domain_nr_dynamic_ida)
> >
> > With driver calls pci_host_common_remove explicity, the flow
> becomes:
> > B pci_host_common_remove
> > pci_bus_release_domain_nr
> > of_pci_bus_release_domain_nr
> > of_get_pci_domain_nr # succeeds in this order
> > ida_free(&pci_domain_nr_static_ida)
> > A of_changeset_revert
> > pci_host_common_remove
> >
> > With updated flow, the pci_host_common_remove will be called
> twice, so
> > need to check 'bridge->bus' to avoid accessing invalid pointer.
> >
> > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> > Signed-off-by: Peng Fan <peng.fan@nxp.com>
>
> I went through the previous discussion [1] and I couldn't see an
> agreement on the point raised by Bjorn on 'removing the host bridge
> before the overlay'.
This patch is an agreement to Bjorn's idea.
I have added pci_host_common_remove to remove host bridge
before removing overlay as I wrote in commit log.
But of_changeset_revert will still runs into pci_host_
common_remove to remove the host bridge again. Per
my view, the design of of_changeset_revert to remove
the device tree node will trigger device remove, so even
pci_host_common_remove was explicitly used before
of_changeset_revert. The following call to of_changeset_revert
will still call pci_host_common_remove.
So I did this patch to add a check of 'bus' to avoid remove again.
>
> I do think this is a valid point and if you do not think so, please state
> the reason.
I agree Bjorn's view, this patch is output of agreement as I write above.
Thanks,
Peng.
>
> - Mani
>
> [1]
> https://lore.kernel.org/lkml/20230913115737.GA426735@bhelgaas/
>
> > ---
> >
> > V1:
> > Not sure to keep the fixes here. I could drop the Fixes tag if it is
> > improper.
> > This is to revisit the patch [1] which was rejected last year. This
> > new flow is using the suggest flow following Bjorn's suggestion.
> > But of_changeset_revert will still invoke plaform_remove and then
> > pci_host_common_remove. So worked out this patch together with a
> patch
> > to jailhouse driver as below:
> > static void destroy_vpci_of_overlay(void) {
> > + struct device_node *vpci_node = NULL;
> > +
> > if (overlay_applied) {
> > +#if LINUX_VERSION_CODE >= KERNEL_VERSION(6,6,0)
> > + vpci_node = of_find_node_by_path("/pci@0");
> > + if (vpci_node) {
> > + struct platform_device *pdev =
> of_find_device_by_node(vpci_node);
> > + if (!pdev)
> > + printk("Not found device for /pci@0\n");
> > + else {
> > + pci_host_common_remove(pdev);
> > + platform_device_put(pdev);
> > + }
> > + }
> > + of_node_put(vpci_node); #endif
> > +
> > of_changeset_revert(&overlay_changeset);
> >
> > [1]
> >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> lore
> > .kernel.org%2Flkml%2F20230908224858.GA306960%40bhelgaas%2
> FT%2F%23md12e
> >
> 6097d91a012ede78c997fc5abf460029a569&data=05%7C02%7Cpeng.
> fan%40nxp.com
> > %7C025e209cf9264c4240fa08dd053d9211%7C686ea1d3bc2b4c6fa
> 92cd99c5c301635
> > %7C0%7C0%7C638672484189046564%7CUnknown%7CTWFpbGZsb
> 3d8eyJFbXB0eU1hcGki
> >
> OnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIl
> dUIjoyfQ
> > %3D%3D%7C0%7C%7C%7C&sdata=uCo5MO5fEqCjBzwZ8hdnsf3ORh
> SedhrJWvNOCCMNvG0%
> > 3D&reserved=0
> >
> > drivers/pci/controller/pci-host-common.c | 6 ++++--
> > 1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/pci/controller/pci-host-common.c
> > b/drivers/pci/controller/pci-host-common.c
> > index cf5f59a745b3..5a9c29fc57cd 100644
> > --- a/drivers/pci/controller/pci-host-common.c
> > +++ b/drivers/pci/controller/pci-host-common.c
> > @@ -86,8 +86,10 @@ void pci_host_common_remove(struct
> platform_device *pdev)
> > struct pci_host_bridge *bridge = platform_get_drvdata(pdev);
> >
> > pci_lock_rescan_remove();
> > - pci_stop_root_bus(bridge->bus);
> > - pci_remove_root_bus(bridge->bus);
> > + if (bridge->bus) {
> > + pci_stop_root_bus(bridge->bus);
> > + pci_remove_root_bus(bridge->bus);
> > + }
> > pci_unlock_rescan_remove();
> > }
> > EXPORT_SYMBOL_GPL(pci_host_common_remove);
> > --
> > 2.37.1
> >
>
> --
> மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: check bridge->bus in pci_host_common_remove
2024-11-15 10:14 ` Peng Fan
@ 2024-11-15 14:47 ` Manivannan Sadhasivam
2024-11-19 7:30 ` Peng Fan
2024-11-27 19:56 ` Rob Herring
0 siblings, 2 replies; 11+ messages in thread
From: Manivannan Sadhasivam @ 2024-11-15 14:47 UTC (permalink / raw)
To: Peng Fan, Rob Herring
Cc: Peng Fan (OSS), Will Deacon, Lorenzo Pieralisi,
Krzysztof Wilczyński, Bjorn Helgaas, Pali Rohár,
open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list
On Fri, Nov 15, 2024 at 10:14:10AM +0000, Peng Fan wrote:
> Hi Manivannan,
>
> > Subject: Re: [PATCH] PCI: check bridge->bus in
> > pci_host_common_remove
> >
> > On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
> > > From: Peng Fan <peng.fan@nxp.com>
> > >
> > > When PCI node was created using an overlay and the overlay is
> > > reverted/destroyed, the "linux,pci-domain" property no longer exists,
> > > so of_get_pci_domain_nr will return failure. Then
> > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> > even
> > > if the IDA was allocated in static IDA. So the flow is as below:
> > > A: of_changeset_revert
> > > pci_host_common_remove
> > > pci_bus_release_domain_nr
> > > of_pci_bus_release_domain_nr
> > > of_get_pci_domain_nr # fails because overlay is gone
> > > ida_free(&pci_domain_nr_dynamic_ida)
> > >
> > > With driver calls pci_host_common_remove explicity, the flow
> > becomes:
> > > B pci_host_common_remove
> > > pci_bus_release_domain_nr
> > > of_pci_bus_release_domain_nr
> > > of_get_pci_domain_nr # succeeds in this order
> > > ida_free(&pci_domain_nr_static_ida)
> > > A of_changeset_revert
> > > pci_host_common_remove
> > >
> > > With updated flow, the pci_host_common_remove will be called
> > twice, so
> > > need to check 'bridge->bus' to avoid accessing invalid pointer.
> > >
> > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> > > Signed-off-by: Peng Fan <peng.fan@nxp.com>
> >
> > I went through the previous discussion [1] and I couldn't see an
> > agreement on the point raised by Bjorn on 'removing the host bridge
> > before the overlay'.
>
> This patch is an agreement to Bjorn's idea.
>
> I have added pci_host_common_remove to remove host bridge
> before removing overlay as I wrote in commit log.
>
> But of_changeset_revert will still runs into pci_host_
> common_remove to remove the host bridge again. Per
> my view, the design of of_changeset_revert to remove
> the device tree node will trigger device remove, so even
> pci_host_common_remove was explicitly used before
> of_changeset_revert. The following call to of_changeset_revert
> will still call pci_host_common_remove.
>
> So I did this patch to add a check of 'bus' to avoid remove again.
>
Ok. I think there was a misunderstanding. Bjorn's example driver,
'i2c-demux-pinctrl' applies the changeset, then adds the i2c adapter for its
own. And in remove(), it does the reverse.
But in your case, the issue is with the host bridge driver that gets probed
because of the changeset. While with 'i2c-demux-pinctrl' driver, it only
applies the changeset. So we cannot compare both drivers. I believe in your
case, 'i2c-demux-pinctrl' becomes 'jailhouse', isn't it?
So in your case, changeset is applied by jailhouse and that causes the
platform device to be created for the host bridge and then the host bridge
driver gets probed. So during destroy(), you call of_changeset_revert() that
removes the platform device and during that process it removes the host bridge
driver. The issue happens because during host bridge remove, it calls
pci_remove_root_bus() and that tries to remove the domain_nr using
pci_bus_release_domain_nr().
But pci_bus_release_domain_nr() uses DT node to check whether to free the
domain_nr from static IDA or dynamic IDA. And because there is no DT node exist
at this time (it was already removed by of_changeset_revert()), it forces
pci_bus_release_domain_nr() to use dynamic IDA even though the IDA was initially
allocated from static IDA.
I think a neat way to solve this issue would be by removing the OF node only
after removing all platform devices/drivers associated with that node. But I
honestly do not know whether that is possible or not. Otherwise, any other
driver that relies on the OF node in its remove() callback, could suffer from
the same issue. And whatever fix we may come up with in PCI core, it will be a
band-aid only.
I'd like to check with Rob first about his opinion.
- Mani
> >
> > I do think this is a valid point and if you do not think so, please state
> > the reason.
>
> I agree Bjorn's view, this patch is output of agreement as I write above.
>
> Thanks,
> Peng.
>
> >
> > - Mani
> >
> > [1]
> > https://lore.kernel.org/lkml/20230913115737.GA426735@bhelgaas/
> >
> > > ---
> > >
> > > V1:
> > > Not sure to keep the fixes here. I could drop the Fixes tag if it is
> > > improper.
> > > This is to revisit the patch [1] which was rejected last year. This
> > > new flow is using the suggest flow following Bjorn's suggestion.
> > > But of_changeset_revert will still invoke plaform_remove and then
> > > pci_host_common_remove. So worked out this patch together with a
> > patch
> > > to jailhouse driver as below:
> > > static void destroy_vpci_of_overlay(void) {
> > > + struct device_node *vpci_node = NULL;
> > > +
> > > if (overlay_applied) {
> > > +#if LINUX_VERSION_CODE >= KERNEL_VERSION(6,6,0)
> > > + vpci_node = of_find_node_by_path("/pci@0");
> > > + if (vpci_node) {
> > > + struct platform_device *pdev =
> > of_find_device_by_node(vpci_node);
> > > + if (!pdev)
> > > + printk("Not found device for /pci@0\n");
> > > + else {
> > > + pci_host_common_remove(pdev);
> > > + platform_device_put(pdev);
> > > + }
> > > + }
> > > + of_node_put(vpci_node); #endif
> > > +
> > > of_changeset_revert(&overlay_changeset);
> > >
> > > [1]
> > >
> > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > lore
> > > .kernel.org%2Flkml%2F20230908224858.GA306960%40bhelgaas%2
> > FT%2F%23md12e
> > >
> > 6097d91a012ede78c997fc5abf460029a569&data=05%7C02%7Cpeng.
> > fan%40nxp.com
> > > %7C025e209cf9264c4240fa08dd053d9211%7C686ea1d3bc2b4c6fa
> > 92cd99c5c301635
> > > %7C0%7C0%7C638672484189046564%7CUnknown%7CTWFpbGZsb
> > 3d8eyJFbXB0eU1hcGki
> > >
> > OnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIl
> > dUIjoyfQ
> > > %3D%3D%7C0%7C%7C%7C&sdata=uCo5MO5fEqCjBzwZ8hdnsf3ORh
> > SedhrJWvNOCCMNvG0%
> > > 3D&reserved=0
> > >
> > > drivers/pci/controller/pci-host-common.c | 6 ++++--
> > > 1 file changed, 4 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/drivers/pci/controller/pci-host-common.c
> > > b/drivers/pci/controller/pci-host-common.c
> > > index cf5f59a745b3..5a9c29fc57cd 100644
> > > --- a/drivers/pci/controller/pci-host-common.c
> > > +++ b/drivers/pci/controller/pci-host-common.c
> > > @@ -86,8 +86,10 @@ void pci_host_common_remove(struct
> > platform_device *pdev)
> > > struct pci_host_bridge *bridge = platform_get_drvdata(pdev);
> > >
> > > pci_lock_rescan_remove();
> > > - pci_stop_root_bus(bridge->bus);
> > > - pci_remove_root_bus(bridge->bus);
> > > + if (bridge->bus) {
> > > + pci_stop_root_bus(bridge->bus);
> > > + pci_remove_root_bus(bridge->bus);
> > > + }
> > > pci_unlock_rescan_remove();
> > > }
> > > EXPORT_SYMBOL_GPL(pci_host_common_remove);
> > > --
> > > 2.37.1
> > >
> >
> > --
> > மணிவண்ணன் சதாசிவம்
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 11+ messages in thread
* RE: [PATCH] PCI: check bridge->bus in pci_host_common_remove
2024-11-15 14:47 ` Manivannan Sadhasivam
@ 2024-11-19 7:30 ` Peng Fan
2024-11-27 19:56 ` Rob Herring
1 sibling, 0 replies; 11+ messages in thread
From: Peng Fan @ 2024-11-19 7:30 UTC (permalink / raw)
To: Manivannan Sadhasivam, Rob Herring
Cc: Peng Fan (OSS), Will Deacon, Lorenzo Pieralisi,
Krzysztof Wilczyński, Bjorn Helgaas, Pali Rohár,
open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list
> Subject: Re: [PATCH] PCI: check bridge->bus in
> pci_host_common_remove
>
> On Fri, Nov 15, 2024 at 10:14:10AM +0000, Peng Fan wrote:
> > Hi Manivannan,
> >
> > > Subject: Re: [PATCH] PCI: check bridge->bus in
> > > pci_host_common_remove
> > >
> > > On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
> > > > From: Peng Fan <peng.fan@nxp.com>
> > > >
> > > > When PCI node was created using an overlay and the overlay is
> > > > reverted/destroyed, the "linux,pci-domain" property no longer
> > > > exists, so of_get_pci_domain_nr will return failure. Then
> > > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> > > even
> > > > if the IDA was allocated in static IDA. So the flow is as below:
> > > > A: of_changeset_revert
> > > > pci_host_common_remove
> > > > pci_bus_release_domain_nr
> > > > of_pci_bus_release_domain_nr
> > > > of_get_pci_domain_nr # fails because overlay is gone
> > > > ida_free(&pci_domain_nr_dynamic_ida)
> > > >
> > > > With driver calls pci_host_common_remove explicity, the flow
> > > becomes:
> > > > B pci_host_common_remove
> > > > pci_bus_release_domain_nr
> > > > of_pci_bus_release_domain_nr
> > > > of_get_pci_domain_nr # succeeds in this order
> > > > ida_free(&pci_domain_nr_static_ida)
> > > > A of_changeset_revert
> > > > pci_host_common_remove
> > > >
> > > > With updated flow, the pci_host_common_remove will be called
> > > twice, so
> > > > need to check 'bridge->bus' to avoid accessing invalid pointer.
> > > >
> > > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> > > > Signed-off-by: Peng Fan <peng.fan@nxp.com>
> > >
> > > I went through the previous discussion [1] and I couldn't see an
> > > agreement on the point raised by Bjorn on 'removing the host
> bridge
> > > before the overlay'.
> >
> > This patch is an agreement to Bjorn's idea.
> >
> > I have added pci_host_common_remove to remove host bridge
> before
> > removing overlay as I wrote in commit log.
> >
> > But of_changeset_revert will still runs into pci_host_
> common_remove
> > to remove the host bridge again. Per my view, the design of
> > of_changeset_revert to remove the device tree node will trigger
> device
> > remove, so even pci_host_common_remove was explicitly used
> before
> > of_changeset_revert. The following call to of_changeset_revert will
> > still call pci_host_common_remove.
> >
> > So I did this patch to add a check of 'bus' to avoid remove again.
> >
>
> Ok. I think there was a misunderstanding. Bjorn's example driver, 'i2c-
> demux-pinctrl' applies the changeset, then adds the i2c adapter for its
> own. And in remove(), it does the reverse.
>
> But in your case, the issue is with the host bridge driver that gets
> probed because of the changeset. While with 'i2c-demux-pinctrl' driver,
> it only applies the changeset. So we cannot compare both drivers. I
> believe in your case, 'i2c-demux-pinctrl' becomes 'jailhouse', isn't it?
Correct.
>
> So in your case, changeset is applied by jailhouse and that causes the
> platform device to be created for the host bridge and then the host
> bridge driver gets probed. So during destroy(), you call
> of_changeset_revert() that removes the platform device and during
> that process it removes the host bridge driver. The issue happens
> because during host bridge remove, it calls
> pci_remove_root_bus() and that tries to remove the domain_nr using
> pci_bus_release_domain_nr().
>
> But pci_bus_release_domain_nr() uses DT node to check whether to
> free the domain_nr from static IDA or dynamic IDA. And because there
> is no DT node exist at this time (it was already removed by
> of_changeset_revert()), it forces
> pci_bus_release_domain_nr() to use dynamic IDA even though the IDA
> was initially allocated from static IDA.
Yeah. This is the issue.
>
> I think a neat way to solve this issue would be by removing the OF
> node only after removing all platform devices/drivers associated with
> that node. But I honestly do not know whether that is possible or not.
> Otherwise, any other driver that relies on the OF node in its remove()
> callback, could suffer from the same issue. And whatever fix we may
> come up with in PCI core, it will be a band-aid only.
>
> I'd like to check with Rob first about his opinion.
Rob,
Do you have any comments?
Thanks,
Peng.
>
> - Mani
>
> > >
> > > I do think this is a valid point and if you do not think so, please
> > > state the reason.
> >
> > I agree Bjorn's view, this patch is output of agreement as I write
> above.
> >
> > Thanks,
> > Peng.
> >
> > >
> > > - Mani
> > >
> > > [1]
> > >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> lo
> > >
> re.kernel.org%2Flkml%2F20230913115737.GA426735%40bhelgaas%2
> F&data=05
> > > %7C02%7Cpeng.fan%40nxp.com%7Ca4506485e5234a78dbe308dd
> 05846f0a%7C686e
> > >
> a1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C6386727885612559
> 43%7CUnknown%
> > >
> 7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMC
> IsIlAiOiJXaW
> > >
> 4zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=
> QdV1D23C%2
> > > BiuMoipt%2BXkINp77bAjFmmMIBuA5kuAR9uM%3D&reserved=0
> > >
> > > > ---
> > > >
> > > > V1:
> > > > Not sure to keep the fixes here. I could drop the Fixes tag if it
> > > > is improper.
> > > > This is to revisit the patch [1] which was rejected last year.
> > > > This new flow is using the suggest flow following Bjorn's
> suggestion.
> > > > But of_changeset_revert will still invoke plaform_remove and
> then
> > > > pci_host_common_remove. So worked out this patch together
> with a
> > > patch
> > > > to jailhouse driver as below:
> > > > static void destroy_vpci_of_overlay(void) {
> > > > + struct device_node *vpci_node = NULL;
> > > > +
> > > > if (overlay_applied) {
> > > > +#if LINUX_VERSION_CODE >= KERNEL_VERSION(6,6,0)
> > > > + vpci_node = of_find_node_by_path("/pci@0");
> > > > + if (vpci_node) {
> > > > + struct platform_device *pdev =
> > > of_find_device_by_node(vpci_node);
> > > > + if (!pdev)
> > > > + printk("Not found device for /pci@0\n");
> > > > + else {
> > > > + pci_host_common_remove(pdev);
> > > > + platform_device_put(pdev);
> > > > + }
> > > > + }
> > > > + of_node_put(vpci_node); #endif
> > > > +
> > > > of_changeset_revert(&overlay_changeset);
> > > >
> > > > [1]
> > > >
> > >
> https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2F
> > > lore
> > > > .kernel.org%2Flkml%2F20230908224858.GA306960%40bhelgaas
> %2
> > > FT%2F%23md12e
> > > >
> > >
> 6097d91a012ede78c997fc5abf460029a569&data=05%7C02%7Cpeng.
> > > fan%40nxp.com
> > > > %7C025e209cf9264c4240fa08dd053d9211%7C686ea1d3bc2b4c
> 6fa
> > > 92cd99c5c301635
> > > > %7C0%7C0%7C638672484189046564%7CUnknown%7CTWFpbG
> Zsb
> > > 3d8eyJFbXB0eU1hcGki
> > > >
> > >
> OnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4zMiIsIkFOIjoiTWFpbCIsIl
> > > dUIjoyfQ
> > > > %3D%3D%7C0%7C%7C%7C&sdata=uCo5MO5fEqCjBzwZ8hdnsf3O
> Rh
> > > SedhrJWvNOCCMNvG0%
> > > > 3D&reserved=0
> > > >
> > > > drivers/pci/controller/pci-host-common.c | 6 ++++--
> > > > 1 file changed, 4 insertions(+), 2 deletions(-)
> > > >
> > > > diff --git a/drivers/pci/controller/pci-host-common.c
> > > > b/drivers/pci/controller/pci-host-common.c
> > > > index cf5f59a745b3..5a9c29fc57cd 100644
> > > > --- a/drivers/pci/controller/pci-host-common.c
> > > > +++ b/drivers/pci/controller/pci-host-common.c
> > > > @@ -86,8 +86,10 @@ void pci_host_common_remove(struct
> > > platform_device *pdev)
> > > > struct pci_host_bridge *bridge = platform_get_drvdata(pdev);
> > > >
> > > > pci_lock_rescan_remove();
> > > > - pci_stop_root_bus(bridge->bus);
> > > > - pci_remove_root_bus(bridge->bus);
> > > > + if (bridge->bus) {
> > > > + pci_stop_root_bus(bridge->bus);
> > > > + pci_remove_root_bus(bridge->bus);
> > > > + }
> > > > pci_unlock_rescan_remove();
> > > > }
> > > > EXPORT_SYMBOL_GPL(pci_host_common_remove);
> > > > --
> > > > 2.37.1
> > > >
> > >
> > > --
> > > மணிவண்ணன் சதாசிவம்
>
> --
> மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: check bridge->bus in pci_host_common_remove
2024-10-28 8:46 [PATCH] PCI: check bridge->bus in pci_host_common_remove Peng Fan (OSS)
2024-11-15 6:20 ` Manivannan Sadhasivam
@ 2024-11-19 17:49 ` Bjorn Helgaas
1 sibling, 0 replies; 11+ messages in thread
From: Bjorn Helgaas @ 2024-11-19 17:49 UTC (permalink / raw)
To: Peng Fan (OSS)
Cc: Will Deacon, Lorenzo Pieralisi, Krzysztof Wilczyński,
Manivannan Sadhasivam, Rob Herring, Bjorn Helgaas,
Pali Rohár, open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list,
Peng Fan
On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
> From: Peng Fan <peng.fan@nxp.com>
>
> When PCI node was created using an overlay and the overlay is
> reverted/destroyed, the "linux,pci-domain" property no longer exists,
> so of_get_pci_domain_nr will return failure. Then
> of_pci_bus_release_domain_nr will actually use the dynamic IDA, even
> if the IDA was allocated in static IDA. So the flow is as below:
> A: of_changeset_revert
> pci_host_common_remove
> pci_bus_release_domain_nr
> of_pci_bus_release_domain_nr
> of_get_pci_domain_nr # fails because overlay is gone
> ida_free(&pci_domain_nr_dynamic_ida)
>
> With driver calls pci_host_common_remove explicity, the flow becomes:
> B pci_host_common_remove
> pci_bus_release_domain_nr
> of_pci_bus_release_domain_nr
> of_get_pci_domain_nr # succeeds in this order
> ida_free(&pci_domain_nr_static_ida)
> A of_changeset_revert
> pci_host_common_remove
>
> With updated flow, the pci_host_common_remove will be called twice,
> so need to check 'bridge->bus' to avoid accessing invalid pointer.
If/when you post a v2 of this, please:
- Update the subject to say *why* this change is desirable.
- Follow the capitalization convention (use "git log --oneline" to
discover it).
- Add "()" after function names in the text (no need in the call
tree because that's obviously all functions).
- Mention the user-visible problem this fixes, e.g., do you see an
oops because of a NULL pointer dereference?
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: check bridge->bus in pci_host_common_remove
2024-11-15 14:47 ` Manivannan Sadhasivam
2024-11-19 7:30 ` Peng Fan
@ 2024-11-27 19:56 ` Rob Herring
2024-12-02 9:29 ` Manivannan Sadhasivam
1 sibling, 1 reply; 11+ messages in thread
From: Rob Herring @ 2024-11-27 19:56 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Peng Fan, Peng Fan (OSS), Will Deacon, Lorenzo Pieralisi,
Krzysztof Wilczyński, Bjorn Helgaas, Pali Rohár,
open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list
On Fri, Nov 15, 2024 at 08:17:20PM +0530, Manivannan Sadhasivam wrote:
> On Fri, Nov 15, 2024 at 10:14:10AM +0000, Peng Fan wrote:
> > Hi Manivannan,
> >
> > > Subject: Re: [PATCH] PCI: check bridge->bus in
> > > pci_host_common_remove
> > >
> > > On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
> > > > From: Peng Fan <peng.fan@nxp.com>
> > > >
> > > > When PCI node was created using an overlay and the overlay is
> > > > reverted/destroyed, the "linux,pci-domain" property no longer exists,
> > > > so of_get_pci_domain_nr will return failure. Then
> > > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> > > even
> > > > if the IDA was allocated in static IDA. So the flow is as below:
> > > > A: of_changeset_revert
> > > > pci_host_common_remove
> > > > pci_bus_release_domain_nr
> > > > of_pci_bus_release_domain_nr
> > > > of_get_pci_domain_nr # fails because overlay is gone
> > > > ida_free(&pci_domain_nr_dynamic_ida)
> > > >
> > > > With driver calls pci_host_common_remove explicity, the flow
> > > becomes:
> > > > B pci_host_common_remove
> > > > pci_bus_release_domain_nr
> > > > of_pci_bus_release_domain_nr
> > > > of_get_pci_domain_nr # succeeds in this order
> > > > ida_free(&pci_domain_nr_static_ida)
> > > > A of_changeset_revert
> > > > pci_host_common_remove
> > > >
> > > > With updated flow, the pci_host_common_remove will be called
> > > twice, so
> > > > need to check 'bridge->bus' to avoid accessing invalid pointer.
> > > >
> > > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> > > > Signed-off-by: Peng Fan <peng.fan@nxp.com>
> > >
> > > I went through the previous discussion [1] and I couldn't see an
> > > agreement on the point raised by Bjorn on 'removing the host bridge
> > > before the overlay'.
> >
> > This patch is an agreement to Bjorn's idea.
> >
> > I have added pci_host_common_remove to remove host bridge
> > before removing overlay as I wrote in commit log.
> >
> > But of_changeset_revert will still runs into pci_host_
> > common_remove to remove the host bridge again. Per
> > my view, the design of of_changeset_revert to remove
> > the device tree node will trigger device remove, so even
> > pci_host_common_remove was explicitly used before
> > of_changeset_revert. The following call to of_changeset_revert
> > will still call pci_host_common_remove.
> >
> > So I did this patch to add a check of 'bus' to avoid remove again.
> >
>
> Ok. I think there was a misunderstanding. Bjorn's example driver,
> 'i2c-demux-pinctrl' applies the changeset, then adds the i2c adapter for its
> own. And in remove(), it does the reverse.
>
> But in your case, the issue is with the host bridge driver that gets probed
> because of the changeset. While with 'i2c-demux-pinctrl' driver, it only
> applies the changeset. So we cannot compare both drivers. I believe in your
> case, 'i2c-demux-pinctrl' becomes 'jailhouse', isn't it?
>
> So in your case, changeset is applied by jailhouse and that causes the
> platform device to be created for the host bridge and then the host bridge
> driver gets probed. So during destroy(), you call of_changeset_revert() that
> removes the platform device and during that process it removes the host bridge
> driver. The issue happens because during host bridge remove, it calls
> pci_remove_root_bus() and that tries to remove the domain_nr using
> pci_bus_release_domain_nr().
>
> But pci_bus_release_domain_nr() uses DT node to check whether to free the
> domain_nr from static IDA or dynamic IDA. And because there is no DT node exist
> at this time (it was already removed by of_changeset_revert()), it forces
> pci_bus_release_domain_nr() to use dynamic IDA even though the IDA was initially
> allocated from static IDA.
Putting linux,pci-domain in an overlay is the same problem as aliases in
overlays[1]. It's not going to work well.
IMO, you can have overlays, or you can have static domains. You can't
have both.
> I think a neat way to solve this issue would be by removing the OF node only
> after removing all platform devices/drivers associated with that node. But I
> honestly do not know whether that is possible or not. Otherwise, any other
> driver that relies on the OF node in its remove() callback, could suffer from
> the same issue. And whatever fix we may come up with in PCI core, it will be a
> band-aid only.
>
> I'd like to check with Rob first about his opinion.
If the struct device has an of_node set, there should be a reference
count on that node. But I think that only prevents the node from being
freed. It does not prevent the overlay from being detached. This is one
of many of the issues with overlays Frank painstakingly documented[2].
Perhaps it is just a matter of iterating thru all the nodes in an
overlay, getting their driver/device, and forcing them to unbind.
Though that has to be done per bus type.
Rob
[1] https://lore.kernel.org/all/CAL_Jsq+72Q6LyOj1va_qcyCVkSRwqGNvBFfB9NNOgYXasAFYJQ@mail.gmail.com/
[2] https://elinux.org/Frank%27s_Evolving_Overlay_Thoughts
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: check bridge->bus in pci_host_common_remove
2024-11-27 19:56 ` Rob Herring
@ 2024-12-02 9:29 ` Manivannan Sadhasivam
2024-12-02 13:55 ` Rob Herring
0 siblings, 1 reply; 11+ messages in thread
From: Manivannan Sadhasivam @ 2024-12-02 9:29 UTC (permalink / raw)
To: Rob Herring
Cc: Peng Fan, Peng Fan (OSS), Will Deacon, Lorenzo Pieralisi,
Krzysztof Wilczyński, Bjorn Helgaas, Pali Rohár,
open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list
On Wed, Nov 27, 2024 at 01:56:50PM -0600, Rob Herring wrote:
> On Fri, Nov 15, 2024 at 08:17:20PM +0530, Manivannan Sadhasivam wrote:
> > On Fri, Nov 15, 2024 at 10:14:10AM +0000, Peng Fan wrote:
> > > Hi Manivannan,
> > >
> > > > Subject: Re: [PATCH] PCI: check bridge->bus in
> > > > pci_host_common_remove
> > > >
> > > > On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
> > > > > From: Peng Fan <peng.fan@nxp.com>
> > > > >
> > > > > When PCI node was created using an overlay and the overlay is
> > > > > reverted/destroyed, the "linux,pci-domain" property no longer exists,
> > > > > so of_get_pci_domain_nr will return failure. Then
> > > > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> > > > even
> > > > > if the IDA was allocated in static IDA. So the flow is as below:
> > > > > A: of_changeset_revert
> > > > > pci_host_common_remove
> > > > > pci_bus_release_domain_nr
> > > > > of_pci_bus_release_domain_nr
> > > > > of_get_pci_domain_nr # fails because overlay is gone
> > > > > ida_free(&pci_domain_nr_dynamic_ida)
> > > > >
> > > > > With driver calls pci_host_common_remove explicity, the flow
> > > > becomes:
> > > > > B pci_host_common_remove
> > > > > pci_bus_release_domain_nr
> > > > > of_pci_bus_release_domain_nr
> > > > > of_get_pci_domain_nr # succeeds in this order
> > > > > ida_free(&pci_domain_nr_static_ida)
> > > > > A of_changeset_revert
> > > > > pci_host_common_remove
> > > > >
> > > > > With updated flow, the pci_host_common_remove will be called
> > > > twice, so
> > > > > need to check 'bridge->bus' to avoid accessing invalid pointer.
> > > > >
> > > > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> > > > > Signed-off-by: Peng Fan <peng.fan@nxp.com>
> > > >
> > > > I went through the previous discussion [1] and I couldn't see an
> > > > agreement on the point raised by Bjorn on 'removing the host bridge
> > > > before the overlay'.
> > >
> > > This patch is an agreement to Bjorn's idea.
> > >
> > > I have added pci_host_common_remove to remove host bridge
> > > before removing overlay as I wrote in commit log.
> > >
> > > But of_changeset_revert will still runs into pci_host_
> > > common_remove to remove the host bridge again. Per
> > > my view, the design of of_changeset_revert to remove
> > > the device tree node will trigger device remove, so even
> > > pci_host_common_remove was explicitly used before
> > > of_changeset_revert. The following call to of_changeset_revert
> > > will still call pci_host_common_remove.
> > >
> > > So I did this patch to add a check of 'bus' to avoid remove again.
> > >
> >
> > Ok. I think there was a misunderstanding. Bjorn's example driver,
> > 'i2c-demux-pinctrl' applies the changeset, then adds the i2c adapter for its
> > own. And in remove(), it does the reverse.
> >
> > But in your case, the issue is with the host bridge driver that gets probed
> > because of the changeset. While with 'i2c-demux-pinctrl' driver, it only
> > applies the changeset. So we cannot compare both drivers. I believe in your
> > case, 'i2c-demux-pinctrl' becomes 'jailhouse', isn't it?
> >
> > So in your case, changeset is applied by jailhouse and that causes the
> > platform device to be created for the host bridge and then the host bridge
> > driver gets probed. So during destroy(), you call of_changeset_revert() that
> > removes the platform device and during that process it removes the host bridge
> > driver. The issue happens because during host bridge remove, it calls
> > pci_remove_root_bus() and that tries to remove the domain_nr using
> > pci_bus_release_domain_nr().
> >
> > But pci_bus_release_domain_nr() uses DT node to check whether to free the
> > domain_nr from static IDA or dynamic IDA. And because there is no DT node exist
> > at this time (it was already removed by of_changeset_revert()), it forces
> > pci_bus_release_domain_nr() to use dynamic IDA even though the IDA was initially
> > allocated from static IDA.
>
> Putting linux,pci-domain in an overlay is the same problem as aliases in
> overlays[1]. It's not going to work well.
>
> IMO, you can have overlays, or you can have static domains. You can't
> have both.
>
Okay.
> > I think a neat way to solve this issue would be by removing the OF node only
> > after removing all platform devices/drivers associated with that node. But I
> > honestly do not know whether that is possible or not. Otherwise, any other
> > driver that relies on the OF node in its remove() callback, could suffer from
> > the same issue. And whatever fix we may come up with in PCI core, it will be a
> > band-aid only.
> >
> > I'd like to check with Rob first about his opinion.
>
> If the struct device has an of_node set, there should be a reference
> count on that node. But I think that only prevents the node from being
> freed. It does not prevent the overlay from being detached. This is one
> of many of the issues with overlays Frank painstakingly documented[2].
>
Ah, I do remember this page as Frank ended up creating it based on my
continuous nudge to add CONFIG_FS interface for applying overlays.
So why are we applying overlays in kernel now?
> Perhaps it is just a matter of iterating thru all the nodes in an
> overlay, getting their driver/device, and forcing them to unbind.
> Though that has to be done per bus type.
>
Sounds like the correct approach.
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: check bridge->bus in pci_host_common_remove
2024-12-02 9:29 ` Manivannan Sadhasivam
@ 2024-12-02 13:55 ` Rob Herring
2024-12-15 13:26 ` Peng Fan
0 siblings, 1 reply; 11+ messages in thread
From: Rob Herring @ 2024-12-02 13:55 UTC (permalink / raw)
To: Manivannan Sadhasivam
Cc: Peng Fan, Peng Fan (OSS), Will Deacon, Lorenzo Pieralisi,
Krzysztof Wilczyński, Bjorn Helgaas, Pali Rohár,
open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list
On Mon, Dec 2, 2024 at 3:29 AM Manivannan Sadhasivam
<manivannan.sadhasivam@linaro.org> wrote:
>
> On Wed, Nov 27, 2024 at 01:56:50PM -0600, Rob Herring wrote:
> > On Fri, Nov 15, 2024 at 08:17:20PM +0530, Manivannan Sadhasivam wrote:
> > > On Fri, Nov 15, 2024 at 10:14:10AM +0000, Peng Fan wrote:
> > > > Hi Manivannan,
> > > >
> > > > > Subject: Re: [PATCH] PCI: check bridge->bus in
> > > > > pci_host_common_remove
> > > > >
> > > > > On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
> > > > > > From: Peng Fan <peng.fan@nxp.com>
> > > > > >
> > > > > > When PCI node was created using an overlay and the overlay is
> > > > > > reverted/destroyed, the "linux,pci-domain" property no longer exists,
> > > > > > so of_get_pci_domain_nr will return failure. Then
> > > > > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> > > > > even
> > > > > > if the IDA was allocated in static IDA. So the flow is as below:
> > > > > > A: of_changeset_revert
> > > > > > pci_host_common_remove
> > > > > > pci_bus_release_domain_nr
> > > > > > of_pci_bus_release_domain_nr
> > > > > > of_get_pci_domain_nr # fails because overlay is gone
> > > > > > ida_free(&pci_domain_nr_dynamic_ida)
> > > > > >
> > > > > > With driver calls pci_host_common_remove explicity, the flow
> > > > > becomes:
> > > > > > B pci_host_common_remove
> > > > > > pci_bus_release_domain_nr
> > > > > > of_pci_bus_release_domain_nr
> > > > > > of_get_pci_domain_nr # succeeds in this order
> > > > > > ida_free(&pci_domain_nr_static_ida)
> > > > > > A of_changeset_revert
> > > > > > pci_host_common_remove
> > > > > >
> > > > > > With updated flow, the pci_host_common_remove will be called
> > > > > twice, so
> > > > > > need to check 'bridge->bus' to avoid accessing invalid pointer.
> > > > > >
> > > > > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> > > > > > Signed-off-by: Peng Fan <peng.fan@nxp.com>
> > > > >
> > > > > I went through the previous discussion [1] and I couldn't see an
> > > > > agreement on the point raised by Bjorn on 'removing the host bridge
> > > > > before the overlay'.
> > > >
> > > > This patch is an agreement to Bjorn's idea.
> > > >
> > > > I have added pci_host_common_remove to remove host bridge
> > > > before removing overlay as I wrote in commit log.
> > > >
> > > > But of_changeset_revert will still runs into pci_host_
> > > > common_remove to remove the host bridge again. Per
> > > > my view, the design of of_changeset_revert to remove
> > > > the device tree node will trigger device remove, so even
> > > > pci_host_common_remove was explicitly used before
> > > > of_changeset_revert. The following call to of_changeset_revert
> > > > will still call pci_host_common_remove.
> > > >
> > > > So I did this patch to add a check of 'bus' to avoid remove again.
> > > >
> > >
> > > Ok. I think there was a misunderstanding. Bjorn's example driver,
> > > 'i2c-demux-pinctrl' applies the changeset, then adds the i2c adapter for its
> > > own. And in remove(), it does the reverse.
> > >
> > > But in your case, the issue is with the host bridge driver that gets probed
> > > because of the changeset. While with 'i2c-demux-pinctrl' driver, it only
> > > applies the changeset. So we cannot compare both drivers. I believe in your
> > > case, 'i2c-demux-pinctrl' becomes 'jailhouse', isn't it?
> > >
> > > So in your case, changeset is applied by jailhouse and that causes the
> > > platform device to be created for the host bridge and then the host bridge
> > > driver gets probed. So during destroy(), you call of_changeset_revert() that
> > > removes the platform device and during that process it removes the host bridge
> > > driver. The issue happens because during host bridge remove, it calls
> > > pci_remove_root_bus() and that tries to remove the domain_nr using
> > > pci_bus_release_domain_nr().
> > >
> > > But pci_bus_release_domain_nr() uses DT node to check whether to free the
> > > domain_nr from static IDA or dynamic IDA. And because there is no DT node exist
> > > at this time (it was already removed by of_changeset_revert()), it forces
> > > pci_bus_release_domain_nr() to use dynamic IDA even though the IDA was initially
> > > allocated from static IDA.
> >
> > Putting linux,pci-domain in an overlay is the same problem as aliases in
> > overlays[1]. It's not going to work well.
> >
> > IMO, you can have overlays, or you can have static domains. You can't
> > have both.
> >
>
> Okay.
>
> > > I think a neat way to solve this issue would be by removing the OF node only
> > > after removing all platform devices/drivers associated with that node. But I
> > > honestly do not know whether that is possible or not. Otherwise, any other
> > > driver that relies on the OF node in its remove() callback, could suffer from
> > > the same issue. And whatever fix we may come up with in PCI core, it will be a
> > > band-aid only.
> > >
> > > I'd like to check with Rob first about his opinion.
> >
> > If the struct device has an of_node set, there should be a reference
> > count on that node. But I think that only prevents the node from being
> > freed. It does not prevent the overlay from being detached. This is one
> > of many of the issues with overlays Frank painstakingly documented[2].
> >
>
> Ah, I do remember this page as Frank ended up creating it based on my
> continuous nudge to add CONFIG_FS interface for applying overlays.
>
> So why are we applying overlays in kernel now?
That's been the case for some time. Mostly it's been for fixups of old
to new bindings, but those all got dropped at some point. The in
kernel users are very specific use cases where we know something about
what's in the overlay. In contrast, configfs interface allows for any
change to any node or property with no control over it by the kernel.
Never say never, but I just don't see that ever happening upstream.
Rob
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: check bridge->bus in pci_host_common_remove
2024-12-02 13:55 ` Rob Herring
@ 2024-12-15 13:26 ` Peng Fan
2024-12-16 5:08 ` Manivannan Sadhasivam
0 siblings, 1 reply; 11+ messages in thread
From: Peng Fan @ 2024-12-15 13:26 UTC (permalink / raw)
To: Rob Herring
Cc: Manivannan Sadhasivam, Peng Fan, Will Deacon, Lorenzo Pieralisi,
Krzysztof Wilczyński, Bjorn Helgaas, Pali Rohár,
open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list
Hi Rob,
On Mon, Dec 02, 2024 at 07:55:27AM -0600, Rob Herring wrote:
>On Mon, Dec 2, 2024 at 3:29 AM Manivannan Sadhasivam
><manivannan.sadhasivam@linaro.org> wrote:
>>
>> On Wed, Nov 27, 2024 at 01:56:50PM -0600, Rob Herring wrote:
>> > On Fri, Nov 15, 2024 at 08:17:20PM +0530, Manivannan Sadhasivam wrote:
>> > > On Fri, Nov 15, 2024 at 10:14:10AM +0000, Peng Fan wrote:
>> > > > Hi Manivannan,
>> > > >
>> > > > > Subject: Re: [PATCH] PCI: check bridge->bus in
>> > > > > pci_host_common_remove
>> > > > >
>> > > > > On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
>> > > > > > From: Peng Fan <peng.fan@nxp.com>
>> > > > > >
>> > > > > > When PCI node was created using an overlay and the overlay is
>> > > > > > reverted/destroyed, the "linux,pci-domain" property no longer exists,
>> > > > > > so of_get_pci_domain_nr will return failure. Then
>> > > > > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
>> > > > > even
>> > > > > > if the IDA was allocated in static IDA. So the flow is as below:
>> > > > > > A: of_changeset_revert
>> > > > > > pci_host_common_remove
>> > > > > > pci_bus_release_domain_nr
>> > > > > > of_pci_bus_release_domain_nr
>> > > > > > of_get_pci_domain_nr # fails because overlay is gone
>> > > > > > ida_free(&pci_domain_nr_dynamic_ida)
>> > > > > >
>> > > > > > With driver calls pci_host_common_remove explicity, the flow
>> > > > > becomes:
>> > > > > > B pci_host_common_remove
>> > > > > > pci_bus_release_domain_nr
>> > > > > > of_pci_bus_release_domain_nr
>> > > > > > of_get_pci_domain_nr # succeeds in this order
>> > > > > > ida_free(&pci_domain_nr_static_ida)
>> > > > > > A of_changeset_revert
>> > > > > > pci_host_common_remove
>> > > > > >
>> > > > > > With updated flow, the pci_host_common_remove will be called
>> > > > > twice, so
>> > > > > > need to check 'bridge->bus' to avoid accessing invalid pointer.
>> > > > > >
>> > > > > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
>> > > > > > Signed-off-by: Peng Fan <peng.fan@nxp.com>
>> > > > >
>> > > > > I went through the previous discussion [1] and I couldn't see an
>> > > > > agreement on the point raised by Bjorn on 'removing the host bridge
>> > > > > before the overlay'.
>> > > >
>> > > > This patch is an agreement to Bjorn's idea.
>> > > >
>> > > > I have added pci_host_common_remove to remove host bridge
>> > > > before removing overlay as I wrote in commit log.
>> > > >
>> > > > But of_changeset_revert will still runs into pci_host_
>> > > > common_remove to remove the host bridge again. Per
>> > > > my view, the design of of_changeset_revert to remove
>> > > > the device tree node will trigger device remove, so even
>> > > > pci_host_common_remove was explicitly used before
>> > > > of_changeset_revert. The following call to of_changeset_revert
>> > > > will still call pci_host_common_remove.
>> > > >
>> > > > So I did this patch to add a check of 'bus' to avoid remove again.
>> > > >
>> > >
>> > > Ok. I think there was a misunderstanding. Bjorn's example driver,
>> > > 'i2c-demux-pinctrl' applies the changeset, then adds the i2c adapter for its
>> > > own. And in remove(), it does the reverse.
>> > >
>> > > But in your case, the issue is with the host bridge driver that gets probed
>> > > because of the changeset. While with 'i2c-demux-pinctrl' driver, it only
>> > > applies the changeset. So we cannot compare both drivers. I believe in your
>> > > case, 'i2c-demux-pinctrl' becomes 'jailhouse', isn't it?
>> > >
>> > > So in your case, changeset is applied by jailhouse and that causes the
>> > > platform device to be created for the host bridge and then the host bridge
>> > > driver gets probed. So during destroy(), you call of_changeset_revert() that
>> > > removes the platform device and during that process it removes the host bridge
>> > > driver. The issue happens because during host bridge remove, it calls
>> > > pci_remove_root_bus() and that tries to remove the domain_nr using
>> > > pci_bus_release_domain_nr().
>> > >
>> > > But pci_bus_release_domain_nr() uses DT node to check whether to free the
>> > > domain_nr from static IDA or dynamic IDA. And because there is no DT node exist
>> > > at this time (it was already removed by of_changeset_revert()), it forces
>> > > pci_bus_release_domain_nr() to use dynamic IDA even though the IDA was initially
>> > > allocated from static IDA.
>> >
>> > Putting linux,pci-domain in an overlay is the same problem as aliases in
>> > overlays[1]. It's not going to work well.
>> >
>> > IMO, you can have overlays, or you can have static domains. You can't
>> > have both.
>> >
>>
>> Okay.
>>
>> > > I think a neat way to solve this issue would be by removing the OF node only
>> > > after removing all platform devices/drivers associated with that node. But I
>> > > honestly do not know whether that is possible or not. Otherwise, any other
>> > > driver that relies on the OF node in its remove() callback, could suffer from
>> > > the same issue. And whatever fix we may come up with in PCI core, it will be a
>> > > band-aid only.
>> > >
>> > > I'd like to check with Rob first about his opinion.
>> >
>> > If the struct device has an of_node set, there should be a reference
>> > count on that node. But I think that only prevents the node from being
>> > freed. It does not prevent the overlay from being detached. This is one
>> > of many of the issues with overlays Frank painstakingly documented[2].
>> >
>>
>> Ah, I do remember this page as Frank ended up creating it based on my
>> continuous nudge to add CONFIG_FS interface for applying overlays.
>>
>> So why are we applying overlays in kernel now?
>
>That's been the case for some time. Mostly it's been for fixups of old
>to new bindings, but those all got dropped at some point. The in
>kernel users are very specific use cases where we know something about
>what's in the overlay. In contrast, configfs interface allows for any
>change to any node or property with no control over it by the kernel.
>Never say never, but I just don't see that ever happening upstream.
So should I switch to use configfs for jailhouse case? Currently
we use overlay to add a virtual pci node to kernel device tree.
Thanks
Peng
>
>Rob
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH] PCI: check bridge->bus in pci_host_common_remove
2024-12-15 13:26 ` Peng Fan
@ 2024-12-16 5:08 ` Manivannan Sadhasivam
0 siblings, 0 replies; 11+ messages in thread
From: Manivannan Sadhasivam @ 2024-12-16 5:08 UTC (permalink / raw)
To: Peng Fan
Cc: Rob Herring, Peng Fan, Will Deacon, Lorenzo Pieralisi,
Krzysztof Wilczyński, Bjorn Helgaas, Pali Rohár,
open list:PCI DRIVER FOR GENERIC OF HOSTS,
moderated list:PCI DRIVER FOR GENERIC OF HOSTS, open list
On Sun, Dec 15, 2024 at 09:26:40PM +0800, Peng Fan wrote:
> Hi Rob,
>
> On Mon, Dec 02, 2024 at 07:55:27AM -0600, Rob Herring wrote:
> >On Mon, Dec 2, 2024 at 3:29 AM Manivannan Sadhasivam
> ><manivannan.sadhasivam@linaro.org> wrote:
> >>
> >> On Wed, Nov 27, 2024 at 01:56:50PM -0600, Rob Herring wrote:
> >> > On Fri, Nov 15, 2024 at 08:17:20PM +0530, Manivannan Sadhasivam wrote:
> >> > > On Fri, Nov 15, 2024 at 10:14:10AM +0000, Peng Fan wrote:
> >> > > > Hi Manivannan,
> >> > > >
> >> > > > > Subject: Re: [PATCH] PCI: check bridge->bus in
> >> > > > > pci_host_common_remove
> >> > > > >
> >> > > > > On Mon, Oct 28, 2024 at 04:46:43PM +0800, Peng Fan (OSS) wrote:
> >> > > > > > From: Peng Fan <peng.fan@nxp.com>
> >> > > > > >
> >> > > > > > When PCI node was created using an overlay and the overlay is
> >> > > > > > reverted/destroyed, the "linux,pci-domain" property no longer exists,
> >> > > > > > so of_get_pci_domain_nr will return failure. Then
> >> > > > > > of_pci_bus_release_domain_nr will actually use the dynamic IDA,
> >> > > > > even
> >> > > > > > if the IDA was allocated in static IDA. So the flow is as below:
> >> > > > > > A: of_changeset_revert
> >> > > > > > pci_host_common_remove
> >> > > > > > pci_bus_release_domain_nr
> >> > > > > > of_pci_bus_release_domain_nr
> >> > > > > > of_get_pci_domain_nr # fails because overlay is gone
> >> > > > > > ida_free(&pci_domain_nr_dynamic_ida)
> >> > > > > >
> >> > > > > > With driver calls pci_host_common_remove explicity, the flow
> >> > > > > becomes:
> >> > > > > > B pci_host_common_remove
> >> > > > > > pci_bus_release_domain_nr
> >> > > > > > of_pci_bus_release_domain_nr
> >> > > > > > of_get_pci_domain_nr # succeeds in this order
> >> > > > > > ida_free(&pci_domain_nr_static_ida)
> >> > > > > > A of_changeset_revert
> >> > > > > > pci_host_common_remove
> >> > > > > >
> >> > > > > > With updated flow, the pci_host_common_remove will be called
> >> > > > > twice, so
> >> > > > > > need to check 'bridge->bus' to avoid accessing invalid pointer.
> >> > > > > >
> >> > > > > > Fixes: c14f7ccc9f5d ("PCI: Assign PCI domain IDs by ida_alloc()")
> >> > > > > > Signed-off-by: Peng Fan <peng.fan@nxp.com>
> >> > > > >
> >> > > > > I went through the previous discussion [1] and I couldn't see an
> >> > > > > agreement on the point raised by Bjorn on 'removing the host bridge
> >> > > > > before the overlay'.
> >> > > >
> >> > > > This patch is an agreement to Bjorn's idea.
> >> > > >
> >> > > > I have added pci_host_common_remove to remove host bridge
> >> > > > before removing overlay as I wrote in commit log.
> >> > > >
> >> > > > But of_changeset_revert will still runs into pci_host_
> >> > > > common_remove to remove the host bridge again. Per
> >> > > > my view, the design of of_changeset_revert to remove
> >> > > > the device tree node will trigger device remove, so even
> >> > > > pci_host_common_remove was explicitly used before
> >> > > > of_changeset_revert. The following call to of_changeset_revert
> >> > > > will still call pci_host_common_remove.
> >> > > >
> >> > > > So I did this patch to add a check of 'bus' to avoid remove again.
> >> > > >
> >> > >
> >> > > Ok. I think there was a misunderstanding. Bjorn's example driver,
> >> > > 'i2c-demux-pinctrl' applies the changeset, then adds the i2c adapter for its
> >> > > own. And in remove(), it does the reverse.
> >> > >
> >> > > But in your case, the issue is with the host bridge driver that gets probed
> >> > > because of the changeset. While with 'i2c-demux-pinctrl' driver, it only
> >> > > applies the changeset. So we cannot compare both drivers. I believe in your
> >> > > case, 'i2c-demux-pinctrl' becomes 'jailhouse', isn't it?
> >> > >
> >> > > So in your case, changeset is applied by jailhouse and that causes the
> >> > > platform device to be created for the host bridge and then the host bridge
> >> > > driver gets probed. So during destroy(), you call of_changeset_revert() that
> >> > > removes the platform device and during that process it removes the host bridge
> >> > > driver. The issue happens because during host bridge remove, it calls
> >> > > pci_remove_root_bus() and that tries to remove the domain_nr using
> >> > > pci_bus_release_domain_nr().
> >> > >
> >> > > But pci_bus_release_domain_nr() uses DT node to check whether to free the
> >> > > domain_nr from static IDA or dynamic IDA. And because there is no DT node exist
> >> > > at this time (it was already removed by of_changeset_revert()), it forces
> >> > > pci_bus_release_domain_nr() to use dynamic IDA even though the IDA was initially
> >> > > allocated from static IDA.
> >> >
> >> > Putting linux,pci-domain in an overlay is the same problem as aliases in
> >> > overlays[1]. It's not going to work well.
> >> >
> >> > IMO, you can have overlays, or you can have static domains. You can't
> >> > have both.
> >> >
> >>
> >> Okay.
> >>
> >> > > I think a neat way to solve this issue would be by removing the OF node only
> >> > > after removing all platform devices/drivers associated with that node. But I
> >> > > honestly do not know whether that is possible or not. Otherwise, any other
> >> > > driver that relies on the OF node in its remove() callback, could suffer from
> >> > > the same issue. And whatever fix we may come up with in PCI core, it will be a
> >> > > band-aid only.
> >> > >
> >> > > I'd like to check with Rob first about his opinion.
> >> >
> >> > If the struct device has an of_node set, there should be a reference
> >> > count on that node. But I think that only prevents the node from being
> >> > freed. It does not prevent the overlay from being detached. This is one
> >> > of many of the issues with overlays Frank painstakingly documented[2].
> >> >
> >>
> >> Ah, I do remember this page as Frank ended up creating it based on my
> >> continuous nudge to add CONFIG_FS interface for applying overlays.
> >>
> >> So why are we applying overlays in kernel now?
> >
> >That's been the case for some time. Mostly it's been for fixups of old
> >to new bindings, but those all got dropped at some point. The in
> >kernel users are very specific use cases where we know something about
> >what's in the overlay. In contrast, configfs interface allows for any
> >change to any node or property with no control over it by the kernel.
> >Never say never, but I just don't see that ever happening upstream.
>
> So should I switch to use configfs for jailhouse case? Currently
> we use overlay to add a virtual pci node to kernel device tree.
>
Not at all. I think you have 2 options:
1. Get rid of 'linux,pci-domain' from overlay and rely on static id allocation.
2. Make sure the driver is unbind for each device before removing the overlays.
Options 1 is to avoid your issue and option 2 is to fix your issue. You can
decide which one to opt for :)
- Mani
--
மணிவண்ணன் சதாசிவம்
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-12-16 5:10 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-10-28 8:46 [PATCH] PCI: check bridge->bus in pci_host_common_remove Peng Fan (OSS)
2024-11-15 6:20 ` Manivannan Sadhasivam
2024-11-15 10:14 ` Peng Fan
2024-11-15 14:47 ` Manivannan Sadhasivam
2024-11-19 7:30 ` Peng Fan
2024-11-27 19:56 ` Rob Herring
2024-12-02 9:29 ` Manivannan Sadhasivam
2024-12-02 13:55 ` Rob Herring
2024-12-15 13:26 ` Peng Fan
2024-12-16 5:08 ` Manivannan Sadhasivam
2024-11-19 17:49 ` Bjorn Helgaas
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).