* [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
@ 2024-07-03 14:16 Amit Machhiwal
2024-07-05 19:20 ` Bjorn Helgaas
2024-07-11 12:20 ` Rob Herring
0 siblings, 2 replies; 9+ messages in thread
From: Amit Machhiwal @ 2024-07-03 14:16 UTC (permalink / raw)
To: linux-pci, linux-kernel, linuxppc-dev
Cc: Rob Herring, Kowshik Jois B S, Nicholas Piggin,
Vaidyanathan Srinivasan, Lizhi Hou, Amit Machhiwal, kvm-ppc,
Bjorn Helgaas, Vaibhav Jain
With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
of a PCI device attached to a PCI-bridge causes following kernel Oops on
a pseries KVM guest:
RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
Kernel attempted to read user page (10ec00000048) - exploit attempt? (uid: 0)
BUG: Unable to handle kernel data access on read at 0x10ec00000048
Faulting instruction address: 0xc0000000012d8728
Oops: Kernel access of bad area, sig: 11 [#1]
LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
<snip>
NIP [c0000000012d8728] __of_changeset_entry_invert+0x10/0x1ac
LR [c0000000012da7f0] __of_changeset_revert_entries+0x98/0x180
Call Trace:
[c00000000bcc3970] [c0000000012daa60] of_changeset_revert+0x58/0xd8
[c00000000bcc39c0] [c000000000d0ed78] of_pci_remove_node+0x74/0xb0
[c00000000bcc39f0] [c000000000cdcfe0] pci_stop_bus_device+0xf4/0x138
[c00000000bcc3a30] [c000000000cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64
[c00000000bcc3a60] [c000000000cf3780] remove_store+0xf0/0x108
[c00000000bcc3ab0] [c000000000e89e04] dev_attr_store+0x34/0x78
[c00000000bcc3ad0] [c0000000007f8dd4] sysfs_kf_write+0x70/0xa4
[c00000000bcc3af0] [c0000000007f7248] kernfs_fop_write_iter+0x1d0/0x2e0
[c00000000bcc3b40] [c0000000006c9b08] vfs_write+0x27c/0x558
[c00000000bcc3bf0] [c0000000006ca168] ksys_write+0x90/0x170
[c00000000bcc3c40] [c000000000033248] system_call_exception+0xf8/0x290
[c00000000bcc3e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
<snip>
A git bisect pointed this regression to be introduced via [1] that added
a mechanism to create device tree nodes for parent PCI bridges when a
PCI device is hot-plugged.
The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
device-tree node associated with the pci_dev that was earlier
hot-plugged and was attached under a pci-bridge. The PCI dev header
`dev->hdr_type` being 0, results a conditional check done with
`pci_is_bridge()` into false. Consequently, a call to
`of_pci_make_dev_node()` to create a device node is never made. When at
a later point in time, in the device node removal path, a memcpy is
attempted in `__of_changeset_entry_invert()`; since the device node was
never created, results in an Oops due to kernel read access to a bad
address.
To fix this issue the patch updates `pci_stop_dev()` to ensure that a
call to `of_pci_remove_node()` is only made for pci-bridge devices.
[1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
Reported-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
Tested-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
---
drivers/pci/remove.c | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index d749ea8250d6..4e51c64af416 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -22,7 +22,8 @@ static void pci_stop_dev(struct pci_dev *dev)
device_release_driver(&dev->dev);
pci_proc_detach_device(dev);
pci_remove_sysfs_dev_files(dev);
- of_pci_remove_node(dev);
+ if (pci_is_bridge(dev))
+ of_pci_remove_node(dev);
pci_dev_assign_added(dev, false);
}
base-commit: e9d22f7a6655941fc8b2b942ed354ec780936b3e
--
2.45.2
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
2024-07-03 14:16 [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest Amit Machhiwal
@ 2024-07-05 19:20 ` Bjorn Helgaas
2024-07-11 4:48 ` Lizhi Hou
2024-07-11 12:20 ` Rob Herring
1 sibling, 1 reply; 9+ messages in thread
From: Bjorn Helgaas @ 2024-07-05 19:20 UTC (permalink / raw)
To: Amit Machhiwal
Cc: Rob Herring, Kowshik Jois B S, linux-pci, linux-kernel, kvm-ppc,
Vaidyanathan Srinivasan, Lizhi Hou, Lukas Wunner, Nicholas Piggin,
Bjorn Helgaas, Vaibhav Jain, linuxppc-dev
[+cc Lukas, FYI]
On Wed, Jul 03, 2024 at 07:46:34PM +0530, Amit Machhiwal wrote:
> With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> of a PCI device attached to a PCI-bridge causes following kernel Oops on
> a pseries KVM guest:
>
> RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
> Kernel attempted to read user page (10ec00000048) - exploit attempt? (uid: 0)
> BUG: Unable to handle kernel data access on read at 0x10ec00000048
> Faulting instruction address: 0xc0000000012d8728
> Oops: Kernel access of bad area, sig: 11 [#1]
> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> <snip>
> NIP [c0000000012d8728] __of_changeset_entry_invert+0x10/0x1ac
> LR [c0000000012da7f0] __of_changeset_revert_entries+0x98/0x180
> Call Trace:
> [c00000000bcc3970] [c0000000012daa60] of_changeset_revert+0x58/0xd8
> [c00000000bcc39c0] [c000000000d0ed78] of_pci_remove_node+0x74/0xb0
> [c00000000bcc39f0] [c000000000cdcfe0] pci_stop_bus_device+0xf4/0x138
> [c00000000bcc3a30] [c000000000cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64
> [c00000000bcc3a60] [c000000000cf3780] remove_store+0xf0/0x108
> [c00000000bcc3ab0] [c000000000e89e04] dev_attr_store+0x34/0x78
> [c00000000bcc3ad0] [c0000000007f8dd4] sysfs_kf_write+0x70/0xa4
> [c00000000bcc3af0] [c0000000007f7248] kernfs_fop_write_iter+0x1d0/0x2e0
> [c00000000bcc3b40] [c0000000006c9b08] vfs_write+0x27c/0x558
> [c00000000bcc3bf0] [c0000000006ca168] ksys_write+0x90/0x170
> [c00000000bcc3c40] [c000000000033248] system_call_exception+0xf8/0x290
> [c00000000bcc3e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
> <snip>
>
> A git bisect pointed this regression to be introduced via [1] that added
> a mechanism to create device tree nodes for parent PCI bridges when a
> PCI device is hot-plugged.
>
> The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
> device-tree node associated with the pci_dev that was earlier
> hot-plugged and was attached under a pci-bridge. The PCI dev header
> `dev->hdr_type` being 0, results a conditional check done with
> `pci_is_bridge()` into false. Consequently, a call to
> `of_pci_make_dev_node()` to create a device node is never made. When at
> a later point in time, in the device node removal path, a memcpy is
> attempted in `__of_changeset_entry_invert()`; since the device node was
> never created, results in an Oops due to kernel read access to a bad
> address.
>
> To fix this issue the patch updates `pci_stop_dev()` to ensure that a
> call to `of_pci_remove_node()` is only made for pci-bridge devices.
>
> [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
>
> Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
> Reported-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
> Tested-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
> Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
Thanks for the patch and testing! Would like a reviewed-by from
Lizhi.
> ---
> drivers/pci/remove.c | 3 ++-
> 1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index d749ea8250d6..4e51c64af416 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -22,7 +22,8 @@ static void pci_stop_dev(struct pci_dev *dev)
> device_release_driver(&dev->dev);
> pci_proc_detach_device(dev);
> pci_remove_sysfs_dev_files(dev);
> - of_pci_remove_node(dev);
> + if (pci_is_bridge(dev))
> + of_pci_remove_node(dev);
IIUC, this basically undoes the work that was done by
of_pci_make_dev_node().
The call of of_pci_make_dev_node() from pci_bus_add_device() was added
by 407d1a51921e and is conditional on pci_is_bridge(), so it makes
sense to me that the remove needs a similar condition.
> pci_dev_assign_added(dev, false);
> }
>
> base-commit: e9d22f7a6655941fc8b2b942ed354ec780936b3e
> --
> 2.45.2
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
2024-07-05 19:20 ` Bjorn Helgaas
@ 2024-07-11 4:48 ` Lizhi Hou
2024-07-11 18:48 ` Amit Machhiwal
0 siblings, 1 reply; 9+ messages in thread
From: Lizhi Hou @ 2024-07-11 4:48 UTC (permalink / raw)
To: Bjorn Helgaas, Amit Machhiwal
Cc: Rob Herring, Kowshik Jois B S, linux-pci, linux-kernel, kvm-ppc,
Vaidyanathan Srinivasan, Lukas Wunner, Nicholas Piggin,
Bjorn Helgaas, Vaibhav Jain, linuxppc-dev
On 7/5/24 12:20, Bjorn Helgaas wrote:
> [+cc Lukas, FYI]
>
> On Wed, Jul 03, 2024 at 07:46:34PM +0530, Amit Machhiwal wrote:
>> With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
>> of a PCI device attached to a PCI-bridge causes following kernel Oops on
>> a pseries KVM guest:
>>
>> RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
>> Kernel attempted to read user page (10ec00000048) - exploit attempt? (uid: 0)
>> BUG: Unable to handle kernel data access on read at 0x10ec00000048
>> Faulting instruction address: 0xc0000000012d8728
>> Oops: Kernel access of bad area, sig: 11 [#1]
>> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>> <snip>
>> NIP [c0000000012d8728] __of_changeset_entry_invert+0x10/0x1ac
>> LR [c0000000012da7f0] __of_changeset_revert_entries+0x98/0x180
>> Call Trace:
>> [c00000000bcc3970] [c0000000012daa60] of_changeset_revert+0x58/0xd8
>> [c00000000bcc39c0] [c000000000d0ed78] of_pci_remove_node+0x74/0xb0
>> [c00000000bcc39f0] [c000000000cdcfe0] pci_stop_bus_device+0xf4/0x138
>> [c00000000bcc3a30] [c000000000cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64
>> [c00000000bcc3a60] [c000000000cf3780] remove_store+0xf0/0x108
>> [c00000000bcc3ab0] [c000000000e89e04] dev_attr_store+0x34/0x78
>> [c00000000bcc3ad0] [c0000000007f8dd4] sysfs_kf_write+0x70/0xa4
>> [c00000000bcc3af0] [c0000000007f7248] kernfs_fop_write_iter+0x1d0/0x2e0
>> [c00000000bcc3b40] [c0000000006c9b08] vfs_write+0x27c/0x558
>> [c00000000bcc3bf0] [c0000000006ca168] ksys_write+0x90/0x170
>> [c00000000bcc3c40] [c000000000033248] system_call_exception+0xf8/0x290
>> [c00000000bcc3e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
>> <snip>
>>
>> A git bisect pointed this regression to be introduced via [1] that added
>> a mechanism to create device tree nodes for parent PCI bridges when a
>> PCI device is hot-plugged.
>>
>> The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
>> device-tree node associated with the pci_dev that was earlier
>> hot-plugged and was attached under a pci-bridge. The PCI dev header
>> `dev->hdr_type` being 0, results a conditional check done with
>> `pci_is_bridge()` into false. Consequently, a call to
>> `of_pci_make_dev_node()` to create a device node is never made. When at
>> a later point in time, in the device node removal path, a memcpy is
>> attempted in `__of_changeset_entry_invert()`; since the device node was
>> never created, results in an Oops due to kernel read access to a bad
>> address.
>>
>> To fix this issue the patch updates `pci_stop_dev()` to ensure that a
>> call to `of_pci_remove_node()` is only made for pci-bridge devices.
>>
>> [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
>>
>> Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
>> Reported-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
>> Tested-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
>> Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> Thanks for the patch and testing! Would like a reviewed-by from
> Lizhi.
of_pci_make_dev_node() will create of nodes for some endpoint devices
(not a bridge) as well. And actually this is the main purpose.
Maybe the patch as below would resolve the Oops?
diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
index dda6092e6d3a..3c693b091ecf 100644
--- a/drivers/of/dynamic.c
+++ b/drivers/of/dynamic.c
@@ -492,21 +492,29 @@ struct device_node *__of_node_dup(const struct
device_node *np,
* a given changeset.
*
* @ocs: Pointer to changeset
+ * @np: Pointer to device node. If it is not null, init it directly
instead of
+ * allocate a new node.
* @parent: Pointer to parent device node
* @full_name: Node full name
*
* Return: Pointer to the created device node or NULL in case of an error.
*/
struct device_node *of_changeset_create_node(struct of_changeset *ocs,
+ struct device_node *np,
struct device_node *parent,
const char *full_name)
{
- struct device_node *np;
int ret;
- np = __of_node_dup(NULL, full_name);
- if (!np)
- return NULL;
+ if (!np) {
+ np = __of_node_dup(NULL, full_name);
+ if (!np)
+ return NULL;
+ } else {
+ of_node_set_flag(np, OF_DYNAMIC);
+ of_node_set_flag(np, OF_DETACHED);
+ }
+
np->parent = parent;
ret = of_changeset_attach_node(ocs, np);
diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
index 445ad13dab98..087de26852cc 100644
--- a/drivers/of/unittest.c
+++ b/drivers/of/unittest.c
@@ -871,7 +871,7 @@ static void __init of_unittest_changeset(void)
unittest(!of_changeset_add_property(&chgset, parent, ppadd),
"fail add prop prop-add\n");
unittest(!of_changeset_update_property(&chgset, parent,
ppupdate), "fail update prop\n");
unittest(!of_changeset_remove_property(&chgset, parent,
ppremove), "fail remove prop\n");
- n22 = of_changeset_create_node(&chgset, n2, "n22");
+ n22 = of_changeset_create_node(&chgset, NULL, n2, "n22");
unittest(n22, "fail create n22\n");
unittest(!of_changeset_add_prop_string(&chgset, n22,
"prop-str", "abcd"),
"fail add prop prop-str");
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 51e3dd0ea5ab..92c079b2e570 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -608,18 +608,28 @@ int devm_of_pci_bridge_init(struct device *dev,
struct pci_host_bridge *bridge)
#ifdef CONFIG_PCI_DYNAMIC_OF_NODES
+void of_pci_free_node(struct device_node *np)
+{
+ struct of_changeset *cset;
+
+ cset = (struct of_changeset *)(np + 1);
+
+ np->data = NULL;
+ of_changeset_revert(cset);
+ of_changeset_destroy(cset);
+ of_node_put(np);
+}
+
void of_pci_remove_node(struct pci_dev *pdev)
{
struct device_node *np;
np = pci_device_to_OF_node(pdev);
- if (!np || !of_node_check_flag(np, OF_DYNAMIC))
+ if (!np || np->data != of_pci_free_node)
return;
pdev->dev.of_node = NULL;
- of_changeset_revert(np->data);
- of_changeset_destroy(np->data);
- of_node_put(np);
+ of_pci_free_node(np);
}
void of_pci_make_dev_node(struct pci_dev *pdev)
@@ -655,14 +665,18 @@ void of_pci_make_dev_node(struct pci_dev *pdev)
if (!name)
return;
- cset = kmalloc(sizeof(*cset), GFP_KERNEL);
- if (!cset)
+ np = kzalloc(sizeof(*np) + sizeof(*cset), GFP_KERNEL);
+ if (!np)
goto out_free_name;
+ np->full_name = name;
+ of_node_init(np);
+
+ cset = (struct of_changeset *)(np + 1);
of_changeset_init(cset);
- np = of_changeset_create_node(cset, ppnode, name);
+ np = of_changeset_create_node(cset, np, ppnode, NULL);
if (!np)
- goto out_destroy_cset;
+ goto out_free_node;
ret = of_pci_add_properties(pdev, cset, np);
if (ret)
@@ -672,9 +686,8 @@ void of_pci_make_dev_node(struct pci_dev *pdev)
if (ret)
goto out_free_node;
- np->data = cset;
+ np->data = of_pci_free_node;
pdev->dev.of_node = np;
- kfree(name);
return;
diff --git a/include/linux/of.h b/include/linux/of.h
index a0bedd038a05..f774459d0d84 100644
--- a/include/linux/of.h
+++ b/include/linux/of.h
@@ -1631,6 +1631,7 @@ static inline int
of_changeset_update_property(struct of_changeset *ocs,
}
struct device_node *of_changeset_create_node(struct of_changeset *ocs,
+ struct device_node *np,
struct device_node *parent,
const char *full_name);
int of_changeset_add_prop_string(struct of_changeset *ocs,
Thanks,
Lizhi
>
>> ---
>> drivers/pci/remove.c | 3 ++-
>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
>> index d749ea8250d6..4e51c64af416 100644
>> --- a/drivers/pci/remove.c
>> +++ b/drivers/pci/remove.c
>> @@ -22,7 +22,8 @@ static void pci_stop_dev(struct pci_dev *dev)
>> device_release_driver(&dev->dev);
>> pci_proc_detach_device(dev);
>> pci_remove_sysfs_dev_files(dev);
>> - of_pci_remove_node(dev);
>> + if (pci_is_bridge(dev))
>> + of_pci_remove_node(dev);
> IIUC, this basically undoes the work that was done by
> of_pci_make_dev_node().
>
> The call of of_pci_make_dev_node() from pci_bus_add_device() was added
> by 407d1a51921e and is conditional on pci_is_bridge(), so it makes
> sense to me that the remove needs a similar condition.
>
>> pci_dev_assign_added(dev, false);
>> }
>>
>> base-commit: e9d22f7a6655941fc8b2b942ed354ec780936b3e
>> --
>> 2.45.2
>>
^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
2024-07-03 14:16 [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest Amit Machhiwal
2024-07-05 19:20 ` Bjorn Helgaas
@ 2024-07-11 12:20 ` Rob Herring
2024-07-11 14:21 ` Amit Machhiwal
1 sibling, 1 reply; 9+ messages in thread
From: Rob Herring @ 2024-07-11 12:20 UTC (permalink / raw)
To: Amit Machhiwal
Cc: Kowshik Jois B S, linux-pci, linux-kernel, kvm-ppc,
Vaidyanathan Srinivasan, Lizhi Hou, Nicholas Piggin,
Bjorn Helgaas, Vaibhav Jain, linuxppc-dev
On Wed, Jul 3, 2024 at 8:17 AM Amit Machhiwal <amachhiw@linux.ibm.com> wrote:
>
> With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> of a PCI device attached to a PCI-bridge causes following kernel Oops on
> a pseries KVM guest:
Can I ask why you have this option on in the first place? Do you have
a use for it or it's just a case of distros turn on every kconfig
option.
Rob
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
2024-07-11 12:20 ` Rob Herring
@ 2024-07-11 14:21 ` Amit Machhiwal
2024-07-11 19:34 ` Rob Herring
0 siblings, 1 reply; 9+ messages in thread
From: Amit Machhiwal @ 2024-07-11 14:21 UTC (permalink / raw)
To: Rob Herring
Cc: Kowshik Jois B S, linux-pci, linux-kernel, kvm-ppc,
Vaidyanathan Srinivasan, Lizhi Hou, Nicholas Piggin,
Bjorn Helgaas, Vaibhav Jain, linuxppc-dev
Hi Rob,
On 2024/07/11 06:20 AM, Rob Herring wrote:
> On Wed, Jul 3, 2024 at 8:17 AM Amit Machhiwal <amachhiw@linux.ibm.com> wrote:
> >
> > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> > of a PCI device attached to a PCI-bridge causes following kernel Oops on
> > a pseries KVM guest:
>
> Can I ask why you have this option on in the first place? Do you have
> a use for it or it's just a case of distros turn on every kconfig
> option.
Yes, this option is turned on in Ubuntu's distro kernel config where the issue
was originally reported, while Fedora is keeping this turned off.
root@ubuntu:~# cat /boot/config-6.8.0-38-generic | grep PCI_DYN
CONFIG_PCI_DYNAMIC_OF_NODES=y
root@fedora:~# cat /boot/config-6.9.7-200.fc40.ppc64le | grep PCI_DYN
# CONFIG_PCI_DYNAMIC_OF_NODES is not set
Thanks,
Amit
>
> Rob
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
2024-07-11 4:48 ` Lizhi Hou
@ 2024-07-11 18:48 ` Amit Machhiwal
2024-07-11 21:18 ` Lizhi Hou
0 siblings, 1 reply; 9+ messages in thread
From: Amit Machhiwal @ 2024-07-11 18:48 UTC (permalink / raw)
To: Lizhi Hou
Cc: Rob Herring, Kowshik Jois B S, linux-pci, linux-kernel, kvm-ppc,
Vaidyanathan Srinivasan, Lukas Wunner, Bjorn Helgaas,
Nicholas Piggin, Bjorn Helgaas, Vaibhav Jain, linuxppc-dev
On 2024/07/10 09:48 PM, Lizhi Hou wrote:
>
> On 7/5/24 12:20, Bjorn Helgaas wrote:
> > [+cc Lukas, FYI]
> >
> > On Wed, Jul 03, 2024 at 07:46:34PM +0530, Amit Machhiwal wrote:
> > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> > > of a PCI device attached to a PCI-bridge causes following kernel Oops on
> > > a pseries KVM guest:
> > >
> > > RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
> > > Kernel attempted to read user page (10ec00000048) - exploit attempt? (uid: 0)
> > > BUG: Unable to handle kernel data access on read at 0x10ec00000048
> > > Faulting instruction address: 0xc0000000012d8728
> > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > > <snip>
> > > NIP [c0000000012d8728] __of_changeset_entry_invert+0x10/0x1ac
> > > LR [c0000000012da7f0] __of_changeset_revert_entries+0x98/0x180
> > > Call Trace:
> > > [c00000000bcc3970] [c0000000012daa60] of_changeset_revert+0x58/0xd8
> > > [c00000000bcc39c0] [c000000000d0ed78] of_pci_remove_node+0x74/0xb0
> > > [c00000000bcc39f0] [c000000000cdcfe0] pci_stop_bus_device+0xf4/0x138
> > > [c00000000bcc3a30] [c000000000cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64
> > > [c00000000bcc3a60] [c000000000cf3780] remove_store+0xf0/0x108
> > > [c00000000bcc3ab0] [c000000000e89e04] dev_attr_store+0x34/0x78
> > > [c00000000bcc3ad0] [c0000000007f8dd4] sysfs_kf_write+0x70/0xa4
> > > [c00000000bcc3af0] [c0000000007f7248] kernfs_fop_write_iter+0x1d0/0x2e0
> > > [c00000000bcc3b40] [c0000000006c9b08] vfs_write+0x27c/0x558
> > > [c00000000bcc3bf0] [c0000000006ca168] ksys_write+0x90/0x170
> > > [c00000000bcc3c40] [c000000000033248] system_call_exception+0xf8/0x290
> > > [c00000000bcc3e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
> > > <snip>
> > >
> > > A git bisect pointed this regression to be introduced via [1] that added
> > > a mechanism to create device tree nodes for parent PCI bridges when a
> > > PCI device is hot-plugged.
> > >
> > > The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
> > > device-tree node associated with the pci_dev that was earlier
> > > hot-plugged and was attached under a pci-bridge. The PCI dev header
> > > `dev->hdr_type` being 0, results a conditional check done with
> > > `pci_is_bridge()` into false. Consequently, a call to
> > > `of_pci_make_dev_node()` to create a device node is never made. When at
> > > a later point in time, in the device node removal path, a memcpy is
> > > attempted in `__of_changeset_entry_invert()`; since the device node was
> > > never created, results in an Oops due to kernel read access to a bad
> > > address.
> > >
> > > To fix this issue the patch updates `pci_stop_dev()` to ensure that a
> > > call to `of_pci_remove_node()` is only made for pci-bridge devices.
> > >
> > > [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
> > >
> > > Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
> > > Reported-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
> > > Tested-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
> > > Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> > Thanks for the patch and testing! Would like a reviewed-by from
> > Lizhi.
>
> of_pci_make_dev_node() will create of nodes for some endpoint devices (not a
> bridge) as well. And actually this is the main purpose.
>
> Maybe the patch as below would resolve the Oops?
Thanks for the patch, Lizhi! I tried out this patch and don't see the issue with
the same. The hot-plug and hot-unplug of PCI device seem to work fine as
expected.
~ Amit
>
> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> index dda6092e6d3a..3c693b091ecf 100644
> --- a/drivers/of/dynamic.c
> +++ b/drivers/of/dynamic.c
> @@ -492,21 +492,29 @@ struct device_node *__of_node_dup(const struct
> device_node *np,
> * a given changeset.
> *
> * @ocs: Pointer to changeset
> + * @np: Pointer to device node. If it is not null, init it directly instead
> of
> + * allocate a new node.
> * @parent: Pointer to parent device node
> * @full_name: Node full name
> *
> * Return: Pointer to the created device node or NULL in case of an error.
> */
> struct device_node *of_changeset_create_node(struct of_changeset *ocs,
> + struct device_node *np,
> struct device_node *parent,
> const char *full_name)
> {
> - struct device_node *np;
> int ret;
>
> - np = __of_node_dup(NULL, full_name);
> - if (!np)
> - return NULL;
> + if (!np) {
> + np = __of_node_dup(NULL, full_name);
> + if (!np)
> + return NULL;
> + } else {
> + of_node_set_flag(np, OF_DYNAMIC);
> + of_node_set_flag(np, OF_DETACHED);
> + }
> +
> np->parent = parent;
>
> ret = of_changeset_attach_node(ocs, np);
> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
> index 445ad13dab98..087de26852cc 100644
> --- a/drivers/of/unittest.c
> +++ b/drivers/of/unittest.c
> @@ -871,7 +871,7 @@ static void __init of_unittest_changeset(void)
> unittest(!of_changeset_add_property(&chgset, parent, ppadd), "fail
> add prop prop-add\n");
> unittest(!of_changeset_update_property(&chgset, parent, ppupdate),
> "fail update prop\n");
> unittest(!of_changeset_remove_property(&chgset, parent, ppremove),
> "fail remove prop\n");
> - n22 = of_changeset_create_node(&chgset, n2, "n22");
> + n22 = of_changeset_create_node(&chgset, NULL, n2, "n22");
> unittest(n22, "fail create n22\n");
> unittest(!of_changeset_add_prop_string(&chgset, n22, "prop-str",
> "abcd"),
> "fail add prop prop-str");
> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> index 51e3dd0ea5ab..92c079b2e570 100644
> --- a/drivers/pci/of.c
> +++ b/drivers/pci/of.c
> @@ -608,18 +608,28 @@ int devm_of_pci_bridge_init(struct device *dev, struct
> pci_host_bridge *bridge)
>
> #ifdef CONFIG_PCI_DYNAMIC_OF_NODES
>
> +void of_pci_free_node(struct device_node *np)
> +{
> + struct of_changeset *cset;
> +
> + cset = (struct of_changeset *)(np + 1);
> +
> + np->data = NULL;
> + of_changeset_revert(cset);
> + of_changeset_destroy(cset);
> + of_node_put(np);
> +}
> +
> void of_pci_remove_node(struct pci_dev *pdev)
> {
> struct device_node *np;
>
> np = pci_device_to_OF_node(pdev);
> - if (!np || !of_node_check_flag(np, OF_DYNAMIC))
> + if (!np || np->data != of_pci_free_node)
> return;
> pdev->dev.of_node = NULL;
>
> - of_changeset_revert(np->data);
> - of_changeset_destroy(np->data);
> - of_node_put(np);
> + of_pci_free_node(np);
> }
>
> void of_pci_make_dev_node(struct pci_dev *pdev)
> @@ -655,14 +665,18 @@ void of_pci_make_dev_node(struct pci_dev *pdev)
> if (!name)
> return;
>
> - cset = kmalloc(sizeof(*cset), GFP_KERNEL);
> - if (!cset)
> + np = kzalloc(sizeof(*np) + sizeof(*cset), GFP_KERNEL);
> + if (!np)
> goto out_free_name;
> + np->full_name = name;
> + of_node_init(np);
> +
> + cset = (struct of_changeset *)(np + 1);
> of_changeset_init(cset);
>
> - np = of_changeset_create_node(cset, ppnode, name);
> + np = of_changeset_create_node(cset, np, ppnode, NULL);
> if (!np)
> - goto out_destroy_cset;
> + goto out_free_node;
>
> ret = of_pci_add_properties(pdev, cset, np);
> if (ret)
> @@ -672,9 +686,8 @@ void of_pci_make_dev_node(struct pci_dev *pdev)
> if (ret)
> goto out_free_node;
>
> - np->data = cset;
> + np->data = of_pci_free_node;
> pdev->dev.of_node = np;
> - kfree(name);
>
> return;
>
> diff --git a/include/linux/of.h b/include/linux/of.h
> index a0bedd038a05..f774459d0d84 100644
> --- a/include/linux/of.h
> +++ b/include/linux/of.h
> @@ -1631,6 +1631,7 @@ static inline int of_changeset_update_property(struct
> of_changeset *ocs,
> }
>
> struct device_node *of_changeset_create_node(struct of_changeset *ocs,
> + struct device_node *np,
> struct device_node *parent,
> const char *full_name);
> int of_changeset_add_prop_string(struct of_changeset *ocs,
>
> Thanks,
>
> Lizhi
>
> >
> > > ---
> > > drivers/pci/remove.c | 3 ++-
> > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> > > index d749ea8250d6..4e51c64af416 100644
> > > --- a/drivers/pci/remove.c
> > > +++ b/drivers/pci/remove.c
> > > @@ -22,7 +22,8 @@ static void pci_stop_dev(struct pci_dev *dev)
> > > device_release_driver(&dev->dev);
> > > pci_proc_detach_device(dev);
> > > pci_remove_sysfs_dev_files(dev);
> > > - of_pci_remove_node(dev);
> > > + if (pci_is_bridge(dev))
> > > + of_pci_remove_node(dev);
> > IIUC, this basically undoes the work that was done by
> > of_pci_make_dev_node().
> >
> > The call of of_pci_make_dev_node() from pci_bus_add_device() was added
> > by 407d1a51921e and is conditional on pci_is_bridge(), so it makes
> > sense to me that the remove needs a similar condition.
> >
> > > pci_dev_assign_added(dev, false);
> > > }
> > >
> > > base-commit: e9d22f7a6655941fc8b2b942ed354ec780936b3e
> > > --
> > > 2.45.2
> > >
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
2024-07-11 14:21 ` Amit Machhiwal
@ 2024-07-11 19:34 ` Rob Herring
0 siblings, 0 replies; 9+ messages in thread
From: Rob Herring @ 2024-07-11 19:34 UTC (permalink / raw)
To: Amit Machhiwal, kernel-team
Cc: Kowshik Jois B S, linux-pci, linux-kernel, kvm-ppc,
Vaidyanathan Srinivasan, Lizhi Hou, Nicholas Piggin,
Bjorn Helgaas, Vaibhav Jain, linuxppc-dev
+Ubuntu kernel team
On Thu, Jul 11, 2024 at 8:21 AM Amit Machhiwal <amachhiw@linux.ibm.com> wrote:
>
> Hi Rob,
>
> On 2024/07/11 06:20 AM, Rob Herring wrote:
> > On Wed, Jul 3, 2024 at 8:17 AM Amit Machhiwal <amachhiw@linux.ibm.com> wrote:
> > >
> > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> > > of a PCI device attached to a PCI-bridge causes following kernel Oops on
> > > a pseries KVM guest:
> >
> > Can I ask why you have this option on in the first place? Do you have
> > a use for it or it's just a case of distros turn on every kconfig
> > option.
>
> Yes, this option is turned on in Ubuntu's distro kernel config where the issue
> was originally reported, while Fedora is keeping this turned off.
>
> root@ubuntu:~# cat /boot/config-6.8.0-38-generic | grep PCI_DYN
> CONFIG_PCI_DYNAMIC_OF_NODES=y
Ubuntu should turn off this option. For starters, it is not complete
to be usable. Eventually, it should get removed in favor of some TBD
runtime option.
(And we should fix the crash too)
Rob
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
2024-07-11 18:48 ` Amit Machhiwal
@ 2024-07-11 21:18 ` Lizhi Hou
2024-07-12 17:19 ` Amit Machhiwal
0 siblings, 1 reply; 9+ messages in thread
From: Lizhi Hou @ 2024-07-11 21:18 UTC (permalink / raw)
To: Bjorn Helgaas, linux-pci, linux-kernel, linuxppc-dev, kvm-ppc,
Bjorn Helgaas, Rob Herring, Vaibhav Jain, Nicholas Piggin,
Michael Ellerman, Vaidyanathan Srinivasan, Kowshik Jois B S,
Lukas Wunner
On 7/11/24 11:48, Amit Machhiwal wrote:
> On 2024/07/10 09:48 PM, Lizhi Hou wrote:
>> On 7/5/24 12:20, Bjorn Helgaas wrote:
>>> [+cc Lukas, FYI]
>>>
>>> On Wed, Jul 03, 2024 at 07:46:34PM +0530, Amit Machhiwal wrote:
>>>> With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
>>>> of a PCI device attached to a PCI-bridge causes following kernel Oops on
>>>> a pseries KVM guest:
>>>>
>>>> RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
>>>> Kernel attempted to read user page (10ec00000048) - exploit attempt? (uid: 0)
>>>> BUG: Unable to handle kernel data access on read at 0x10ec00000048
>>>> Faulting instruction address: 0xc0000000012d8728
>>>> Oops: Kernel access of bad area, sig: 11 [#1]
>>>> LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
>>>> <snip>
>>>> NIP [c0000000012d8728] __of_changeset_entry_invert+0x10/0x1ac
>>>> LR [c0000000012da7f0] __of_changeset_revert_entries+0x98/0x180
>>>> Call Trace:
>>>> [c00000000bcc3970] [c0000000012daa60] of_changeset_revert+0x58/0xd8
>>>> [c00000000bcc39c0] [c000000000d0ed78] of_pci_remove_node+0x74/0xb0
>>>> [c00000000bcc39f0] [c000000000cdcfe0] pci_stop_bus_device+0xf4/0x138
>>>> [c00000000bcc3a30] [c000000000cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64
>>>> [c00000000bcc3a60] [c000000000cf3780] remove_store+0xf0/0x108
>>>> [c00000000bcc3ab0] [c000000000e89e04] dev_attr_store+0x34/0x78
>>>> [c00000000bcc3ad0] [c0000000007f8dd4] sysfs_kf_write+0x70/0xa4
>>>> [c00000000bcc3af0] [c0000000007f7248] kernfs_fop_write_iter+0x1d0/0x2e0
>>>> [c00000000bcc3b40] [c0000000006c9b08] vfs_write+0x27c/0x558
>>>> [c00000000bcc3bf0] [c0000000006ca168] ksys_write+0x90/0x170
>>>> [c00000000bcc3c40] [c000000000033248] system_call_exception+0xf8/0x290
>>>> [c00000000bcc3e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
>>>> <snip>
>>>>
>>>> A git bisect pointed this regression to be introduced via [1] that added
>>>> a mechanism to create device tree nodes for parent PCI bridges when a
>>>> PCI device is hot-plugged.
>>>>
>>>> The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
>>>> device-tree node associated with the pci_dev that was earlier
>>>> hot-plugged and was attached under a pci-bridge. The PCI dev header
>>>> `dev->hdr_type` being 0, results a conditional check done with
>>>> `pci_is_bridge()` into false. Consequently, a call to
>>>> `of_pci_make_dev_node()` to create a device node is never made. When at
>>>> a later point in time, in the device node removal path, a memcpy is
>>>> attempted in `__of_changeset_entry_invert()`; since the device node was
>>>> never created, results in an Oops due to kernel read access to a bad
>>>> address.
>>>>
>>>> To fix this issue the patch updates `pci_stop_dev()` to ensure that a
>>>> call to `of_pci_remove_node()` is only made for pci-bridge devices.
>>>>
>>>> [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
>>>>
>>>> Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
>>>> Reported-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
>>>> Tested-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
>>>> Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
>>> Thanks for the patch and testing! Would like a reviewed-by from
>>> Lizhi.
>> of_pci_make_dev_node() will create of nodes for some endpoint devices (not a
>> bridge) as well. And actually this is the main purpose.
>>
>> Maybe the patch as below would resolve the Oops?
> Thanks for the patch, Lizhi! I tried out this patch and don't see the issue with
> the same. The hot-plug and hot-unplug of PCI device seem to work fine as
> expected.
Cool! Thanks for trying it. Will you re-spin the patch or you would like
me to create a patch?
Lizhi
>
> ~ Amit
>
>> diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
>> index dda6092e6d3a..3c693b091ecf 100644
>> --- a/drivers/of/dynamic.c
>> +++ b/drivers/of/dynamic.c
>> @@ -492,21 +492,29 @@ struct device_node *__of_node_dup(const struct
>> device_node *np,
>> * a given changeset.
>> *
>> * @ocs: Pointer to changeset
>> + * @np: Pointer to device node. If it is not null, init it directly instead
>> of
>> + * allocate a new node.
>> * @parent: Pointer to parent device node
>> * @full_name: Node full name
>> *
>> * Return: Pointer to the created device node or NULL in case of an error.
>> */
>> struct device_node *of_changeset_create_node(struct of_changeset *ocs,
>> + struct device_node *np,
>> struct device_node *parent,
>> const char *full_name)
>> {
>> - struct device_node *np;
>> int ret;
>>
>> - np = __of_node_dup(NULL, full_name);
>> - if (!np)
>> - return NULL;
>> + if (!np) {
>> + np = __of_node_dup(NULL, full_name);
>> + if (!np)
>> + return NULL;
>> + } else {
>> + of_node_set_flag(np, OF_DYNAMIC);
>> + of_node_set_flag(np, OF_DETACHED);
>> + }
>> +
>> np->parent = parent;
>>
>> ret = of_changeset_attach_node(ocs, np);
>> diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
>> index 445ad13dab98..087de26852cc 100644
>> --- a/drivers/of/unittest.c
>> +++ b/drivers/of/unittest.c
>> @@ -871,7 +871,7 @@ static void __init of_unittest_changeset(void)
>> unittest(!of_changeset_add_property(&chgset, parent, ppadd), "fail
>> add prop prop-add\n");
>> unittest(!of_changeset_update_property(&chgset, parent, ppupdate),
>> "fail update prop\n");
>> unittest(!of_changeset_remove_property(&chgset, parent, ppremove),
>> "fail remove prop\n");
>> - n22 = of_changeset_create_node(&chgset, n2, "n22");
>> + n22 = of_changeset_create_node(&chgset, NULL, n2, "n22");
>> unittest(n22, "fail create n22\n");
>> unittest(!of_changeset_add_prop_string(&chgset, n22, "prop-str",
>> "abcd"),
>> "fail add prop prop-str");
>> diff --git a/drivers/pci/of.c b/drivers/pci/of.c
>> index 51e3dd0ea5ab..92c079b2e570 100644
>> --- a/drivers/pci/of.c
>> +++ b/drivers/pci/of.c
>> @@ -608,18 +608,28 @@ int devm_of_pci_bridge_init(struct device *dev, struct
>> pci_host_bridge *bridge)
>>
>> #ifdef CONFIG_PCI_DYNAMIC_OF_NODES
>>
>> +void of_pci_free_node(struct device_node *np)
>> +{
>> + struct of_changeset *cset;
>> +
>> + cset = (struct of_changeset *)(np + 1);
>> +
>> + np->data = NULL;
>> + of_changeset_revert(cset);
>> + of_changeset_destroy(cset);
>> + of_node_put(np);
>> +}
>> +
>> void of_pci_remove_node(struct pci_dev *pdev)
>> {
>> struct device_node *np;
>>
>> np = pci_device_to_OF_node(pdev);
>> - if (!np || !of_node_check_flag(np, OF_DYNAMIC))
>> + if (!np || np->data != of_pci_free_node)
>> return;
>> pdev->dev.of_node = NULL;
>>
>> - of_changeset_revert(np->data);
>> - of_changeset_destroy(np->data);
>> - of_node_put(np);
>> + of_pci_free_node(np);
>> }
>>
>> void of_pci_make_dev_node(struct pci_dev *pdev)
>> @@ -655,14 +665,18 @@ void of_pci_make_dev_node(struct pci_dev *pdev)
>> if (!name)
>> return;
>>
>> - cset = kmalloc(sizeof(*cset), GFP_KERNEL);
>> - if (!cset)
>> + np = kzalloc(sizeof(*np) + sizeof(*cset), GFP_KERNEL);
>> + if (!np)
>> goto out_free_name;
>> + np->full_name = name;
>> + of_node_init(np);
>> +
>> + cset = (struct of_changeset *)(np + 1);
>> of_changeset_init(cset);
>>
>> - np = of_changeset_create_node(cset, ppnode, name);
>> + np = of_changeset_create_node(cset, np, ppnode, NULL);
>> if (!np)
>> - goto out_destroy_cset;
>> + goto out_free_node;
>>
>> ret = of_pci_add_properties(pdev, cset, np);
>> if (ret)
>> @@ -672,9 +686,8 @@ void of_pci_make_dev_node(struct pci_dev *pdev)
>> if (ret)
>> goto out_free_node;
>>
>> - np->data = cset;
>> + np->data = of_pci_free_node;
>> pdev->dev.of_node = np;
>> - kfree(name);
>>
>> return;
>>
>> diff --git a/include/linux/of.h b/include/linux/of.h
>> index a0bedd038a05..f774459d0d84 100644
>> --- a/include/linux/of.h
>> +++ b/include/linux/of.h
>> @@ -1631,6 +1631,7 @@ static inline int of_changeset_update_property(struct
>> of_changeset *ocs,
>> }
>>
>> struct device_node *of_changeset_create_node(struct of_changeset *ocs,
>> + struct device_node *np,
>> struct device_node *parent,
>> const char *full_name);
>> int of_changeset_add_prop_string(struct of_changeset *ocs,
>>
>> Thanks,
>>
>> Lizhi
>>
>>>> ---
>>>> drivers/pci/remove.c | 3 ++-
>>>> 1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
>>>> index d749ea8250d6..4e51c64af416 100644
>>>> --- a/drivers/pci/remove.c
>>>> +++ b/drivers/pci/remove.c
>>>> @@ -22,7 +22,8 @@ static void pci_stop_dev(struct pci_dev *dev)
>>>> device_release_driver(&dev->dev);
>>>> pci_proc_detach_device(dev);
>>>> pci_remove_sysfs_dev_files(dev);
>>>> - of_pci_remove_node(dev);
>>>> + if (pci_is_bridge(dev))
>>>> + of_pci_remove_node(dev);
>>> IIUC, this basically undoes the work that was done by
>>> of_pci_make_dev_node().
>>>
>>> The call of of_pci_make_dev_node() from pci_bus_add_device() was added
>>> by 407d1a51921e and is conditional on pci_is_bridge(), so it makes
>>> sense to me that the remove needs a similar condition.
>>>
>>>> pci_dev_assign_added(dev, false);
>>>> }
>>>>
>>>> base-commit: e9d22f7a6655941fc8b2b942ed354ec780936b3e
>>>> --
>>>> 2.45.2
>>>>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
2024-07-11 21:18 ` Lizhi Hou
@ 2024-07-12 17:19 ` Amit Machhiwal
0 siblings, 0 replies; 9+ messages in thread
From: Amit Machhiwal @ 2024-07-12 17:19 UTC (permalink / raw)
To: Lizhi Hou
Cc: Rob Herring, Kowshik Jois B S, linux-pci, linux-kernel, kvm-ppc,
Vaidyanathan Srinivasan, Lukas Wunner, Bjorn Helgaas,
Nicholas Piggin, Bjorn Helgaas, Vaibhav Jain, linuxppc-dev
Hi Lizhi,
On 2024/07/11 02:18 PM, Lizhi Hou wrote:
>
> On 7/11/24 11:48, Amit Machhiwal wrote:
> > On 2024/07/10 09:48 PM, Lizhi Hou wrote:
> > > On 7/5/24 12:20, Bjorn Helgaas wrote:
> > > > [+cc Lukas, FYI]
> > > >
> > > > On Wed, Jul 03, 2024 at 07:46:34PM +0530, Amit Machhiwal wrote:
> > > > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> > > > > of a PCI device attached to a PCI-bridge causes following kernel Oops on
> > > > > a pseries KVM guest:
> > > > >
> > > > > RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
> > > > > Kernel attempted to read user page (10ec00000048) - exploit attempt? (uid: 0)
> > > > > BUG: Unable to handle kernel data access on read at 0x10ec00000048
> > > > > Faulting instruction address: 0xc0000000012d8728
> > > > > Oops: Kernel access of bad area, sig: 11 [#1]
> > > > > LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > > > > <snip>
> > > > > NIP [c0000000012d8728] __of_changeset_entry_invert+0x10/0x1ac
> > > > > LR [c0000000012da7f0] __of_changeset_revert_entries+0x98/0x180
> > > > > Call Trace:
> > > > > [c00000000bcc3970] [c0000000012daa60] of_changeset_revert+0x58/0xd8
> > > > > [c00000000bcc39c0] [c000000000d0ed78] of_pci_remove_node+0x74/0xb0
> > > > > [c00000000bcc39f0] [c000000000cdcfe0] pci_stop_bus_device+0xf4/0x138
> > > > > [c00000000bcc3a30] [c000000000cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64
> > > > > [c00000000bcc3a60] [c000000000cf3780] remove_store+0xf0/0x108
> > > > > [c00000000bcc3ab0] [c000000000e89e04] dev_attr_store+0x34/0x78
> > > > > [c00000000bcc3ad0] [c0000000007f8dd4] sysfs_kf_write+0x70/0xa4
> > > > > [c00000000bcc3af0] [c0000000007f7248] kernfs_fop_write_iter+0x1d0/0x2e0
> > > > > [c00000000bcc3b40] [c0000000006c9b08] vfs_write+0x27c/0x558
> > > > > [c00000000bcc3bf0] [c0000000006ca168] ksys_write+0x90/0x170
> > > > > [c00000000bcc3c40] [c000000000033248] system_call_exception+0xf8/0x290
> > > > > [c00000000bcc3e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
> > > > > <snip>
> > > > >
> > > > > A git bisect pointed this regression to be introduced via [1] that added
> > > > > a mechanism to create device tree nodes for parent PCI bridges when a
> > > > > PCI device is hot-plugged.
> > > > >
> > > > > The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
> > > > > device-tree node associated with the pci_dev that was earlier
> > > > > hot-plugged and was attached under a pci-bridge. The PCI dev header
> > > > > `dev->hdr_type` being 0, results a conditional check done with
> > > > > `pci_is_bridge()` into false. Consequently, a call to
> > > > > `of_pci_make_dev_node()` to create a device node is never made. When at
> > > > > a later point in time, in the device node removal path, a memcpy is
> > > > > attempted in `__of_changeset_entry_invert()`; since the device node was
> > > > > never created, results in an Oops due to kernel read access to a bad
> > > > > address.
> > > > >
> > > > > To fix this issue the patch updates `pci_stop_dev()` to ensure that a
> > > > > call to `of_pci_remove_node()` is only made for pci-bridge devices.
> > > > >
> > > > > [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
> > > > >
> > > > > Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
> > > > > Reported-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
> > > > > Tested-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
> > > > > Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> > > > Thanks for the patch and testing! Would like a reviewed-by from
> > > > Lizhi.
> > > of_pci_make_dev_node() will create of nodes for some endpoint devices (not a
> > > bridge) as well. And actually this is the main purpose.
> > >
> > > Maybe the patch as below would resolve the Oops?
> > Thanks for the patch, Lizhi! I tried out this patch and don't see the issue with
> > the same. The hot-plug and hot-unplug of PCI device seem to work fine as
> > expected.
>
> Cool! Thanks for trying it. Will you re-spin the patch or you would like me
> to create a patch?
Sure, I'll re-spin and send the V2 of the patch.
~ Amit
>
> Lizhi
>
> >
> > ~ Amit
> >
> > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > index dda6092e6d3a..3c693b091ecf 100644
> > > --- a/drivers/of/dynamic.c
> > > +++ b/drivers/of/dynamic.c
> > > @@ -492,21 +492,29 @@ struct device_node *__of_node_dup(const struct
> > > device_node *np,
> > > * a given changeset.
> > > *
> > > * @ocs: Pointer to changeset
> > > + * @np: Pointer to device node. If it is not null, init it directly instead
> > > of
> > > + * allocate a new node.
> > > * @parent: Pointer to parent device node
> > > * @full_name: Node full name
> > > *
> > > * Return: Pointer to the created device node or NULL in case of an error.
> > > */
> > > struct device_node *of_changeset_create_node(struct of_changeset *ocs,
> > > + struct device_node *np,
> > > struct device_node *parent,
> > > const char *full_name)
> > > {
> > > - struct device_node *np;
> > > int ret;
> > >
> > > - np = __of_node_dup(NULL, full_name);
> > > - if (!np)
> > > - return NULL;
> > > + if (!np) {
> > > + np = __of_node_dup(NULL, full_name);
> > > + if (!np)
> > > + return NULL;
> > > + } else {
> > > + of_node_set_flag(np, OF_DYNAMIC);
> > > + of_node_set_flag(np, OF_DETACHED);
> > > + }
> > > +
> > > np->parent = parent;
> > >
> > > ret = of_changeset_attach_node(ocs, np);
> > > diff --git a/drivers/of/unittest.c b/drivers/of/unittest.c
> > > index 445ad13dab98..087de26852cc 100644
> > > --- a/drivers/of/unittest.c
> > > +++ b/drivers/of/unittest.c
> > > @@ -871,7 +871,7 @@ static void __init of_unittest_changeset(void)
> > > unittest(!of_changeset_add_property(&chgset, parent, ppadd), "fail
> > > add prop prop-add\n");
> > > unittest(!of_changeset_update_property(&chgset, parent, ppupdate),
> > > "fail update prop\n");
> > > unittest(!of_changeset_remove_property(&chgset, parent, ppremove),
> > > "fail remove prop\n");
> > > - n22 = of_changeset_create_node(&chgset, n2, "n22");
> > > + n22 = of_changeset_create_node(&chgset, NULL, n2, "n22");
> > > unittest(n22, "fail create n22\n");
> > > unittest(!of_changeset_add_prop_string(&chgset, n22, "prop-str",
> > > "abcd"),
> > > "fail add prop prop-str");
> > > diff --git a/drivers/pci/of.c b/drivers/pci/of.c
> > > index 51e3dd0ea5ab..92c079b2e570 100644
> > > --- a/drivers/pci/of.c
> > > +++ b/drivers/pci/of.c
> > > @@ -608,18 +608,28 @@ int devm_of_pci_bridge_init(struct device *dev, struct
> > > pci_host_bridge *bridge)
> > >
> > > #ifdef CONFIG_PCI_DYNAMIC_OF_NODES
> > >
> > > +void of_pci_free_node(struct device_node *np)
> > > +{
> > > + struct of_changeset *cset;
> > > +
> > > + cset = (struct of_changeset *)(np + 1);
> > > +
> > > + np->data = NULL;
> > > + of_changeset_revert(cset);
> > > + of_changeset_destroy(cset);
> > > + of_node_put(np);
> > > +}
> > > +
> > > void of_pci_remove_node(struct pci_dev *pdev)
> > > {
> > > struct device_node *np;
> > >
> > > np = pci_device_to_OF_node(pdev);
> > > - if (!np || !of_node_check_flag(np, OF_DYNAMIC))
> > > + if (!np || np->data != of_pci_free_node)
> > > return;
> > > pdev->dev.of_node = NULL;
> > >
> > > - of_changeset_revert(np->data);
> > > - of_changeset_destroy(np->data);
> > > - of_node_put(np);
> > > + of_pci_free_node(np);
> > > }
> > >
> > > void of_pci_make_dev_node(struct pci_dev *pdev)
> > > @@ -655,14 +665,18 @@ void of_pci_make_dev_node(struct pci_dev *pdev)
> > > if (!name)
> > > return;
> > >
> > > - cset = kmalloc(sizeof(*cset), GFP_KERNEL);
> > > - if (!cset)
> > > + np = kzalloc(sizeof(*np) + sizeof(*cset), GFP_KERNEL);
> > > + if (!np)
> > > goto out_free_name;
> > > + np->full_name = name;
> > > + of_node_init(np);
> > > +
> > > + cset = (struct of_changeset *)(np + 1);
> > > of_changeset_init(cset);
> > >
> > > - np = of_changeset_create_node(cset, ppnode, name);
> > > + np = of_changeset_create_node(cset, np, ppnode, NULL);
> > > if (!np)
> > > - goto out_destroy_cset;
> > > + goto out_free_node;
> > >
> > > ret = of_pci_add_properties(pdev, cset, np);
> > > if (ret)
> > > @@ -672,9 +686,8 @@ void of_pci_make_dev_node(struct pci_dev *pdev)
> > > if (ret)
> > > goto out_free_node;
> > >
> > > - np->data = cset;
> > > + np->data = of_pci_free_node;
> > > pdev->dev.of_node = np;
> > > - kfree(name);
> > >
> > > return;
> > >
> > > diff --git a/include/linux/of.h b/include/linux/of.h
> > > index a0bedd038a05..f774459d0d84 100644
> > > --- a/include/linux/of.h
> > > +++ b/include/linux/of.h
> > > @@ -1631,6 +1631,7 @@ static inline int of_changeset_update_property(struct
> > > of_changeset *ocs,
> > > }
> > >
> > > struct device_node *of_changeset_create_node(struct of_changeset *ocs,
> > > + struct device_node *np,
> > > struct device_node *parent,
> > > const char *full_name);
> > > int of_changeset_add_prop_string(struct of_changeset *ocs,
> > >
> > > Thanks,
> > >
> > > Lizhi
> > >
> > > > > ---
> > > > > drivers/pci/remove.c | 3 ++-
> > > > > 1 file changed, 2 insertions(+), 1 deletion(-)
> > > > >
> > > > > diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> > > > > index d749ea8250d6..4e51c64af416 100644
> > > > > --- a/drivers/pci/remove.c
> > > > > +++ b/drivers/pci/remove.c
> > > > > @@ -22,7 +22,8 @@ static void pci_stop_dev(struct pci_dev *dev)
> > > > > device_release_driver(&dev->dev);
> > > > > pci_proc_detach_device(dev);
> > > > > pci_remove_sysfs_dev_files(dev);
> > > > > - of_pci_remove_node(dev);
> > > > > + if (pci_is_bridge(dev))
> > > > > + of_pci_remove_node(dev);
> > > > IIUC, this basically undoes the work that was done by
> > > > of_pci_make_dev_node().
> > > >
> > > > The call of of_pci_make_dev_node() from pci_bus_add_device() was added
> > > > by 407d1a51921e and is conditional on pci_is_bridge(), so it makes
> > > > sense to me that the remove needs a similar condition.
> > > >
> > > > > pci_dev_assign_added(dev, false);
> > > > > }
> > > > >
> > > > > base-commit: e9d22f7a6655941fc8b2b942ed354ec780936b3e
> > > > > --
> > > > > 2.45.2
> > > > >
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2024-07-12 17:20 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-07-03 14:16 [PATCH] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest Amit Machhiwal
2024-07-05 19:20 ` Bjorn Helgaas
2024-07-11 4:48 ` Lizhi Hou
2024-07-11 18:48 ` Amit Machhiwal
2024-07-11 21:18 ` Lizhi Hou
2024-07-12 17:19 ` Amit Machhiwal
2024-07-11 12:20 ` Rob Herring
2024-07-11 14:21 ` Amit Machhiwal
2024-07-11 19:34 ` Rob Herring
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).