All of lore.kernel.org
 help / color / mirror / Atom feed
From: Rob Herring <robh@kernel.org>
To: Lizhi Hou <lizhi.hou@amd.com>
Cc: Amit Machhiwal <amachhiw@linux.ibm.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	devicetree@vger.kernel.org, linuxppc-dev@lists.ozlabs.org,
	kvm-ppc@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Saravana Kannan <saravanak@google.com>,
	Vaibhav Jain <vaibhav@linux.ibm.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Michael Ellerman <mpe@ellerman.id.au>,
	Vaidyanathan Srinivasan <svaidy@linux.ibm.com>,
	Kowshik Jois B S <kowsjois@linux.ibm.com>,
	Lukas Wunner <lukas@wunner.de>
Subject: Re: [PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Date: Tue, 23 Jul 2024 10:21:07 -0600	[thread overview]
Message-ID: <20240723162107.GA501469-robh@kernel.org> (raw)
In-Reply-To: <a6c92c73-13fb-8e9c-29de-1437654c3880@amd.com>

On Mon, Jul 15, 2024 at 01:52:30PM -0700, Lizhi Hou wrote:
> 
> On 7/15/24 11:55, Rob Herring wrote:
> > On Mon, Jul 15, 2024 at 2:08 AM Amit Machhiwal <amachhiw@linux.ibm.com> wrote:
> > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> > > of a PCI device attached to a PCI-bridge causes following kernel Oops on
> > > a pseries KVM guest:
> > > 
> > >   RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
> > >   Kernel attempted to read user page (10ec00000048) - exploit attempt? (uid: 0)
> > >   BUG: Unable to handle kernel data access on read at 0x10ec00000048
> > >   Faulting instruction address: 0xc0000000012d8728
> > >   Oops: Kernel access of bad area, sig: 11 [#1]
> > >   LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > > <snip>
> > >   NIP [c0000000012d8728] __of_changeset_entry_invert+0x10/0x1ac
> > >   LR [c0000000012da7f0] __of_changeset_revert_entries+0x98/0x180
> > >   Call Trace:
> > >   [c00000000bcc3970] [c0000000012daa60] of_changeset_revert+0x58/0xd8
> > >   [c00000000bcc39c0] [c000000000d0ed78] of_pci_remove_node+0x74/0xb0
> > >   [c00000000bcc39f0] [c000000000cdcfe0] pci_stop_bus_device+0xf4/0x138
> > >   [c00000000bcc3a30] [c000000000cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64
> > >   [c00000000bcc3a60] [c000000000cf3780] remove_store+0xf0/0x108
> > >   [c00000000bcc3ab0] [c000000000e89e04] dev_attr_store+0x34/0x78
> > >   [c00000000bcc3ad0] [c0000000007f8dd4] sysfs_kf_write+0x70/0xa4
> > >   [c00000000bcc3af0] [c0000000007f7248] kernfs_fop_write_iter+0x1d0/0x2e0
> > >   [c00000000bcc3b40] [c0000000006c9b08] vfs_write+0x27c/0x558
> > >   [c00000000bcc3bf0] [c0000000006ca168] ksys_write+0x90/0x170
> > >   [c00000000bcc3c40] [c000000000033248] system_call_exception+0xf8/0x290
> > >   [c00000000bcc3e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
> > > <snip>
> > > 
> > > A git bisect pointed this regression to be introduced via [1] that added
> > > a mechanism to create device tree nodes for parent PCI bridges when a
> > > PCI device is hot-plugged.
> > > 
> > > The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
> > > device-tree node associated with the pci_dev that was earlier
> > > hot-plugged and was attached under a pci-bridge. The PCI dev header
> > > `dev->hdr_type` being 0, results a conditional check done with
> > > `pci_is_bridge()` into false. Consequently, a call to
> > > `of_pci_make_dev_node()` to create a device node is never made. When at
> > > a later point in time, in the device node removal path, a memcpy is
> > > attempted in `__of_changeset_entry_invert()`; since the device node was
> > > never created, results in an Oops due to kernel read access to a bad
> > > address.
> > > 
> > > To fix this issue, the patch updates `of_changeset_create_node()` to
> > > allocate a new node only when the device node doesn't exist and init it
> > > in case it does already. Also, introduce `of_pci_free_node()` to be
> > > called to only revert and destroy the changeset device node that was
> > > created via a call to `of_changeset_create_node()`.
> > > 
> > > [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
> > > 
> > > Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
> > > Reported-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
> > > Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
> > > Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> > > ---
> > > Changes since v1:
> > >      * Included Lizhi's suggested changes on V1
> > >      * Fixed below two warnings from Lizhi's changes and rearranged the cleanup
> > >        part a bit in `of_pci_make_dev_node`
> > >          drivers/pci/of.c:611:6: warning: no previous prototype for ‘of_pci_free_node’ [-Wmissing-prototypes]
> > >            611 | void of_pci_free_node(struct device_node *np)
> > >                |      ^~~~~~~~~~~~~~~~
> > >          drivers/pci/of.c: In function ‘of_pci_make_dev_node’:
> > >          drivers/pci/of.c:696:1: warning: label ‘out_destroy_cset’ defined but not used [-Wunused-label]
> > >            696 | out_destroy_cset:
> > >                | ^~~~~~~~~~~~~~~~
> > >      * V1: https://lore.kernel.org/all/20240703141634.2974589-1-amachhiw@linux.ibm.com/
> > > 
> > >   drivers/of/dynamic.c  | 16 ++++++++++++----
> > >   drivers/of/unittest.c |  2 +-
> > >   drivers/pci/bus.c     |  3 +--
> > >   drivers/pci/of.c      | 39 ++++++++++++++++++++++++++-------------
> > >   drivers/pci/pci.h     |  2 ++
> > >   include/linux/of.h    |  1 +
> > >   6 files changed, 43 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > index dda6092e6d3a..9bba5e82a384 100644
> > > --- a/drivers/of/dynamic.c
> > > +++ b/drivers/of/dynamic.c
> > > @@ -492,21 +492,29 @@ struct device_node *__of_node_dup(const struct device_node *np,
> > >    * a given changeset.
> > >    *
> > >    * @ocs: Pointer to changeset
> > > + * @np: Pointer to device node. If null, allocate a new node. If not, init an
> > > + *     existing one.
> > >    * @parent: Pointer to parent device node
> > >    * @full_name: Node full name
> > >    *
> > >    * Return: Pointer to the created device node or NULL in case of an error.
> > >    */
> > >   struct device_node *of_changeset_create_node(struct of_changeset *ocs,
> > > +                                            struct device_node *np,
> > >                                               struct device_node *parent,
> > >                                               const char *full_name)
> > >   {
> > > -       struct device_node *np;
> > >          int ret;
> > > 
> > > -       np = __of_node_dup(NULL, full_name);
> > > -       if (!np)
> > > -               return NULL;
> > > +       if (!np) {
> > > +               np = __of_node_dup(NULL, full_name);
> > > +               if (!np)
> > > +                       return NULL;
> > > +       } else {
> > > +               of_node_set_flag(np, OF_DYNAMIC);
> > > +               of_node_set_flag(np, OF_DETACHED);
> > Are we going to rename the function to
> > of_changeset_create_or_maybe_modify_node()? No. The functions here are
> > very clear in that they allocate new objects and don't reuse what's
> > passed in.
> 
> Ok. How about keeping of_changeset_create_node unchanged.
> 
> Instead, call kzalloc(), of_node_init() and of_changeset_attach_node()
> 
> in of_pci_make_dev_node() directly.
> 
> A similar example is dlpar_parse_cc_node().
> 
> 
> Does this sound better?

No, because really that code should be re-written using of_changeset
API.

My suggestion is add a data pointer to struct of_changeset and then set 
that to something to know the data ptr is a changeset and is your 
changeset.

Rob

WARNING: multiple messages have this Message-ID (diff)
From: Rob Herring <robh@kernel.org>
To: Lizhi Hou <lizhi.hou@amd.com>
Cc: devicetree@vger.kernel.org,
	Saravana Kannan <saravanak@google.com>,
	Lukas Wunner <lukas@wunner.de>,
	Kowshik Jois B S <kowsjois@linux.ibm.com>,
	linux-pci@vger.kernel.org, linux-kernel@vger.kernel.org,
	kvm-ppc@vger.kernel.org,
	Vaidyanathan Srinivasan <svaidy@linux.ibm.com>,
	Amit Machhiwal <amachhiw@linux.ibm.com>,
	Nicholas Piggin <npiggin@gmail.com>,
	Bjorn Helgaas <bhelgaas@google.com>,
	Vaibhav Jain <vaibhav@linux.ibm.com>,
	linuxppc-dev@lists.ozlabs.org
Subject: Re: [PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest
Date: Tue, 23 Jul 2024 10:21:07 -0600	[thread overview]
Message-ID: <20240723162107.GA501469-robh@kernel.org> (raw)
In-Reply-To: <a6c92c73-13fb-8e9c-29de-1437654c3880@amd.com>

On Mon, Jul 15, 2024 at 01:52:30PM -0700, Lizhi Hou wrote:
> 
> On 7/15/24 11:55, Rob Herring wrote:
> > On Mon, Jul 15, 2024 at 2:08 AM Amit Machhiwal <amachhiw@linux.ibm.com> wrote:
> > > With CONFIG_PCI_DYNAMIC_OF_NODES [1], a hot-plug and hot-unplug sequence
> > > of a PCI device attached to a PCI-bridge causes following kernel Oops on
> > > a pseries KVM guest:
> > > 
> > >   RTAS: event: 2, Type: Hotplug Event (229), Severity: 1
> > >   Kernel attempted to read user page (10ec00000048) - exploit attempt? (uid: 0)
> > >   BUG: Unable to handle kernel data access on read at 0x10ec00000048
> > >   Faulting instruction address: 0xc0000000012d8728
> > >   Oops: Kernel access of bad area, sig: 11 [#1]
> > >   LE PAGE_SIZE=64K MMU=Radix SMP NR_CPUS=2048 NUMA pSeries
> > > <snip>
> > >   NIP [c0000000012d8728] __of_changeset_entry_invert+0x10/0x1ac
> > >   LR [c0000000012da7f0] __of_changeset_revert_entries+0x98/0x180
> > >   Call Trace:
> > >   [c00000000bcc3970] [c0000000012daa60] of_changeset_revert+0x58/0xd8
> > >   [c00000000bcc39c0] [c000000000d0ed78] of_pci_remove_node+0x74/0xb0
> > >   [c00000000bcc39f0] [c000000000cdcfe0] pci_stop_bus_device+0xf4/0x138
> > >   [c00000000bcc3a30] [c000000000cdd140] pci_stop_and_remove_bus_device_locked+0x34/0x64
> > >   [c00000000bcc3a60] [c000000000cf3780] remove_store+0xf0/0x108
> > >   [c00000000bcc3ab0] [c000000000e89e04] dev_attr_store+0x34/0x78
> > >   [c00000000bcc3ad0] [c0000000007f8dd4] sysfs_kf_write+0x70/0xa4
> > >   [c00000000bcc3af0] [c0000000007f7248] kernfs_fop_write_iter+0x1d0/0x2e0
> > >   [c00000000bcc3b40] [c0000000006c9b08] vfs_write+0x27c/0x558
> > >   [c00000000bcc3bf0] [c0000000006ca168] ksys_write+0x90/0x170
> > >   [c00000000bcc3c40] [c000000000033248] system_call_exception+0xf8/0x290
> > >   [c00000000bcc3e50] [c00000000000d05c] system_call_vectored_common+0x15c/0x2ec
> > > <snip>
> > > 
> > > A git bisect pointed this regression to be introduced via [1] that added
> > > a mechanism to create device tree nodes for parent PCI bridges when a
> > > PCI device is hot-plugged.
> > > 
> > > The Oops is caused when `pci_stop_dev()` tries to remove a non-existing
> > > device-tree node associated with the pci_dev that was earlier
> > > hot-plugged and was attached under a pci-bridge. The PCI dev header
> > > `dev->hdr_type` being 0, results a conditional check done with
> > > `pci_is_bridge()` into false. Consequently, a call to
> > > `of_pci_make_dev_node()` to create a device node is never made. When at
> > > a later point in time, in the device node removal path, a memcpy is
> > > attempted in `__of_changeset_entry_invert()`; since the device node was
> > > never created, results in an Oops due to kernel read access to a bad
> > > address.
> > > 
> > > To fix this issue, the patch updates `of_changeset_create_node()` to
> > > allocate a new node only when the device node doesn't exist and init it
> > > in case it does already. Also, introduce `of_pci_free_node()` to be
> > > called to only revert and destroy the changeset device node that was
> > > created via a call to `of_changeset_create_node()`.
> > > 
> > > [1] commit 407d1a51921e ("PCI: Create device tree node for bridge")
> > > 
> > > Fixes: 407d1a51921e ("PCI: Create device tree node for bridge")
> > > Reported-by: Kowshik Jois B S <kowsjois@linux.ibm.com>
> > > Signed-off-by: Lizhi Hou <lizhi.hou@amd.com>
> > > Signed-off-by: Amit Machhiwal <amachhiw@linux.ibm.com>
> > > ---
> > > Changes since v1:
> > >      * Included Lizhi's suggested changes on V1
> > >      * Fixed below two warnings from Lizhi's changes and rearranged the cleanup
> > >        part a bit in `of_pci_make_dev_node`
> > >          drivers/pci/of.c:611:6: warning: no previous prototype for ‘of_pci_free_node’ [-Wmissing-prototypes]
> > >            611 | void of_pci_free_node(struct device_node *np)
> > >                |      ^~~~~~~~~~~~~~~~
> > >          drivers/pci/of.c: In function ‘of_pci_make_dev_node’:
> > >          drivers/pci/of.c:696:1: warning: label ‘out_destroy_cset’ defined but not used [-Wunused-label]
> > >            696 | out_destroy_cset:
> > >                | ^~~~~~~~~~~~~~~~
> > >      * V1: https://lore.kernel.org/all/20240703141634.2974589-1-amachhiw@linux.ibm.com/
> > > 
> > >   drivers/of/dynamic.c  | 16 ++++++++++++----
> > >   drivers/of/unittest.c |  2 +-
> > >   drivers/pci/bus.c     |  3 +--
> > >   drivers/pci/of.c      | 39 ++++++++++++++++++++++++++-------------
> > >   drivers/pci/pci.h     |  2 ++
> > >   include/linux/of.h    |  1 +
> > >   6 files changed, 43 insertions(+), 20 deletions(-)
> > > 
> > > diff --git a/drivers/of/dynamic.c b/drivers/of/dynamic.c
> > > index dda6092e6d3a..9bba5e82a384 100644
> > > --- a/drivers/of/dynamic.c
> > > +++ b/drivers/of/dynamic.c
> > > @@ -492,21 +492,29 @@ struct device_node *__of_node_dup(const struct device_node *np,
> > >    * a given changeset.
> > >    *
> > >    * @ocs: Pointer to changeset
> > > + * @np: Pointer to device node. If null, allocate a new node. If not, init an
> > > + *     existing one.
> > >    * @parent: Pointer to parent device node
> > >    * @full_name: Node full name
> > >    *
> > >    * Return: Pointer to the created device node or NULL in case of an error.
> > >    */
> > >   struct device_node *of_changeset_create_node(struct of_changeset *ocs,
> > > +                                            struct device_node *np,
> > >                                               struct device_node *parent,
> > >                                               const char *full_name)
> > >   {
> > > -       struct device_node *np;
> > >          int ret;
> > > 
> > > -       np = __of_node_dup(NULL, full_name);
> > > -       if (!np)
> > > -               return NULL;
> > > +       if (!np) {
> > > +               np = __of_node_dup(NULL, full_name);
> > > +               if (!np)
> > > +                       return NULL;
> > > +       } else {
> > > +               of_node_set_flag(np, OF_DYNAMIC);
> > > +               of_node_set_flag(np, OF_DETACHED);
> > Are we going to rename the function to
> > of_changeset_create_or_maybe_modify_node()? No. The functions here are
> > very clear in that they allocate new objects and don't reuse what's
> > passed in.
> 
> Ok. How about keeping of_changeset_create_node unchanged.
> 
> Instead, call kzalloc(), of_node_init() and of_changeset_attach_node()
> 
> in of_pci_make_dev_node() directly.
> 
> A similar example is dlpar_parse_cc_node().
> 
> 
> Does this sound better?

No, because really that code should be re-written using of_changeset
API.

My suggestion is add a data pointer to struct of_changeset and then set 
that to something to know the data ptr is a changeset and is your 
changeset.

Rob

  reply	other threads:[~2024-07-23 16:21 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-07-15  8:07 [PATCH v2] PCI: Fix crash during pci_dev hot-unplug on pseries KVM guest Amit Machhiwal
2024-07-15  8:07 ` Amit Machhiwal
2024-07-15 16:20 ` Lizhi Hou
2024-07-15 16:20   ` Lizhi Hou
2024-07-15 17:23   ` Bjorn Helgaas
2024-07-15 17:23     ` Bjorn Helgaas
2024-07-15 20:35     ` Lizhi Hou
2024-07-15 20:35       ` Lizhi Hou
2024-07-15 18:55 ` Rob Herring
2024-07-15 18:55   ` Rob Herring
2024-07-15 20:52   ` Lizhi Hou
2024-07-15 20:52     ` Lizhi Hou
2024-07-23 16:21     ` Rob Herring [this message]
2024-07-23 16:21       ` Rob Herring
2024-07-23 18:21       ` Lizhi Hou
2024-07-23 18:21         ` Lizhi Hou
2024-07-23 19:54         ` Rob Herring
2024-07-23 19:54           ` Rob Herring
2024-07-23 21:08           ` Lizhi Hou
2024-07-23 21:08             ` Lizhi Hou
2024-07-25 17:45             ` Amit Machhiwal
2024-07-25 17:45               ` Amit Machhiwal
2024-07-25 20:55               ` Bjorn Helgaas
2024-07-26  5:49                 ` Amit Machhiwal
2024-07-26  5:49                   ` Amit Machhiwal
2024-07-26 12:49                   ` Michael Ellerman
2024-07-26 12:49                     ` Michael Ellerman
2024-07-25 23:06               ` Lizhi Hou
2024-07-26 17:52                 ` Rob Herring
2024-07-26 17:52                   ` Rob Herring
2024-07-26 18:45                   ` Lizhi Hou
2024-07-26 18:45                     ` Lizhi Hou
2024-07-29 11:13                     ` Amit Machhiwal
2024-07-29 11:13                       ` Amit Machhiwal
2024-07-29 16:47                       ` Lizhi Hou
2024-07-29 16:55                         ` Amit Machhiwal
2024-07-29 16:55                           ` Amit Machhiwal
2024-07-29 18:19                           ` Lizhi Hou
2024-07-26 11:37               ` Rob Herring
2024-07-29 13:27                 ` Stefan Bader
2024-07-29 13:27                   ` Stefan Bader
2024-08-02 16:55                   ` Amit Machhiwal
2024-08-02 16:55                     ` Amit Machhiwal

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240723162107.GA501469-robh@kernel.org \
    --to=robh@kernel.org \
    --cc=amachhiw@linux.ibm.com \
    --cc=bhelgaas@google.com \
    --cc=devicetree@vger.kernel.org \
    --cc=kowsjois@linux.ibm.com \
    --cc=kvm-ppc@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=lizhi.hou@amd.com \
    --cc=lukas@wunner.de \
    --cc=mpe@ellerman.id.au \
    --cc=npiggin@gmail.com \
    --cc=saravanak@google.com \
    --cc=svaidy@linux.ibm.com \
    --cc=vaibhav@linux.ibm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.