linuxppc-dev.lists.ozlabs.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller()
@ 2016-07-05  1:44 Mauricio Faria de Oliveira
  2016-07-05  2:55 ` Benjamin Herrenschmidt
  0 siblings, 1 reply; 5+ messages in thread
From: Mauricio Faria de Oliveira @ 2016-07-05  1:44 UTC (permalink / raw)
  To: linuxppc-dev

It's possible to hit an oops/crash if pcibios_release_device() accesses the
phb struct and it had been freed earlier -- by pcibios_free_controller() --
as the memory it pointed to can be reused.

If after reuse 'phb->controller_ops.release_device' is non-NULL it will be
called, but it points to an invalid location (that function pointer is not
set anywhere in the code, so if it's non-NULL, that's not correct), and so
it hits an oops and the system crashes.

The solution is to verify whether 'phb' is still in 'hose_list' before any
access to it in pcibios_release_device() -- as it is removed from the list
by pcibios_free_controller() -- and ensure it cannot be used after kfree().

That problem can happen with the pSeries platform's DLPAR remove operation
if references to devices are held until after the pcibios_free_controller()
function runs, and then released - exercising pcibios_release_device() path.

It's been seen in multipath configurations during the removal/add of fibre
channel adapters, which may lead to tasks blocked for IO (thus holding the
references to devices) until adapters reappear later on, then hitting oops.

It can be synthesized simply w/ 'cat >/dev/sdX' for one or two devices, as
demonstrated below (with a quirk to set phb->controller_ops.release_device
to 0x0123456789abcdef before kfree() on pcibios_free_controller() and some
printk() debug calls for clarity):

    # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r    # need to remove and re-add
    # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -a    # so phb->is_dynamic is true.

    # cat >/dev/sdz & pid=$!

    # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
    <...>
    pci_bus 0002:01: busn_res: [bus 01-ff] is released
    pcibios_free_controller() phb = c0000001f72ccc00
    pcibios_free_controller QUIRK! ptr = 0123456789abcdef
    rpadlpar_io: slot PHB 33 removed

    # kill -9 $pid
    pcibios_release_device() phb = c0000001f72ccc00
    Unable to handle kernel paging request for instruction fetch
    Faulting instruction address: 0x123456789abcdef
    Oops: Kernel access of bad area, sig: 11 [#1]
    <...>
    CPU: 23 PID: 14878 Comm: cat Tainted: G        W       4.7.0-rc6 #2
    <...>
    NIP [0123456789abcdef] 0x123456789abcdef
    LR [c00000000003f0a0] pcibios_release_device+0x70/0x90
    Call Trace:
    [c0000001fa45b5a0] [c00000000003f07c] pcibios_release_device+0x4c/0x90 (unreliable)
    [c0000001fa45b610] [c0000000004e66b0] pci_release_dev+0x50/0xa0
    [c0000001fa45b640] [c000000000582d48] device_release+0x58/0xf0
    [c0000001fa45b6c0] [c0000000004aeb58] kobject_cleanup+0x78/0xe0
    [c0000001fa45b700] [c000000000583374] put_device+0x24/0x40
    [c0000001fa45b720] [c0000000005d61c8] scsi_host_dev_release+0x138/0x1c0
    [c0000001fa45b760] [c000000000582d48] device_release+0x58/0xf0
    [c0000001fa45b7e0] [c0000000004aeb58] kobject_cleanup+0x78/0xe0
    [c0000001fa45b820] [c000000000583374] put_device+0x24/0x40
    [c0000001fa45b840] [c0000000005f38d4] fc_rport_dev_release+0x24/0x50
    [c0000001fa45b870] [c000000000582d48] device_release+0x58/0xf0
    [c0000001fa45b8f0] [c0000000004aeb58] kobject_cleanup+0x78/0xe0
    [c0000001fa45b930] [c000000000583374] put_device+0x24/0x40
    [c0000001fa45b950] [c0000000005e0c20] scsi_target_dev_release+0x30/0x50
    [c0000001fa45b980] [c000000000582d48] device_release+0x58/0xf0
    [c0000001fa45ba00] [c0000000004aeb58] kobject_cleanup+0x78/0xe0
    [c0000001fa45ba40] [c000000000583374] put_device+0x24/0x40
    [c0000001fa45ba60] [c0000000005e4cb8] scsi_device_dev_release_usercontext+0x178/0x1b0
    [c0000001fa45bac0] [c0000000000cac34] execute_in_process_context+0x94/0xb0
    [c0000001fa45bae0] [c0000000005e4b24] scsi_device_dev_release+0x24/0x40
    [c0000001fa45bb00] [c000000000582d48] device_release+0x58/0xf0
    [c0000001fa45bb80] [c0000000004aeb58] kobject_cleanup+0x78/0xe0
    [c0000001fa45bbc0] [c000000000583374] put_device+0x24/0x40
    [c0000001fa45bbe0] [c0000000005d3f08] scsi_device_put+0x38/0x50
    [c0000001fa45bc10] [c00000000062dba0] scsi_disk_put+0x50/0x80
    [c0000001fa45bc50] [c0000000002ca378] __blkdev_put+0x2f8/0x340
    [c0000001fa45bd40] [c0000000002ca588] blkdev_close+0x28/0x40
    [c0000001fa45bd60] [c00000000027e9ec] __fput+0xbc/0x260
    [c0000001fa45bdb0] [c0000000000d1280] task_work_run+0xf0/0x130
    [c0000001fa45be00] [c0000000000172a4] do_notify_resume+0x84/0x90
    [c0000001fa45be30] [c000000000009844] ret_from_except_lite+0x70/0x74
    Instruction dump:
    XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
    XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX XXXXXXXX
    ---[ end trace fc3730d9babb2e31 ]---

With the patch applied:

    # drmgr -w 5 -d 1 -c phb -s 'PHB 33' -r
    <...>
    pci_bus 0001:01: busn_res: [bus 01-ff] is released
    pcibios_free_controller() phb = c0000001dea2d400
    pcibios_free_controller QUIRK! ptr = 0123456789abcdef
    rpadlpar_io: slot PHB 33 removed

    # kill -9 $pid
    pcibios_release_device() phb = c0000001dea2d400, ptr = 0123456789abcdef, found 0

Signed-off-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com>

Tested on 4.7.0-rc6.
---
 arch/powerpc/kernel/pci-common.c  |  5 +++--
 arch/powerpc/kernel/pci-hotplug.c | 22 ++++++++++++++++++++--
 2 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 0f7a60f..00a22e7 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -41,7 +41,7 @@
 #include <asm/ppc-pci.h>
 #include <asm/eeh.h>
 
-static DEFINE_SPINLOCK(hose_spinlock);
+DEFINE_SPINLOCK(hose_spinlock);
 LIST_HEAD(hose_list);
 
 /* XXX kill that some day ... */
@@ -95,10 +95,11 @@ void pcibios_free_controller(struct pci_controller *phb)
 {
 	spin_lock(&hose_spinlock);
 	list_del(&phb->list_node);
-	spin_unlock(&hose_spinlock);
 
 	if (phb->is_dynamic)
 		kfree(phb);
+
+	spin_unlock(&hose_spinlock);
 }
 EXPORT_SYMBOL_GPL(pcibios_free_controller);
 
diff --git a/arch/powerpc/kernel/pci-hotplug.c b/arch/powerpc/kernel/pci-hotplug.c
index 2d71269..49b5388 100644
--- a/arch/powerpc/kernel/pci-hotplug.c
+++ b/arch/powerpc/kernel/pci-hotplug.c
@@ -21,6 +21,9 @@
 #include <asm/firmware.h>
 #include <asm/eeh.h>
 
+extern spinlock_t hose_spinlock;
+extern struct list_head hose_list;
+
 static struct pci_bus *find_bus_among_children(struct pci_bus *bus,
 					       struct device_node *dn)
 {
@@ -58,12 +61,27 @@ EXPORT_SYMBOL_GPL(pci_find_bus_by_node);
  */
 void pcibios_release_device(struct pci_dev *dev)
 {
-	struct pci_controller *phb = pci_bus_to_host(dev->bus);
+	struct pci_controller *phb = pci_bus_to_host(dev->bus), *e;
+	int found = 0;
 
 	eeh_remove_device(dev);
 
-	if (phb->controller_ops.release_device)
+	/*
+	 * Only access phb if it's still in hose_list; otherwise
+	 * it's been freed and may contain corrupt data and oops.
+	 */
+	spin_lock(&hose_spinlock);
+	list_for_each_entry(e, &hose_list, list_node) {
+		if (e == phb) {
+			found = 1;
+			break;
+		}
+	}
+
+	if (found && phb->controller_ops.release_device)
 		phb->controller_ops.release_device(dev);
+
+	spin_unlock(&hose_spinlock);
 }
 
 /**
-- 
1.8.3.1

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller()
  2016-07-05  1:44 [PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller() Mauricio Faria de Oliveira
@ 2016-07-05  2:55 ` Benjamin Herrenschmidt
  2016-07-05 13:34   ` Mauricio Faria de Oliveira
  2016-07-12 23:07   ` Mauricio Faria de Oliveira
  0 siblings, 2 replies; 5+ messages in thread
From: Benjamin Herrenschmidt @ 2016-07-05  2:55 UTC (permalink / raw)
  To: Mauricio Faria de Oliveira, linuxppc-dev

On Mon, 2016-07-04 at 22:44 -0300, Mauricio Faria de Oliveira wrote:
> It's possible to hit an oops/crash if pcibios_release_device() accesses the
> phb struct and it had been freed earlier -- by pcibios_free_controller() --
> as the memory it pointed to can be reused.
> 
> If after reuse 'phb->controller_ops.release_device' is non-NULL it will be
> called, but it points to an invalid location (that function pointer is not
> set anywhere in the code, so if it's non-NULL, that's not correct), and so
> it hits an oops and the system crashes.
> 
> The solution is to verify whether 'phb' is still in 'hose_list' before any
> access to it in pcibios_release_device() -- as it is removed from the list
> by pcibios_free_controller() -- and ensure it cannot be used after kfree().
> 
> That problem can happen with the pSeries platform's DLPAR remove operation
> if references to devices are held until after the pcibios_free_controller()
> function runs, and then released - exercising pcibios_release_device() path.

Have you considered instead adding a kref to the PHB and only freeing
it when all devices have been freed ? Or it's too hard to tract device
creation ?

Ben.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller()
  2016-07-05  2:55 ` Benjamin Herrenschmidt
@ 2016-07-05 13:34   ` Mauricio Faria de Oliveira
  2016-07-12 23:07   ` Mauricio Faria de Oliveira
  1 sibling, 0 replies; 5+ messages in thread
From: Mauricio Faria de Oliveira @ 2016-07-05 13:34 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev

On 07/04/2016 11:55 PM, Benjamin Herrenschmidt wrote:
> Have you considered instead adding a kref to the PHB and only freeing
> it when all devices have been freed ? Or it's too hard to tract device
> creation ?

Yes, considered it, but felt leery of possibly leaving the PHB unfreed
(or block until it is freed) in the call that is supposed to remove it:

for example, if it's not freed yet (because of an userspace program
that holds references, maybe misbehaving or in error)...

- Should the call block? -- in the sense of what would it mean to the
   DLPAR tools (say, drmgr and hmc) that might see a timeout as failure,
   and perhaps won't retry it?
   Or if the administrator requested the adapter/PHB to be removed,
   maybe he's aware of the situation and actually wants the removal
   to happen anyway.

- If it should not block, the phb struct would be left there at mercy
   of the reference holders, and perhaps complicate things if the PHB
   is re-added later on? (say, bring the adapter back to the LPAR)
   Could the tools think it's already added/still present, and then
   stop in error?

Now, putting the complicated scenarios aside, implementing krefs would
touch several other parts of code that I'm less comfortable to change
at this moment, which would probably take a lot more time to do.

So, for simplicity and schedule/backport/time constraints, I've shot
for a less intrusive change that still resolves the problem.

Do you think it's an acceptable solution as is, or needs change or to
be implemented in a different way?

Thanks,

-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller()
  2016-07-05  2:55 ` Benjamin Herrenschmidt
  2016-07-05 13:34   ` Mauricio Faria de Oliveira
@ 2016-07-12 23:07   ` Mauricio Faria de Oliveira
  2016-07-13 13:52     ` Mauricio Faria de Oliveira
  1 sibling, 1 reply; 5+ messages in thread
From: Mauricio Faria de Oliveira @ 2016-07-12 23:07 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev

Ben,

On 07/04/2016 11:55 PM, Benjamin Herrenschmidt wrote:
> Have you considered instead adding a kref to the PHB and only freeing
> it when all devices have been freed ? Or it's too hard to tract device
> creation ?

Can you clarify which are the devices that should be tracked w/ krefs to
the PHB?

I've been wondering if it's just the root bus (phb->bus) -- which relays
on it (ie its phb->bus->children and phb->bus->devices) being eventually
freed in order to free the phb, or perhaps track the children & devices
directly.

If that's too far from sensible, can you point some interesting places
to look at? I've read much of arch/powerpc/kernel/pci-{common,hotplug}.c
and arch/powerpc/include/asm/pci-bridge.h, and some more in drivers/pci,
but things weren't as obvious to a newcomer in this area.

Thanks,

-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller()
  2016-07-12 23:07   ` Mauricio Faria de Oliveira
@ 2016-07-13 13:52     ` Mauricio Faria de Oliveira
  0 siblings, 0 replies; 5+ messages in thread
From: Mauricio Faria de Oliveira @ 2016-07-13 13:52 UTC (permalink / raw)
  To: Benjamin Herrenschmidt, linuxppc-dev

On 07/12/2016 08:07 PM, Mauricio Faria de Oliveira wrote:
> Can you clarify which are the devices that should be tracked w/ krefs to
> the PHB?

Last night I had forgotten about the fundamental point of krefs - track 
references to pointers - and this answers the question.

I'm looking at the holders of pointers to the phb struct, and it seems
I wasn't too far off w/ the child buses and devices idea -- as the root 
bus (phb->bus) is assigned the phb pointer in the bus->sysdata field,
and it's inherited from parent by child buses and devices.

I'll continue here. Comments always welcome.

-- 
Mauricio Faria de Oliveira
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2016-07-13 13:52 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-07-05  1:44 [PATCH] powerpc: fix oops in pcibios_release_device() after pcibios_free_controller() Mauricio Faria de Oliveira
2016-07-05  2:55 ` Benjamin Herrenschmidt
2016-07-05 13:34   ` Mauricio Faria de Oliveira
2016-07-12 23:07   ` Mauricio Faria de Oliveira
2016-07-13 13:52     ` Mauricio Faria de Oliveira

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).