linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Murali Karicheri <m-karicheri2@ti.com>
To: Guenter Roeck <linux@roeck-us.net>
Cc: Bjorn Helgaas <bhelgaas@google.com>,
	Fengguang Wu <fengguang.wu@intel.com>, LKP <lkp@01.org>,
	"linux-pci@vger.kernel.org" <linux-pci@vger.kernel.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>
Subject: Re: [PCI] BUG: unable to handle kernel
Date: Mon, 9 Mar 2015 14:09:48 -0400	[thread overview]
Message-ID: <54FDE1EC.9040207@ti.com> (raw)
In-Reply-To: <54FDD995.1080000@roeck-us.net>

On 03/09/2015 01:34 PM, Guenter Roeck wrote:
> On 03/09/2015 10:03 AM, Murali Karicheri wrote:
>> On 03/09/2015 12:07 PM, Guenter Roeck wrote:
>>> On 03/09/2015 08:53 AM, Murali Karicheri wrote:
>>>> On 03/09/2015 10:44 AM, Bjorn Helgaas wrote:
>>>>> On Mon, Mar 9, 2015 at 9:17 AM, Murali Karicheri<m-karicheri2@ti.com>
>>>>> wrote:
>>>>>> On 03/06/2015 12:58 PM, Murali Karicheri wrote:
>>>>>>>
>>>>>>> On 03/06/2015 11:55 AM, Guenter Roeck wrote:
>>>>>>>>
>>>>>>>> On Fri, Mar 06, 2015 at 10:48:59AM -0500, Murali Karicheri wrote:
>>>>>>>> [ ... ]
>>>>>>>>
>>>>>>>>>> From 098b4f5e4ab9407fbdbfcca3a91785c17e25cf03 Mon Sep 17
>>>>>>>>>> 00:00:00 2001
>>>>>>>>> From: Murali Karicheri<m-karicheri2@ti.com>
>>>>>>>>> Date: Fri, 6 Mar 2015 10:23:08 -0500
>>>>>>>>> Subject: [PATCH] pci: of : fix kernel crash
>>>>>>>>>
>>>>>>>>> This is a debug patch to root cause the kernel crash
>>>>>>>>>
>>>>>>>>> commit 0b2af171520e5d5e7d5b5f479b90a6a5014d9df6
>>>>>>>>>
>>>>>>>>> PCI: Update DMA configuration from DT
>>>>>>>>>
>>>>>>>>> Signed-off-by: Murali Karicheri<m-karicheri2@ti.com>
>>>>>>>>> ---
>>>>>>>>> drivers/of/of_pci.c | 8 ++++++++
>>>>>>>>> drivers/pci/host-bridge.c | 5 +++++
>>>>>>>>> 2 files changed, 13 insertions(+)
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c
>>>>>>>>> index 86d3c38..5a59fb8 100644
>>>>>>>>> --- a/drivers/of/of_pci.c
>>>>>>>>> +++ b/drivers/of/of_pci.c
>>>>>>>>> @@ -129,6 +129,14 @@ void of_pci_dma_configure(struct pci_dev
>>>>>>>>> *pci_dev)
>>>>>>>>> struct device *dev =&pci_dev->dev;
>>>>>>>>> struct device *bridge = pci_get_host_bridge_device(pci_dev);
>>>>>>>>>
>>>>>>>>> + if (!bridge || !bridge->parent) {
>>>>>>>>> + if (!bridge)
>>>>>>>>> + pr_err("PCI bridge not found\n");
>>>>>>>>> + if (!bridge->parent)
>>>>>>>>> + pr_err("PCI bridge parent not found\n");
>>>>>>>>
>>>>>>>>
>>>>>>>> You'll see a crash here if bridge is NULL. Maybe add an else before
>>>>>>>> the second
>>>>>>>> if statement ? Also, dev_err might be a bit more useful and
>>>>>>>> would be
>>>>>>>> available.
>>>>>>>>
>>>>>>> Fixed and attached.
>>>>>>>
>>>>>>> Murali
>>>>>>>>
>>>>>>>> Thanks,
>>>>>>>> Guenter
>>>>>>>>
>>>>>>>>> + return;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> of_dma_configure(dev, bridge->parent->of_node);
>>>>>>>>> pci_put_host_bridge_device(bridge);
>>>>>>>>> }
>>>>>>>>> diff --git a/drivers/pci/host-bridge.c b/drivers/pci/host-bridge.c
>>>>>>>>> index 3e5bbf9..ef2ab51 100644
>>>>>>>>> --- a/drivers/pci/host-bridge.c
>>>>>>>>> +++ b/drivers/pci/host-bridge.c
>>>>>>>>> @@ -28,6 +28,11 @@ struct device
>>>>>>>>> *pci_get_host_bridge_device(struct
>>>>>>>>> pci_dev *dev)
>>>>>>>>> struct pci_bus *root_bus = find_pci_root_bus(dev->bus);
>>>>>>>>> struct device *bridge = root_bus->bridge;
>>>>>>>>>
>>>>>>>>> + if (!bridge) {
>>>>>>>>> + pr_err("PCI: bridge not found\n");
>>>>>>>>> + return NULL;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> kobject_get(&bridge->kobj);
>>>>>>>>> return bridge;
>>>>>>>>> }
>>>>>>>>> --
>>>>>>>>> 1.7.9.5
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>> BJorn,
>>>>>>
>>>>>> Any chance of applying the attached debug patch to see if this fixes
>>>>>> and
>>>>>> provide some additional information on this BUG? Not sure who will
>>>>>> pick this
>>>>>> one and apply.
>>>>>
>>>>> The change that caused the oops (0b2af171520e ("PCI: Update DMA
>>>>> configuration from DT")) only exists on my pci/iommu branch, so I'm
>>>>> the one to apply it.
>>>>>
>>>>> It's much easier for me to deal with plain text patches (not
>>>>> attachments).
>>>>>
>>>>> I'm hesitating because I don't want to encourage use of the 0-day
>>>>> testing robot as a tool at which we can just throw debug patches. The
>>>>> robot is a service that costs somebody real money, and I want to be a
>>>>> good neighbor when using it.
>>>>
>>>> Thanks for the clarification as I don't have much information on the
>>>> testing robot. At the same time the question is how similar incidence
>>>> in the past have been handled. I am a newbie w.r.t to this. This is
>>>> first time I have introduced a patch that impacts multiple
>>>> arch/machines.
>>>>
>>>>>
>>>>> Was the information in the robot's report enough to reproduce the
>>>>> oops? If not, is there additional information we could add to the
>>>>> report that would enable you to reproduce it? Even if we can't
>>>>> reproduce the oops, the report seems detailed enough that we should be
>>>>> able to deduce the problem and produce a fix in which we have high
>>>>> confidence.
>>>>
>>>> The BUG report essentially indicates the crash happened in
>>>> of_pci_dma_configure(). The machine specific log make sense to a
>>>> person familiar with this arch and I am not familiar with the same. So
>>>> anyone can help narrow down the root cause of this?
>>>>
>>>> Looking at the code, there are two ptr variables that are accessed
>>>> without checking for NULL as initial thinking was that these can never
>>>> be NULL. So the debug patch is just adding addition check before
>>>> accessing the ptr. I can send this patch without debug prints if that
>>>> make sense. I was thinking to get confirmation that this is indeed the
>>>> case before adding the check. What do you think the right approach
>>>> here? Send a patch for this to the ML for adding the check as a
>>>> potential fix? Or someone can help me investigate the crash dump and
>>>> root cause it? or if we can use test robot to confirm this, I can
>>>> re-send the patch ASIS to the list. Please clarify.
>>>>
>>> If the assumption is that the pointers can never be NULL,
>>> wouldn't it be important to see a call trace and to find out
>>> if the NULL pointers can actually be seen by design,
>>> or if there is some other bug ?
>>
>> Call trace shows
>>
>> [ 0.576666] [<7976c1ac>] pci_device_add+0xbc/0x820
>> [ 0.576666] [<7976c1ac>] pci_device_add+0xbc/0x820
>>
>>
>> And BUG seems to be in of_pci_dma_configure() as shown in the BUG report.
>>
>> of_pci_dma_configure() calls newly added API call to
>> pci_get_host_bridge_device(). Seems like this has succeeded which
>> means bridge is non NULL IMO. However in this function it passes
>> bridge->parent->of_node to of_dma_configure(). So I suspect
>> bridge->parent is NULL for some reason. Is there a chance for parent
>> being NULL in this or any other platform?
>>
>
> Can bridge be the root bridge ?

Going by the code below, bridge is assigned the ptr to bridge on the 
root bus.

+struct device *pci_get_host_bridge_device(struct pci_dev *dev)
+{
+	struct pci_bus *root_bus = find_pci_root_bus(dev->bus);
+	struct device *bridge = root_bus->bridge;
+
+	kobject_get(&bridge->kobj);
+	return bridge;
+}
+

So to answer your question, yes it is the root bridge.

-- 
Murali Karicheri
Linux Kernel, Texas Instruments

  reply	other threads:[~2015-03-09 18:09 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-03-06  6:06 [PCI] BUG: unable to handle kernel Fengguang Wu
2015-03-06 15:13 ` Murali Karicheri
2015-03-06 15:48   ` Murali Karicheri
2015-03-06 16:55     ` Guenter Roeck
2015-03-06 17:50       ` Murali Karicheri
2015-03-06 17:58       ` Murali Karicheri
2015-03-09 14:17         ` Murali Karicheri
2015-03-09 14:44           ` Bjorn Helgaas
2015-03-09 15:53             ` Murali Karicheri
2015-03-09 16:07               ` Guenter Roeck
2015-03-09 17:03                 ` Murali Karicheri
2015-03-09 17:34                   ` Guenter Roeck
2015-03-09 18:09                     ` Murali Karicheri [this message]
2015-03-09 18:12                       ` Guenter Roeck
2015-03-09 20:52                         ` Murali Karicheri
2015-03-10 15:29                         ` Murali Karicheri

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=54FDE1EC.9040207@ti.com \
    --to=m-karicheri2@ti.com \
    --cc=bhelgaas@google.com \
    --cc=fengguang.wu@intel.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=linux@roeck-us.net \
    --cc=lkp@01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).