From: Prarit Bhargava <prarit@redhat.com>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: Leon Romanovsky <leon@kernel.org>,
bhelgaas@google.com, corbet@lwn.net, linux-doc@vger.kernel.org,
linux-pci@vger.kernel.org, mstowe@redhat.com
Subject: Re: [PATCH] pci-driver: Add driver load messages
Date: Thu, 4 Mar 2021 09:42:44 -0500 [thread overview]
Message-ID: <4a584957-24d5-54c8-07f8-36fd7d2e9fce@redhat.com> (raw)
In-Reply-To: <20210218190603.GA993998@bjorn-Precision-5520>
On 2/18/21 2:06 PM, Bjorn Helgaas wrote:
> On Thu, Feb 18, 2021 at 01:36:35PM -0500, Prarit Bhargava wrote:
>> On 1/26/21 10:12 AM, Bjorn Helgaas wrote:
>>> On Tue, Jan 26, 2021 at 09:05:23AM -0500, Prarit Bhargava wrote:
>>>> On 1/26/21 8:53 AM, Leon Romanovsky wrote:
>>>>> On Tue, Jan 26, 2021 at 08:42:12AM -0500, Prarit Bhargava wrote:
>>>>>> On 1/26/21 8:14 AM, Leon Romanovsky wrote:
>>>>>>> On Tue, Jan 26, 2021 at 07:54:46AM -0500, Prarit Bhargava wrote:
>>>>>>>> Leon Romanovsky <leon@kernel.org> wrote:
>>>>>>>>> On Mon, Jan 25, 2021 at 02:41:38PM -0500, Prarit Bhargava wrote:
>>>>>>>>>> There are two situations where driver load messages are helpful.
>>>>>>>>>>
>>>>>>>>>> 1) Some drivers silently load on devices and debugging driver or system
>>>>>>>>>> failures in these cases is difficult. While some drivers (networking
>>>>>>>>>> for example) may not completely initialize when the PCI driver probe() function
>>>>>>>>>> has returned, it is still useful to have some idea of driver completion.
>>>>>>>>>
>>>>>>>>> Sorry, probably it is me, but I don't understand this use case.
>>>>>>>>> Are you adding global to whole kernel command line boot argument to debug
>>>>>>>>> what and when?
>>>>>>>>>
>>>>>>>>> During boot:
>>>>>>>>> If device success, you will see it in /sys/bus/pci/[drivers|devices]/*.
>>>>>>>>> If device fails, you should get an error from that device (fix the
>>>>>>>>> device to return an error), or something immediately won't work and
>>>>>>>>> you won't see it in sysfs.
>>>>>>>>
>>>>>>>> What if there is a panic during boot? There's no way to get to sysfs.
>>>>>>>> That's the case where this is helpful.
>>>>>>>
>>>>>>> How? If you have kernel panic, it means you have much more worse problem
>>>>>>> than not-supported device. If kernel panic was caused by the driver, you
>>>>>>> will see call trace related to it. If kernel panic was caused by
>>>>>>> something else, supported/not supported won't help here.
>>>>>>
>>>>>> I still have no idea *WHICH* device it was that the panic occurred on.
>>>>>
>>>>> The kernel panic is printed from the driver. There is one driver loaded
>>>>> for all same PCI devices which are probed without relation to their
>>>>> number.>
>>>>> If you have host with ten same cards, you will see one driver and this
>>>>> is where the problem and not in supported/not-supported device.
>>>>
>>>> That's true, but you can also have different cards loading the same driver.
>>>> See, for example, any PCI_IDs list in a driver.
>>>>
>>>> For example,
>>>>
>>>> 10:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] (rev 02)
>>>> 20:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)
>>>>
>>>> Both load the megaraid driver and have different profiles within the
>>>> driver. I have no idea which one actually panicked until removing
>>>> one card.
>>>>
>>>> It's MUCH worse when debugging new hardware and getting a panic
>>>> from, for example, the uncore code which binds to a PCI mapped
>>>> device. One device might work and the next one doesn't. And
>>>> then you can multiply that by seeing *many* panics at once and
>>>> trying to determine if the problem was on one specific socket,
>>>> die, or core.
>>>
>>> Would a dev_panic() interface that identified the device and
>>> driver help with this?
>>
>> ^^ the more I look at this problem, the more a dev_panic() that
>> would output a device specific message at panic time is what I
>> really need.
Bjorn,
I went down this road a bit and had a realization. The issue isn't with
printing something at panic time, but the *data* that is output. Each PCI
device is associated with a struct device. That device struct's name is output
for dev_dbg(), etc., commands. The PCI subsystem sets the device struct name at
drivers/pci/probe.c: 1799
dev_set_name(&dev->dev, "%04x:%02x:%02x.%d", pci_domain_nr(dev->bus),
dev->bus->number, PCI_SLOT(dev->devfn),
PCI_FUNC(dev->devfn));
My problem really is that the above information is insufficient when I (or a
user) need to debug a system. The complexities of debugging multiple broken
driver loads would be much easier if I didn't have to constantly add this output
manually :).
Would you be okay with adding a *debug* parameter to expand the device name to
include the vendor & device ID pair? FWIW, I'm somewhat against
yet-another-kernel-option but that's really the information I need. I could
then add dev_dbg() statements in the local_pci_probe() function.
Thoughts?
P.
next prev parent reply other threads:[~2021-03-04 14:45 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-01-25 19:41 [PATCH] pci-driver: Add driver load messages Prarit Bhargava
2021-01-26 6:39 ` Leon Romanovsky
2021-01-26 12:54 ` Prarit Bhargava
2021-01-26 13:14 ` Leon Romanovsky
2021-01-26 13:42 ` Prarit Bhargava
2021-01-26 13:53 ` Leon Romanovsky
2021-01-26 14:05 ` Prarit Bhargava
2021-01-26 15:12 ` Bjorn Helgaas
2021-01-29 18:38 ` Prarit Bhargava
2021-02-18 18:36 ` Prarit Bhargava
2021-02-18 19:06 ` Bjorn Helgaas
2021-03-04 14:42 ` Prarit Bhargava [this message]
2021-03-04 15:50 ` Bjorn Helgaas
2021-03-05 18:20 ` Prarit Bhargava
-- strict thread matches above, loose matches on Subject: below --
2021-01-25 19:21 Prarit Bhargava
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4a584957-24d5-54c8-07f8-36fd7d2e9fce@redhat.com \
--to=prarit@redhat.com \
--cc=bhelgaas@google.com \
--cc=corbet@lwn.net \
--cc=helgaas@kernel.org \
--cc=leon@kernel.org \
--cc=linux-doc@vger.kernel.org \
--cc=linux-pci@vger.kernel.org \
--cc=mstowe@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).