public inbox for linux-doc@vger.kernel.org
 help / color / mirror / Atom feed
From: Prarit Bhargava <prarit@redhat.com>
To: Leon Romanovsky <leon@kernel.org>
Cc: bhelgaas@google.com, corbet@lwn.net, linux-doc@vger.kernel.org,
	linux-pci@vger.kernel.org, mstowe@redhat.com
Subject: Re: [PATCH] pci-driver: Add driver load messages
Date: Tue, 26 Jan 2021 09:05:23 -0500	[thread overview]
Message-ID: <1917ff0c-7d7a-9580-be8a-bb65a970c5bb@redhat.com> (raw)
In-Reply-To: <20210126135324.GH1053290@unreal>



On 1/26/21 8:53 AM, Leon Romanovsky wrote:
> On Tue, Jan 26, 2021 at 08:42:12AM -0500, Prarit Bhargava wrote:
>>
>>
>> On 1/26/21 8:14 AM, Leon Romanovsky wrote:
>>> On Tue, Jan 26, 2021 at 07:54:46AM -0500, Prarit Bhargava wrote:
>>>>   Leon Romanovsky <leon@kernel.org> wrote:
>>>>> On Mon, Jan 25, 2021 at 02:41:38PM -0500, Prarit Bhargava wrote:
>>>>>> There are two situations where driver load messages are helpful.
>>>>>>
>>>>>> 1) Some drivers silently load on devices and debugging driver or system
>>>>>> failures in these cases is difficult.  While some drivers (networking
>>>>>> for example) may not completely initialize when the PCI driver probe() function
>>>>>> has returned, it is still useful to have some idea of driver completion.
>>>>>
>>>>> Sorry, probably it is me, but I don't understand this use case.
>>>>> Are you adding global to whole kernel command line boot argument to debug
>>>>> what and when?
>>>>>
>>>>> During boot:
>>>>> If device success, you will see it in /sys/bus/pci/[drivers|devices]/*.
>>>>> If device fails, you should get an error from that device (fix the
>>>>> device to return an error), or something immediately won't work and
>>>>> you won't see it in sysfs.
>>>>>
>>>>
>>>> What if there is a panic during boot?  There's no way to get to sysfs.
>>>> That's the case where this is helpful.
>>>
>>> How? If you have kernel panic, it means you have much more worse problem
>>> than not-supported device. If kernel panic was caused by the driver, you
>>> will see call trace related to it. If kernel panic was caused by
>>> something else, supported/not supported won't help here.
>>
>> I still have no idea *WHICH* device it was that the panic occurred on.
> 
> The kernel panic is printed from the driver. There is one driver loaded
> for all same PCI devices which are probed without relation to their
> number.>
> If you have host with ten same cards, you will see one driver and this
> is where the problem and not in supported/not-supported device.

That's true, but you can also have different cards loading the same driver.
See, for example, any PCI_IDs list in a driver.

For example,

10:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3008 [Fury] (rev 02)
20:00.0 RAID bus controller: Broadcom / LSI MegaRAID SAS-3 3108 [Invader] (rev 02)

Both load the megaraid driver and have different profiles within the driver.  I
have no idea which one actually panicked until removing one card.

It's MUCH worse when debugging new hardware and getting a panic from, for
example, the uncore code which binds to a PCI mapped device.  One device might
work and the next one doesn't.  And then you can multiply that by seeing *many*
panics at once and trying to determine if the problem was on one specific
socket, die, or core.

> 
>>>
>>>>
>>>>> During run:
>>>>> We have many other solutions to get debug prints during run, for example
>>>>> tracing, which is possible to toggle dynamically.
>>>>>
>>>>> Right now, my laptop will print 34 prints on boot and endless amount during
>>>>> day-to-day usage.
>>>>>
>>>>> ➜  kernel git:(rdma-next) ✗ lspci |wc -l
>>>>> 34
>>>>>
>>>>>>
>>>>>> 2) Storage and Network device vendors have relatively short lives for
>>>>>> some of their hardware.  Some devices may continue to function but are
>>>>>> problematic due to out-of-date firmware or other issues.  Maintaining
>>>>>> a database of the hardware is out-of-the-question in the kernel as it would
>>>>>> require constant updating.  Outputting a message in the log would allow
>>>>>> different OSes to determine if the problem hardware was truly supported or not.
>>>>>
>>>>> And rely on some dmesg output as a true source of supported/not supported and
>>>>> making this ABI which needs knob in command line. ?
>>>>
>>>> Yes.  The console log being saved would work as a true source of load
>>>> messages to be interpreted by an OS tool.  But I see your point about the
>>>> knob below...
>>>
>>> You will need much more stronger claim than the above if you want to proceed
>>> ABI path through dmesg prints.
>>>
>>
>> See my answer below.  I agree with you on the ABI statement.
>>
>>>>
>>>>>
>>>>>>
>>>>>> Add optional driver load messages from the PCI core that indicates which
>>>>>> driver was loaded, on which slot, and on which device.
>>>>>
>>>>> Why don't you add simple pr_debug(..) without any knob? You will be able
>>>>> to enable/disable it through dynamic prints facility.
>>>>
>>>> Good point.  I'll wait for more feedback and submit a v2 with pr_debug.
>>>
>>> Just to be clear, none of this can be ABI and any kernel print can
>>> be changed or removed any minute without any announcement.
>>
>> Yes, that's absolutely the case and I agree with you that nothing can guarantee
>> ABI of those pr_debug() statements.  They are *debug* after all.
> 
> You missed the point. ALL pr*() prints are not ABI, without relation to their level.
> 

Yes, I understood that.  I'm just emphasizing your ABI concern.

P.

> Thanks
> 
>>
>> P.
>>
>>>
>>> Thanks
>>>
>>>>
>>>> P.
>>>>
>>>>>
>>>>> Thanks
>>>>
>>>
>>
> 


  reply	other threads:[~2021-01-26 14:07 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-25 19:41 [PATCH] pci-driver: Add driver load messages Prarit Bhargava
2021-01-26  6:39 ` Leon Romanovsky
2021-01-26 12:54   ` Prarit Bhargava
2021-01-26 13:14     ` Leon Romanovsky
2021-01-26 13:42       ` Prarit Bhargava
2021-01-26 13:53         ` Leon Romanovsky
2021-01-26 14:05           ` Prarit Bhargava [this message]
2021-01-26 15:12             ` Bjorn Helgaas
2021-01-29 18:38               ` Prarit Bhargava
2021-02-18 18:36               ` Prarit Bhargava
2021-02-18 19:06                 ` Bjorn Helgaas
2021-03-04 14:42                   ` Prarit Bhargava
2021-03-04 15:50                     ` Bjorn Helgaas
2021-03-05 18:20                       ` Prarit Bhargava
  -- strict thread matches above, loose matches on Subject: below --
2021-01-25 19:21 Prarit Bhargava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1917ff0c-7d7a-9580-be8a-bb65a970c5bb@redhat.com \
    --to=prarit@redhat.com \
    --cc=bhelgaas@google.com \
    --cc=corbet@lwn.net \
    --cc=leon@kernel.org \
    --cc=linux-doc@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mstowe@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox