linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Prarit Bhargava <prarit@redhat.com>
To: Greg KH <gregkh@linuxfoundation.org>
Cc: linux-pci@vger.kernel.org, Bjorn Helgaas <bhelgaas@google.com>,
	Shyam Iyer <Shyam_Iyer@Dell.com>,
	ddutile@redhat.com
Subject: Re: [PATCH] pci, Add AER_panic sysfs file
Date: Fri, 18 May 2012 15:28:36 -0400	[thread overview]
Message-ID: <4FB6A2E4.2080203@redhat.com> (raw)
In-Reply-To: <20120518181305.GB27021@kroah.com>



On 05/18/2012 02:13 PM, Greg KH wrote:
> On Fri, May 18, 2012 at 01:17:54PM -0400, Prarit Bhargava wrote:
>>
>>>
>>> Please define "unhardened".  Why aren't all drivers "hardened"?
>>
>> Most drivers _currently_ do not handle reading all f's (or -1) from hardware.
>> Some drivers do handle some situations but definitely not all of them.
>> Hardening a driver involves making the driver "-1" safe.
> 
> That's not "hardening", it should be written as, "fixing broken
> drivers".  It's a bug if a PCI driver can not handle this as that is
> exactly what happens when a PCI device is removed from the system
> without the driver knowing about it.

Sorry Greg, I should have stated this at the beginning.  Without question
patches should be _and will be_ sent to drivers as problems are found.

> 
>> Some companies do ship hardened drivers, but the ones in the tree are not hardened.
> 
> Why are there out-of-tree drivers that are so-called "hardened" and why
> are those bug fixes not merged into the kernel tree?

See comment below.
> 
>> [The above comment is in no way an approval of shipping drivers outside of the
>> kernel.  I'm just stating a fact.]

I'm just stating a fact.  I have no idea why patches are not on the list.  Nor
am I condoning the activity of keeping fixes outside of the tree.  I've _just
started_ working with a group who has patches and am going to do some work with
them on getting patches out to the tree.

> 
> Any specific drivers you are referring to so that I can go and kick
> someone to get their act together?

I'll get a list together and hopefully we can get some patches out.

> 
>> The effort involved in hardening this drivers is significant.
> 
> It shouldn't be, this has been well known for what, 13+ years now?  This
> is nothing new at all, and again, is a bug if the driver can't handle
> this.

Most drivers cannot handle surprise removals, and in this case that's what we're
effectively talking about.  There may be a few that can, and even those might be
able to handle a few cases of surprise removal or reset events.

>>>> In these cases, the system should not do a bus reset, but rather the
>>>> system should panic to avoid any further possible data corruption.
>>>
>>> Really?  You really want to panic the whole system and shut down and
>>> potentially loose everything?  That does not sound like a good idea at
>>> all to me, is there really no way to recover from this?
>>
>> Yes, that's _exactly_ what I want to do.  Having a driver that is capable of
>> writing corrupted data to a disk or corrupting memory is much worse than
>> panicking and stopping the system for a short period of time.
> 
> But by panicking, you just lost data and have potentially corrupt data
> written to the disk in a half-finished manner, plus you now have a
> broken system that is stuck and needs to be rebooted :)

Fair enough -- I suppose this comes down to:  Which is worse?  Coming back to a
system with corrupt data or memory, or rebooting and losing data (and waiting
for a reboot)? :)

In my view, if a user *KNOWS* that a driver isn't going to play well with a
reset then let them make the call -- it shouldn't be up to us to decide what is
best for them.  If they want a panic, let them have it.  As time goes on the
drivers will get better but that isn't going to happen overnight.

> 
>> The default is to handle an AER through a bus reset so a user must actively
>> request the panic.
> 
> Fair enough, I can understand why some people might want this type of
> control over a system, and if they reboot-on-panic, they can recover
> quickly and get back up and running.
> 
> But again, this needs to be fixed in the drivers themselves, otherwise
> they are broken on systems that, again, have been shipping for 13+ years
> now.  It's unacceptable for the driver authors to be that sloppy.

I agree with you Greg.  I 100% agree with you.  But getting fixes into the
drivers will take some time.  Getting them to a state where
commercial/enterprise customers consider them reliable for surprise events is
going to take some time... so I'm arguing that we should go with a simple
user-based policy and fix the drivers as we move along.

P.

  reply	other threads:[~2012-05-18 19:28 UTC|newest]

Thread overview: 19+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-05-17 17:04 [PATCH] pci, Add AER_panic sysfs file Prarit Bhargava
2012-05-17 17:29 ` Shyam_Iyer
2012-05-17 17:39   ` Prarit Bhargava
2012-05-17 17:51     ` Shyam_Iyer
     [not found]     ` <DBFB1B45AF80394ABD1C807E9F28D15707BC712175@BLRX7MCDC203.AMER.DELL.COM>
2012-05-17 18:00       ` Shyam_Iyer
2012-05-17 18:51 ` Don Dutile
2012-05-17 18:54   ` Prarit Bhargava
2012-05-17 19:11     ` Don Dutile
2012-05-17 22:16       ` Prarit Bhargava
2012-05-17 21:32 ` Betty Dall
2012-05-18  4:51 ` Greg KH
2012-05-18 10:26   ` Prarit Bhargava
2012-05-18 14:16   ` Prarit Bhargava
2012-05-18 15:47     ` Greg KH
2012-05-18 17:17       ` Prarit Bhargava
2012-05-18 18:13         ` Greg KH
2012-05-18 19:28           ` Prarit Bhargava [this message]
2012-05-18 23:19             ` Greg KH
2012-05-18 17:26       ` Prarit Bhargava

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=4FB6A2E4.2080203@redhat.com \
    --to=prarit@redhat.com \
    --cc=Shyam_Iyer@Dell.com \
    --cc=bhelgaas@google.com \
    --cc=ddutile@redhat.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-pci@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).