linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Interrupt remapping quirk tainting the kernel
@ 2014-03-31  8:17 Jean Delvare
  2014-03-31 10:56 ` Neil Horman
  0 siblings, 1 reply; 5+ messages in thread
From: Jean Delvare @ 2014-03-31  8:17 UTC (permalink / raw)
  To: Neil Horman; +Cc: Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas, linux-pci

Hi Neil and all,

I have (once again) a question about this commit:

From: Neil Horman <nhorman@tuxdriver.com>
Date: Tue, 16 Apr 2013 20:38:32 +0000
Subject: iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
Git-commit: 03bbcb2e7e292838bb0244f5a7816d194c911d62

When interrupt remapping is disabled by this quirk, the kernel gets
tainted. What is the rationale for doing that?

The user can boot with intremap=off. That will also disable interrupt
remapping, as the quirk does, but not taint the kernel. If this is
considered OK then I fail to see why the quirk should behave differently
and taint the kernel.

Thanks,
-- 
Jean Delvare
SUSE L3 Support


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Interrupt remapping quirk tainting the kernel
  2014-03-31  8:17 Interrupt remapping quirk tainting the kernel Jean Delvare
@ 2014-03-31 10:56 ` Neil Horman
  2014-03-31 14:18   ` Jean Delvare
  0 siblings, 1 reply; 5+ messages in thread
From: Neil Horman @ 2014-03-31 10:56 UTC (permalink / raw)
  To: Jean Delvare; +Cc: Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas, linux-pci

On Mon, Mar 31, 2014 at 10:17:48AM +0200, Jean Delvare wrote:
> Hi Neil and all,
> 
> I have (once again) a question about this commit:
> 
> From: Neil Horman <nhorman@tuxdriver.com>
> Date: Tue, 16 Apr 2013 20:38:32 +0000
> Subject: iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
> Git-commit: 03bbcb2e7e292838bb0244f5a7816d194c911d62
> 
> When interrupt remapping is disabled by this quirk, the kernel gets
> tainted. What is the rationale for doing that?
> 
> The user can boot with intremap=off. That will also disable interrupt
> remapping, as the quirk does, but not taint the kernel. If this is
> considered OK then I fail to see why the quirk should behave differently
> and taint the kernel.
> 
> Thanks,
The quirk is intented to flag to the user the fact that BIOS has not followed
the recommended procedure that was laied out in the intel published errata
sheet.  Arguably you could say that we should still taint the kernel in the
event that intremap=off is still specified, but it seems pragmatic not to do so,
as the use of that option suggsts the administrator has asserted a workaround to
the problem that is identical to the fix (in the event that the BIOS vendor has
not released an update).

Thanks
Neil

> -- 
> Jean Delvare
> SUSE L3 Support
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Interrupt remapping quirk tainting the kernel
  2014-03-31 10:56 ` Neil Horman
@ 2014-03-31 14:18   ` Jean Delvare
  2014-03-31 15:07     ` Prarit Bhargava
  2014-03-31 15:28     ` Neil Horman
  0 siblings, 2 replies; 5+ messages in thread
From: Jean Delvare @ 2014-03-31 14:18 UTC (permalink / raw)
  To: Neil Horman; +Cc: Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas, linux-pci

Hi Neil,

Le Monday 31 March 2014 à 06:56 -0400, Neil Horman a écrit :
> On Mon, Mar 31, 2014 at 10:17:48AM +0200, Jean Delvare wrote:
> > Hi Neil and all,
> > 
> > I have (once again) a question about this commit:
> > 
> > From: Neil Horman <nhorman@tuxdriver.com>
> > Date: Tue, 16 Apr 2013 20:38:32 +0000
> > Subject: iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
> > Git-commit: 03bbcb2e7e292838bb0244f5a7816d194c911d62
> > 
> > When interrupt remapping is disabled by this quirk, the kernel gets
> > tainted. What is the rationale for doing that?
> > 
> > The user can boot with intremap=off. That will also disable interrupt
> > remapping, as the quirk does, but not taint the kernel. If this is
> > considered OK then I fail to see why the quirk should behave differently
> > and taint the kernel.
> > 
> > Thanks,
> The quirk is intented to flag to the user the fact that BIOS has not followed
> the recommended procedure that was laied out in the intel published errata
> sheet.  Arguably you could say that we should still taint the kernel in the
> event that intremap=off is still specified, but it seems pragmatic not to do so,
> as the use of that option suggsts the administrator has asserted a workaround to
> the problem that is identical to the fix (in the event that the BIOS vendor has
> not released an update).

That doesn't really answer my question. While I understand that the
preferred fix is that the BIOS disables the feature, how bad are we if
it does not and the kernel has to do it?

We normally taint the kernel when the situation is such that debugging
the kernel would be a waste of time. For example, because a binary
driver was loaded, or a module was forcibly unloaded, etc. How does that
apply here? If the quirk kicks in, aren't we just as safe as if the BIOS
had disabled the feature? If not, then I would like to understand why,
and document it properly.

Thanks,
-- 
Jean Delvare
SUSE L3 Support


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Interrupt remapping quirk tainting the kernel
  2014-03-31 14:18   ` Jean Delvare
@ 2014-03-31 15:07     ` Prarit Bhargava
  2014-03-31 15:28     ` Neil Horman
  1 sibling, 0 replies; 5+ messages in thread
From: Prarit Bhargava @ 2014-03-31 15:07 UTC (permalink / raw)
  To: Jean Delvare
  Cc: Neil Horman, Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas,
	linux-pci

On 03/31/2014 10:18 AM, Jean Delvare wrote:
> Hi Neil,
> 
> Le Monday 31 March 2014 à 06:56 -0400, Neil Horman a écrit :
>> On Mon, Mar 31, 2014 at 10:17:48AM +0200, Jean Delvare wrote:
>>> Hi Neil and all,
>>>
>>> I have (once again) a question about this commit:
>>>
>>> From: Neil Horman <nhorman@tuxdriver.com>
>>> Date: Tue, 16 Apr 2013 20:38:32 +0000
>>> Subject: iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
>>> Git-commit: 03bbcb2e7e292838bb0244f5a7816d194c911d62
>>>
>>> When interrupt remapping is disabled by this quirk, the kernel gets
>>> tainted. What is the rationale for doing that?
>>>
>>> The user can boot with intremap=off. That will also disable interrupt
>>> remapping, as the quirk does, but not taint the kernel. If this is
>>> considered OK then I fail to see why the quirk should behave differently
>>> and taint the kernel.
>>>
>>> Thanks,
>> The quirk is intented to flag to the user the fact that BIOS has not followed
>> the recommended procedure that was laied out in the intel published errata
>> sheet.  Arguably you could say that we should still taint the kernel in the
>> event that intremap=off is still specified, but it seems pragmatic not to do so,
>> as the use of that option suggsts the administrator has asserted a workaround to
>> the problem that is identical to the fix (in the event that the BIOS vendor has
>> not released an update).
> 
> That doesn't really answer my question. While I understand that the
> preferred fix is that the BIOS disables the feature, how bad are we if
> it does not and the kernel has to do it?
> 
> We normally taint the kernel when the situation is such that debugging
> the kernel would be a waste of time. For example, because a binary
> driver was loaded, or a module was forcibly unloaded, etc. How does that
> apply here? 

There seems to be some misconception with various Enterprise Linux Support Teams
(plural emphasized ... I can only hope my support is paying attention) that
tainting only has one meaning.  In your definition we should only taint when
"debugging the kernel would be a waste of time".  I completely disagree with
that.  Many (if not most) of the time I as a engineer get only a stack trace
from a kernel (see pretty much every bug sent to LKML) that gives me useful
information on exactly what the kernel state was at the time of the stack trace.

In that stack trace there is an entry (for example)

CPU: 23 PID: 0 Comm: swapper/23 Not tainted 3.13.rc5+goredsox #1

which will/will not show TAINT entries.  These entries can be easily deciphered
from kernel/panic.c:

 *  'P' - Proprietary module has been loaded.
 *  'F' - Module has been forcibly loaded.
 *  'S' - SMP with CPUs not designed for SMP.
 *  'R' - User forced a module unload.
 *  'M' - System experienced a machine check exception.
 *  'B' - System has hit bad_page.
 *  'U' - Userspace-defined naughtiness.
 *  'D' - Kernel has oopsed before
 *  'A' - ACPI table overridden.
 *  'W' - Taint on warning.
 *  'C' - modules from drivers/staging are loaded.
 *  'I' - Working around severe firmware bug.
 *  'O' - Out-of-tree module has been loaded.

All of those are important to me as an engineer (and I'd like to add a few more
for support but that's really a RHEL thing).  Many of those, such as the "W" for
Taint on warning, are useful to me to let me know what the system state was when
the panic/oops/warning occurred.  They tell me if some critical situation has
occurred that led to the stack trace.

Taint does not and should not be taken as "we can't debug this." If you, or your
support organization has interpreted taint in that manner you should do all that
you can to inform them of the error of their ways.

tl;dr Tainting is supposed to aid in debugging, not prevent debugging.

If the quirk kicks in, aren't we just as safe as if the BIOS
> had disabled the feature? If not, then I would like to understand why,
> and document it properly.

In this case, yes, you are just as unsafe as if the BIOS had disabled the
feature.  The quirk completely disables the subsystem IIRC.  nhorman, of course,
can confirm.

P.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: Interrupt remapping quirk tainting the kernel
  2014-03-31 14:18   ` Jean Delvare
  2014-03-31 15:07     ` Prarit Bhargava
@ 2014-03-31 15:28     ` Neil Horman
  1 sibling, 0 replies; 5+ messages in thread
From: Neil Horman @ 2014-03-31 15:28 UTC (permalink / raw)
  To: Jean Delvare; +Cc: Joerg Roedel, Greg Kroah-Hartman, Bjorn Helgaas, linux-pci

On Mon, Mar 31, 2014 at 04:18:05PM +0200, Jean Delvare wrote:
> Hi Neil,
> 
> Le Monday 31 March 2014 à 06:56 -0400, Neil Horman a écrit :
> > On Mon, Mar 31, 2014 at 10:17:48AM +0200, Jean Delvare wrote:
> > > Hi Neil and all,
> > > 
> > > I have (once again) a question about this commit:
> > > 
> > > From: Neil Horman <nhorman@tuxdriver.com>
> > > Date: Tue, 16 Apr 2013 20:38:32 +0000
> > > Subject: iommu/vt-d: add quirk for broken interrupt remapping on 55XX chipsets
> > > Git-commit: 03bbcb2e7e292838bb0244f5a7816d194c911d62
> > > 
> > > When interrupt remapping is disabled by this quirk, the kernel gets
> > > tainted. What is the rationale for doing that?
> > > 
> > > The user can boot with intremap=off. That will also disable interrupt
> > > remapping, as the quirk does, but not taint the kernel. If this is
> > > considered OK then I fail to see why the quirk should behave differently
> > > and taint the kernel.
> > > 
> > > Thanks,
> > The quirk is intented to flag to the user the fact that BIOS has not followed
> > the recommended procedure that was laied out in the intel published errata
> > sheet.  Arguably you could say that we should still taint the kernel in the
> > event that intremap=off is still specified, but it seems pragmatic not to do so,
> > as the use of that option suggsts the administrator has asserted a workaround to
> > the problem that is identical to the fix (in the event that the BIOS vendor has
> > not released an update).
> 
> That doesn't really answer my question. While I understand that the
> preferred fix is that the BIOS disables the feature, how bad are we if
> it does not and the kernel has to do it?
> 
For exactly the reason Prarit indicated.  Because the way its currently coded
gives me valuable information when debugging systems.  It tells me that the
customer is running with a bios that exposes a bug in the hardware, that while
we can work around dynamically, we should still inform them of.

> We normally taint the kernel when the situation is such that debugging
> the kernel would be a waste of time.
Um, no.  Thats absolutely not the reason we taint the kernel.  If you are using
the kernel taint as an excuse to shurg off support responsibiilty, you're doing
it wrong.

> For example, because a binary
> driver was loaded, or a module was forcibly unloaded, etc.
That just means that those things happeend, which may be (and very likely are)
relevant to the debugging process.  It means that you need to ask the bug
reporter about it, get in touch with the module vendor, Attempt to recreate
without the tainting actions, etc.  It by no means indicates debugging is
"useless"

> How does that
> apply here? If the quirk kicks in, aren't we just as safe as if the BIOS
> had disabled the feature? If not, then I would like to understand why,
> and document it properly.
> 
Yes, we are just as safe, but see above, the reasons we taint the kernel aren't
the reasons you think.

Regards
Neil

> Thanks,
> -- 
> Jean Delvare
> SUSE L3 Support
> 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-03-31 15:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-03-31  8:17 Interrupt remapping quirk tainting the kernel Jean Delvare
2014-03-31 10:56 ` Neil Horman
2014-03-31 14:18   ` Jean Delvare
2014-03-31 15:07     ` Prarit Bhargava
2014-03-31 15:28     ` Neil Horman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).