linux-acpi.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Quirking acpi_enforce_resources
@ 2010-03-01 21:01 Chase Douglas
  2010-03-01 22:04 ` Matthew Garrett
  0 siblings, 1 reply; 6+ messages in thread
From: Chase Douglas @ 2010-03-01 21:01 UTC (permalink / raw)
  To: linux-acpi

Hello,

I've noticed there are a lot of bugs filed against Ubuntu due to the
changes that introduced strict resource checking between the acpi driver
and legacy hardware monitor drivers [1]. Although I understand the
reason for the strict checking, the legacy drivers seemed to work fine.
I'm unaware of any real issues that were caused by using them. Thus, I'm
wondering if it would be worthwhile to whitelist machines that are known
to work OK even with the resources being doubly held by both the acpi
driver and a legacy driver. It certainly does not seem prudent to set
acpi_enforce_resources=lax across the board, but the change to strict
resource checking seems to have done more harm than good for the average
user.

I noticed a similar thread at [2]. At the end it was proposed that a
whitelist be set up to handle this issue. Has anyone been working on
this?

Beyond the potential breakage of function, I'm also concerned by the
logging level of the message that informs the user of the resource
contention. Right now, KERN_ERR is used by default when a contention is
found. Basically, the message means "You tried to use a legacy driver,
but that's not a good idea, so I've prevented you from doing so." It
seems ok on the one hand to call this an error condition. On the other
hand, it pollutes Ubuntu's startup screen because it thinks there's a
real error occurring in the kernel. From this point of reference, it
seems better to emit the message at the KERN_WARN level at all times,
since it's not really something that a user should be frightened of when
they see their computer spit out such a message at boot.

Any thoughts on these two ideas? I can work on some patches if either is
deemed worthwhile.

Thanks,
Chase Douglas

[1]
https://launchpad.net/+search?field.text=acpi_enforce_resources&field.actions.search=Search
[2] http://marc.info/?l=linux-acpi&m=125233061012344&w=2


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Quirking acpi_enforce_resources
  2010-03-01 21:01 Quirking acpi_enforce_resources Chase Douglas
@ 2010-03-01 22:04 ` Matthew Garrett
  2010-03-01 22:15   ` Chase Douglas
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Garrett @ 2010-03-01 22:04 UTC (permalink / raw)
  To: Chase Douglas; +Cc: linux-acpi

On Mon, Mar 01, 2010 at 04:01:37PM -0500, Chase Douglas wrote:
> Hello,
> 
> I've noticed there are a lot of bugs filed against Ubuntu due to the
> changes that introduced strict resource checking between the acpi driver
> and legacy hardware monitor drivers [1]. Although I understand the
> reason for the strict checking, the legacy drivers seemed to work fine.
> I'm unaware of any real issues that were caused by using them.

They'll work absolutely fine until both ACPI and the native driver 
attempt to use the bus simultaneously, at which point there's a risk 
that you'll end up writing to the wrong register and causing hardware 
damage. The probability of this is tiny - you'd need two uncommon things 
to happen at exactly the same time. That doesn't mean it's safe.

> Thus, I'm wondering if it would be worthwhile to whitelist machines 
> that are known to work OK even with the resources being doubly held by 
> both the acpi driver and a legacy driver. It certainly does not seem 
> prudent to set acpi_enforce_resources=lax across the board, but the 
> change to strict resource checking seems to have done more harm than 
> good for the average user.

The average user now has no chance of this causing a spurious system 
shutdown due to a false temperature reading, and also no chance of this 
causing invalid values to be written to a device causing it to brick the 
hardware. If users are willing to accept the (admittedly small) risk, 
they get to pass the argument. It's not reasonable for the upstream 
kernel to do this.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Quirking acpi_enforce_resources
  2010-03-01 22:04 ` Matthew Garrett
@ 2010-03-01 22:15   ` Chase Douglas
  2010-03-01 22:22     ` Matthew Garrett
  0 siblings, 1 reply; 6+ messages in thread
From: Chase Douglas @ 2010-03-01 22:15 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-acpi

On Mon, 2010-03-01 at 22:04 +0000, Matthew Garrett wrote:
> On Mon, Mar 01, 2010 at 04:01:37PM -0500, Chase Douglas wrote:
> > Thus, I'm wondering if it would be worthwhile to whitelist machines 
> > that are known to work OK even with the resources being doubly held by 
> > both the acpi driver and a legacy driver. It certainly does not seem 
> > prudent to set acpi_enforce_resources=lax across the board, but the 
> > change to strict resource checking seems to have done more harm than 
> > good for the average user.
> 
> The average user now has no chance of this causing a spurious system 
> shutdown due to a false temperature reading, and also no chance of this 
> causing invalid values to be written to a device causing it to brick the 
> hardware. If users are willing to accept the (admittedly small) risk, 
> they get to pass the argument. It's not reasonable for the upstream 
> kernel to do this.

I understand your points, but from the user's perspective they had
something that worked perfectly fine before, and now it doesn't. Have
there been any reports of anyone's hardware being adversely affected by
doubly acquiring the acpi region? Beyond that, have there been any
reports of any adverse affects of any kind?

I'm not advocating for enabling acpi_enforce_resources=lax across the
board. That would be foolish, especially since there is an existing acpi
driver that would be harmed. However, a whitelist of known-working
hardware would allow us to cater to users needs while still being fairly
careful.

Thanks,
Chase


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Quirking acpi_enforce_resources
  2010-03-01 22:15   ` Chase Douglas
@ 2010-03-01 22:22     ` Matthew Garrett
  2010-03-01 22:35       ` Chase Douglas
  0 siblings, 1 reply; 6+ messages in thread
From: Matthew Garrett @ 2010-03-01 22:22 UTC (permalink / raw)
  To: Chase Douglas; +Cc: linux-acpi

On Mon, Mar 01, 2010 at 05:15:26PM -0500, Chase Douglas wrote:

> I understand your points, but from the user's perspective they had
> something that worked perfectly fine before, and now it doesn't. Have
> there been any reports of anyone's hardware being adversely affected by
> doubly acquiring the acpi region? Beyond that, have there been any
> reports of any adverse affects of any kind?

I really don't understand your position here. Like I said, the 
probability of a collision between ACPI and the OS is low. On the other 
hand, the potential outcome of such a collision is hardware damage. This 
isn't even close to being something that should be considered.

> I'm not advocating for enabling acpi_enforce_resources=lax across the
> board. That would be foolish, especially since there is an existing acpi
> driver that would be harmed. However, a whitelist of known-working
> hardware would allow us to cater to users needs while still being fairly
> careful.

How are you defining "known-working"? You've verified that the system 
management code on the hardware in question makes no accesses to the 
smbus?

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Quirking acpi_enforce_resources
  2010-03-01 22:22     ` Matthew Garrett
@ 2010-03-01 22:35       ` Chase Douglas
  2010-03-01 22:45         ` Matthew Garrett
  0 siblings, 1 reply; 6+ messages in thread
From: Chase Douglas @ 2010-03-01 22:35 UTC (permalink / raw)
  To: Matthew Garrett; +Cc: linux-acpi

On Mon, 2010-03-01 at 22:22 +0000, Matthew Garrett wrote:
> On Mon, Mar 01, 2010 at 05:15:26PM -0500, Chase Douglas wrote:
> 
> > I understand your points, but from the user's perspective they had
> > something that worked perfectly fine before, and now it doesn't. Have
> > there been any reports of anyone's hardware being adversely affected by
> > doubly acquiring the acpi region? Beyond that, have there been any
> > reports of any adverse affects of any kind?
> 
> I really don't understand your position here. Like I said, the 
> probability of a collision between ACPI and the OS is low. On the other 
> hand, the potential outcome of such a collision is hardware damage. This 
> isn't even close to being something that should be considered.
> 
> > I'm not advocating for enabling acpi_enforce_resources=lax across the
> > board. That would be foolish, especially since there is an existing acpi
> > driver that would be harmed. However, a whitelist of known-working
> > hardware would allow us to cater to users needs while still being fairly
> > careful.
> 
> How are you defining "known-working"? You've verified that the system 
> management code on the hardware in question makes no accesses to the 
> smbus?

I'm defining "known-working" to refer to the amount of bugs opened
related to the fact that their drivers used to work fine, but now they
don't.

As are many things, this is a risk-reward tradeoff. If there was even
one single instance of anyone being harmed by native hwmon drivers, I
wouldn't have attempted to bring this up. However, I am bringing this up
not necessarily because I believe this is the best way, but because I
think we should at least get a discussion out in the open so that others
can contemplate the issue.

If consensus is that the risk is not worth it, then I can go back to end
users and say "The gurus say this is too risky and could harm your
hardware, please wait for proper ACPI drivers." If the consensus is that
it will be too hard to determine a proper whitelist of working devices,
then I can go back with that as well. Right now though, there's no
stance that's been taken by anyone, so we just have users who are upset
that their hardware/software isn't working "right".

P.S. I also don't want to drown out a discussion on the log level of the
message produced. Please see the first message of this thread for
details.

Thanks,
Chase


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: Quirking acpi_enforce_resources
  2010-03-01 22:35       ` Chase Douglas
@ 2010-03-01 22:45         ` Matthew Garrett
  0 siblings, 0 replies; 6+ messages in thread
From: Matthew Garrett @ 2010-03-01 22:45 UTC (permalink / raw)
  To: Chase Douglas; +Cc: linux-acpi

On Mon, Mar 01, 2010 at 05:35:31PM -0500, Chase Douglas wrote:
> On Mon, 2010-03-01 at 22:22 +0000, Matthew Garrett wrote:
> > How are you defining "known-working"? You've verified that the system 
> > management code on the hardware in question makes no accesses to the 
> > smbus?
> 
> I'm defining "known-working" to refer to the amount of bugs opened
> related to the fact that their drivers used to work fine, but now they
> don't.

You're defining "known-working" as "hasn't bricked any hardware yet, as 
far as we know, even though examining the code reveals that it's a 
possibility"? I'm not enthusiastic.

> As are many things, this is a risk-reward tradeoff. If there was even
> one single instance of anyone being harmed by native hwmon drivers, I
> wouldn't have attempted to bring this up. However, I am bringing this up
> not necessarily because I believe this is the best way, but because I
> think we should at least get a discussion out in the open so that others
> can contemplate the issue.

We've had several cases of people having critical thermal shutdowns and 
suffering data loss that were traced back to this issue.

> If consensus is that the risk is not worth it, then I can go back to end
> users and say "The gurus say this is too risky and could harm your
> hardware, please wait for proper ACPI drivers." If the consensus is that
> it will be too hard to determine a proper whitelist of working devices,
> then I can go back with that as well. Right now though, there's no
> stance that's been taken by anyone, so we just have users who are upset
> that their hardware/software isn't working "right".

The stance was made clear by the changing of the default value - if your 
system firmware says that the OS shouldn't touch these resources, the OS 
will not touch those resources.

> P.S. I also don't want to drown out a discussion on the log level of the
> message produced. Please see the first message of this thread for
> details.

I agree that that should probably be KERN_INFO.

-- 
Matthew Garrett | mjg59@srcf.ucam.org

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2010-03-01 22:45 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-03-01 21:01 Quirking acpi_enforce_resources Chase Douglas
2010-03-01 22:04 ` Matthew Garrett
2010-03-01 22:15   ` Chase Douglas
2010-03-01 22:22     ` Matthew Garrett
2010-03-01 22:35       ` Chase Douglas
2010-03-01 22:45         ` Matthew Garrett

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).