public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Bjorn Helgaas <helgaas@kernel.org>
Cc: "Mariusz Tkaczyk" <mariusz.tkaczyk@linux.intel.com>,
	linux-pci@vger.kernel.org, "Christoph Hellwig" <hch@lst.de>,
	"Ilpo Järvinen" <ilpo.jarvinen@linux.intel.com>,
	"Stuart Hayes" <stuart.w.hayes@gmail.com>,
	"Arnd Bergmann" <arnd@arndb.de>,
	"Bjorn Helgaas" <bhelgaas@google.com>,
	"Dan Williams" <dan.j.williams@intel.com>,
	"Greg Kroah-Hartman" <gregkh@linuxfoundation.org>,
	"Keith Busch" <kbusch@kernel.org>,
	"Marek Behun" <marek.behun@nic.cz>, "Pavel Machek" <pavel@ucw.cz>,
	"Randy Dunlap" <rdunlap@infradead.org>,
	"Andy Shevchenko" <andriy.shevchenko@linux.intel.com>
Subject: Re: [PATCH v6 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
Date: Thu, 15 Aug 2024 07:45:09 +0200	[thread overview]
Message-ID: <Zr2V5XqTAMSiEDJ-@wunner.de> (raw)
In-Reply-To: <20240814214930.GA5507@bhelgaas>

On Wed, Aug 14, 2024 at 04:49:30PM -0500, Bjorn Helgaas wrote:
> On Wed, Aug 14, 2024 at 02:28:59PM +0200, Mariusz Tkaczyk wrote:
> > +	/*
> > +	 * Use lazy loading for active_indications to not play with initcalls.
> > +	 * It is needed to allow _DSM initialization on DELL platforms, where
> > +	 * ACPI_IPMI must be loaded first.
> > +	 */
> > +	unsigned int active_inds_initialized:1;
> 
> What's going on here?  I hope we can at least move this to the _DSM
> patch since it seems related to that, not to the NPEM capability.  I
> don't understand the initcall reference or what "lazy loading" means.

In previous iterations of this series, the status of all LEDs was
read on PCI device enumeration.  That was done so that when user space
reads the brightness is sysfs, it gets the correct value.  The value
is cached, it's not re-read from the register on every brightness read.

(It's not guaranteed that all LEDs are off on enumeration.  E.g. boot
firmware may have fiddled with them, or the enclosure itself may have
turned some of them on by itself, typically the "ok" LED.)

However Stuart reported issues when the _DSM interface is used on
Dell servers, because the _DSM requires IPMI drivers to access the
NPEM registers.  He got a ton of errors when LED status was read on
enumeration because that was simply too early.  Start of thread:

https://lore.kernel.org/all/05455f36-7027-4fd6-8af7-4fe8e483f25c@gmail.com/

The solution is to read LED status lazily, when brightness is read
or written for the first time through sysfs.  At that point, IPMI
drivers are typically loaded.  Stuart reported success with this
approach.  There is still a possibility that users may see issues
if they access brightness before IPMI drivers are loaded.  Those
drivers may be modules and user space might overzealously try to
access brightness before they're loaded.  Or user space may prevent
them from loading by blacklisting or not installing them.  In which
case users get to keep the pieces.

We discussed various alternative approaches in the above-linked thread
but concluded that this pragmatic solution is the simplest that does
the job for all but the most pathological use cases.

We wanted to make this work on Dell servers, but at the same time
minimize the contortions that we need to go through to accommodate
their quirky implementation.

The code uses lazy initialization of LED status even in the native
NPEM case because it would make the code more complex to use early
initialization for direct NPEM register access and lazy initialization
for _DSM-mediated register access.


> Is there some existing ACPI ordering that guarantees ACPI_IPMI happens
> first?  Why do we need some Dell-specific thing here?
> 
> What is ACPI_IPMI?  I guess it refers to the "acpi_ipmi" module,
> acpi_ipmi.c?

As it turned out in the above-linked thread, just forcing ACPI_IPMI=y
for NPEM is not sufficient because additional (Dell-specific) IPMI
drivers need to be loaded as well for NPEM register access to work
through _DSM.


> > +void pci_npem_create(struct pci_dev *dev)
> > +{
> > +	const struct npem_ops *ops = &npem_ops;
> > +	int pos = 0, ret;
> > +	u32 cap;
> > +
> > +	if (!npem_has_dsm(dev)) {
> > +		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM);
> > +		if (pos == 0)
> > +			return;
> > +
> > +		if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 ||
> > +		    (cap & PCI_NPEM_CAP_CAPABLE) == 0)
> > +			return;
> > +	} else {
> > +		/*
> > +		 * OS should use the DSM for LED control if it is available
> > +		 * PCI Firmware Spec r3.3 sec 4.7.
> > +		 */
> > +		return;
> > +	}
> 
> I know this is sort of a transient state since the next patch adds
> full _DSM support, but I do think (a) the fact that NPEM will stop
> working simply because firmware adds _DSM support is unexpected
> behavior, and (b) npem_has_dsm() and the other ACPI-related stuff
> would fit better in the next patch.  It's a little strange to have
> them mixed here.

PCI Firmware Spec r3.3 sec 4.7 says:

   "OSPM should use this _DSM when available. If this _DSM is not
    available, OSPM should use Native PCIe Enclosure Management (NPEM)
    or SCSI Enclosure Services (SES) instead, if available."

I realize that a "should" is not a "must", so Linux would in principle
be allowed to use direct register access despite presence of the _DSM.

However that doesn't feel safe.  If the _DSM is present, I think it's
fair to assume that the platform firmware wants to control at least
a portion of the LEDs itself.  Accessing those LEDs directly, behind the
platform firmware's back, may cause issues.  Not exposing the LEDs
to the user in the _DSM case therefore seems safer.

Which is why the ACPI stuff to query for _DSM presence is already in
this patch instead of the succeeding one.

Thanks,

Lukas

  reply	other threads:[~2024-08-15  5:50 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-08-14 12:28 [PATCH v6 0/3] PCIe Enclosure LED Management Mariusz Tkaczyk
2024-08-14 12:28 ` [PATCH v6 1/3] leds: Init leds class earlier Mariusz Tkaczyk
2024-08-14 12:28 ` [PATCH v6 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support Mariusz Tkaczyk
2024-08-14 21:49   ` Bjorn Helgaas
2024-08-15  5:45     ` Lukas Wunner [this message]
2024-08-15 17:42       ` Bjorn Helgaas
2024-08-20 13:25         ` Mariusz Tkaczyk
2024-08-20 13:52         ` Lukas Wunner
2024-08-14 12:29 ` [PATCH v6 3/3] PCI/NPEM: Add _DSM PCIe SSD status LED management Mariusz Tkaczyk
2024-08-14 20:27 ` [PATCH v6 0/3] PCIe Enclosure LED Management Bjorn Helgaas
2024-08-21 14:02   ` Lee Jones

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Zr2V5XqTAMSiEDJ-@wunner.de \
    --to=lukas@wunner.de \
    --cc=andriy.shevchenko@linux.intel.com \
    --cc=arnd@arndb.de \
    --cc=bhelgaas@google.com \
    --cc=dan.j.williams@intel.com \
    --cc=gregkh@linuxfoundation.org \
    --cc=hch@lst.de \
    --cc=helgaas@kernel.org \
    --cc=ilpo.jarvinen@linux.intel.com \
    --cc=kbusch@kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=marek.behun@nic.cz \
    --cc=mariusz.tkaczyk@linux.intel.com \
    --cc=pavel@ucw.cz \
    --cc=rdunlap@infradead.org \
    --cc=stuart.w.hayes@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox