public inbox for linux-pci@vger.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/3] PCIe Enclosure LED Management
@ 2024-05-28 13:19 Mariusz Tkaczyk
  2024-05-28 13:19 ` [PATCH v2 1/3] leds: Init leds class earlier Mariusz Tkaczyk
                   ` (2 more replies)
  0 siblings, 3 replies; 22+ messages in thread
From: Mariusz Tkaczyk @ 2024-05-28 13:19 UTC (permalink / raw)
  To: linux-pci
  Cc: Mariusz Tkaczyk, Arnd Bergmann, Bjorn Helgaas, Dan Williams,
	Greg Kroah-Hartman, Ilpo Jarvinen, Lukas Wunner, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko,
	Stuart Hayes

Patchset is named as PCIe Enclosure LED Management because it adds two features:
- Native PCIe Enclosure Management (NPEM)
- PCIe SSD Status LED Management (DSM)

Both are pattern oriented standards, they tell which "indication" should blink.
It doesn't control physical LED or pattern visualization.

Overall, driver is simple but it was not simple to fit it into interfaces
we have in kernel (We considered leds and enclosure interfaces). It reuses
leds interface, this approach seems to be the best because:
- leds are actively maintained, no new interface added.
- leds do not require any extensions, enclosure needs to be adjusted first.

There are trade-offs:
- "brightness" is the name of sysfs file to control led. It is not
  natural to use brightness to set patterns, that is why multiple led
  devices are created (one per indication);
- Update of one led may affect other leds, led triggers may not work
  as expected.

This patchset was made in close collaboration with Lucas Wunner.

Before it can be merged, DSM tests are needed.
Stuart, please test it and let us know.

Changes from v1:
- Renamed "pattern" to indication.
- DSM support added (not tested yet!).
- fixed nits reported by Bjorn.

Cc: Arnd Bergmann <arnd@arndb.de>
Cc: Bjorn Helgaas <bhelgaas@google.com>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: Ilpo Jarvinen <ilpo.jarvinen@linux.intel.com>
Cc: Lukas Wunner <lukas@wunner.de>
Cc: Keith Busch <kbusch@kernel.org>
Cc: Marek Behun <marek.behun@nic.cz>
Cc: Pavel Machek <pavel@ucw.cz>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
Cc: Stuart Hayes <stuart.w.hayes@gmail.com>
Link: https://lore.kernel.org/linux-pci/20240215142345.6073-1-mariusz.tkaczyk@linux.intel.com/

Mariusz Tkaczyk (3):
  leds: Init leds class earlier
  PCI/NPEM: Add Native PCIe Enclosure Management support
  PCI/NPEM: Add _DSM PCIe SSD status LED management

 drivers/leds/led-class.c      |   2 +-
 drivers/pci/Kconfig           |   9 +
 drivers/pci/Makefile          |   1 +
 drivers/pci/npem.c            | 551 ++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h             |   8 +
 drivers/pci/probe.c           |   2 +
 drivers/pci/remove.c          |   2 +
 include/linux/pci.h           |   3 +
 include/uapi/linux/pci_regs.h |  34 +++
 9 files changed, 611 insertions(+), 1 deletion(-)
 create mode 100644 drivers/pci/npem.c

-- 
2.35.3


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH v2 1/3] leds: Init leds class earlier
  2024-05-28 13:19 [PATCH v2 0/3] PCIe Enclosure LED Management Mariusz Tkaczyk
@ 2024-05-28 13:19 ` Mariusz Tkaczyk
  2024-05-29  4:22   ` Dan Williams
  2024-05-28 13:19 ` [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support Mariusz Tkaczyk
  2024-05-28 13:19 ` [PATCH v2 3/3] PCI/NPEM: Add _DSM PCIe SSD status LED management Mariusz Tkaczyk
  2 siblings, 1 reply; 22+ messages in thread
From: Mariusz Tkaczyk @ 2024-05-28 13:19 UTC (permalink / raw)
  To: linux-pci
  Cc: Mariusz Tkaczyk, Arnd Bergmann, Bjorn Helgaas, Dan Williams,
	Greg Kroah-Hartman, Ilpo Jarvinen, Lukas Wunner, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko,
	Stuart Hayes

PCI subsystem is registered on subsys_initcall() same as leds class.
NPEM driver will require leds class, there is race possibility.

Register led class on arch_initcall() to avoid it.

Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
---
 drivers/leds/led-class.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/leds/led-class.c b/drivers/leds/led-class.c
index 24fcff682b24..bed7b6ea9b98 100644
--- a/drivers/leds/led-class.c
+++ b/drivers/leds/led-class.c
@@ -681,7 +681,7 @@ static void __exit leds_exit(void)
 	class_unregister(&leds_class);
 }
 
-subsys_initcall(leds_init);
+arch_initcall(leds_init);
 module_exit(leds_exit);
 
 MODULE_AUTHOR("John Lenz, Richard Purdie");
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-05-28 13:19 [PATCH v2 0/3] PCIe Enclosure LED Management Mariusz Tkaczyk
  2024-05-28 13:19 ` [PATCH v2 1/3] leds: Init leds class earlier Mariusz Tkaczyk
@ 2024-05-28 13:19 ` Mariusz Tkaczyk
  2024-05-28 13:55   ` Ilpo Järvinen
                     ` (2 more replies)
  2024-05-28 13:19 ` [PATCH v2 3/3] PCI/NPEM: Add _DSM PCIe SSD status LED management Mariusz Tkaczyk
  2 siblings, 3 replies; 22+ messages in thread
From: Mariusz Tkaczyk @ 2024-05-28 13:19 UTC (permalink / raw)
  To: linux-pci
  Cc: Mariusz Tkaczyk, Arnd Bergmann, Bjorn Helgaas, Dan Williams,
	Greg Kroah-Hartman, Ilpo Jarvinen, Lukas Wunner, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko,
	Stuart Hayes

Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows
managing LED in storage enclosures. NPEM is indication oriented
and it does not give direct access to LED. Although each of
the indications *could* represent an individual LED, multiple
indications could also be represented as a single,
multi-color LED or a single LED blinking in a specific interval.
The specification leaves that open.

Each enabled indication (capability register bit on) is represented as a
ledclass_dev which can be controlled through sysfs. For every ledclass
device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0).
It is corresponding to NPEM control register (Indication bit on/off).

Ledclass devices appear in sysfs as child devices (subdirectory) of PCI
device which has an NPEM Extended Capability and indication is enabled
in NPEM capability register. For example, these are leds created for
pcieport "10000:02:05.0" on my setup:

leds/
├── 10000:02:05.0:enclosure:fail
├── 10000:02:05.0:enclosure:locate
├── 10000:02:05.0:enclosure:ok
└── 10000:02:05.0:enclosure:rebuild

They can be also found in "/sys/class/leds" directory. Parent PCIe device
bdf is used to guarantee uniqueness across leds subsystem.

To enable/disable fail indication "brightness" file can be edited:
echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness
echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness

PCIe r6.1, sec 7.9.19.2 defines the possible indications.

Multiple indications for same parent PCIe device can conflict and
hardware may update them when processing new request. To avoid issues,
driver refresh all indications by reading back control register.

Driver is projected to be exclusive NPEM extended capability manager.
It waits up to 1 second after imposing new request, it doesn't verify if
controller is busy before write, assuming that mutex lock gives protection
from concurrent updates. Driver is not registered if _DSM LED management
is available.

NPEM is a PCIe extended capability so it should be registered in
pcie_init_capabilities() but it is not possible due to LED dependency.
Parent pci_device must be added earlier for led_classdev_register()
to be successful. NPEM does not require configuration on kernel side, it
is safe to register LED devices later.

Link: https://members.pcisig.com/wg/PCI-SIG/document/19849 [1]
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
---
 drivers/pci/Kconfig           |   9 +
 drivers/pci/Makefile          |   1 +
 drivers/pci/npem.c            | 410 ++++++++++++++++++++++++++++++++++
 drivers/pci/pci.h             |   8 +
 drivers/pci/probe.c           |   2 +
 drivers/pci/remove.c          |   2 +
 include/linux/pci.h           |   3 +
 include/uapi/linux/pci_regs.h |  34 +++
 8 files changed, 469 insertions(+)
 create mode 100644 drivers/pci/npem.c

diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
index d35001589d88..e696e69ad516 100644
--- a/drivers/pci/Kconfig
+++ b/drivers/pci/Kconfig
@@ -143,6 +143,15 @@ config PCI_IOV
 
 	  If unsure, say N.
 
+config PCI_NPEM
+	bool "Native PCIe Enclosure Management"
+	depends on LEDS_CLASS=y
+	help
+	  Support for Native PCIe Enclosure Management. It allows managing LED
+	  indications in storage enclosures. Enclosure must support following
+	  indications: OK, Locate, Fail, Rebuild, other indications are
+	  optional.
+
 config PCI_PRI
 	bool "PCI PRI support"
 	select PCI_ATS
diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
index 175302036890..cd5f655d4be9 100644
--- a/drivers/pci/Makefile
+++ b/drivers/pci/Makefile
@@ -34,6 +34,7 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
 obj-$(CONFIG_VGA_ARB)		+= vgaarb.o
 obj-$(CONFIG_PCI_DOE)		+= doe.o
 obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o
+obj-$(CONFIG_PCI_NPEM)		+= npem.o
 
 # Endpoint library must be initialized before its users
 obj-$(CONFIG_PCI_ENDPOINT)	+= endpoint/
diff --git a/drivers/pci/npem.c b/drivers/pci/npem.c
new file mode 100644
index 000000000000..a76a2044dab2
--- /dev/null
+++ b/drivers/pci/npem.c
@@ -0,0 +1,410 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * PCIe Enclosure management driver created for LED interfaces based on
+ * indications. It says *what indications* blink but does not specify *how*
+ * they blink - it is hardware defined.
+ *
+ * The driver name refers to Native PCIe Enclosure Management. It is
+ * first indication oriented standard with specification.
+ *
+ * Native PCIe Enclosure Management (NPEM)
+ *	PCIe Base Specification r6.1 sec 6.28
+ *	PCIe Base Specification r6.1 sec 7.9.19
+ *
+ * Copyright (c) 2023-2024 Intel Corporation
+ *	Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
+ */
+
+#include <linux/acpi.h>
+#include <linux/bitops.h>
+#include <linux/errno.h>
+#include <linux/iopoll.h>
+#include <linux/leds.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pci_regs.h>
+#include <linux/types.h>
+#include <linux/uleds.h>
+
+#include "pci.h"
+
+struct indication {
+	u32 bit;
+	const char *name;
+};
+
+static const struct indication npem_indications[] = {
+	{PCI_NPEM_IND_OK,	"enclosure:ok"},
+	{PCI_NPEM_IND_LOCATE,	"enclosure:locate"},
+	{PCI_NPEM_IND_FAIL,	"enclosure:fail"},
+	{PCI_NPEM_IND_REBUILD,	"enclosure:rebuild"},
+	{PCI_NPEM_IND_PFA,	"enclosure:pfa"},
+	{PCI_NPEM_IND_HOTSPARE,	"enclosure:hotspare"},
+	{PCI_NPEM_IND_ICA,	"enclosure:ica"},
+	{PCI_NPEM_IND_IFA,	"enclosure:ifa"},
+	{PCI_NPEM_IND_IDT,	"enclosure:idt"},
+	{PCI_NPEM_IND_DISABLED,	"enclosure:disabled"},
+	{PCI_NPEM_IND_SPEC_0,	"enclosure:specific_0"},
+	{PCI_NPEM_IND_SPEC_1,	"enclosure:specific_1"},
+	{PCI_NPEM_IND_SPEC_2,	"enclosure:specific_2"},
+	{PCI_NPEM_IND_SPEC_3,	"enclosure:specific_3"},
+	{PCI_NPEM_IND_SPEC_4,	"enclosure:specific_4"},
+	{PCI_NPEM_IND_SPEC_5,	"enclosure:specific_5"},
+	{PCI_NPEM_IND_SPEC_6,	"enclosure:specific_6"},
+	{PCI_NPEM_IND_SPEC_7,	"enclosure:specific_7"},
+	{0,			NULL}
+};
+
+#define for_each_indication(ind, inds) \
+	for (ind = inds; ind->bit; ind++)
+
+/* To avoid confusion, do not keep any special bits in indications */
+static u32 reg_to_indications(u32 caps, const struct indication *inds)
+{
+	const struct indication *ind;
+	u32 supported_indications = 0;
+
+	for_each_indication(ind, inds)
+		supported_indications |= ind->bit;
+
+	return caps & supported_indications;
+}
+
+/**
+ * struct npem_led - LED details
+ * @indication: indication details
+ * @npem: npem device
+ * @name: LED name
+ * @led: LED device
+ */
+struct npem_led {
+	const struct indication *indication;
+	struct npem *npem;
+	char name[LED_MAX_NAME_SIZE];
+	struct led_classdev led;
+};
+
+/**
+ * struct npem_ops - backend specific callbacks
+ * @inds: supported indications array
+ * @get_active_indications: get active indications
+ *	npem: npem device
+ *	buf: response buffer
+ * @set_active_indications: set new indications
+ *	npem: npem device
+ *	val: bit mask to set
+ *
+ * Handle communication with hardware. set_active_indications updates
+ * npem->active_indications.
+ */
+struct npem_ops {
+	const struct indication *inds;
+	int (*get_active_indications)(struct npem *npem, u32 *buf);
+	int (*set_active_indications)(struct npem *npem, u32 val);
+};
+
+/**
+ * struct npem - NPEM device properties
+ * @dev: PCIe device this driver is attached to
+ * @ops: Backend specific callbacks
+ * @npem_lock: to keep concurrent updates serialized
+ * @pos: NPEM capability offset (only relevant for NPEM direct register access,
+ *	 not DSM access method)
+ * @supported_indications: bit mask of supported indications
+ *			   non-indication and reserved bits are cleared
+ * @active_indications: bit mask of active indications
+ *			non-indication and reserved bits are cleared
+ * @led_cnt: Supported LEDs count
+ * @leds: supported LEDs
+ */
+struct npem {
+	struct pci_dev *dev;
+	const struct npem_ops *ops;
+	struct mutex npem_lock;
+	u16 pos;
+	u32 supported_indications;
+	u32 active_indications;
+	int led_cnt;
+	struct npem_led leds[];
+};
+
+static int npem_read_reg(struct npem *npem, u16 reg, u32 *val)
+{
+	int ret = pci_read_config_dword(npem->dev, npem->pos + reg, val);
+
+	return pcibios_err_to_errno(ret);
+}
+
+static int npem_write_ctrl(struct npem *npem, u32 reg)
+{
+	int pos = npem->pos + PCI_NPEM_CTRL;
+	int ret = pci_write_config_dword(npem->dev, pos, reg);
+
+	return pcibios_err_to_errno(ret);
+}
+
+static int npem_get_active_indications(struct npem *npem, u32 *buf)
+{
+	int ret;
+
+	ret = npem_read_reg(npem, PCI_NPEM_CTRL, buf);
+	if (ret)
+		return ret;
+
+	/* If PCI_NPEM_CTRL_ENABLE is not set then no indication should blink */
+	if (!(*buf & PCI_NPEM_CTRL_ENABLE))
+		*buf = 0;
+
+	/* Filter out not supported indications in response */
+	*buf &= npem->supported_indications;
+	return 0;
+}
+
+static int npem_set_active_indications(struct npem *npem, u32 val)
+{
+	int ret, ret_val;
+	u32 cc_status;
+
+	lockdep_assert_held(&npem->npem_lock);
+
+	/* This bit is always required */
+	val |= PCI_NPEM_CTRL_ENABLE;
+	ret = npem_write_ctrl(npem, val);
+	if (ret)
+		return ret;
+
+	/*
+	 * For the case where a NPEM command has not completed immediately,
+	 * it is recommended that software not continuously “spin” on polling
+	 * the status register, but rather poll under interrupt at a reduced
+	 * rate; for example at 10 ms intervals.
+	 *
+	 * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of NPEM
+	 * Command Completed"
+	 */
+	ret = read_poll_timeout(npem_read_reg, ret_val,
+				ret_val || (cc_status & PCI_NPEM_STATUS_CC),
+				10 * USEC_PER_MSEC, USEC_PER_SEC, false, npem,
+				PCI_NPEM_STATUS, &cc_status);
+	if (ret)
+		return ret_val;
+
+	/*
+	 * All writes to control register, including writes that do not change
+	 * the register value, are NPEM commands and should eventually result
+	 * in a command completion indication in the NPEM Status Register.
+	 *
+	 * PCIe Base Specification r6.1 sec 7.9.19.3
+	 *
+	 * Register may not be updated, or other conflicting bits may be
+	 * cleared. Spec is not strict here. Read NPEM Control register after
+	 * write to keep cache in-sync.
+	 */
+	return npem_get_active_indications(npem, &npem->active_indications);
+}
+
+static const struct npem_ops npem_ops = {
+	.inds = npem_indications,
+	.get_active_indications = npem_get_active_indications,
+	.set_active_indications = npem_set_active_indications,
+};
+
+#define DSM_GUID GUID_INIT(0x5d524d9d, 0xfff9, 0x4d4b,  0x8c, 0xb7, 0x74, 0x7e,\
+			   0xd5, 0x1e, 0x19, 0x4d)
+#define GET_SUPPORTED_STATES_DSM	BIT(1)
+#define GET_STATE_DSM			BIT(2)
+#define SET_STATE_DSM			BIT(3)
+
+static const guid_t dsm_guid = DSM_GUID;
+
+static bool npem_has_dsm(struct pci_dev *pdev)
+{
+	acpi_handle handle;
+
+	handle = ACPI_HANDLE(&pdev->dev);
+	if (!handle)
+		return false;
+
+	return acpi_check_dsm(handle, &dsm_guid, 0x1, GET_SUPPORTED_STATES_DSM |
+			      GET_STATE_DSM | SET_STATE_DSM);
+}
+
+/*
+ * This function does not use ops->get_active_indications().
+ * It returns cached value, npem->npem_lock is held and it is safe.
+ */
+static enum led_brightness brightness_get(struct led_classdev *led)
+{
+	struct npem_led *nled = container_of(led, struct npem_led, led);
+	struct npem *npem = nled->npem;
+	int ret, val = LED_OFF;
+
+	ret = mutex_lock_interruptible(&npem->npem_lock);
+	if (ret)
+		return ret;
+
+	if (npem->active_indications & nled->indication->bit)
+		val = LED_ON;
+
+	mutex_unlock(&npem->npem_lock);
+
+	return val;
+}
+
+static int brightness_set(struct led_classdev *led,
+			  enum led_brightness brightness)
+{
+	struct npem_led *nled = container_of(led, struct npem_led, led);
+	struct npem *npem = nled->npem;
+	u32 indications;
+	int ret;
+
+	ret = mutex_lock_interruptible(&npem->npem_lock);
+	if (ret)
+		return ret;
+
+	if (brightness == LED_OFF)
+		indications = npem->active_indications & ~(nled->indication->bit);
+	else
+		indications = npem->active_indications | nled->indication->bit;
+
+	ret = npem->ops->set_active_indications(npem, indications);
+
+	mutex_unlock(&npem->npem_lock);
+	return ret;
+}
+
+static void npem_free(struct npem *npem)
+{
+	struct npem_led *nled;
+	int cnt;
+
+	for (cnt = 0; cnt < npem->led_cnt; cnt++) {
+		nled = &npem->leds[cnt];
+
+		if (nled->name[0])
+			led_classdev_unregister(&nled->led);
+	}
+
+	mutex_destroy(&npem->npem_lock);
+	kfree(npem);
+}
+
+static int pci_npem_set_led_classdev(struct npem *npem, struct npem_led *nled)
+{
+	struct led_classdev *led = &nled->led;
+	struct led_init_data init_data = {};
+	char *name = nled->name;
+	int ret;
+
+	init_data.devicename = pci_name(npem->dev);
+	init_data.default_label = nled->indication->name;
+
+	ret = led_compose_name(&npem->dev->dev, &init_data, name);
+	if (ret)
+		return ret;
+
+	led->name = name;
+	led->brightness_set_blocking = brightness_set;
+	led->brightness_get = brightness_get;
+	led->max_brightness = LED_ON;
+	led->default_trigger = "none";
+	led->flags = 0;
+
+	ret = led_classdev_register(&npem->dev->dev, led);
+	if (ret)
+		/* Clear the name to indicate that it is not registered. */
+		name[0] = 0;
+	return ret;
+}
+
+static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops,
+			 int pos, u32 caps)
+{
+	u32 supported = reg_to_indications(caps, ops->inds);
+	int supported_cnt = hweight32(supported);
+	const struct indication *indication;
+	struct npem_led *nled;
+	struct npem *npem;
+	int led_idx = 0;
+	u32 active;
+	int ret;
+
+	npem = kzalloc(struct_size(npem, leds, supported_cnt), GFP_KERNEL);
+
+	if (!npem)
+		return -ENOMEM;
+
+	npem->supported_indications = supported;
+	npem->led_cnt = supported_cnt;
+	npem->pos = pos;
+	npem->dev = dev;
+	npem->ops = ops;
+
+	ret = ops->get_active_indications(npem, &active);
+	if (ret) {
+		npem_free(npem);
+		return -EACCES;
+	}
+
+	npem->active_indications = reg_to_indications(active, ops->inds);
+
+	/*
+	 * Do not take npem->npem_lock, get_brightness() is called on
+	 * registration path.
+	 */
+	mutex_init(&npem->npem_lock);
+
+	for_each_indication(indication, npem_indications) {
+		if ((npem->supported_indications & indication->bit) == 0)
+			/* Do not register unsupported indication */
+			continue;
+
+		nled = &npem->leds[led_idx++];
+		nled->indication = indication;
+		nled->npem = npem;
+
+		ret = pci_npem_set_led_classdev(npem, nled);
+		if (ret) {
+			npem_free(npem);
+			return ret;
+		}
+	}
+
+	dev->npem = npem;
+	return 0;
+}
+
+void pci_npem_remove(struct pci_dev *dev)
+{
+	if (dev->npem)
+		npem_free(dev->npem);
+}
+
+void pci_npem_create(struct pci_dev *dev)
+{
+	const struct npem_ops *ops = &npem_ops;
+	int pos = 0, ret;
+	u32 cap;
+
+	if (!npem_has_dsm(dev)) {
+		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM);
+		if (pos == 0)
+			return;
+
+		if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 ||
+		    (cap & PCI_NPEM_CAP_CAPABLE) == 0)
+			return;
+	} else {
+		/*
+		 * OS should use the DSM for LED control if it is available
+		 * PCI Firmware Spec r3.3 sec 4.7.
+		 */
+		return;
+	}
+
+	ret = pci_npem_init(dev, ops, pos, cap);
+	if (ret)
+		pci_err(dev, "Failed to register PCIe Enclosure Management driver, err: %d\n",
+			ret);
+}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index fd44565c4756..9dea8c7353ab 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -333,6 +333,14 @@ static inline void pci_doe_destroy(struct pci_dev *pdev) { }
 static inline void pci_doe_disconnected(struct pci_dev *pdev) { }
 #endif
 
+#ifdef CONFIG_PCI_NPEM
+void pci_npem_create(struct pci_dev *dev);
+void pci_npem_remove(struct pci_dev *dev);
+#else
+static inline void pci_npem_create(struct pci_dev *dev) { }
+static inline void pci_npem_remove(struct pci_dev *dev) { }
+#endif
+
 /**
  * pci_dev_set_io_state - Set the new error state if possible.
  *
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 8e696e547565..b8ea6353e27a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2582,6 +2582,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 	dev->match_driver = false;
 	ret = device_add(&dev->dev);
 	WARN_ON(ret < 0);
+
+	pci_npem_create(dev);
 }
 
 struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index d749ea8250d6..1436f9cf1fea 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -33,6 +33,8 @@ static void pci_destroy_dev(struct pci_dev *dev)
 	if (!dev->dev.kobj.parent)
 		return;
 
+	pci_npem_remove(dev);
+
 	device_del(&dev->dev);
 
 	down_write(&pci_bus_sem);
diff --git a/include/linux/pci.h b/include/linux/pci.h
index fb004fd4e889..c327c2dd4527 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -517,6 +517,9 @@ struct pci_dev {
 #endif
 #ifdef CONFIG_PCI_DOE
 	struct xarray	doe_mbs;	/* Data Object Exchange mailboxes */
+#endif
+#ifdef CONFIG_PCI_NPEM
+	struct npem	*npem;		/* Native PCIe Enclosure Management */
 #endif
 	u16		acs_cap;	/* ACS Capability offset */
 	phys_addr_t	rom;		/* Physical address if not from BAR */
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index 94c00996e633..97e8b7ed9998 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -740,6 +740,7 @@
 #define PCI_EXT_CAP_ID_DVSEC	0x23	/* Designated Vendor-Specific */
 #define PCI_EXT_CAP_ID_DLF	0x25	/* Data Link Feature */
 #define PCI_EXT_CAP_ID_PL_16GT	0x26	/* Physical Layer 16.0 GT/s */
+#define PCI_EXT_CAP_ID_NPEM	0x29	/* Native PCIe Enclosure Management */
 #define PCI_EXT_CAP_ID_PL_32GT  0x2A    /* Physical Layer 32.0 GT/s */
 #define PCI_EXT_CAP_ID_DOE	0x2E	/* Data Object Exchange */
 #define PCI_EXT_CAP_ID_MAX	PCI_EXT_CAP_ID_DOE
@@ -1121,6 +1122,39 @@
 #define  PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK		0x000000F0
 #define  PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT	4
 
+/* Native PCIe Enclosure Management */
+#define PCI_NPEM_CAP	0x04 /* NPEM capability register */
+#define	 PCI_NPEM_CAP_CAPABLE		0x00000001 /* NPEM Capable */
+
+#define PCI_NPEM_CTRL	0x08 /* NPEM control register */
+#define	 PCI_NPEM_CTRL_ENABLE		0x00000001 /* NPEM Enable */
+
+#define PCI_NPEM_STATUS	0x0c /* NPEM status register */
+#define	 PCI_NPEM_STATUS_CC		0x00000001 /* NPEM Command completed */
+/*
+ * Native PCIe Enclosure Management indication bits and Reset command bit
+ * are corresponding for capability and control registers.
+ */
+#define  PCI_NPEM_CMD_RESET		0x00000002 /* NPEM Reset Command */
+#define  PCI_NPEM_IND_OK		0x00000004 /* NPEM indication OK */
+#define  PCI_NPEM_IND_LOCATE		0x00000008 /* NPEM indication Locate */
+#define  PCI_NPEM_IND_FAIL		0x00000010 /* NPEM indication Fail */
+#define  PCI_NPEM_IND_REBUILD		0x00000020 /* NPEM indication Rebuild */
+#define  PCI_NPEM_IND_PFA		0x00000040 /* NPEM indication Predicted Failure Analysis */
+#define  PCI_NPEM_IND_HOTSPARE		0x00000080 /* NPEM indication Hot Spare */
+#define  PCI_NPEM_IND_ICA		0x00000100 /* NPEM indication In Critical Array */
+#define  PCI_NPEM_IND_IFA		0x00000200 /* NPEM indication In Failed Array */
+#define  PCI_NPEM_IND_IDT		0x00000400 /* NPEM indication Invalid Device Type */
+#define  PCI_NPEM_IND_DISABLED		0x00000800 /* NPEM indication Disabled */
+#define  PCI_NPEM_IND_SPEC_0		0x00800000
+#define  PCI_NPEM_IND_SPEC_1		0x01000000
+#define  PCI_NPEM_IND_SPEC_2		0x02000000
+#define  PCI_NPEM_IND_SPEC_3		0x04000000
+#define  PCI_NPEM_IND_SPEC_4		0x08000000
+#define  PCI_NPEM_IND_SPEC_5		0x10000000
+#define  PCI_NPEM_IND_SPEC_6		0x20000000
+#define  PCI_NPEM_IND_SPEC_7		0x40000000
+
 /* Data Object Exchange */
 #define PCI_DOE_CAP		0x04    /* DOE Capabilities Register */
 #define  PCI_DOE_CAP_INT_SUP			0x00000001  /* Interrupt Support */
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH v2 3/3] PCI/NPEM: Add _DSM PCIe SSD status LED management
  2024-05-28 13:19 [PATCH v2 0/3] PCIe Enclosure LED Management Mariusz Tkaczyk
  2024-05-28 13:19 ` [PATCH v2 1/3] leds: Init leds class earlier Mariusz Tkaczyk
  2024-05-28 13:19 ` [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support Mariusz Tkaczyk
@ 2024-05-28 13:19 ` Mariusz Tkaczyk
  2 siblings, 0 replies; 22+ messages in thread
From: Mariusz Tkaczyk @ 2024-05-28 13:19 UTC (permalink / raw)
  To: linux-pci
  Cc: Mariusz Tkaczyk, Arnd Bergmann, Bjorn Helgaas, Dan Williams,
	Greg Kroah-Hartman, Ilpo Jarvinen, Lukas Wunner, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko,
	Stuart Hayes

Device Specific Method PCIe SSD Status LED Management (_DSM) defined in
PCI Firmware Spec r3.3 sec 4.7 provides a way to manage LEDs via ACPI.

The design is similar to NPEM defined in PCIe Base Specification r6.1
sec 6.28:
  - both standards are indication oriented,
  - _DSM supported bits are corresponding to NPEM capability
    register bits,
  - _DSM control bits are corresponding to NPEM control register
    bits.

_DSM does not support enclosure specific indications and special NPEM
commands NPEM_ENABLE and NPEM_RESET.

_DSM is implemented as a second op in NPEM driver. The standard used
in background is not presented to user. The interface is accessed same
as NPEM.

According to spec, _DSM has higher priority and availability  of _DSM
in not limited to devices with NPEM support. It is followed in
implementation.

Link: https://members.pcisig.com/wg/PCI-SIG/document/14025
Link: https://members.pcisig.com/wg/PCI-SIG/document/15350
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Stuart Hayes <stuart.w.hayes@gmail.com>
Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
---
 drivers/pci/npem.c | 147 ++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 144 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/npem.c b/drivers/pci/npem.c
index a76a2044dab2..a97e19fb50c9 100644
--- a/drivers/pci/npem.c
+++ b/drivers/pci/npem.c
@@ -11,6 +11,14 @@
  *	PCIe Base Specification r6.1 sec 6.28
  *	PCIe Base Specification r6.1 sec 7.9.19
  *
+ * _DSM Definitions for PCIe SSD Status LED
+ *	 PCI Firmware Specification, r3.3 sec 4.7
+ *
+ * Two backends are supported to manipulate indications:  Direct NPEM register
+ * access (npem_ops) and indirect access through the ACPI _DSM (dsm_ops).
+ * _DSM is used if supported, else NPEM.
+ *
+ * Copyright (c) 2021-2022 Dell Inc.
  * Copyright (c) 2023-2024 Intel Corporation
  *	Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
  */
@@ -55,6 +63,21 @@ static const struct indication npem_indications[] = {
 	{0,			NULL}
 };
 
+/* _DSM PCIe SSD LED States are corresponding to NPEM register values */
+static const struct indication dsm_indications[] = {
+	{PCI_NPEM_IND_OK,	"enclosure:ok"},
+	{PCI_NPEM_IND_LOCATE,	"enclosure:locate"},
+	{PCI_NPEM_IND_FAIL,	"enclosure:fail"},
+	{PCI_NPEM_IND_REBUILD,	"enclosure:rebuild"},
+	{PCI_NPEM_IND_PFA,	"enclosure:pfa"},
+	{PCI_NPEM_IND_HOTSPARE,	"enclosure:hotspare"},
+	{PCI_NPEM_IND_ICA,	"enclosure:ica"},
+	{PCI_NPEM_IND_IFA,	"enclosure:ifa"},
+	{PCI_NPEM_IND_IDT,	"enclosure:idt"},
+	{PCI_NPEM_IND_DISABLED,	"enclosure:disabled"},
+	{0,			NULL}
+};
+
 #define for_each_indication(ind, inds) \
 	for (ind = inds; ind->bit; ind++)
 
@@ -229,6 +252,120 @@ static bool npem_has_dsm(struct pci_dev *pdev)
 			      GET_STATE_DSM | SET_STATE_DSM);
 }
 
+struct dsm_output {
+	u16 status;
+	u8 function_specific_err;
+	u8 vendor_specific_err;
+	u32 state;
+} __packed;
+
+/**
+ * dsm_evaluate() - send DSM PCIe SSD Status LED command
+ * @pdev: PCI device
+ * @dsm_func: DSM LED Function
+ * @output: buffer to copy DSM Response
+ * @value_to_set: value for SET_STATE_DSM function
+ *
+ * To not bother caller with ACPI context, the returned _DSM Output Buffer is
+ * copied.
+ */
+static int dsm_evaluate(struct pci_dev *pdev, u64 dsm_func,
+			struct dsm_output *output, u32 value_to_set)
+{
+	acpi_handle handle = ACPI_HANDLE(&pdev->dev);
+	union acpi_object *out_obj, arg3[2];
+	union acpi_object *arg3_p = NULL;
+
+	if (dsm_func == SET_STATE_DSM) {
+		arg3[0].type = ACPI_TYPE_PACKAGE;
+		arg3[0].package.count = 1;
+		arg3[0].package.elements = &arg3[1];
+
+		arg3[1].type = ACPI_TYPE_BUFFER;
+		arg3[1].buffer.length = 4;
+		arg3[1].buffer.pointer = (u8 *)&value_to_set;
+
+		arg3_p = arg3;
+	}
+
+	out_obj = acpi_evaluate_dsm_typed(handle, &dsm_guid, 0x1, dsm_func,
+					  arg3_p, ACPI_TYPE_BUFFER);
+	if (!out_obj)
+		return -EIO;
+
+	if (out_obj->buffer.length < sizeof(struct dsm_output)) {
+		ACPI_FREE(out_obj);
+		return -EIO;
+	}
+
+	memcpy(output, out_obj->buffer.pointer, sizeof(struct dsm_output));
+
+	ACPI_FREE(out_obj);
+	return 0;
+}
+
+static int dsm_get(struct pci_dev *pdev, u64 dsm_func, u32 *buf)
+{
+	struct dsm_output output;
+	int ret = dsm_evaluate(pdev, dsm_func, &output, 0);
+
+	if (ret)
+		return ret;
+
+	if (output.status != 0)
+		return -EIO;
+
+	*buf = output.state;
+	return 0;
+}
+
+static int dsm_get_active_indications(struct npem *npem, u32 *buf)
+{
+	int ret = dsm_get(npem->dev, GET_STATE_DSM, buf);
+
+	/* Filter out not supported indications in response */
+	*buf &= npem->supported_indications;
+	return ret;
+}
+
+static int dsm_set_active_indications(struct npem *npem, u32 value)
+{
+	struct dsm_output output;
+	int ret = dsm_evaluate(npem->dev, SET_STATE_DSM, &output, value);
+
+	if (ret)
+		return ret;
+
+	switch (output.status) {
+	case 4:
+		/*
+		 * Not all bits are set. If this bit is set, the platform
+		 * disregarded some or all of the request state changes. OSPM
+		 * should check the resulting PCIe SSD Status LED States to see
+		 * what, if anything, has changed.
+		 *
+		 * PCI Firmware Specification, r3.3 Table 4-19.
+		 */
+		if (output.function_specific_err != 1)
+			return -EIO;
+		fallthrough;
+	case 0:
+		break;
+	default:
+		return -EIO;
+	}
+
+	npem->active_indications = output.state;
+
+	return 0;
+}
+
+static const struct npem_ops dsm_ops = {
+	.inds = dsm_indications,
+	.get_active_indications = dsm_get_active_indications,
+	.set_active_indications = dsm_set_active_indications,
+};
+
 /*
  * This function does not use ops->get_active_indications().
  * It returns cached value, npem->npem_lock is held and it is safe.
@@ -400,11 +537,15 @@ void pci_npem_create(struct pci_dev *dev)
 		 * OS should use the DSM for LED control if it is available
 		 * PCI Firmware Spec r3.3 sec 4.7.
 		 */
-		return;
+		ret = dsm_get(dev, GET_SUPPORTED_STATES_DSM, &cap);
+		if (ret)
+			return;
+
+		ops = &dsm_ops;
 	}
 
 	ret = pci_npem_init(dev, ops, pos, cap);
 	if (ret)
-		pci_err(dev, "Failed to register PCIe Enclosure Management driver, err: %d\n",
-			ret);
+		pci_err(dev, "Failed to register %s PCIe Enclosure Management driver, err: %d\n",
+			(ops == &dsm_ops ? "_DSM" : "Native"), ret);
 }
-- 
2.35.3


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-05-28 13:19 ` [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support Mariusz Tkaczyk
@ 2024-05-28 13:55   ` Ilpo Järvinen
  2024-06-14 13:40     ` Mariusz Tkaczyk
  2024-05-29  5:21   ` Dan Williams
  2024-06-14 21:06   ` stuart hayes
  2 siblings, 1 reply; 22+ messages in thread
From: Ilpo Järvinen @ 2024-05-28 13:55 UTC (permalink / raw)
  To: Mariusz Tkaczyk
  Cc: linux-pci, Arnd Bergmann, Bjorn Helgaas, Dan Williams,
	Greg Kroah-Hartman, Lukas Wunner, Keith Busch, Marek Behun,
	Pavel Machek, Randy Dunlap, Andy Shevchenko, Stuart Hayes

[-- Attachment #1: Type: text/plain, Size: 21892 bytes --]

On Tue, 28 May 2024, Mariusz Tkaczyk wrote:

> Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows
> managing LED in storage enclosures. NPEM is indication oriented
> and it does not give direct access to LED. Although each of
> the indications *could* represent an individual LED, multiple
> indications could also be represented as a single,
> multi-color LED or a single LED blinking in a specific interval.
> The specification leaves that open.
> 
> Each enabled indication (capability register bit on) is represented as a
> ledclass_dev which can be controlled through sysfs. For every ledclass
> device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0).
> It is corresponding to NPEM control register (Indication bit on/off).
> 
> Ledclass devices appear in sysfs as child devices (subdirectory) of PCI
> device which has an NPEM Extended Capability and indication is enabled
> in NPEM capability register. For example, these are leds created for
> pcieport "10000:02:05.0" on my setup:
> 
> leds/
> ├── 10000:02:05.0:enclosure:fail
> ├── 10000:02:05.0:enclosure:locate
> ├── 10000:02:05.0:enclosure:ok
> └── 10000:02:05.0:enclosure:rebuild
> 
> They can be also found in "/sys/class/leds" directory. Parent PCIe device
> bdf is used to guarantee uniqueness across leds subsystem.
> 
> To enable/disable fail indication "brightness" file can be edited:
> echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness
> echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness
> 
> PCIe r6.1, sec 7.9.19.2 defines the possible indications.
> 
> Multiple indications for same parent PCIe device can conflict and
> hardware may update them when processing new request. To avoid issues,
> driver refresh all indications by reading back control register.
> 
> Driver is projected to be exclusive NPEM extended capability manager.
> It waits up to 1 second after imposing new request, it doesn't verify if
> controller is busy before write, assuming that mutex lock gives protection
> from concurrent updates. Driver is not registered if _DSM LED management
> is available.
> 
> NPEM is a PCIe extended capability so it should be registered in
> pcie_init_capabilities() but it is not possible due to LED dependency.
> Parent pci_device must be added earlier for led_classdev_register()
> to be successful. NPEM does not require configuration on kernel side, it
> is safe to register LED devices later.
> 
> Link: https://members.pcisig.com/wg/PCI-SIG/document/19849 [1]
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> ---
>  drivers/pci/Kconfig           |   9 +
>  drivers/pci/Makefile          |   1 +
>  drivers/pci/npem.c            | 410 ++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.h             |   8 +
>  drivers/pci/probe.c           |   2 +
>  drivers/pci/remove.c          |   2 +
>  include/linux/pci.h           |   3 +
>  include/uapi/linux/pci_regs.h |  34 +++
>  8 files changed, 469 insertions(+)
>  create mode 100644 drivers/pci/npem.c
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index d35001589d88..e696e69ad516 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -143,6 +143,15 @@ config PCI_IOV
>  
>  	  If unsure, say N.
>  
> +config PCI_NPEM
> +	bool "Native PCIe Enclosure Management"
> +	depends on LEDS_CLASS=y
> +	help
> +	  Support for Native PCIe Enclosure Management. It allows managing LED
> +	  indications in storage enclosures. Enclosure must support following
> +	  indications: OK, Locate, Fail, Rebuild, other indications are
> +	  optional.
> +
>  config PCI_PRI
>  	bool "PCI PRI support"
>  	select PCI_ATS
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 175302036890..cd5f655d4be9 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -34,6 +34,7 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
>  obj-$(CONFIG_VGA_ARB)		+= vgaarb.o
>  obj-$(CONFIG_PCI_DOE)		+= doe.o
>  obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o
> +obj-$(CONFIG_PCI_NPEM)		+= npem.o
>  
>  # Endpoint library must be initialized before its users
>  obj-$(CONFIG_PCI_ENDPOINT)	+= endpoint/
> diff --git a/drivers/pci/npem.c b/drivers/pci/npem.c
> new file mode 100644
> index 000000000000..a76a2044dab2
> --- /dev/null
> +++ b/drivers/pci/npem.c
> @@ -0,0 +1,410 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCIe Enclosure management driver created for LED interfaces based on
> + * indications. It says *what indications* blink but does not specify *how*
> + * they blink - it is hardware defined.
> + *
> + * The driver name refers to Native PCIe Enclosure Management. It is
> + * first indication oriented standard with specification.
> + *
> + * Native PCIe Enclosure Management (NPEM)
> + *	PCIe Base Specification r6.1 sec 6.28
> + *	PCIe Base Specification r6.1 sec 7.9.19
> + *
> + * Copyright (c) 2023-2024 Intel Corporation
> + *	Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/bitops.h>
> +#include <linux/errno.h>
> +#include <linux/iopoll.h>
> +#include <linux/leds.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pci_regs.h>
> +#include <linux/types.h>
> +#include <linux/uleds.h>
> +
> +#include "pci.h"
> +
> +struct indication {
> +	u32 bit;
> +	const char *name;
> +};
> +
> +static const struct indication npem_indications[] = {
> +	{PCI_NPEM_IND_OK,	"enclosure:ok"},
> +	{PCI_NPEM_IND_LOCATE,	"enclosure:locate"},
> +	{PCI_NPEM_IND_FAIL,	"enclosure:fail"},
> +	{PCI_NPEM_IND_REBUILD,	"enclosure:rebuild"},
> +	{PCI_NPEM_IND_PFA,	"enclosure:pfa"},
> +	{PCI_NPEM_IND_HOTSPARE,	"enclosure:hotspare"},
> +	{PCI_NPEM_IND_ICA,	"enclosure:ica"},
> +	{PCI_NPEM_IND_IFA,	"enclosure:ifa"},
> +	{PCI_NPEM_IND_IDT,	"enclosure:idt"},
> +	{PCI_NPEM_IND_DISABLED,	"enclosure:disabled"},
> +	{PCI_NPEM_IND_SPEC_0,	"enclosure:specific_0"},
> +	{PCI_NPEM_IND_SPEC_1,	"enclosure:specific_1"},
> +	{PCI_NPEM_IND_SPEC_2,	"enclosure:specific_2"},
> +	{PCI_NPEM_IND_SPEC_3,	"enclosure:specific_3"},
> +	{PCI_NPEM_IND_SPEC_4,	"enclosure:specific_4"},
> +	{PCI_NPEM_IND_SPEC_5,	"enclosure:specific_5"},
> +	{PCI_NPEM_IND_SPEC_6,	"enclosure:specific_6"},
> +	{PCI_NPEM_IND_SPEC_7,	"enclosure:specific_7"},
> +	{0,			NULL}
> +};
> +
> +#define for_each_indication(ind, inds) \
> +	for (ind = inds; ind->bit; ind++)
> +
> +/* To avoid confusion, do not keep any special bits in indications */
> +static u32 reg_to_indications(u32 caps, const struct indication *inds)
> +{
> +	const struct indication *ind;
> +	u32 supported_indications = 0;
> +
> +	for_each_indication(ind, inds)
> +		supported_indications |= ind->bit;
> +
> +	return caps & supported_indications;
> +}
> +
> +/**
> + * struct npem_led - LED details
> + * @indication: indication details
> + * @npem: npem device
> + * @name: LED name
> + * @led: LED device
> + */
> +struct npem_led {
> +	const struct indication *indication;
> +	struct npem *npem;
> +	char name[LED_MAX_NAME_SIZE];
> +	struct led_classdev led;
> +};
> +
> +/**
> + * struct npem_ops - backend specific callbacks
> + * @inds: supported indications array
> + * @get_active_indications: get active indications
> + *	npem: npem device
> + *	buf: response buffer
> + * @set_active_indications: set new indications
> + *	npem: npem device
> + *	val: bit mask to set
> + *
> + * Handle communication with hardware. set_active_indications updates
> + * npem->active_indications.
> + */
> +struct npem_ops {
> +	const struct indication *inds;
> +	int (*get_active_indications)(struct npem *npem, u32 *buf);
> +	int (*set_active_indications)(struct npem *npem, u32 val);
> +};
> +
> +/**
> + * struct npem - NPEM device properties
> + * @dev: PCIe device this driver is attached to
> + * @ops: Backend specific callbacks
> + * @npem_lock: to keep concurrent updates serialized
> + * @pos: NPEM capability offset (only relevant for NPEM direct register access,
> + *	 not DSM access method)
> + * @supported_indications: bit mask of supported indications
> + *			   non-indication and reserved bits are cleared
> + * @active_indications: bit mask of active indications
> + *			non-indication and reserved bits are cleared
> + * @led_cnt: Supported LEDs count
> + * @leds: supported LEDs
> + */
> +struct npem {
> +	struct pci_dev *dev;
> +	const struct npem_ops *ops;
> +	struct mutex npem_lock;
> +	u16 pos;
> +	u32 supported_indications;
> +	u32 active_indications;
> +	int led_cnt;
> +	struct npem_led leds[];
> +};
> +
> +static int npem_read_reg(struct npem *npem, u16 reg, u32 *val)
> +{
> +	int ret = pci_read_config_dword(npem->dev, npem->pos + reg, val);
> +
> +	return pcibios_err_to_errno(ret);
> +}
> +
> +static int npem_write_ctrl(struct npem *npem, u32 reg)
> +{
> +	int pos = npem->pos + PCI_NPEM_CTRL;
> +	int ret = pci_write_config_dword(npem->dev, pos, reg);
> +
> +	return pcibios_err_to_errno(ret);
> +}
> +
> +static int npem_get_active_indications(struct npem *npem, u32 *buf)
> +{
> +	int ret;
> +
> +	ret = npem_read_reg(npem, PCI_NPEM_CTRL, buf);
> +	if (ret)
> +		return ret;
> +
> +	/* If PCI_NPEM_CTRL_ENABLE is not set then no indication should blink */
> +	if (!(*buf & PCI_NPEM_CTRL_ENABLE))
> +		*buf = 0;
> +
> +	/* Filter out not supported indications in response */
> +	*buf &= npem->supported_indications;
> +	return 0;
> +}
> +
> +static int npem_set_active_indications(struct npem *npem, u32 val)
> +{
> +	int ret, ret_val;
> +	u32 cc_status;
> +
> +	lockdep_assert_held(&npem->npem_lock);
> +
> +	/* This bit is always required */
> +	val |= PCI_NPEM_CTRL_ENABLE;
> +	ret = npem_write_ctrl(npem, val);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * For the case where a NPEM command has not completed immediately,
> +	 * it is recommended that software not continuously “spin” on polling
> +	 * the status register, but rather poll under interrupt at a reduced
> +	 * rate; for example at 10 ms intervals.
> +	 *
> +	 * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of NPEM
> +	 * Command Completed"
> +	 */
> +	ret = read_poll_timeout(npem_read_reg, ret_val,
> +				ret_val || (cc_status & PCI_NPEM_STATUS_CC),
> +				10 * USEC_PER_MSEC, USEC_PER_SEC, false, npem,
> +				PCI_NPEM_STATUS, &cc_status);
> +	if (ret)
> +		return ret_val;
> +
> +	/*
> +	 * All writes to control register, including writes that do not change
> +	 * the register value, are NPEM commands and should eventually result
> +	 * in a command completion indication in the NPEM Status Register.
> +	 *
> +	 * PCIe Base Specification r6.1 sec 7.9.19.3
> +	 *
> +	 * Register may not be updated, or other conflicting bits may be
> +	 * cleared. Spec is not strict here. Read NPEM Control register after
> +	 * write to keep cache in-sync.
> +	 */
> +	return npem_get_active_indications(npem, &npem->active_indications);
> +}
> +
> +static const struct npem_ops npem_ops = {
> +	.inds = npem_indications,
> +	.get_active_indications = npem_get_active_indications,
> +	.set_active_indications = npem_set_active_indications,
> +};
> +
> +#define DSM_GUID GUID_INIT(0x5d524d9d, 0xfff9, 0x4d4b,  0x8c, 0xb7, 0x74, 0x7e,\
> +			   0xd5, 0x1e, 0x19, 0x4d)
> +#define GET_SUPPORTED_STATES_DSM	BIT(1)
> +#define GET_STATE_DSM			BIT(2)
> +#define SET_STATE_DSM			BIT(3)
> +
> +static const guid_t dsm_guid = DSM_GUID;
> +
> +static bool npem_has_dsm(struct pci_dev *pdev)
> +{
> +	acpi_handle handle;
> +
> +	handle = ACPI_HANDLE(&pdev->dev);
> +	if (!handle)
> +		return false;
> +
> +	return acpi_check_dsm(handle, &dsm_guid, 0x1, GET_SUPPORTED_STATES_DSM |
> +			      GET_STATE_DSM | SET_STATE_DSM);
> +}
> +
> +/*
> + * This function does not use ops->get_active_indications().
> + * It returns cached value, npem->npem_lock is held and it is safe.
> + */
> +static enum led_brightness brightness_get(struct led_classdev *led)
> +{
> +	struct npem_led *nled = container_of(led, struct npem_led, led);
> +	struct npem *npem = nled->npem;
> +	int ret, val = LED_OFF;
> +
> +	ret = mutex_lock_interruptible(&npem->npem_lock);
> +	if (ret)
> +		return ret;
> +
> +	if (npem->active_indications & nled->indication->bit)
> +		val = LED_ON;
> +
> +	mutex_unlock(&npem->npem_lock);
> +
> +	return val;
> +}
> +
> +static int brightness_set(struct led_classdev *led,
> +			  enum led_brightness brightness)
> +{
> +	struct npem_led *nled = container_of(led, struct npem_led, led);
> +	struct npem *npem = nled->npem;
> +	u32 indications;
> +	int ret;
> +
> +	ret = mutex_lock_interruptible(&npem->npem_lock);
> +	if (ret)
> +		return ret;
> +
> +	if (brightness == LED_OFF)
> +		indications = npem->active_indications & ~(nled->indication->bit);
> +	else
> +		indications = npem->active_indications | nled->indication->bit;
> +
> +	ret = npem->ops->set_active_indications(npem, indications);
> +
> +	mutex_unlock(&npem->npem_lock);
> +	return ret;
> +}
> +
> +static void npem_free(struct npem *npem)
> +{
> +	struct npem_led *nled;
> +	int cnt;
> +
> +	for (cnt = 0; cnt < npem->led_cnt; cnt++) {
> +		nled = &npem->leds[cnt];
> +
> +		if (nled->name[0])
> +			led_classdev_unregister(&nled->led);
> +	}
> +
> +	mutex_destroy(&npem->npem_lock);
> +	kfree(npem);
> +}
> +
> +static int pci_npem_set_led_classdev(struct npem *npem, struct npem_led *nled)
> +{
> +	struct led_classdev *led = &nled->led;
> +	struct led_init_data init_data = {};
> +	char *name = nled->name;
> +	int ret;
> +
> +	init_data.devicename = pci_name(npem->dev);
> +	init_data.default_label = nled->indication->name;
> +
> +	ret = led_compose_name(&npem->dev->dev, &init_data, name);
> +	if (ret)
> +		return ret;
> +
> +	led->name = name;
> +	led->brightness_set_blocking = brightness_set;
> +	led->brightness_get = brightness_get;
> +	led->max_brightness = LED_ON;
> +	led->default_trigger = "none";
> +	led->flags = 0;
> +
> +	ret = led_classdev_register(&npem->dev->dev, led);
> +	if (ret)
> +		/* Clear the name to indicate that it is not registered. */
> +		name[0] = 0;
> +	return ret;
> +}
> +
> +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops,
> +			 int pos, u32 caps)
> +{
> +	u32 supported = reg_to_indications(caps, ops->inds);
> +	int supported_cnt = hweight32(supported);
> +	const struct indication *indication;
> +	struct npem_led *nled;
> +	struct npem *npem;
> +	int led_idx = 0;
> +	u32 active;
> +	int ret;
> +
> +	npem = kzalloc(struct_size(npem, leds, supported_cnt), GFP_KERNEL);
> +
> +	if (!npem)
> +		return -ENOMEM;

Don't leave empty line between func and it's error handling.

> +
> +	npem->supported_indications = supported;
> +	npem->led_cnt = supported_cnt;
> +	npem->pos = pos;
> +	npem->dev = dev;
> +	npem->ops = ops;
> +
> +	ret = ops->get_active_indications(npem, &active);
> +	if (ret) {
> +		npem_free(npem);

Just call kfree() directly here. As is, you're calling mutex_destroy() 
before mutex_init().

> +		return -EACCES;
> +	}
> +
> +	npem->active_indications = reg_to_indications(active, ops->inds);
> +
> +	/*
> +	 * Do not take npem->npem_lock, get_brightness() is called on
> +	 * registration path.
> +	 */
> +	mutex_init(&npem->npem_lock);
> +
> +	for_each_indication(indication, npem_indications) {
> +		if ((npem->supported_indications & indication->bit) == 0)

if (!(...))

> +			/* Do not register unsupported indication */

This sounds quite obvious comment, I'd drop it.

> +			continue;
> +
> +		nled = &npem->leds[led_idx++];
> +		nled->indication = indication;
> +		nled->npem = npem;
> +
> +		ret = pci_npem_set_led_classdev(npem, nled);
> +		if (ret) {
> +			npem_free(npem);
> +			return ret;
> +		}
> +	}
> +
> +	dev->npem = npem;
> +	return 0;
> +}
> +
> +void pci_npem_remove(struct pci_dev *dev)
> +{
> +	if (dev->npem)
> +		npem_free(dev->npem);
> +}
> +
> +void pci_npem_create(struct pci_dev *dev)
> +{
> +	const struct npem_ops *ops = &npem_ops;
> +	int pos = 0, ret;
> +	u32 cap;
> +
> +	if (!npem_has_dsm(dev)) {
> +		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM);
> +		if (pos == 0)
> +			return;
> +
> +		if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 ||
> +		    (cap & PCI_NPEM_CAP_CAPABLE) == 0)
> +			return;
> +	} else {
> +		/*
> +		 * OS should use the DSM for LED control if it is available
> +		 * PCI Firmware Spec r3.3 sec 4.7.
> +		 */
> +		return;
> +	}
> +
> +	ret = pci_npem_init(dev, ops, pos, cap);
> +	if (ret)
> +		pci_err(dev, "Failed to register PCIe Enclosure Management driver, err: %d\n",
> +			ret);
> +}
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index fd44565c4756..9dea8c7353ab 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -333,6 +333,14 @@ static inline void pci_doe_destroy(struct pci_dev *pdev) { }
>  static inline void pci_doe_disconnected(struct pci_dev *pdev) { }
>  #endif
>  
> +#ifdef CONFIG_PCI_NPEM
> +void pci_npem_create(struct pci_dev *dev);
> +void pci_npem_remove(struct pci_dev *dev);
> +#else
> +static inline void pci_npem_create(struct pci_dev *dev) { }
> +static inline void pci_npem_remove(struct pci_dev *dev) { }
> +#endif
> +
>  /**
>   * pci_dev_set_io_state - Set the new error state if possible.
>   *
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 8e696e547565..b8ea6353e27a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2582,6 +2582,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
>  	dev->match_driver = false;
>  	ret = device_add(&dev->dev);
>  	WARN_ON(ret < 0);
> +
> +	pci_npem_create(dev);
>  }
>  
>  struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index d749ea8250d6..1436f9cf1fea 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -33,6 +33,8 @@ static void pci_destroy_dev(struct pci_dev *dev)
>  	if (!dev->dev.kobj.parent)
>  		return;
>  
> +	pci_npem_remove(dev);
> +
>  	device_del(&dev->dev);
>  
>  	down_write(&pci_bus_sem);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index fb004fd4e889..c327c2dd4527 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -517,6 +517,9 @@ struct pci_dev {
>  #endif
>  #ifdef CONFIG_PCI_DOE
>  	struct xarray	doe_mbs;	/* Data Object Exchange mailboxes */
> +#endif
> +#ifdef CONFIG_PCI_NPEM
> +	struct npem	*npem;		/* Native PCIe Enclosure Management */
>  #endif
>  	u16		acs_cap;	/* ACS Capability offset */
>  	phys_addr_t	rom;		/* Physical address if not from BAR */
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 94c00996e633..97e8b7ed9998 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -740,6 +740,7 @@
>  #define PCI_EXT_CAP_ID_DVSEC	0x23	/* Designated Vendor-Specific */
>  #define PCI_EXT_CAP_ID_DLF	0x25	/* Data Link Feature */
>  #define PCI_EXT_CAP_ID_PL_16GT	0x26	/* Physical Layer 16.0 GT/s */
> +#define PCI_EXT_CAP_ID_NPEM	0x29	/* Native PCIe Enclosure Management */
>  #define PCI_EXT_CAP_ID_PL_32GT  0x2A    /* Physical Layer 32.0 GT/s */
>  #define PCI_EXT_CAP_ID_DOE	0x2E	/* Data Object Exchange */
>  #define PCI_EXT_CAP_ID_MAX	PCI_EXT_CAP_ID_DOE
> @@ -1121,6 +1122,39 @@
>  #define  PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK		0x000000F0
>  #define  PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT	4
>  
> +/* Native PCIe Enclosure Management */
> +#define PCI_NPEM_CAP	0x04 /* NPEM capability register */
> +#define	 PCI_NPEM_CAP_CAPABLE		0x00000001 /* NPEM Capable */
> +
> +#define PCI_NPEM_CTRL	0x08 /* NPEM control register */
> +#define	 PCI_NPEM_CTRL_ENABLE		0x00000001 /* NPEM Enable */

I think the rest of the bits would belong here or after 
PCI_NPEM_CAP_CAPABLE.

> +#define PCI_NPEM_STATUS	0x0c /* NPEM status register */
> +#define	 PCI_NPEM_STATUS_CC		0x00000001 /* NPEM Command completed */
> +/*
> + * Native PCIe Enclosure Management indication bits and Reset command bit
> + * are corresponding for capability and control registers.
> + */
> +#define  PCI_NPEM_CMD_RESET		0x00000002 /* NPEM Reset Command */
> +#define  PCI_NPEM_IND_OK		0x00000004 /* NPEM indication OK */
> +#define  PCI_NPEM_IND_LOCATE		0x00000008 /* NPEM indication Locate */
> +#define  PCI_NPEM_IND_FAIL		0x00000010 /* NPEM indication Fail */
> +#define  PCI_NPEM_IND_REBUILD		0x00000020 /* NPEM indication Rebuild */
> +#define  PCI_NPEM_IND_PFA		0x00000040 /* NPEM indication Predicted Failure Analysis */
> +#define  PCI_NPEM_IND_HOTSPARE		0x00000080 /* NPEM indication Hot Spare */
> +#define  PCI_NPEM_IND_ICA		0x00000100 /* NPEM indication In Critical Array */
> +#define  PCI_NPEM_IND_IFA		0x00000200 /* NPEM indication In Failed Array */
> +#define  PCI_NPEM_IND_IDT		0x00000400 /* NPEM indication Invalid Device Type */
> +#define  PCI_NPEM_IND_DISABLED		0x00000800 /* NPEM indication Disabled */

> +#define  PCI_NPEM_IND_SPEC_0		0x00800000
> +#define  PCI_NPEM_IND_SPEC_1		0x01000000
> +#define  PCI_NPEM_IND_SPEC_2		0x02000000
> +#define  PCI_NPEM_IND_SPEC_3		0x04000000
> +#define  PCI_NPEM_IND_SPEC_4		0x08000000
> +#define  PCI_NPEM_IND_SPEC_5		0x10000000
> +#define  PCI_NPEM_IND_SPEC_6		0x20000000
> +#define  PCI_NPEM_IND_SPEC_7		0x40000000

Are these off-by-one, shouldn't that field go all the way to the last bit 
which is 0x80000000 (the field is 31:24)??

-- 
 i.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] leds: Init leds class earlier
  2024-05-28 13:19 ` [PATCH v2 1/3] leds: Init leds class earlier Mariusz Tkaczyk
@ 2024-05-29  4:22   ` Dan Williams
  2024-06-14 14:08     ` Lukas Wunner
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Williams @ 2024-05-29  4:22 UTC (permalink / raw)
  To: Mariusz Tkaczyk, linux-pci
  Cc: Mariusz Tkaczyk, Arnd Bergmann, Bjorn Helgaas, Dan Williams,
	Greg Kroah-Hartman, Ilpo Jarvinen, Lukas Wunner, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko,
	Stuart Hayes

Mariusz Tkaczyk wrote:
> PCI subsystem is registered on subsys_initcall() same as leds class.
> NPEM driver will require leds class, there is race possibility.

s/race possibility/an init-order conflict/

> Register led class on arch_initcall() to avoid it.

I assume a later patch selects CONFIG_LEDS_CLASS from CONFIG_PCI?
Otherwise this change will not help.

Another way to solve this without changing initcall levels is to just
make sure that the leds subsys initcall happens first, i.e.:

diff --git a/drivers/Makefile b/drivers/Makefile
index fe9ceb0d2288..d47b4186108a 100644
--- a/drivers/Makefile
+++ b/drivers/Makefile
@@ -16,6 +16,7 @@ obj-$(CONFIG_GENERIC_PHY)     += phy/
 obj-$(CONFIG_PINCTRL)          += pinctrl/
 obj-$(CONFIG_GPIOLIB)          += gpio/
 obj-y                          += pwm/
+obj-y                          += leds/
 
 obj-y                          += pci/
 
@@ -130,7 +131,6 @@ obj-$(CONFIG_CPU_IDLE)              += cpuidle/
 obj-y                          += mmc/
 obj-y                          += ufs/
 obj-$(CONFIG_MEMSTICK)         += memstick/
-obj-y                          += leds/
 obj-$(CONFIG_INFINIBAND)       += infiniband/
 obj-y                          += firmware/
 obj-$(CONFIG_CRYPTO)           += crypto/

...which may be more appropriate.

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-05-28 13:19 ` [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support Mariusz Tkaczyk
  2024-05-28 13:55   ` Ilpo Järvinen
@ 2024-05-29  5:21   ` Dan Williams
  2024-05-29  9:38     ` Lukas Wunner
  2024-06-19 14:28     ` Mariusz Tkaczyk
  2024-06-14 21:06   ` stuart hayes
  2 siblings, 2 replies; 22+ messages in thread
From: Dan Williams @ 2024-05-29  5:21 UTC (permalink / raw)
  To: Mariusz Tkaczyk, linux-pci
  Cc: Mariusz Tkaczyk, Arnd Bergmann, Bjorn Helgaas, Dan Williams,
	Greg Kroah-Hartman, Ilpo Jarvinen, Lukas Wunner, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko,
	Stuart Hayes

Mariusz Tkaczyk wrote:
> Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows
> managing LED in storage enclosures. NPEM is indication oriented
> and it does not give direct access to LED. Although each of
> the indications *could* represent an individual LED, multiple
> indications could also be represented as a single,
> multi-color LED or a single LED blinking in a specific interval.
> The specification leaves that open.
> 
> Each enabled indication (capability register bit on) is represented as a
> ledclass_dev which can be controlled through sysfs. For every ledclass
> device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0).
> It is corresponding to NPEM control register (Indication bit on/off).
> 
> Ledclass devices appear in sysfs as child devices (subdirectory) of PCI
> device which has an NPEM Extended Capability and indication is enabled
> in NPEM capability register. For example, these are leds created for
> pcieport "10000:02:05.0" on my setup:
> 
> leds/
> ├── 10000:02:05.0:enclosure:fail
> ├── 10000:02:05.0:enclosure:locate
> ├── 10000:02:05.0:enclosure:ok
> └── 10000:02:05.0:enclosure:rebuild
> 
> They can be also found in "/sys/class/leds" directory. Parent PCIe device
> bdf is used to guarantee uniqueness across leds subsystem.
> 
> To enable/disable fail indication "brightness" file can be edited:
> echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness
> echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness
> 
> PCIe r6.1, sec 7.9.19.2 defines the possible indications.
> 
> Multiple indications for same parent PCIe device can conflict and
> hardware may update them when processing new request. To avoid issues,
> driver refresh all indications by reading back control register.
> 
> Driver is projected to be exclusive NPEM extended capability manager.
> It waits up to 1 second after imposing new request, it doesn't verify if
> controller is busy before write, assuming that mutex lock gives protection
> from concurrent updates. Driver is not registered if _DSM LED management
> is available.
> 
> NPEM is a PCIe extended capability so it should be registered in
> pcie_init_capabilities() but it is not possible due to LED dependency.
> Parent pci_device must be added earlier for led_classdev_register()
> to be successful. NPEM does not require configuration on kernel side, it
> is safe to register LED devices later.
> 
> Link: https://members.pcisig.com/wg/PCI-SIG/document/19849 [1]
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> ---
>  drivers/pci/Kconfig           |   9 +
>  drivers/pci/Makefile          |   1 +
>  drivers/pci/npem.c            | 410 ++++++++++++++++++++++++++++++++++
>  drivers/pci/pci.h             |   8 +
>  drivers/pci/probe.c           |   2 +
>  drivers/pci/remove.c          |   2 +
>  include/linux/pci.h           |   3 +
>  include/uapi/linux/pci_regs.h |  34 +++
>  8 files changed, 469 insertions(+)
>  create mode 100644 drivers/pci/npem.c
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index d35001589d88..e696e69ad516 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -143,6 +143,15 @@ config PCI_IOV
>  
>  	  If unsure, say N.
>  
> +config PCI_NPEM
> +	bool "Native PCIe Enclosure Management"
> +	depends on LEDS_CLASS=y

I would have expected

    depends on NEW_LEDS
    select LEDS_CLASS

> +	help
> +	  Support for Native PCIe Enclosure Management. It allows managing LED
> +	  indications in storage enclosures. Enclosure must support following
> +	  indications: OK, Locate, Fail, Rebuild, other indications are
> +	  optional.
> +
>  config PCI_PRI
>  	bool "PCI PRI support"
>  	select PCI_ATS
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 175302036890..cd5f655d4be9 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -34,6 +34,7 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
>  obj-$(CONFIG_VGA_ARB)		+= vgaarb.o
>  obj-$(CONFIG_PCI_DOE)		+= doe.o
>  obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o
> +obj-$(CONFIG_PCI_NPEM)		+= npem.o
>  
>  # Endpoint library must be initialized before its users
>  obj-$(CONFIG_PCI_ENDPOINT)	+= endpoint/
> diff --git a/drivers/pci/npem.c b/drivers/pci/npem.c
> new file mode 100644
> index 000000000000..a76a2044dab2
> --- /dev/null
> +++ b/drivers/pci/npem.c
> @@ -0,0 +1,410 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCIe Enclosure management driver created for LED interfaces based on
> + * indications. It says *what indications* blink but does not specify *how*
> + * they blink - it is hardware defined.
> + *
> + * The driver name refers to Native PCIe Enclosure Management. It is
> + * first indication oriented standard with specification.
> + *
> + * Native PCIe Enclosure Management (NPEM)
> + *	PCIe Base Specification r6.1 sec 6.28
> + *	PCIe Base Specification r6.1 sec 7.9.19
> + *
> + * Copyright (c) 2023-2024 Intel Corporation
> + *	Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/bitops.h>
> +#include <linux/errno.h>
> +#include <linux/iopoll.h>
> +#include <linux/leds.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pci_regs.h>
> +#include <linux/types.h>
> +#include <linux/uleds.h>
> +
> +#include "pci.h"
> +
> +struct indication {
> +	u32 bit;
> +	const char *name;
> +};
> +
> +static const struct indication npem_indications[] = {
> +	{PCI_NPEM_IND_OK,	"enclosure:ok"},
> +	{PCI_NPEM_IND_LOCATE,	"enclosure:locate"},
> +	{PCI_NPEM_IND_FAIL,	"enclosure:fail"},
> +	{PCI_NPEM_IND_REBUILD,	"enclosure:rebuild"},
> +	{PCI_NPEM_IND_PFA,	"enclosure:pfa"},
> +	{PCI_NPEM_IND_HOTSPARE,	"enclosure:hotspare"},
> +	{PCI_NPEM_IND_ICA,	"enclosure:ica"},
> +	{PCI_NPEM_IND_IFA,	"enclosure:ifa"},
> +	{PCI_NPEM_IND_IDT,	"enclosure:idt"},
> +	{PCI_NPEM_IND_DISABLED,	"enclosure:disabled"},
> +	{PCI_NPEM_IND_SPEC_0,	"enclosure:specific_0"},
> +	{PCI_NPEM_IND_SPEC_1,	"enclosure:specific_1"},
> +	{PCI_NPEM_IND_SPEC_2,	"enclosure:specific_2"},
> +	{PCI_NPEM_IND_SPEC_3,	"enclosure:specific_3"},
> +	{PCI_NPEM_IND_SPEC_4,	"enclosure:specific_4"},
> +	{PCI_NPEM_IND_SPEC_5,	"enclosure:specific_5"},
> +	{PCI_NPEM_IND_SPEC_6,	"enclosure:specific_6"},
> +	{PCI_NPEM_IND_SPEC_7,	"enclosure:specific_7"},
> +	{0,			NULL}
> +};
> +
> +#define for_each_indication(ind, inds) \
> +	for (ind = inds; ind->bit; ind++)
> +
> +/* To avoid confusion, do not keep any special bits in indications */

I am confused by this comment. What "special bits" is this referring to?

Make sure comments are something that can help you remember 5 to 10
years what this function is doing.

> +static u32 reg_to_indications(u32 caps, const struct indication *inds)
> +{
> +	const struct indication *ind;
> +	u32 supported_indications = 0;
> +
> +	for_each_indication(ind, inds)
> +		supported_indications |= ind->bit;
> +
> +	return caps & supported_indications;
> +}
> +
> +/**
> + * struct npem_led - LED details
> + * @indication: indication details
> + * @npem: npem device
> + * @name: LED name
> + * @led: LED device
> + */
> +struct npem_led {
> +	const struct indication *indication;
> +	struct npem *npem;
> +	char name[LED_MAX_NAME_SIZE];
> +	struct led_classdev led;
> +};
> +
> +/**
> + * struct npem_ops - backend specific callbacks
> + * @inds: supported indications array
> + * @get_active_indications: get active indications
> + *	npem: npem device
> + *	buf: response buffer
> + * @set_active_indications: set new indications
> + *	npem: npem device
> + *	val: bit mask to set
> + *
> + * Handle communication with hardware. set_active_indications updates
> + * npem->active_indications.
> + */
> +struct npem_ops {
> +	const struct indication *inds;

@inds is not an operation, it feels like something that belongs as
another member in 'struct npem'. What drove this data to join 'struct
npem_ops'?

> +	int (*get_active_indications)(struct npem *npem, u32 *buf);
> +	int (*set_active_indications)(struct npem *npem, u32 val);
> +};
> +
> +/**
> + * struct npem - NPEM device properties
> + * @dev: PCIe device this driver is attached to
> + * @ops: Backend specific callbacks
> + * @npem_lock: to keep concurrent updates serialized
> + * @pos: NPEM capability offset (only relevant for NPEM direct register access,
> + *	 not DSM access method)
> + * @supported_indications: bit mask of supported indications
> + *			   non-indication and reserved bits are cleared
> + * @active_indications: bit mask of active indications
> + *			non-indication and reserved bits are cleared
> + * @led_cnt: Supported LEDs count
> + * @leds: supported LEDs

It would help to describe the locking model here since the comment for
@npem_lock is not adding anything useful.

> + */
> +struct npem {
> +	struct pci_dev *dev;
> +	const struct npem_ops *ops;
> +	struct mutex npem_lock;
> +	u16 pos;
> +	u32 supported_indications;
> +	u32 active_indications;
> +	int led_cnt;
> +	struct npem_led leds[];
> +};
> +
> +static int npem_read_reg(struct npem *npem, u16 reg, u32 *val)
> +{
> +	int ret = pci_read_config_dword(npem->dev, npem->pos + reg, val);
> +
> +	return pcibios_err_to_errno(ret);
> +}
> +
> +static int npem_write_ctrl(struct npem *npem, u32 reg)
> +{
> +	int pos = npem->pos + PCI_NPEM_CTRL;
> +	int ret = pci_write_config_dword(npem->dev, pos, reg);
> +
> +	return pcibios_err_to_errno(ret);
> +}
> +
> +static int npem_get_active_indications(struct npem *npem, u32 *buf)
> +{
> +	int ret;
> +
> +	ret = npem_read_reg(npem, PCI_NPEM_CTRL, buf);
> +	if (ret)
> +		return ret;
> +
> +	/* If PCI_NPEM_CTRL_ENABLE is not set then no indication should blink */
> +	if (!(*buf & PCI_NPEM_CTRL_ENABLE))

It feels odd to reuse @buf like this.

I would have a local @ctrl variable and before returning do:

   *buf = ctrl;

> +		*buf = 0;
> +
> +	/* Filter out not supported indications in response */
> +	*buf &= npem->supported_indications;
> +	return 0;
> +}
> +
> +static int npem_set_active_indications(struct npem *npem, u32 val)
> +{
> +	int ret, ret_val;
> +	u32 cc_status;
> +
> +	lockdep_assert_held(&npem->npem_lock);
> +
> +	/* This bit is always required */
> +	val |= PCI_NPEM_CTRL_ENABLE;
> +	ret = npem_write_ctrl(npem, val);

@val is too generic a name when it is known that this the control
register value, how about call it @ctrl directly?

> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * For the case where a NPEM command has not completed immediately,
> +	 * it is recommended that software not continuously “spin” on polling
> +	 * the status register, but rather poll under interrupt at a reduced
> +	 * rate; for example at 10 ms intervals.

I think the use of read_poll_timeout() obviates the need for this
comment. I.e. read_poll_timeout() is self documenting.

> +	 *
> +	 * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of NPEM
> +	 * Command Completed"
> +	 */
> +	ret = read_poll_timeout(npem_read_reg, ret_val,
> +				ret_val || (cc_status & PCI_NPEM_STATUS_CC),
> +				10 * USEC_PER_MSEC, USEC_PER_SEC, false, npem,
> +				PCI_NPEM_STATUS, &cc_status);
> +	if (ret)
> +		return ret_val;
> +
> +	/*
> +	 * All writes to control register, including writes that do not change
> +	 * the register value, are NPEM commands and should eventually result
> +	 * in a command completion indication in the NPEM Status Register.
> +	 *
> +	 * PCIe Base Specification r6.1 sec 7.9.19.3
> +	 *
> +	 * Register may not be updated, or other conflicting bits may be
> +	 * cleared. Spec is not strict here. Read NPEM Control register after
> +	 * write to keep cache in-sync.
> +	 */
> +	return npem_get_active_indications(npem, &npem->active_indications);
> +}
> +
> +static const struct npem_ops npem_ops = {
> +	.inds = npem_indications,
> +	.get_active_indications = npem_get_active_indications,
> +	.set_active_indications = npem_set_active_indications,
> +};
> +
> +#define DSM_GUID GUID_INIT(0x5d524d9d, 0xfff9, 0x4d4b,  0x8c, 0xb7, 0x74, 0x7e,\
> +			   0xd5, 0x1e, 0x19, 0x4d)
> +#define GET_SUPPORTED_STATES_DSM	BIT(1)
> +#define GET_STATE_DSM			BIT(2)
> +#define SET_STATE_DSM			BIT(3)
> +
> +static const guid_t dsm_guid = DSM_GUID;
> +
> +static bool npem_has_dsm(struct pci_dev *pdev)
> +{
> +	acpi_handle handle;
> +
> +	handle = ACPI_HANDLE(&pdev->dev);
> +	if (!handle)
> +		return false;
> +
> +	return acpi_check_dsm(handle, &dsm_guid, 0x1, GET_SUPPORTED_STATES_DSM |
> +			      GET_STATE_DSM | SET_STATE_DSM);
> +}
> +
> +/*
> + * This function does not use ops->get_active_indications().
> + * It returns cached value, npem->npem_lock is held and it is safe.

What about this function would make a reader think it was unsafe? It is
evident that this function does not call ops->get_active_indications(),
just as it is evident that it does not call kfree().

Would a better description be:

"The status of each indicator is cached at init time and only updated at
write time. brightness_get() is only responsible for reflecting the last
written/cached value."

> + */
> +static enum led_brightness brightness_get(struct led_classdev *led)
> +{
> +	struct npem_led *nled = container_of(led, struct npem_led, led);
> +	struct npem *npem = nled->npem;
> +	int ret, val = LED_OFF;
> +
> +	ret = mutex_lock_interruptible(&npem->npem_lock);
> +	if (ret)
> +		return ret;
> +
> +	if (npem->active_indications & nled->indication->bit)
> +		val = LED_ON;
> +
> +	mutex_unlock(&npem->npem_lock);
> +
> +	return val;
> +}
> +
> +static int brightness_set(struct led_classdev *led,
> +			  enum led_brightness brightness)
> +{
> +	struct npem_led *nled = container_of(led, struct npem_led, led);
> +	struct npem *npem = nled->npem;
> +	u32 indications;
> +	int ret;
> +
> +	ret = mutex_lock_interruptible(&npem->npem_lock);
> +	if (ret)
> +		return ret;
> +
> +	if (brightness == LED_OFF)
> +		indications = npem->active_indications & ~(nled->indication->bit);
> +	else
> +		indications = npem->active_indications | nled->indication->bit;
> +
> +	ret = npem->ops->set_active_indications(npem, indications);
> +
> +	mutex_unlock(&npem->npem_lock);
> +	return ret;
> +}
> +
> +static void npem_free(struct npem *npem)
> +{
> +	struct npem_led *nled;
> +	int cnt;
> +
> +	for (cnt = 0; cnt < npem->led_cnt; cnt++) {
> +		nled = &npem->leds[cnt];
> +
> +		if (nled->name[0])
> +			led_classdev_unregister(&nled->led);
> +	}
> +
> +	mutex_destroy(&npem->npem_lock);
> +	kfree(npem);
> +}
> +
> +static int pci_npem_set_led_classdev(struct npem *npem, struct npem_led *nled)
> +{
> +	struct led_classdev *led = &nled->led;
> +	struct led_init_data init_data = {};
> +	char *name = nled->name;
> +	int ret;
> +
> +	init_data.devicename = pci_name(npem->dev);
> +	init_data.default_label = nled->indication->name;
> +
> +	ret = led_compose_name(&npem->dev->dev, &init_data, name);
> +	if (ret)
> +		return ret;
> +
> +	led->name = name;
> +	led->brightness_set_blocking = brightness_set;
> +	led->brightness_get = brightness_get;
> +	led->max_brightness = LED_ON;
> +	led->default_trigger = "none";
> +	led->flags = 0;
> +
> +	ret = led_classdev_register(&npem->dev->dev, led);
> +	if (ret)
> +		/* Clear the name to indicate that it is not registered. */
> +		name[0] = 0;
> +	return ret;
> +}
> +
> +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops,
> +			 int pos, u32 caps)
> +{
> +	u32 supported = reg_to_indications(caps, ops->inds);
> +	int supported_cnt = hweight32(supported);
> +	const struct indication *indication;
> +	struct npem_led *nled;
> +	struct npem *npem;
> +	int led_idx = 0;
> +	u32 active;
> +	int ret;
> +
> +	npem = kzalloc(struct_size(npem, leds, supported_cnt), GFP_KERNEL);
> +
> +	if (!npem)
> +		return -ENOMEM;
> +
> +	npem->supported_indications = supported;
> +	npem->led_cnt = supported_cnt;
> +	npem->pos = pos;
> +	npem->dev = dev;
> +	npem->ops = ops;
> +
> +	ret = ops->get_active_indications(npem, &active);
> +	if (ret) {
> +		npem_free(npem);
> +		return -EACCES;

get_active_indications() returns an errno, why translate to "Permission
denied" which is likely to make someone go cross-eyed wondering why root
is getting "Permission denied".

> +	}
> +
> +	npem->active_indications = reg_to_indications(active, ops->inds);
> +
> +	/*
> +	 * Do not take npem->npem_lock, get_brightness() is called on
> +	 * registration path.
> +	 */

Delete this comment. Just put a lockdep_assert_held() in
get_active_indications() and take the lock for caching the initial
state.

> +	mutex_init(&npem->npem_lock);
> +
> +	for_each_indication(indication, npem_indications) {
> +		if ((npem->supported_indications & indication->bit) == 0)
> +			/* Do not register unsupported indication */
> +			continue;
> +
> +		nled = &npem->leds[led_idx++];
> +		nled->indication = indication;
> +		nled->npem = npem;
> +
> +		ret = pci_npem_set_led_classdev(npem, nled);
> +		if (ret) {
> +			npem_free(npem);
> +			return ret;
> +		}
> +	}
> +
> +	dev->npem = npem;
> +	return 0;
> +}
> +
> +void pci_npem_remove(struct pci_dev *dev)
> +{
> +	if (dev->npem)
> +		npem_free(dev->npem);

Just put a NULL check in npem_free().

> +}
> +
> +void pci_npem_create(struct pci_dev *dev)
> +{
> +	const struct npem_ops *ops = &npem_ops;
> +	int pos = 0, ret;
> +	u32 cap;
> +
> +	if (!npem_has_dsm(dev)) {
> +		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM);
> +		if (pos == 0)
> +			return;
> +
> +		if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 ||
> +		    (cap & PCI_NPEM_CAP_CAPABLE) == 0)
> +			return;
> +	} else {
> +		/*
> +		 * OS should use the DSM for LED control if it is available
> +		 * PCI Firmware Spec r3.3 sec 4.7.
> +		 */
> +		return;

I assume this is a TODO for the next patch that add DSM support?

> +	}
> +
> +	ret = pci_npem_init(dev, ops, pos, cap);
> +	if (ret)
> +		pci_err(dev, "Failed to register PCIe Enclosure Management driver, err: %d\n",
> +			ret);
> +}
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index fd44565c4756..9dea8c7353ab 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -333,6 +333,14 @@ static inline void pci_doe_destroy(struct pci_dev *pdev) { }
>  static inline void pci_doe_disconnected(struct pci_dev *pdev) { }
>  #endif
>  
> +#ifdef CONFIG_PCI_NPEM
> +void pci_npem_create(struct pci_dev *dev);
> +void pci_npem_remove(struct pci_dev *dev);
> +#else
> +static inline void pci_npem_create(struct pci_dev *dev) { }
> +static inline void pci_npem_remove(struct pci_dev *dev) { }
> +#endif
> +
>  /**
>   * pci_dev_set_io_state - Set the new error state if possible.
>   *
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 8e696e547565..b8ea6353e27a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2582,6 +2582,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
>  	dev->match_driver = false;
>  	ret = device_add(&dev->dev);
>  	WARN_ON(ret < 0);
> +
> +	pci_npem_create(dev);
>  }
>  
>  struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index d749ea8250d6..1436f9cf1fea 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -33,6 +33,8 @@ static void pci_destroy_dev(struct pci_dev *dev)
>  	if (!dev->dev.kobj.parent)
>  		return;
>  
> +	pci_npem_remove(dev);
> +
>  	device_del(&dev->dev);
>  
>  	down_write(&pci_bus_sem);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index fb004fd4e889..c327c2dd4527 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -517,6 +517,9 @@ struct pci_dev {
>  #endif
>  #ifdef CONFIG_PCI_DOE
>  	struct xarray	doe_mbs;	/* Data Object Exchange mailboxes */
> +#endif
> +#ifdef CONFIG_PCI_NPEM
> +	struct npem	*npem;		/* Native PCIe Enclosure Management */
>  #endif
>  	u16		acs_cap;	/* ACS Capability offset */
>  	phys_addr_t	rom;		/* Physical address if not from BAR */
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 94c00996e633..97e8b7ed9998 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -740,6 +740,7 @@
>  #define PCI_EXT_CAP_ID_DVSEC	0x23	/* Designated Vendor-Specific */
>  #define PCI_EXT_CAP_ID_DLF	0x25	/* Data Link Feature */
>  #define PCI_EXT_CAP_ID_PL_16GT	0x26	/* Physical Layer 16.0 GT/s */
> +#define PCI_EXT_CAP_ID_NPEM	0x29	/* Native PCIe Enclosure Management */
>  #define PCI_EXT_CAP_ID_PL_32GT  0x2A    /* Physical Layer 32.0 GT/s */
>  #define PCI_EXT_CAP_ID_DOE	0x2E	/* Data Object Exchange */
>  #define PCI_EXT_CAP_ID_MAX	PCI_EXT_CAP_ID_DOE
> @@ -1121,6 +1122,39 @@
>  #define  PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK		0x000000F0
>  #define  PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT	4
>  
> +/* Native PCIe Enclosure Management */
> +#define PCI_NPEM_CAP	0x04 /* NPEM capability register */
> +#define	 PCI_NPEM_CAP_CAPABLE		0x00000001 /* NPEM Capable */
> +
> +#define PCI_NPEM_CTRL	0x08 /* NPEM control register */
> +#define	 PCI_NPEM_CTRL_ENABLE		0x00000001 /* NPEM Enable */
> +
> +#define PCI_NPEM_STATUS	0x0c /* NPEM status register */
> +#define	 PCI_NPEM_STATUS_CC		0x00000001 /* NPEM Command completed */
> +/*
> + * Native PCIe Enclosure Management indication bits and Reset command bit
> + * are corresponding for capability and control registers.
> + */
> +#define  PCI_NPEM_CMD_RESET		0x00000002 /* NPEM Reset Command */
> +#define  PCI_NPEM_IND_OK		0x00000004 /* NPEM indication OK */
> +#define  PCI_NPEM_IND_LOCATE		0x00000008 /* NPEM indication Locate */
> +#define  PCI_NPEM_IND_FAIL		0x00000010 /* NPEM indication Fail */
> +#define  PCI_NPEM_IND_REBUILD		0x00000020 /* NPEM indication Rebuild */
> +#define  PCI_NPEM_IND_PFA		0x00000040 /* NPEM indication Predicted Failure Analysis */
> +#define  PCI_NPEM_IND_HOTSPARE		0x00000080 /* NPEM indication Hot Spare */
> +#define  PCI_NPEM_IND_ICA		0x00000100 /* NPEM indication In Critical Array */
> +#define  PCI_NPEM_IND_IFA		0x00000200 /* NPEM indication In Failed Array */
> +#define  PCI_NPEM_IND_IDT		0x00000400 /* NPEM indication Invalid Device Type */
> +#define  PCI_NPEM_IND_DISABLED		0x00000800 /* NPEM indication Disabled */
> +#define  PCI_NPEM_IND_SPEC_0		0x00800000
> +#define  PCI_NPEM_IND_SPEC_1		0x01000000
> +#define  PCI_NPEM_IND_SPEC_2		0x02000000
> +#define  PCI_NPEM_IND_SPEC_3		0x04000000
> +#define  PCI_NPEM_IND_SPEC_4		0x08000000
> +#define  PCI_NPEM_IND_SPEC_5		0x10000000
> +#define  PCI_NPEM_IND_SPEC_6		0x20000000
> +#define  PCI_NPEM_IND_SPEC_7		0x40000000

Given no other driver needs this, I would define them locally in
drivers/pci/npem.c.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-05-29  5:21   ` Dan Williams
@ 2024-05-29  9:38     ` Lukas Wunner
  2024-06-12 11:40       ` Mariusz Tkaczyk
  2024-06-19 14:28     ` Mariusz Tkaczyk
  1 sibling, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2024-05-29  9:38 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mariusz Tkaczyk, linux-pci, Arnd Bergmann, Bjorn Helgaas,
	Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch, Marek Behun,
	Pavel Machek, Randy Dunlap, Andy Shevchenko, Stuart Hayes

On Tue, May 28, 2024 at 10:21:10PM -0700, Dan Williams wrote:
> Mariusz Tkaczyk wrote:
> > +config PCI_NPEM
> > +	bool "Native PCIe Enclosure Management"
> > +	depends on LEDS_CLASS=y
> 
> I would have expected
> 
>     depends on NEW_LEDS
>     select LEDS_CLASS

Hm, a quick "git grep -C 2 'depends on NEW_LEDS'" shows that noone else
does that.  Everyone else either selects both NEW_LEDS and LEDS_CLASS
or depends on both or depends on just LEDS_CLASS.

(Since LEDS_CLASS is constrained to "if NEW_LEDS", depending on both
seems pointless, so I'm not sure why some people do that.)

I guess it would be good to get guidance from leds maintainers what
the preferred modus operandi is.


> > +#define for_each_indication(ind, inds) \
> > +	for (ind = inds; ind->bit; ind++)
> > +
> > +/* To avoid confusion, do not keep any special bits in indications */
> 
> I am confused by this comment. What "special bits" is this referring to?

I think it's referring to bit 0 in the Status and Control register,
which is a master "NPEM Capable" and "NPEM Enable" bit.


> > +struct npem_ops {
> > +	const struct indication *inds;
> 
> @inds is not an operation, it feels like something that belongs as
> another member in 'struct npem'. What drove this data to join 'struct
> npem_ops'?

The native NPEM register interface supports enclosure-specific indications
which the DSM interface does not support.  So those indications are
present in the native npem_ops->inds and not present in the DSM
npem_ops->inds.


> > --- a/include/uapi/linux/pci_regs.h
> > +++ b/include/uapi/linux/pci_regs.h
[...]
> > +#define  PCI_NPEM_IND_SPEC_0		0x00800000
> > +#define  PCI_NPEM_IND_SPEC_1		0x01000000
> > +#define  PCI_NPEM_IND_SPEC_2		0x02000000
> > +#define  PCI_NPEM_IND_SPEC_3		0x04000000
> > +#define  PCI_NPEM_IND_SPEC_4		0x08000000
> > +#define  PCI_NPEM_IND_SPEC_5		0x10000000
> > +#define  PCI_NPEM_IND_SPEC_6		0x20000000
> > +#define  PCI_NPEM_IND_SPEC_7		0x40000000
> 
> Given no other driver needs this, I would define them locally in
> drivers/pci/npem.c.

This is a uapi header, so could be used not just by other drivers
but by user space.

It's common to add spec-defined register bits to this header file
even if they're only used by a single source file in the kernel.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-05-29  9:38     ` Lukas Wunner
@ 2024-06-12 11:40       ` Mariusz Tkaczyk
  2024-06-13 16:11         ` stuart hayes
  0 siblings, 1 reply; 22+ messages in thread
From: Mariusz Tkaczyk @ 2024-06-12 11:40 UTC (permalink / raw)
  To: Lukas Wunner, Pavel Machek
  Cc: Dan Williams, linux-pci, Arnd Bergmann, Bjorn Helgaas,
	Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch, Marek Behun,
	Randy Dunlap, Andy Shevchenko, Stuart Hayes

Hi,
Thanks for feedback Dan!

On Wed, 29 May 2024 11:38:12 +0200
Lukas Wunner <lukas@wunner.de> wrote:

> On Tue, May 28, 2024 at 10:21:10PM -0700, Dan Williams wrote:
> > Mariusz Tkaczyk wrote:  
> > > +config PCI_NPEM
> > > +	bool "Native PCIe Enclosure Management"
> > > +	depends on LEDS_CLASS=y  
> > 
> > I would have expected
> > 
> >     depends on NEW_LEDS
> >     select LEDS_CLASS  
> 
> Hm, a quick "git grep -C 2 'depends on NEW_LEDS'" shows that noone else
> does that.  Everyone else either selects both NEW_LEDS and LEDS_CLASS
> or depends on both or depends on just LEDS_CLASS.
> 
> (Since LEDS_CLASS is constrained to "if NEW_LEDS", depending on both
> seems pointless, so I'm not sure why some people do that.)
> 
> I guess it would be good to get guidance from leds maintainers what
> the preferred modus operandi is.

Pavel, could you please advice?
I have no clue which way I should take so I prefer to keep current approach.

> 
> 
> > > +#define for_each_indication(ind, inds) \
> > > +	for (ind = inds; ind->bit; ind++)
> > > +
> > > +/* To avoid confusion, do not keep any special bits in indications */  
> > 
> > I am confused by this comment. What "special bits" is this referring to?  
> 
> I think it's referring to bit 0 in the Status and Control register,
> which is a master "NPEM Capable" and "NPEM Enable" bit.

Yes, there are 2 special bits for capability/control
NPEM_CAP_CAPABLE/NPEM_ENABLE and NPEM_CAP_RESET/NPEM_RESET.

I wanted to highlight that these bits are not included in the cache. I will try
to make it more precise in v3.

> 
> 
> > > +struct npem_ops {
> > > +	const struct indication *inds;  
> > 
> > @inds is not an operation, it feels like something that belongs as
> > another member in 'struct npem'. What drove this data to join 'struct
> > npem_ops'?  
> 
> The native NPEM register interface supports enclosure-specific indications
> which the DSM interface does not support.  So those indications are
> present in the native npem_ops->inds and not present in the DSM
> npem_ops->inds.

Yes, I need to differentiate DSM and NPEM indications. DSM has own indications
list.

> 
> 
> > > --- a/include/uapi/linux/pci_regs.h
> > > +++ b/include/uapi/linux/pci_regs.h  
> [...]
> > > +#define  PCI_NPEM_IND_SPEC_0		0x00800000
> > > +#define  PCI_NPEM_IND_SPEC_1		0x01000000
> > > +#define  PCI_NPEM_IND_SPEC_2		0x02000000
> > > +#define  PCI_NPEM_IND_SPEC_3		0x04000000
> > > +#define  PCI_NPEM_IND_SPEC_4		0x08000000
> > > +#define  PCI_NPEM_IND_SPEC_5		0x10000000
> > > +#define  PCI_NPEM_IND_SPEC_6		0x20000000
> > > +#define  PCI_NPEM_IND_SPEC_7		0x40000000  
> > 
> > Given no other driver needs this, I would define them locally in
> > drivers/pci/npem.c.  
> 
> This is a uapi header, so could be used not just by other drivers
> but by user space.
> 
> It's common to add spec-defined register bits to this header file
> even if they're only used by a single source file in the kernel.
> 

I will stay with current state while waiting for Bjorn's voice here.

I will send v3 with fixes requested by Dan and Ilpo but I still need Stuart
feedback on DSM patch.

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-06-12 11:40       ` Mariusz Tkaczyk
@ 2024-06-13 16:11         ` stuart hayes
  0 siblings, 0 replies; 22+ messages in thread
From: stuart hayes @ 2024-06-13 16:11 UTC (permalink / raw)
  To: Mariusz Tkaczyk, Lukas Wunner, Pavel Machek
  Cc: Dan Williams, linux-pci, Arnd Bergmann, Bjorn Helgaas,
	Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch, Marek Behun,
	Randy Dunlap, Andy Shevchenko



On 6/12/2024 6:40 AM, Mariusz Tkaczyk wrote:
> Hi,
> Thanks for feedback Dan!
> 
> On Wed, 29 May 2024 11:38:12 +0200
> Lukas Wunner <lukas@wunner.de> wrote:
> 
>> On Tue, May 28, 2024 at 10:21:10PM -0700, Dan Williams wrote:
>>> Mariusz Tkaczyk wrote:
>>>> +config PCI_NPEM
>>>> +	bool "Native PCIe Enclosure Management"
>>>> +	depends on LEDS_CLASS=y
>>>
>>> I would have expected
>>>
>>>      depends on NEW_LEDS
>>>      select LEDS_CLASS
>>
>> Hm, a quick "git grep -C 2 'depends on NEW_LEDS'" shows that noone else
>> does that.  Everyone else either selects both NEW_LEDS and LEDS_CLASS
>> or depends on both or depends on just LEDS_CLASS.
>>
>> (Since LEDS_CLASS is constrained to "if NEW_LEDS", depending on both
>> seems pointless, so I'm not sure why some people do that.)
>>
>> I guess it would be good to get guidance from leds maintainers what
>> the preferred modus operandi is.
> 
> Pavel, could you please advice?
> I have no clue which way I should take so I prefer to keep current approach.
> 
>>
>>
>>>> +#define for_each_indication(ind, inds) \
>>>> +	for (ind = inds; ind->bit; ind++)
>>>> +
>>>> +/* To avoid confusion, do not keep any special bits in indications */
>>>
>>> I am confused by this comment. What "special bits" is this referring to?
>>
>> I think it's referring to bit 0 in the Status and Control register,
>> which is a master "NPEM Capable" and "NPEM Enable" bit.
> 
> Yes, there are 2 special bits for capability/control
> NPEM_CAP_CAPABLE/NPEM_ENABLE and NPEM_CAP_RESET/NPEM_RESET.
> 
> I wanted to highlight that these bits are not included in the cache. I will try
> to make it more precise in v3.
> 
>>
>>
>>>> +struct npem_ops {
>>>> +	const struct indication *inds;
>>>
>>> @inds is not an operation, it feels like something that belongs as
>>> another member in 'struct npem'. What drove this data to join 'struct
>>> npem_ops'?
>>
>> The native NPEM register interface supports enclosure-specific indications
>> which the DSM interface does not support.  So those indications are
>> present in the native npem_ops->inds and not present in the DSM
>> npem_ops->inds.
> 
> Yes, I need to differentiate DSM and NPEM indications. DSM has own indications
> list.
> 
>>
>>
>>>> --- a/include/uapi/linux/pci_regs.h
>>>> +++ b/include/uapi/linux/pci_regs.h
>> [...]
>>>> +#define  PCI_NPEM_IND_SPEC_0		0x00800000
>>>> +#define  PCI_NPEM_IND_SPEC_1		0x01000000
>>>> +#define  PCI_NPEM_IND_SPEC_2		0x02000000
>>>> +#define  PCI_NPEM_IND_SPEC_3		0x04000000
>>>> +#define  PCI_NPEM_IND_SPEC_4		0x08000000
>>>> +#define  PCI_NPEM_IND_SPEC_5		0x10000000
>>>> +#define  PCI_NPEM_IND_SPEC_6		0x20000000
>>>> +#define  PCI_NPEM_IND_SPEC_7		0x40000000
>>>
>>> Given no other driver needs this, I would define them locally in
>>> drivers/pci/npem.c.
>>
>> This is a uapi header, so could be used not just by other drivers
>> but by user space.
>>
>> It's common to add spec-defined register bits to this header file
>> even if they're only used by a single source file in the kernel.
>>
> 
> I will stay with current state while waiting for Bjorn's voice here.
> 
> I will send v3 with fixes requested by Dan and Ilpo but I still need Stuart
> feedback on DSM patch.
> 
> Thanks,
> Mariusz

I'm working on testing this now, sorry for the delay.
Thanks!  Stuart

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-05-28 13:55   ` Ilpo Järvinen
@ 2024-06-14 13:40     ` Mariusz Tkaczyk
  0 siblings, 0 replies; 22+ messages in thread
From: Mariusz Tkaczyk @ 2024-06-14 13:40 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: linux-pci, Arnd Bergmann, Bjorn Helgaas, Dan Williams,
	Greg Kroah-Hartman, Lukas Wunner, Keith Busch, Marek Behun,
	Pavel Machek, Randy Dunlap, Andy Shevchenko, Stuart Hayes

On Tue, 28 May 2024 16:55:16 +0300 (EEST)
Ilpo Järvinen <ilpo.jarvinen@linux.intel.com> wrote:

> > +#define  PCI_NPEM_IND_SPEC_0		0x00800000
> > +#define  PCI_NPEM_IND_SPEC_1		0x01000000
> > +#define  PCI_NPEM_IND_SPEC_2		0x02000000
> > +#define  PCI_NPEM_IND_SPEC_3		0x04000000
> > +#define  PCI_NPEM_IND_SPEC_4		0x08000000
> > +#define  PCI_NPEM_IND_SPEC_5		0x10000000
> > +#define  PCI_NPEM_IND_SPEC_6		0x20000000
> > +#define  PCI_NPEM_IND_SPEC_7		0x40000000  
> 
> Are these off-by-one, shouldn't that field go all the way to the last bit 
> which is 0x80000000 (the field is 31:24)??

Yes, I made a mistake. Good catch!

Mariusz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] leds: Init leds class earlier
  2024-05-29  4:22   ` Dan Williams
@ 2024-06-14 14:08     ` Lukas Wunner
  2024-06-14 14:13       ` Greg Kroah-Hartman
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2024-06-14 14:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: Mariusz Tkaczyk, linux-pci, Arnd Bergmann, Bjorn Helgaas,
	Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch, Marek Behun,
	Pavel Machek, Randy Dunlap, Andy Shevchenko, Stuart Hayes

On Tue, May 28, 2024 at 09:22:22PM -0700, Dan Williams wrote:
> Mariusz Tkaczyk wrote:
> > PCI subsystem is registered on subsys_initcall() same as leds class.
> > NPEM driver will require leds class, there is race possibility.
> > Register led class on arch_initcall() to avoid it.
> 
> Another way to solve this without changing initcall levels is to just
> make sure that the leds subsys initcall happens first, i.e.:
> 
> diff --git a/drivers/Makefile b/drivers/Makefile
> index fe9ceb0d2288..d47b4186108a 100644
> --- a/drivers/Makefile
> +++ b/drivers/Makefile
> @@ -16,6 +16,7 @@ obj-$(CONFIG_GENERIC_PHY)     += phy/
>  obj-$(CONFIG_PINCTRL)          += pinctrl/
>  obj-$(CONFIG_GPIOLIB)          += gpio/
>  obj-y                          += pwm/
> +obj-y                          += leds/
>  
>  obj-y                          += pci/
>  
> @@ -130,7 +131,6 @@ obj-$(CONFIG_CPU_IDLE)              += cpuidle/
>  obj-y                          += mmc/
>  obj-y                          += ufs/
>  obj-$(CONFIG_MEMSTICK)         += memstick/
> -obj-y                          += leds/
>  obj-$(CONFIG_INFINIBAND)       += infiniband/
>  obj-y                          += firmware/
>  obj-$(CONFIG_CRYPTO)           += crypto/

To me, this seems more fragile than changing the initcall level.

The risk I see is that someone comes along and rearranges the Makefile in,
say, alphabetic order because "why not?" and unwittingly breaks NPEM.

Changing initcall levels at least explicitly sets the order in stone.

However, perhaps a code comment is helpful to tell the casual
code reader why this particular initcall level is needed.

Something like...

/* LEDs may already be registered in subsys initcall level */

... may already be sufficient.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 1/3] leds: Init leds class earlier
  2024-06-14 14:08     ` Lukas Wunner
@ 2024-06-14 14:13       ` Greg Kroah-Hartman
  0 siblings, 0 replies; 22+ messages in thread
From: Greg Kroah-Hartman @ 2024-06-14 14:13 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Dan Williams, Mariusz Tkaczyk, linux-pci, Arnd Bergmann,
	Bjorn Helgaas, Ilpo Jarvinen, Keith Busch, Marek Behun,
	Pavel Machek, Randy Dunlap, Andy Shevchenko, Stuart Hayes

On Fri, Jun 14, 2024 at 04:08:48PM +0200, Lukas Wunner wrote:
> On Tue, May 28, 2024 at 09:22:22PM -0700, Dan Williams wrote:
> > Mariusz Tkaczyk wrote:
> > > PCI subsystem is registered on subsys_initcall() same as leds class.
> > > NPEM driver will require leds class, there is race possibility.
> > > Register led class on arch_initcall() to avoid it.
> > 
> > Another way to solve this without changing initcall levels is to just
> > make sure that the leds subsys initcall happens first, i.e.:
> > 
> > diff --git a/drivers/Makefile b/drivers/Makefile
> > index fe9ceb0d2288..d47b4186108a 100644
> > --- a/drivers/Makefile
> > +++ b/drivers/Makefile
> > @@ -16,6 +16,7 @@ obj-$(CONFIG_GENERIC_PHY)     += phy/
> >  obj-$(CONFIG_PINCTRL)          += pinctrl/
> >  obj-$(CONFIG_GPIOLIB)          += gpio/
> >  obj-y                          += pwm/
> > +obj-y                          += leds/
> >  
> >  obj-y                          += pci/
> >  
> > @@ -130,7 +131,6 @@ obj-$(CONFIG_CPU_IDLE)              += cpuidle/
> >  obj-y                          += mmc/
> >  obj-y                          += ufs/
> >  obj-$(CONFIG_MEMSTICK)         += memstick/
> > -obj-y                          += leds/
> >  obj-$(CONFIG_INFINIBAND)       += infiniband/
> >  obj-y                          += firmware/
> >  obj-$(CONFIG_CRYPTO)           += crypto/
> 
> To me, this seems more fragile than changing the initcall level.
> 
> The risk I see is that someone comes along and rearranges the Makefile in,
> say, alphabetic order because "why not?" and unwittingly breaks NPEM.

If they do that, then we have worse problems as many of us "know" that
the order here matters a LOT.

> Changing initcall levels at least explicitly sets the order in stone.

For a while, until they change :)

> However, perhaps a code comment is helpful to tell the casual
> code reader why this particular initcall level is needed.
> 
> Something like...
> 
> /* LEDs may already be registered in subsys initcall level */

That would be good to have.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-05-28 13:19 ` [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support Mariusz Tkaczyk
  2024-05-28 13:55   ` Ilpo Järvinen
  2024-05-29  5:21   ` Dan Williams
@ 2024-06-14 21:06   ` stuart hayes
  2024-06-15 10:33     ` Lukas Wunner
  2 siblings, 1 reply; 22+ messages in thread
From: stuart hayes @ 2024-06-14 21:06 UTC (permalink / raw)
  To: Mariusz Tkaczyk, linux-pci
  Cc: Arnd Bergmann, Bjorn Helgaas, Dan Williams, Greg Kroah-Hartman,
	Ilpo Jarvinen, Lukas Wunner, Keith Busch, Marek Behun,
	Pavel Machek, Randy Dunlap, Andy Shevchenko



On 5/28/2024 8:19 AM, Mariusz Tkaczyk wrote:
> Native PCIe Enclosure Management (NPEM, PCIe r6.1 sec 6.28) allows
> managing LED in storage enclosures. NPEM is indication oriented
> and it does not give direct access to LED. Although each of
> the indications *could* represent an individual LED, multiple
> indications could also be represented as a single,
> multi-color LED or a single LED blinking in a specific interval.
> The specification leaves that open.
> 
> Each enabled indication (capability register bit on) is represented as a
> ledclass_dev which can be controlled through sysfs. For every ledclass
> device only 2 brightness states are allowed: LED_ON (1) or LED_OFF (0).
> It is corresponding to NPEM control register (Indication bit on/off).
> 
> Ledclass devices appear in sysfs as child devices (subdirectory) of PCI
> device which has an NPEM Extended Capability and indication is enabled
> in NPEM capability register. For example, these are leds created for
> pcieport "10000:02:05.0" on my setup:
> 
> leds/
> ├── 10000:02:05.0:enclosure:fail
> ├── 10000:02:05.0:enclosure:locate
> ├── 10000:02:05.0:enclosure:ok
> └── 10000:02:05.0:enclosure:rebuild
> 
> They can be also found in "/sys/class/leds" directory. Parent PCIe device
> bdf is used to guarantee uniqueness across leds subsystem.
> 
> To enable/disable fail indication "brightness" file can be edited:
> echo 1 > ./leds/10000:02:05.0:enclosure:fail/brightness
> echo 0 > ./leds/10000:02:05.0:enclosure:fail/brightness
> 
> PCIe r6.1, sec 7.9.19.2 defines the possible indications.
> 
> Multiple indications for same parent PCIe device can conflict and
> hardware may update them when processing new request. To avoid issues,
> driver refresh all indications by reading back control register.
> 
> Driver is projected to be exclusive NPEM extended capability manager.
> It waits up to 1 second after imposing new request, it doesn't verify if
> controller is busy before write, assuming that mutex lock gives protection
> from concurrent updates. Driver is not registered if _DSM LED management
> is available.
> 
> NPEM is a PCIe extended capability so it should be registered in
> pcie_init_capabilities() but it is not possible due to LED dependency.
> Parent pci_device must be added earlier for led_classdev_register()
> to be successful. NPEM does not require configuration on kernel side, it
> is safe to register LED devices later.
> 
> Link: https://members.pcisig.com/wg/PCI-SIG/document/19849 [1]
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> ---
>   drivers/pci/Kconfig           |   9 +
>   drivers/pci/Makefile          |   1 +
>   drivers/pci/npem.c            | 410 ++++++++++++++++++++++++++++++++++
>   drivers/pci/pci.h             |   8 +
>   drivers/pci/probe.c           |   2 +
>   drivers/pci/remove.c          |   2 +
>   include/linux/pci.h           |   3 +
>   include/uapi/linux/pci_regs.h |  34 +++
>   8 files changed, 469 insertions(+)
>   create mode 100644 drivers/pci/npem.c
> 
> diff --git a/drivers/pci/Kconfig b/drivers/pci/Kconfig
> index d35001589d88..e696e69ad516 100644
> --- a/drivers/pci/Kconfig
> +++ b/drivers/pci/Kconfig
> @@ -143,6 +143,15 @@ config PCI_IOV
>   
>   	  If unsure, say N.
>   
> +config PCI_NPEM
> +	bool "Native PCIe Enclosure Management"
> +	depends on LEDS_CLASS=y
> +	help
> +	  Support for Native PCIe Enclosure Management. It allows managing LED
> +	  indications in storage enclosures. Enclosure must support following
> +	  indications: OK, Locate, Fail, Rebuild, other indications are
> +	  optional.
> +
>   config PCI_PRI
>   	bool "PCI PRI support"
>   	select PCI_ATS
> diff --git a/drivers/pci/Makefile b/drivers/pci/Makefile
> index 175302036890..cd5f655d4be9 100644
> --- a/drivers/pci/Makefile
> +++ b/drivers/pci/Makefile
> @@ -34,6 +34,7 @@ obj-$(CONFIG_XEN_PCIDEV_FRONTEND) += xen-pcifront.o
>   obj-$(CONFIG_VGA_ARB)		+= vgaarb.o
>   obj-$(CONFIG_PCI_DOE)		+= doe.o
>   obj-$(CONFIG_PCI_DYNAMIC_OF_NODES) += of_property.o
> +obj-$(CONFIG_PCI_NPEM)		+= npem.o
>   
>   # Endpoint library must be initialized before its users
>   obj-$(CONFIG_PCI_ENDPOINT)	+= endpoint/
> diff --git a/drivers/pci/npem.c b/drivers/pci/npem.c
> new file mode 100644
> index 000000000000..a76a2044dab2
> --- /dev/null
> +++ b/drivers/pci/npem.c
> @@ -0,0 +1,410 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * PCIe Enclosure management driver created for LED interfaces based on
> + * indications. It says *what indications* blink but does not specify *how*
> + * they blink - it is hardware defined.
> + *
> + * The driver name refers to Native PCIe Enclosure Management. It is
> + * first indication oriented standard with specification.
> + *
> + * Native PCIe Enclosure Management (NPEM)
> + *	PCIe Base Specification r6.1 sec 6.28
> + *	PCIe Base Specification r6.1 sec 7.9.19
> + *
> + * Copyright (c) 2023-2024 Intel Corporation
> + *	Mariusz Tkaczyk <mariusz.tkaczyk@linux.intel.com>
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/bitops.h>
> +#include <linux/errno.h>
> +#include <linux/iopoll.h>
> +#include <linux/leds.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pci_regs.h>
> +#include <linux/types.h>
> +#include <linux/uleds.h>
> +
> +#include "pci.h"
> +
> +struct indication {
> +	u32 bit;
> +	const char *name;
> +};
> +
> +static const struct indication npem_indications[] = {
> +	{PCI_NPEM_IND_OK,	"enclosure:ok"},
> +	{PCI_NPEM_IND_LOCATE,	"enclosure:locate"},
> +	{PCI_NPEM_IND_FAIL,	"enclosure:fail"},
> +	{PCI_NPEM_IND_REBUILD,	"enclosure:rebuild"},
> +	{PCI_NPEM_IND_PFA,	"enclosure:pfa"},
> +	{PCI_NPEM_IND_HOTSPARE,	"enclosure:hotspare"},
> +	{PCI_NPEM_IND_ICA,	"enclosure:ica"},
> +	{PCI_NPEM_IND_IFA,	"enclosure:ifa"},
> +	{PCI_NPEM_IND_IDT,	"enclosure:idt"},
> +	{PCI_NPEM_IND_DISABLED,	"enclosure:disabled"},
> +	{PCI_NPEM_IND_SPEC_0,	"enclosure:specific_0"},
> +	{PCI_NPEM_IND_SPEC_1,	"enclosure:specific_1"},
> +	{PCI_NPEM_IND_SPEC_2,	"enclosure:specific_2"},
> +	{PCI_NPEM_IND_SPEC_3,	"enclosure:specific_3"},
> +	{PCI_NPEM_IND_SPEC_4,	"enclosure:specific_4"},
> +	{PCI_NPEM_IND_SPEC_5,	"enclosure:specific_5"},
> +	{PCI_NPEM_IND_SPEC_6,	"enclosure:specific_6"},
> +	{PCI_NPEM_IND_SPEC_7,	"enclosure:specific_7"},
> +	{0,			NULL}
> +};
> +
> +#define for_each_indication(ind, inds) \
> +	for (ind = inds; ind->bit; ind++)
> +
> +/* To avoid confusion, do not keep any special bits in indications */
> +static u32 reg_to_indications(u32 caps, const struct indication *inds)
> +{
> +	const struct indication *ind;
> +	u32 supported_indications = 0;
> +
> +	for_each_indication(ind, inds)
> +		supported_indications |= ind->bit;
> +
> +	return caps & supported_indications;
> +}
> +
> +/**
> + * struct npem_led - LED details
> + * @indication: indication details
> + * @npem: npem device
> + * @name: LED name
> + * @led: LED device
> + */
> +struct npem_led {
> +	const struct indication *indication;
> +	struct npem *npem;
> +	char name[LED_MAX_NAME_SIZE];
> +	struct led_classdev led;
> +};
> +
> +/**
> + * struct npem_ops - backend specific callbacks
> + * @inds: supported indications array
> + * @get_active_indications: get active indications
> + *	npem: npem device
> + *	buf: response buffer
> + * @set_active_indications: set new indications
> + *	npem: npem device
> + *	val: bit mask to set
> + *
> + * Handle communication with hardware. set_active_indications updates
> + * npem->active_indications.
> + */
> +struct npem_ops {
> +	const struct indication *inds;
> +	int (*get_active_indications)(struct npem *npem, u32 *buf);
> +	int (*set_active_indications)(struct npem *npem, u32 val);
> +};
> +
> +/**
> + * struct npem - NPEM device properties
> + * @dev: PCIe device this driver is attached to
> + * @ops: Backend specific callbacks
> + * @npem_lock: to keep concurrent updates serialized
> + * @pos: NPEM capability offset (only relevant for NPEM direct register access,
> + *	 not DSM access method)
> + * @supported_indications: bit mask of supported indications
> + *			   non-indication and reserved bits are cleared
> + * @active_indications: bit mask of active indications
> + *			non-indication and reserved bits are cleared
> + * @led_cnt: Supported LEDs count
> + * @leds: supported LEDs
> + */
> +struct npem {
> +	struct pci_dev *dev;
> +	const struct npem_ops *ops;
> +	struct mutex npem_lock;
> +	u16 pos;
> +	u32 supported_indications;
> +	u32 active_indications;
> +	int led_cnt;
> +	struct npem_led leds[];
> +};
> +
> +static int npem_read_reg(struct npem *npem, u16 reg, u32 *val)
> +{
> +	int ret = pci_read_config_dword(npem->dev, npem->pos + reg, val);
> +
> +	return pcibios_err_to_errno(ret);
> +}
> +
> +static int npem_write_ctrl(struct npem *npem, u32 reg)
> +{
> +	int pos = npem->pos + PCI_NPEM_CTRL;
> +	int ret = pci_write_config_dword(npem->dev, pos, reg);
> +
> +	return pcibios_err_to_errno(ret);
> +}
> +
> +static int npem_get_active_indications(struct npem *npem, u32 *buf)
> +{
> +	int ret;
> +
> +	ret = npem_read_reg(npem, PCI_NPEM_CTRL, buf);
> +	if (ret)
> +		return ret;
> +
> +	/* If PCI_NPEM_CTRL_ENABLE is not set then no indication should blink */
> +	if (!(*buf & PCI_NPEM_CTRL_ENABLE))
> +		*buf = 0;
> +
> +	/* Filter out not supported indications in response */
> +	*buf &= npem->supported_indications;
> +	return 0;
> +}
> +
> +static int npem_set_active_indications(struct npem *npem, u32 val)
> +{
> +	int ret, ret_val;
> +	u32 cc_status;
> +
> +	lockdep_assert_held(&npem->npem_lock);
> +
> +	/* This bit is always required */
> +	val |= PCI_NPEM_CTRL_ENABLE;
> +	ret = npem_write_ctrl(npem, val);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * For the case where a NPEM command has not completed immediately,
> +	 * it is recommended that software not continuously “spin” on polling
> +	 * the status register, but rather poll under interrupt at a reduced
> +	 * rate; for example at 10 ms intervals.
> +	 *
> +	 * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of NPEM
> +	 * Command Completed"
> +	 */
> +	ret = read_poll_timeout(npem_read_reg, ret_val,
> +				ret_val || (cc_status & PCI_NPEM_STATUS_CC),
> +				10 * USEC_PER_MSEC, USEC_PER_SEC, false, npem,
> +				PCI_NPEM_STATUS, &cc_status);
> +	if (ret)
> +		return ret_val;
> +
> +	/*
> +	 * All writes to control register, including writes that do not change
> +	 * the register value, are NPEM commands and should eventually result
> +	 * in a command completion indication in the NPEM Status Register.
> +	 *
> +	 * PCIe Base Specification r6.1 sec 7.9.19.3
> +	 *
> +	 * Register may not be updated, or other conflicting bits may be
> +	 * cleared. Spec is not strict here. Read NPEM Control register after
> +	 * write to keep cache in-sync.
> +	 */
> +	return npem_get_active_indications(npem, &npem->active_indications);
> +}
> +
> +static const struct npem_ops npem_ops = {
> +	.inds = npem_indications,
> +	.get_active_indications = npem_get_active_indications,
> +	.set_active_indications = npem_set_active_indications,
> +};
> +
> +#define DSM_GUID GUID_INIT(0x5d524d9d, 0xfff9, 0x4d4b,  0x8c, 0xb7, 0x74, 0x7e,\
> +			   0xd5, 0x1e, 0x19, 0x4d)
> +#define GET_SUPPORTED_STATES_DSM	BIT(1)
> +#define GET_STATE_DSM			BIT(2)
> +#define SET_STATE_DSM			BIT(3)
> +
> +static const guid_t dsm_guid = DSM_GUID;
> +
> +static bool npem_has_dsm(struct pci_dev *pdev)
> +{
> +	acpi_handle handle;
> +
> +	handle = ACPI_HANDLE(&pdev->dev);
> +	if (!handle)
> +		return false;
> +
> +	return acpi_check_dsm(handle, &dsm_guid, 0x1, GET_SUPPORTED_STATES_DSM |
> +			      GET_STATE_DSM | SET_STATE_DSM);
> +}
> +

I had to change the DSM definitions to be 1/2/3 rather than BIT(1)/BIT(2)/BIT(3), since
the numbers 1/2/3 (not 2/4/8) need to be passed to acpi_evaluate_dsm_typed()... I just
changed it to:

#define GET_SUPPORTED_STATES_DSM	1
#define GET_STATE_DSM			2
#define SET_STATE_DSM			3

...and

	return acpi_check_dsm(handle, &dsm_guid, 0x1,
			      BIT(GET_SUPPORTED_STATES_DSM) |
			      BIT(GET_STATE_DSM) | BIT(SET_STATE_DSM));

After I changed this (and one other thing, below), it worked great.

> +/*
> + * This function does not use ops->get_active_indications().
> + * It returns cached value, npem->npem_lock is held and it is safe.
> + */
> +static enum led_brightness brightness_get(struct led_classdev *led)
> +{
> +	struct npem_led *nled = container_of(led, struct npem_led, led);
> +	struct npem *npem = nled->npem;
> +	int ret, val = LED_OFF;
> +
> +	ret = mutex_lock_interruptible(&npem->npem_lock);
> +	if (ret)
> +		return ret;
> +
> +	if (npem->active_indications & nled->indication->bit)
> +		val = LED_ON;
> +
> +	mutex_unlock(&npem->npem_lock);
> +
> +	return val;
> +}
> +
> +static int brightness_set(struct led_classdev *led,
> +			  enum led_brightness brightness)
> +{
> +	struct npem_led *nled = container_of(led, struct npem_led, led);
> +	struct npem *npem = nled->npem;
> +	u32 indications;
> +	int ret;
> +
> +	ret = mutex_lock_interruptible(&npem->npem_lock);
> +	if (ret)
> +		return ret;
> +
> +	if (brightness == LED_OFF)
> +		indications = npem->active_indications & ~(nled->indication->bit);
> +	else
> +		indications = npem->active_indications | nled->indication->bit;
> +
> +	ret = npem->ops->set_active_indications(npem, indications);
> +
> +	mutex_unlock(&npem->npem_lock);
> +	return ret;
> +}
> +
> +static void npem_free(struct npem *npem)
> +{
> +	struct npem_led *nled;
> +	int cnt;
> +
> +	for (cnt = 0; cnt < npem->led_cnt; cnt++) {
> +		nled = &npem->leds[cnt];
> +
> +		if (nled->name[0])
> +			led_classdev_unregister(&nled->led);
> +	}
> +
> +	mutex_destroy(&npem->npem_lock);
> +	kfree(npem);
> +}
> +
> +static int pci_npem_set_led_classdev(struct npem *npem, struct npem_led *nled)
> +{
> +	struct led_classdev *led = &nled->led;
> +	struct led_init_data init_data = {};
> +	char *name = nled->name;
> +	int ret;
> +
> +	init_data.devicename = pci_name(npem->dev);
> +	init_data.default_label = nled->indication->name;
> +
> +	ret = led_compose_name(&npem->dev->dev, &init_data, name);
> +	if (ret)
> +		return ret;
> +
> +	led->name = name;
> +	led->brightness_set_blocking = brightness_set;
> +	led->brightness_get = brightness_get;
> +	led->max_brightness = LED_ON;
> +	led->default_trigger = "none";
> +	led->flags = 0;
> +
> +	ret = led_classdev_register(&npem->dev->dev, led);
> +	if (ret)
> +		/* Clear the name to indicate that it is not registered. */
> +		name[0] = 0;
> +	return ret;
> +}
> +
> +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops,
> +			 int pos, u32 caps)
> +{
> +	u32 supported = reg_to_indications(caps, ops->inds);
> +	int supported_cnt = hweight32(supported);
> +	const struct indication *indication;
> +	struct npem_led *nled;
> +	struct npem *npem;
> +	int led_idx = 0;
> +	u32 active;
> +	int ret;
> +
> +	npem = kzalloc(struct_size(npem, leds, supported_cnt), GFP_KERNEL);
> +
> +	if (!npem)
> +		return -ENOMEM;
> +
> +	npem->supported_indications = supported;
> +	npem->led_cnt = supported_cnt;
> +	npem->pos = pos;
> +	npem->dev = dev;
> +	npem->ops = ops;
> +
> +	ret = ops->get_active_indications(npem, &active);
> +	if (ret) {
> +		npem_free(npem);
> +		return -EACCES;
> +	}

Failing pci_npem_init() if this ops->get_active_indications() fails will keep this
from working on most (all?) Dell servers, because the _DSM get/set functions use an
IPMI operation region to get/set the active LEDs, and this is getting run before the
IPMI drivers and acpi_ipmi module (which provides ACPI access to IPMI operation
regions) get loaded.  (GET_SUPPORTED_STATES works without IPMI.)

For testing, I just changed this to ignore the error returned from
get_active_indications() if it is using DSM ops.  That still results in a string of
errors ("ACPI Error: Region IPMI (ID=7) has no handler" etc), but once the system is
up and I can log in, everything is loaded, and i can see & change brightness in sysfs.

I'm not sure if ignoring the error is the best fix for that.  Solutions I
can think of are:
(1) ignoring an error result from get_active_indications during init (as mentioned)
(2) providing a mechanism to trigger this driver to rescan a PCI device later from
     user space
(3) don't cache the active LEDs--just get the active states using NPEM/DSM when
     brightness is read, and do a get/modify/set when setting the brightness... then
     get_active_indications() wouldn't need to be called during init.


Thank you for including DSM support, Mariusz!

> +
> +	npem->active_indications = reg_to_indications(active, ops->inds);
> +
> +	/*
> +	 * Do not take npem->npem_lock, get_brightness() is called on
> +	 * registration path.
> +	 */
> +	mutex_init(&npem->npem_lock);
> +
> +	for_each_indication(indication, npem_indications) {
> +		if ((npem->supported_indications & indication->bit) == 0)
> +			/* Do not register unsupported indication */
> +			continue;
> +
> +		nled = &npem->leds[led_idx++];
> +		nled->indication = indication;
> +		nled->npem = npem;
> +
> +		ret = pci_npem_set_led_classdev(npem, nled);
> +		if (ret) {
> +			npem_free(npem);
> +			return ret;
> +		}
> +	}
> +
> +	dev->npem = npem;
> +	return 0;
> +}
> +
> +void pci_npem_remove(struct pci_dev *dev)
> +{
> +	if (dev->npem)
> +		npem_free(dev->npem);
> +}
> +
> +void pci_npem_create(struct pci_dev *dev)
> +{
> +	const struct npem_ops *ops = &npem_ops;
> +	int pos = 0, ret;
> +	u32 cap;
> +
> +	if (!npem_has_dsm(dev)) {
> +		pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_NPEM);
> +		if (pos == 0)
> +			return;
> +
> +		if (pci_read_config_dword(dev, pos + PCI_NPEM_CAP, &cap) != 0 ||
> +		    (cap & PCI_NPEM_CAP_CAPABLE) == 0)
> +			return;
> +	} else {
> +		/*
> +		 * OS should use the DSM for LED control if it is available
> +		 * PCI Firmware Spec r3.3 sec 4.7.
> +		 */
> +		return;
> +	}
> +
> +	ret = pci_npem_init(dev, ops, pos, cap);
> +	if (ret)
> +		pci_err(dev, "Failed to register PCIe Enclosure Management driver, err: %d\n",
> +			ret);
> +}
> diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
> index fd44565c4756..9dea8c7353ab 100644
> --- a/drivers/pci/pci.h
> +++ b/drivers/pci/pci.h
> @@ -333,6 +333,14 @@ static inline void pci_doe_destroy(struct pci_dev *pdev) { }
>   static inline void pci_doe_disconnected(struct pci_dev *pdev) { }
>   #endif
>   
> +#ifdef CONFIG_PCI_NPEM
> +void pci_npem_create(struct pci_dev *dev);
> +void pci_npem_remove(struct pci_dev *dev);
> +#else
> +static inline void pci_npem_create(struct pci_dev *dev) { }
> +static inline void pci_npem_remove(struct pci_dev *dev) { }
> +#endif
> +
>   /**
>    * pci_dev_set_io_state - Set the new error state if possible.
>    *
> diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
> index 8e696e547565..b8ea6353e27a 100644
> --- a/drivers/pci/probe.c
> +++ b/drivers/pci/probe.c
> @@ -2582,6 +2582,8 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
>   	dev->match_driver = false;
>   	ret = device_add(&dev->dev);
>   	WARN_ON(ret < 0);
> +
> +	pci_npem_create(dev);
>   }
>   
>   struct pci_dev *pci_scan_single_device(struct pci_bus *bus, int devfn)
> diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
> index d749ea8250d6..1436f9cf1fea 100644
> --- a/drivers/pci/remove.c
> +++ b/drivers/pci/remove.c
> @@ -33,6 +33,8 @@ static void pci_destroy_dev(struct pci_dev *dev)
>   	if (!dev->dev.kobj.parent)
>   		return;
>   
> +	pci_npem_remove(dev);
> +
>   	device_del(&dev->dev);
>   
>   	down_write(&pci_bus_sem);
> diff --git a/include/linux/pci.h b/include/linux/pci.h
> index fb004fd4e889..c327c2dd4527 100644
> --- a/include/linux/pci.h
> +++ b/include/linux/pci.h
> @@ -517,6 +517,9 @@ struct pci_dev {
>   #endif
>   #ifdef CONFIG_PCI_DOE
>   	struct xarray	doe_mbs;	/* Data Object Exchange mailboxes */
> +#endif
> +#ifdef CONFIG_PCI_NPEM
> +	struct npem	*npem;		/* Native PCIe Enclosure Management */
>   #endif
>   	u16		acs_cap;	/* ACS Capability offset */
>   	phys_addr_t	rom;		/* Physical address if not from BAR */
> diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
> index 94c00996e633..97e8b7ed9998 100644
> --- a/include/uapi/linux/pci_regs.h
> +++ b/include/uapi/linux/pci_regs.h
> @@ -740,6 +740,7 @@
>   #define PCI_EXT_CAP_ID_DVSEC	0x23	/* Designated Vendor-Specific */
>   #define PCI_EXT_CAP_ID_DLF	0x25	/* Data Link Feature */
>   #define PCI_EXT_CAP_ID_PL_16GT	0x26	/* Physical Layer 16.0 GT/s */
> +#define PCI_EXT_CAP_ID_NPEM	0x29	/* Native PCIe Enclosure Management */
>   #define PCI_EXT_CAP_ID_PL_32GT  0x2A    /* Physical Layer 32.0 GT/s */
>   #define PCI_EXT_CAP_ID_DOE	0x2E	/* Data Object Exchange */
>   #define PCI_EXT_CAP_ID_MAX	PCI_EXT_CAP_ID_DOE
> @@ -1121,6 +1122,39 @@
>   #define  PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_MASK		0x000000F0
>   #define  PCI_PL_16GT_LE_CTRL_USP_TX_PRESET_SHIFT	4
>   
> +/* Native PCIe Enclosure Management */
> +#define PCI_NPEM_CAP	0x04 /* NPEM capability register */
> +#define	 PCI_NPEM_CAP_CAPABLE		0x00000001 /* NPEM Capable */
> +
> +#define PCI_NPEM_CTRL	0x08 /* NPEM control register */
> +#define	 PCI_NPEM_CTRL_ENABLE		0x00000001 /* NPEM Enable */
> +
> +#define PCI_NPEM_STATUS	0x0c /* NPEM status register */
> +#define	 PCI_NPEM_STATUS_CC		0x00000001 /* NPEM Command completed */
> +/*
> + * Native PCIe Enclosure Management indication bits and Reset command bit
> + * are corresponding for capability and control registers.
> + */
> +#define  PCI_NPEM_CMD_RESET		0x00000002 /* NPEM Reset Command */
> +#define  PCI_NPEM_IND_OK		0x00000004 /* NPEM indication OK */
> +#define  PCI_NPEM_IND_LOCATE		0x00000008 /* NPEM indication Locate */
> +#define  PCI_NPEM_IND_FAIL		0x00000010 /* NPEM indication Fail */
> +#define  PCI_NPEM_IND_REBUILD		0x00000020 /* NPEM indication Rebuild */
> +#define  PCI_NPEM_IND_PFA		0x00000040 /* NPEM indication Predicted Failure Analysis */
> +#define  PCI_NPEM_IND_HOTSPARE		0x00000080 /* NPEM indication Hot Spare */
> +#define  PCI_NPEM_IND_ICA		0x00000100 /* NPEM indication In Critical Array */
> +#define  PCI_NPEM_IND_IFA		0x00000200 /* NPEM indication In Failed Array */
> +#define  PCI_NPEM_IND_IDT		0x00000400 /* NPEM indication Invalid Device Type */
> +#define  PCI_NPEM_IND_DISABLED		0x00000800 /* NPEM indication Disabled */
> +#define  PCI_NPEM_IND_SPEC_0		0x00800000
> +#define  PCI_NPEM_IND_SPEC_1		0x01000000
> +#define  PCI_NPEM_IND_SPEC_2		0x02000000
> +#define  PCI_NPEM_IND_SPEC_3		0x04000000
> +#define  PCI_NPEM_IND_SPEC_4		0x08000000
> +#define  PCI_NPEM_IND_SPEC_5		0x10000000
> +#define  PCI_NPEM_IND_SPEC_6		0x20000000
> +#define  PCI_NPEM_IND_SPEC_7		0x40000000
> +
>   /* Data Object Exchange */
>   #define PCI_DOE_CAP		0x04    /* DOE Capabilities Register */
>   #define  PCI_DOE_CAP_INT_SUP			0x00000001  /* Interrupt Support */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-06-14 21:06   ` stuart hayes
@ 2024-06-15 10:33     ` Lukas Wunner
  2024-06-18  8:56       ` Mariusz Tkaczyk
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2024-06-15 10:33 UTC (permalink / raw)
  To: stuart hayes
  Cc: Mariusz Tkaczyk, linux-pci, Arnd Bergmann, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko

On Fri, Jun 14, 2024 at 04:06:14PM -0500, stuart hayes wrote:
> On 5/28/2024 8:19 AM, Mariusz Tkaczyk wrote:
> > +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops,
> > +			 int pos, u32 caps)
> > +{
[...]
> > +	ret = ops->get_active_indications(npem, &active);
> > +	if (ret) {
> > +		npem_free(npem);
> > +		return -EACCES;
> > +	}
> 
> Failing pci_npem_init() if this ops->get_active_indications() fails
> will keep this from working on most (all?) Dell servers, because the
> _DSM get/set functions use an IPMI operation region to get/set the
> active LEDs, and this is getting run before the IPMI drivers and
> acpi_ipmi module (which provides ACPI access to IPMI operation
> regions) get loaded.  (GET_SUPPORTED_STATES works without IPMI.)

CONFIG_ACPI_IPMI is tristate.  Even if it's built-in, the
module_initcall() becomes a device_initcall().

PCI enumeration happens from a subsys_initcall(), way earlier
than device_initcall().

If you set CONFIG_ACPI_IPMI=y and change the module_initcall() in
drivers/acpi/acpi_ipmi.c to arch_initcall(), does the issue go away?


> (2) providing a mechanism to trigger this driver to rescan a PCI
>     device later from user space

If this was a regular device driver and -EPROBE_DEFER was returned if
IPMI drivers aren't loaded yet, then this would be easy to solve.
But neither is the case here.

Of course it's possible to exercise the "remove" and "rescan" attributes
in sysfs to re-enumerate the device but that's not a great solution.


> (3) don't cache the active LEDs--just get the active states using
>     NPEM/DSM when brightness is read, and do a get/modify/set when
>     setting the brightness... then get_active_indications() wouldn't
>     need to be called during init.

Not good.  The LEDs are published in sysfs from a subsys_initcall().
Brightness changes through sysfs could in theory immediately happen
once they're published.  If acpi_ipmi is a module and gets loaded way
later, we'd still have to cope with Set State or Get State DSMs going
nowhere.

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-06-15 10:33     ` Lukas Wunner
@ 2024-06-18  8:56       ` Mariusz Tkaczyk
  2024-06-18 17:13         ` Lukas Wunner
  2024-06-18 18:50         ` stuart hayes
  0 siblings, 2 replies; 22+ messages in thread
From: Mariusz Tkaczyk @ 2024-06-18  8:56 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: stuart hayes, linux-pci, Arnd Bergmann, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko

On Sat, 15 Jun 2024 12:33:45 +0200
Lukas Wunner <lukas@wunner.de> wrote:

> On Fri, Jun 14, 2024 at 04:06:14PM -0500, stuart hayes wrote:
> > On 5/28/2024 8:19 AM, Mariusz Tkaczyk wrote:  
> > > +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops,
> > > +			 int pos, u32 caps)
> > > +{  
> [...]
> > > +	ret = ops->get_active_indications(npem, &active);
> > > +	if (ret) {
> > > +		npem_free(npem);
> > > +		return -EACCES;
> > > +	}  
> > 
> > Failing pci_npem_init() if this ops->get_active_indications() fails
> > will keep this from working on most (all?) Dell servers, because the
> > _DSM get/set functions use an IPMI operation region to get/set the
> > active LEDs, and this is getting run before the IPMI drivers and
> > acpi_ipmi module (which provides ACPI access to IPMI operation
> > regions) get loaded.  (GET_SUPPORTED_STATES works without IPMI.)  
> 
> CONFIG_ACPI_IPMI is tristate.  Even if it's built-in, the
> module_initcall() becomes a device_initcall().
> 
> PCI enumeration happens from a subsys_initcall(), way earlier
> than device_initcall().
> 
> If you set CONFIG_ACPI_IPMI=y and change the module_initcall() in
> drivers/acpi/acpi_ipmi.c to arch_initcall(), does the issue go away?

That seems to be the best option. Please test Lukas proposal and let me know.
Shouldn't I make a dependency to ACPI_IPMI in Kconfig (with optional comment
about initcall)?

+config PCI_NPEM
+	bool "Native PCIe Enclosure Management"
+	depends on LEDS_CLASS=y
+	depends on ACPI_IPMI=y

> 
> 
> > (2) providing a mechanism to trigger this driver to rescan a PCI
> >     device later from user space  
> 
> If this was a regular device driver and -EPROBE_DEFER was returned if
> IPMI drivers aren't loaded yet, then this would be easy to solve.
> But neither is the case here.
> 
> Of course it's possible to exercise the "remove" and "rescan" attributes
> in sysfs to re-enumerate the device but that's not a great solution.

We cannot expect from users to know and do that. If we cannot configure driver,
we should not register it. We have to guarantee that IMPI commands are
available at the point we are using them.

There is not better place to add _DSM device than its enumeration and I have a
feeling than sooner or later someone else will reach this problem so it would
be better for community to solve it now.

> 
> 
> > (3) don't cache the active LEDs--just get the active states using
> >     NPEM/DSM when brightness is read, and do a get/modify/set when
> >     setting the brightness... then get_active_indications() wouldn't
> >     need to be called during init.  
> 
> Not good.  The LEDs are published in sysfs from a subsys_initcall().
> Brightness changes through sysfs could in theory immediately happen
> once they're published.  If acpi_ipmi is a module and gets loaded way
> later, we'd still have to cope with Set State or Get State DSMs going
> nowhere.
> 

Agree. I can do it and it should be safe but it is not addressing the issue.
We would limit time race window but we will not close it.

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-06-18  8:56       ` Mariusz Tkaczyk
@ 2024-06-18 17:13         ` Lukas Wunner
  2024-06-18 18:50         ` stuart hayes
  1 sibling, 0 replies; 22+ messages in thread
From: Lukas Wunner @ 2024-06-18 17:13 UTC (permalink / raw)
  To: Mariusz Tkaczyk
  Cc: stuart hayes, linux-pci, Arnd Bergmann, Bjorn Helgaas,
	Dan Williams, Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko

On Tue, Jun 18, 2024 at 10:56:53AM +0200, Mariusz Tkaczyk wrote:
> On Sat, 15 Jun 2024 12:33:45 +0200 Lukas Wunner <lukas@wunner.de> wrote:
>> > On Fri, Jun 14, 2024 at 04:06:14PM -0500, stuart hayes wrote:
> > > Failing pci_npem_init() if this ops->get_active_indications() fails
> > > will keep this from working on most (all?) Dell servers, because the
> > > _DSM get/set functions use an IPMI operation region to get/set the
> > > active LEDs, and this is getting run before the IPMI drivers and
> > > acpi_ipmi module (which provides ACPI access to IPMI operation
> > > regions) get loaded.  (GET_SUPPORTED_STATES works without IPMI.)  
> > 
> > CONFIG_ACPI_IPMI is tristate.  Even if it's built-in, the
> > module_initcall() becomes a device_initcall().
> > 
> > PCI enumeration happens from a subsys_initcall(), way earlier
> > than device_initcall().
> > 
> > If you set CONFIG_ACPI_IPMI=y and change the module_initcall() in
> > drivers/acpi/acpi_ipmi.c to arch_initcall(), does the issue go away?
> 
> That seems to be the best option. Please test Lukas proposal and let me know.
> Shouldn't I make a dependency to ACPI_IPMI in Kconfig (with optional comment
> about initcall)?
> 
> +config PCI_NPEM
> +	bool "Native PCIe Enclosure Management"
> +	depends on LEDS_CLASS=y
> +	depends on ACPI_IPMI=y

This would effectively disallow NPEM on non-ACPI systems.

I think what you want instead is to allow either ACPI_IPMI=y or
ACPI_IPMI=n, but not ACPI_IPMI=m, so:

	depends on ACPI_IPMI!=m

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-06-18  8:56       ` Mariusz Tkaczyk
  2024-06-18 17:13         ` Lukas Wunner
@ 2024-06-18 18:50         ` stuart hayes
  2024-06-18 19:32           ` Dan Williams
  1 sibling, 1 reply; 22+ messages in thread
From: stuart hayes @ 2024-06-18 18:50 UTC (permalink / raw)
  To: Mariusz Tkaczyk, Lukas Wunner
  Cc: linux-pci, Arnd Bergmann, Bjorn Helgaas, Dan Williams,
	Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch, Marek Behun,
	Pavel Machek, Randy Dunlap, Andy Shevchenko



On 6/18/2024 3:56 AM, Mariusz Tkaczyk wrote:
> On Sat, 15 Jun 2024 12:33:45 +0200
> Lukas Wunner <lukas@wunner.de> wrote:
> 
>> On Fri, Jun 14, 2024 at 04:06:14PM -0500, stuart hayes wrote:
>>> On 5/28/2024 8:19 AM, Mariusz Tkaczyk wrote:
>>>> +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops,
>>>> +			 int pos, u32 caps)
>>>> +{
>> [...]
>>>> +	ret = ops->get_active_indications(npem, &active);
>>>> +	if (ret) {
>>>> +		npem_free(npem);
>>>> +		return -EACCES;
>>>> +	}
>>>
>>> Failing pci_npem_init() if this ops->get_active_indications() fails
>>> will keep this from working on most (all?) Dell servers, because the
>>> _DSM get/set functions use an IPMI operation region to get/set the
>>> active LEDs, and this is getting run before the IPMI drivers and
>>> acpi_ipmi module (which provides ACPI access to IPMI operation
>>> regions) get loaded.  (GET_SUPPORTED_STATES works without IPMI.)
>>
>> CONFIG_ACPI_IPMI is tristate.  Even if it's built-in, the
>> module_initcall() becomes a device_initcall().
>>
>> PCI enumeration happens from a subsys_initcall(), way earlier
>> than device_initcall().
>>
>> If you set CONFIG_ACPI_IPMI=y and change the module_initcall() in
>> drivers/acpi/acpi_ipmi.c to arch_initcall(), does the issue go away?
> 
> That seems to be the best option. Please test Lukas proposal and let me know.
> Shouldn't I make a dependency to ACPI_IPMI in Kconfig (with optional comment
> about initcall)?
> 
> +config PCI_NPEM
> +	bool "Native PCIe Enclosure Management"
> +	depends on LEDS_CLASS=y
> +	depends on ACPI_IPMI=y
> 

I tried it just to be sure, but changing the module_initcall() to an
arch_initcall() in acpi_ipmi.c (and compiling it into the kernel) doesn't
help... acpi_ipmi is loaded before npem, but the underlying IPMI drivers
that acpi_ipmi uses to provide the IPMI operation region in ACPI aren't
loaded until later... acpi_ipmi needs the IPMI device.

I'll try to think of another solution.  I don't see how to get the IPMI
drivers to load before PCI devices are enumerated, so it seems to me that
the only way to get it to work from the moment the LEDs show up in sysfs
is to somehow delay the npem driver initialization of the LEDs (at least
for _DSM) or use something to trigger a rescan later.

I notice that acpi_ipmi provides a function to wait for it to have an
IPMI device registered (acpi_wait_for_acpi_ipmi()), which is used by
acpi_power_meter.c.  It would be kind of awkward to use that... it just
just waits for a completion (with a 2 second timeout)--it isn't a notifier
or callback.  On my system, the npem driver failed to run a _DSM at
0.72 seconds, and ipmi_si driver didn't initialize the IPMI controller
until 9.54 seconds.

>>
>>> (2) providing a mechanism to trigger this driver to rescan a PCI
>>>      device later from user space
>>
>> If this was a regular device driver and -EPROBE_DEFER was returned if
>> IPMI drivers aren't loaded yet, then this would be easy to solve.
>> But neither is the case here.
>>
>> Of course it's possible to exercise the "remove" and "rescan" attributes
>> in sysfs to re-enumerate the device but that's not a great solution.
> 
> We cannot expect from users to know and do that. If we cannot configure driver,
> we should not register it. We have to guarantee that IMPI commands are
> available at the point we are using them.
> 
> There is not better place to add _DSM device than its enumeration and I have a
> feeling than sooner or later someone else will reach this problem so it would
> be better for community to solve it now.
> 
>>
>>
>>> (3) don't cache the active LEDs--just get the active states using
>>>      NPEM/DSM when brightness is read, and do a get/modify/set when
>>>      setting the brightness... then get_active_indications() wouldn't
>>>      need to be called during init.
>>
>> Not good.  The LEDs are published in sysfs from a subsys_initcall().
>> Brightness changes through sysfs could in theory immediately happen
>> once they're published.  If acpi_ipmi is a module and gets loaded way
>> later, we'd still have to cope with Set State or Get State DSMs going
>> nowhere.
>>
> 
> Agree. I can do it and it should be safe but it is not addressing the issue.
> We would limit time race window but we will not close it.
> 
> Thanks,
> Mariusz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-06-18 18:50         ` stuart hayes
@ 2024-06-18 19:32           ` Dan Williams
  2024-06-19  9:08             ` Lukas Wunner
  0 siblings, 1 reply; 22+ messages in thread
From: Dan Williams @ 2024-06-18 19:32 UTC (permalink / raw)
  To: stuart hayes, Mariusz Tkaczyk, Lukas Wunner
  Cc: linux-pci, Arnd Bergmann, Bjorn Helgaas, Dan Williams,
	Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch, Marek Behun,
	Pavel Machek, Randy Dunlap, Andy Shevchenko

stuart hayes wrote:
> 
> 
> On 6/18/2024 3:56 AM, Mariusz Tkaczyk wrote:
> > On Sat, 15 Jun 2024 12:33:45 +0200
> > Lukas Wunner <lukas@wunner.de> wrote:
> > 
> >> On Fri, Jun 14, 2024 at 04:06:14PM -0500, stuart hayes wrote:
> >>> On 5/28/2024 8:19 AM, Mariusz Tkaczyk wrote:
> >>>> +static int pci_npem_init(struct pci_dev *dev, const struct npem_ops *ops,
> >>>> +			 int pos, u32 caps)
> >>>> +{
> >> [...]
> >>>> +	ret = ops->get_active_indications(npem, &active);
> >>>> +	if (ret) {
> >>>> +		npem_free(npem);
> >>>> +		return -EACCES;
> >>>> +	}
> >>>
> >>> Failing pci_npem_init() if this ops->get_active_indications() fails
> >>> will keep this from working on most (all?) Dell servers, because the
> >>> _DSM get/set functions use an IPMI operation region to get/set the
> >>> active LEDs, and this is getting run before the IPMI drivers and
> >>> acpi_ipmi module (which provides ACPI access to IPMI operation
> >>> regions) get loaded.  (GET_SUPPORTED_STATES works without IPMI.)
> >>
> >> CONFIG_ACPI_IPMI is tristate.  Even if it's built-in, the
> >> module_initcall() becomes a device_initcall().
> >>
> >> PCI enumeration happens from a subsys_initcall(), way earlier
> >> than device_initcall().
> >>
> >> If you set CONFIG_ACPI_IPMI=y and change the module_initcall() in
> >> drivers/acpi/acpi_ipmi.c to arch_initcall(), does the issue go away?
> > 
> > That seems to be the best option. Please test Lukas proposal and let me know.
> > Shouldn't I make a dependency to ACPI_IPMI in Kconfig (with optional comment
> > about initcall)?
> > 
> > +config PCI_NPEM
> > +	bool "Native PCIe Enclosure Management"
> > +	depends on LEDS_CLASS=y
> > +	depends on ACPI_IPMI=y
> > 
> 
> I tried it just to be sure, but changing the module_initcall() to an
> arch_initcall() in acpi_ipmi.c (and compiling it into the kernel) doesn't
> help... acpi_ipmi is loaded before npem, but the underlying IPMI drivers
> that acpi_ipmi uses to provide the IPMI operation region in ACPI aren't
> loaded until later... acpi_ipmi needs the IPMI device.
> 
> I'll try to think of another solution.  I don't see how to get the IPMI
> drivers to load before PCI devices are enumerated, so it seems to me that
> the only way to get it to work from the moment the LEDs show up in sysfs
> is to somehow delay the npem driver initialization of the LEDs (at least
> for _DSM) or use something to trigger a rescan later.
> 
> I notice that acpi_ipmi provides a function to wait for it to have an
> IPMI device registered (acpi_wait_for_acpi_ipmi()), which is used by
> acpi_power_meter.c.  It would be kind of awkward to use that... it just
> just waits for a completion (with a 2 second timeout)--it isn't a notifier
> or callback.  On my system, the npem driver failed to run a _DSM at
> 0.72 seconds, and ipmi_si driver didn't initialize the IPMI controller
> until 9.54 seconds.

It strikes me that playing these initcall games is a losing battle and
that this case would be best served by late loading of NPEM
functionality.

Something similar is happening with PCI device security where the
enabling depends on a third-party driver for a platform
"security-manager" (TSM) to arrive.

The approach there is to make the functionality independent of
device-discovery vs TSM driver load order. So, if the TSM driver is
loaded early then pci_init_capabilities() can immediately enable the
functionality. If the TSM driver is loaded *after* some devices have already
gone through pci_init_capabilities(), then that event is responsible for
doing for_each_pci_dev() to catch up on devices that missed their
initial chance to turn on device security details.

So, for NPEM, the thought would be to implement the same rendezvous
flow, i.e. s/TSM/NPEM/.

I am an overdue for a refresh of the TSM patches, but you can see the
last posting here:

http://lore.kernel.org/171291190324.3532867.13480405752065082171.stgit@dwillia2-xfh.jf.intel.com

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-06-18 19:32           ` Dan Williams
@ 2024-06-19  9:08             ` Lukas Wunner
  2024-06-19 12:07               ` Mariusz Tkaczyk
  0 siblings, 1 reply; 22+ messages in thread
From: Lukas Wunner @ 2024-06-19  9:08 UTC (permalink / raw)
  To: Dan Williams
  Cc: stuart hayes, Mariusz Tkaczyk, linux-pci, Arnd Bergmann,
	Bjorn Helgaas, Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko

On Tue, Jun 18, 2024 at 12:32:33PM -0700, Dan Williams wrote:
> It strikes me that playing these initcall games is a losing battle and
> that this case would be best served by late loading of NPEM
> functionality.
> 
> Something similar is happening with PCI device security where the
> enabling depends on a third-party driver for a platform
> "security-manager" (TSM) to arrive.
> 
> The approach there is to make the functionality independent of
> device-discovery vs TSM driver load order. So, if the TSM driver is
> loaded early then pci_init_capabilities() can immediately enable the
> functionality. If the TSM driver is loaded *after* some devices have already
> gone through pci_init_capabilities(), then that event is responsible for
> doing for_each_pci_dev() to catch up on devices that missed their
> initial chance to turn on device security details.
> 
> So, for NPEM, the thought would be to implement the same rendezvous
> flow, i.e. s/TSM/NPEM/.

A different viewpoint is that these issues are caused by the
"division of labor" between OS kernel and platform firmware.

In the NPEM case, Dell servers require the OS to call firmware
to change LEDs.  But before OS can do that, OS has to initialize
a certain other interface with firmware.

In the TSM case, Intel TDX Connect or AMD SEV-TIO require OS to
ask firmware to perform certain authentication steps with devices,
wherefore OS has to provide another interface to facilitate
communication with the device.

It's a complexity nightmare exacerbated by vendor-specific quirks.

Which is why I'm arguing that firmware functionality (e.g. TDX module)
should be constrained to the absolute minimum and the OS should be
in control of as much as possible.  That's the approach Apple has
been following as it's the only way to achieve their close interplay
between hardware and software without making things too complex.

It seems what's keeping this series from working on Dell servers is
primarily that the driver wants to read out LED status on probe.
So I've recommended to Mariusz off-list to do that lazily if possible,
i.e. on first read of a LED's status.

Then if users do try to read or write LED status on Dell servers without
loading IPMI modules first, they get to keep the pieces, sorry. :(


> I am an overdue for a refresh of the TSM patches

No hurry, there's a refresh of the OS-owned PCI device authentication
coming up before end of this month.

I'm taking my "TDX Connect heretic" hat off now. :)

Thanks,

Lukas

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-06-19  9:08             ` Lukas Wunner
@ 2024-06-19 12:07               ` Mariusz Tkaczyk
  0 siblings, 0 replies; 22+ messages in thread
From: Mariusz Tkaczyk @ 2024-06-19 12:07 UTC (permalink / raw)
  To: Lukas Wunner
  Cc: Dan Williams, stuart hayes, linux-pci, Arnd Bergmann,
	Bjorn Helgaas, Greg Kroah-Hartman, Ilpo Jarvinen, Keith Busch,
	Marek Behun, Pavel Machek, Randy Dunlap, Andy Shevchenko

On Wed, 19 Jun 2024 11:08:26 +0200
Lukas Wunner <lukas@wunner.de> wrote:

> On Tue, Jun 18, 2024 at 12:32:33PM -0700, Dan Williams wrote:
> > It strikes me that playing these initcall games is a losing battle and
> > that this case would be best served by late loading of NPEM
> > functionality.
> > 
> > Something similar is happening with PCI device security where the
> > enabling depends on a third-party driver for a platform
> > "security-manager" (TSM) to arrive.
> > 
> > The approach there is to make the functionality independent of
> > device-discovery vs TSM driver load order. So, if the TSM driver is
> > loaded early then pci_init_capabilities() can immediately enable the
> > functionality. If the TSM driver is loaded *after* some devices have already
> > gone through pci_init_capabilities(), then that event is responsible for
> > doing for_each_pci_dev() to catch up on devices that missed their
> > initial chance to turn on device security details.
> > 
> > So, for NPEM, the thought would be to implement the same rendezvous
> > flow, i.e. s/TSM/NPEM/.  
> 
> A different viewpoint is that these issues are caused by the
> "division of labor" between OS kernel and platform firmware.
> 
> In the NPEM case, Dell servers require the OS to call firmware
> to change LEDs.  But before OS can do that, OS has to initialize
> a certain other interface with firmware.
> 
> In the TSM case, Intel TDX Connect or AMD SEV-TIO require OS to
> ask firmware to perform certain authentication steps with devices,
> wherefore OS has to provide another interface to facilitate
> communication with the device.
> 
> It's a complexity nightmare exacerbated by vendor-specific quirks.
> 
> Which is why I'm arguing that firmware functionality (e.g. TDX module)
> should be constrained to the absolute minimum and the OS should be
> in control of as much as possible.  That's the approach Apple has
> been following as it's the only way to achieve their close interplay
> between hardware and software without making things too complex.
> 
> It seems what's keeping this series from working on Dell servers is
> primarily that the driver wants to read out LED status on probe.
> So I've recommended to Mariusz off-list to do that lazily if possible,
> i.e. on first read of a LED's status.
> 
> Then if users do try to read or write LED status on Dell servers without
> loading IPMI modules first, they get to keep the pieces, sorry. :(

> 
Initially, I thought that Dan suggestion is the best option but after taking
into account use cases of the driver and times provided by Stuart - lazy
loading wins.

As a led application maintainer, I can accept fact that I cannot impose led for
a while and errors will be reported, that is fine. I can left a hint why it is
happening to user.

I would be a nightmare to get new LED controller after some time if LED
interface appearance is delayed. It is much worse from user perspective because
no device means that I have no information in userland. I cannot determine if
something is going to be up soon so I will report disks as not supported -
unnecessary maintenance hell. I may receive a lot of issues.

Stuart, please give me some time to apply suggestions and introduce lazy
approach. I'm working on it!

Thanks,
Mariusz

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support
  2024-05-29  5:21   ` Dan Williams
  2024-05-29  9:38     ` Lukas Wunner
@ 2024-06-19 14:28     ` Mariusz Tkaczyk
  1 sibling, 0 replies; 22+ messages in thread
From: Mariusz Tkaczyk @ 2024-06-19 14:28 UTC (permalink / raw)
  To: Dan Williams
  Cc: linux-pci, Arnd Bergmann, Bjorn Helgaas, Greg Kroah-Hartman,
	Ilpo Jarvinen, Lukas Wunner, Keith Busch, Marek Behun,
	Pavel Machek, Randy Dunlap, Andy Shevchenko, Stuart Hayes

On Tue, 28 May 2024 22:21:10 -0700
Dan Williams <dan.j.williams@intel.com> wrote:

> > +
> > +	/*
> > +	 * For the case where a NPEM command has not completed immediately,
> > +	 * it is recommended that software not continuously “spin” on
> > polling
> > +	 * the status register, but rather poll under interrupt at a
> > reduced
> > +	 * rate; for example at 10 ms intervals.  
> 
> I think the use of read_poll_timeout() obviates the need for this
> comment. I.e. read_poll_timeout() is self documenting.
> 

This is not describing how read_poll_timeout works but why I need it at all.
I don't think that there is and ever will be a controller that needs a time to
complete the NPEM request. From my experience this is immediate action. This is
explanation why I have to keep it this way even if it has almost no sense :)

Mariusz

> > +	 *
> > +	 * PCIe r6.1 sec 6.28 "Implementation Note: Software Polling of
> > NPEM
> > +	 * Command Completed"
> > +	 */
> > +	ret = read_poll_timeout(npem_read_reg, ret_val,
> > +				ret_val || (cc_status &
> > PCI_NPEM_STATUS_CC),
> > +				10 * USEC_PER_MSEC, USEC_PER_SEC, false,
> > npem,
> > +				PCI_NPEM_STATUS, &cc_status);
> > +	if (ret)
> > +		return ret_val;

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2024-06-19 14:28 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-05-28 13:19 [PATCH v2 0/3] PCIe Enclosure LED Management Mariusz Tkaczyk
2024-05-28 13:19 ` [PATCH v2 1/3] leds: Init leds class earlier Mariusz Tkaczyk
2024-05-29  4:22   ` Dan Williams
2024-06-14 14:08     ` Lukas Wunner
2024-06-14 14:13       ` Greg Kroah-Hartman
2024-05-28 13:19 ` [PATCH v2 2/3] PCI/NPEM: Add Native PCIe Enclosure Management support Mariusz Tkaczyk
2024-05-28 13:55   ` Ilpo Järvinen
2024-06-14 13:40     ` Mariusz Tkaczyk
2024-05-29  5:21   ` Dan Williams
2024-05-29  9:38     ` Lukas Wunner
2024-06-12 11:40       ` Mariusz Tkaczyk
2024-06-13 16:11         ` stuart hayes
2024-06-19 14:28     ` Mariusz Tkaczyk
2024-06-14 21:06   ` stuart hayes
2024-06-15 10:33     ` Lukas Wunner
2024-06-18  8:56       ` Mariusz Tkaczyk
2024-06-18 17:13         ` Lukas Wunner
2024-06-18 18:50         ` stuart hayes
2024-06-18 19:32           ` Dan Williams
2024-06-19  9:08             ` Lukas Wunner
2024-06-19 12:07               ` Mariusz Tkaczyk
2024-05-28 13:19 ` [PATCH v2 3/3] PCI/NPEM: Add _DSM PCIe SSD status LED management Mariusz Tkaczyk

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox