linux-pci.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Lukas Wunner <lukas@wunner.de>
To: Tony Hutter <hutter2@llnl.gov>
Cc: Bjorn Helgaas <helgaas@kernel.org>,
	corey@minyard.net, alok.a.tiwari@oracle.com,
	mariusz.tkaczyk@linux.intel.com, minyard@acm.org,
	linux-pci@vger.kernel.org,
	openipmi-developer@lists.sourceforge.net,
	Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v6] Introduce Cray ClusterStor E1000 NVMe slot LED driver
Date: Thu, 30 Oct 2025 10:13:57 +0100	[thread overview]
Message-ID: <aQMsVUCBDF7ZUSK-@wunner.de> (raw)
In-Reply-To: <d485bd74-e49d-4c89-b986-1b45c93e7975@llnl.gov>

On Wed, Oct 08, 2025 at 04:48:22PM -0700, Tony Hutter wrote:
> +++ b/Documentation/ABI/testing/sysfs-bus-pci
> @@ -231,6 +231,27 @@ Description:
>  		    - scXX contains the device subclass;
>  		    - iXX contains the device class programming interface.
>  
> +What:		/sys/bus/pci/slots/.../attention
> +Date:		February 2025

Please update the "Date" timestamp to the month of submission to the list.

> +Contact:	linux-pci@vger.kernel.org
> +Description:
> +		The attention attribute is used to read or write the attention
> +		status for an enclosure slot.  This is often used to set the
> +		slot LED value on a NVMe storage enclosure.
> +
> +		Common values:
> +		0 = OFF
> +		1 = ON
> +		2 = blink (ampere, ibmphp, pciehp, rpaphp, shpchp)
> +
> +		Using the pciehp_craye1k extensions:
> +		0 = fault LED OFF, locate LED OFF
> +		1 = fault LED ON,  locate LED OFF
> +		2 = fault LED OFF, locate LED ON
> +		3 = fault LED ON,  locate LED ON

An end user might not be able to understand all these driver names,
so I'd omit "(ampere, ibmphp, pciehp, rpaphp, shpchp)",
and replace "pciehp_craye1k" with "Cray ClusterStor E1000".

> +++ b/drivers/pci/hotplug/pciehp_core.c
> @@ -73,6 +73,13 @@ static int init_slot(struct controller *ctrl)
>  		ops->get_attention_status = pciehp_get_raw_indicator_status;
>  		ops->set_attention_status = pciehp_set_raw_indicator_status;
>  	}
> +#ifdef CONFIG_HOTPLUG_PCI_PCIE_CRAY_E1000
> +	if (is_craye1k_slot(ctrl)) {
> +		/* slots 1-24 on Cray E1000s are controlled differently */
> +		ops->get_attention_status = craye1k_get_attention_status;
> +		ops->set_attention_status = craye1k_set_attention_status;
> +	}
> +#endif

Per section 21 of Documentation/process/coding-style.rst,
please avoid the use of #ifdef and instead add an #else section
to pciehp.h with an inline stub for is_craye1k_slot() which returns false
and #define craye1k_{get,set}_attention_status to NULL.

Do these hotplug slots on the Cray E1000 have the Attention Indicator
Present bit set in the Slot Capabilities register?  If they do not,
please use "if else" instead of "if".  Right now you're overwriting
the default {get,set}_attention_status pointers, which only makes
sense if ATTN_LED(ctrl) is true.

When is craye1k_new_smi() called?  Is it guaranteed to be called
before the first invocation of craye1k_{get,set}_attention_status()?

I'm asking because craye1k_{get,set}_attention_status() do not
contain any checks whether the craye1k_global struct has been
populated.  You would need such checks if craye1k_new_smi() may
run later than the "attention" attribute appearing in sysfs.

Actually craye1k_new_smi() may fail, e.g. because of kzalloc()
returning NULL, so I think you'll need additional checks in any case.
However if craye1k_new_smi() is guaranteed to run before init_slot(),
you could check for an uninitialized craye1k_global struct already in
is_craye1k_slot().

Note that just checking "craye1k_global == NULL" is insufficient as
craye1k_new_smi() might run concurrently to
craye1k_{get,set}_attention_status().

I note that is_craye1k_slot() calls dmi_match(), so you should add
"depends on DMI" to the Kconfig entry you're adding.

> @@ -376,8 +383,16 @@ int __init pcie_hp_init(void)
>  
>  	retval = pcie_port_service_register(&hpdriver_portdrv);
>  	pr_debug("pcie_port_service_register = %d\n", retval);
> -	if (retval)
> +	if (retval) {
>  		pr_debug("Failure to register service\n");
> +		return retval;
> +	}
> +
> +#ifdef CONFIG_HOTPLUG_PCI_PCIE_CRAY_E1000
> +	retval = craye1k_init();
> +	if (retval)
> +		pr_debug("Failure to register Cray E1000 extensions");
> +#endif

Another #ifdef that should be eliminated.

You also need to annotate craye1k_init() with __init.

> +++ b/drivers/pci/hotplug/pciehp_craye1k.c
[...]
> +#include <linux/debugfs.h>
> +#include <linux/delay.h>
> +#include <linux/errno.h>
> +#include <linux/dmi.h>
> +#include <linux/ipmi.h>
> +#include <linux/module.h>
> +#include <linux/pci.h>
> +#include <linux/pci_hotplug.h>
> +#include <linux/random.h>
> +#include "pciehp.h"

dmi.h isn't inserted in alphabetical order.

> +	/* debugfs stats */
> +	u64 check_primary;
> +	u64 check_primary_failed;
> +	u64 was_already_primary;
> +	u64 was_not_already_primary;
> +	u64 set_primary;
> +	u64 set_initial_primary_failed;
> +	u64 set_primary_failed;
> +	u64 set_led_locate_failed;
> +	u64 set_led_fault_failed;
> +	u64 set_led_readback_failed;
> +	u64 set_led_failed;
> +	u64 get_led_failed;
> +	u64 completion_timeout;
> +	u64 wrong_msgid;
> +	u64 request_failed;

I'm wondering if having all this debug code in mainline is useful?
Is the IPMI interface this brittle?  Was this just necessary
for initial development and may not be useful going forward?

> +static void craye1k_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data)
> +{
> +	struct craye1k *craye1k = user_msg_data;
> +
> +	if (msg->msgid != craye1k->tx_msg_id) {
> +		craye1k->wrong_msgid++;
> +		if (craye1k->print_errors) {
> +			dev_warn_ratelimited(craye1k->dev, "rx msgid %d != %d",
> +					     (int)msg->msgid,
> +					     (int)craye1k->tx_msg_id);

Why use casts instead of %ld in the format string?

> +static void craye1k_smi_gone(int iface)
> +{
> +	pr_warn("craye1k: Got unexpected smi_gone, iface=%d", iface);
> +}

Why not kfree() the craye1k struct here, tear down debugfs etc?

Thanks,

Lukas

      reply	other threads:[~2025-10-30  9:13 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2025-10-08 23:48 [PATCH v6] Introduce Cray ClusterStor E1000 NVMe slot LED driver Tony Hutter
2025-10-30  9:13 ` Lukas Wunner [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=aQMsVUCBDF7ZUSK-@wunner.de \
    --to=lukas@wunner.de \
    --cc=alok.a.tiwari@oracle.com \
    --cc=corey@minyard.net \
    --cc=helgaas@kernel.org \
    --cc=hutter2@llnl.gov \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-pci@vger.kernel.org \
    --cc=mariusz.tkaczyk@linux.intel.com \
    --cc=minyard@acm.org \
    --cc=openipmi-developer@lists.sourceforge.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).