netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: Konrad Knitter <konrad.knitter@intel.com>
Cc: intel-wired-lan@lists.osuosl.org, anthony.l.nguyen@intel.com,
	przemyslaw.kitszel@intel.com, netdev@vger.kernel.org,
	kuba@kernel.org, pabeni@redhat.com, edumazet@google.com,
	davem@davemloft.net, andrew+netdev@lunn.ch,
	Sharon Haroni <sharon.haroni@intel.com>,
	Nicholas Nunley <nicholas.d.nunley@intel.com>,
	Brett Creeley <brett.creeley@intel.com>
Subject: Re: [PATCH iwl-next v2] ice: fw and port health status
Date: Mon, 9 Dec 2024 11:13:59 +0000	[thread overview]
Message-ID: <20241209111359.GA2581@kernel.org> (raw)
In-Reply-To: <20241204122738.114511-1-konrad.knitter@intel.com>

On Wed, Dec 04, 2024 at 01:27:38PM +0100, Konrad Knitter wrote:
> Firmware generates events for global events or port specific events.
> 
> Driver shall subscribe for health status events from firmware on supported
> FW versions >= 1.7.6.
> Driver shall expose those under specific health reporter, two new
> reporters are introduced:
> - FW health reporter shall represent global events (problems with the
> image, recovery mode);
> - Port health reporter shall represent port-specific events (module
> failure).
> 
> Firmware only reports problems when those are detected, it does not store
> active fault list.
> Driver will hold only last global and last port-specific event.
> Driver will report all events via devlink health report,
> so in case of multiple events of the same source they can be reviewed
> using devlink autodump feature.
> 
> $ devlink health
> 
> pci/0000:b1:00.3:
>   reporter fw
>     state healthy error 0 recover 0 auto_dump true
>   reporter port
>     state error error 1 recover 0 last_dump_date 2024-03-17
> 	last_dump_time 09:29:29 auto_dump true
> 
> $ devlink health diagnose pci/0000:b1:00.3 reporter port
> 
>   Syndrome: 262
>   Description: Module is not present.
>   Possible Solution: Check that the module is inserted correctly.
>   Port Number: 0
> 
> Tested on Intel Corporation Ethernet Controller E810-C for SFP
> 
> Co-developed-by: Sharon Haroni <sharon.haroni@intel.com>
> Signed-off-by: Sharon Haroni <sharon.haroni@intel.com>
> Co-developed-by: Nicholas Nunley <nicholas.d.nunley@intel.com>
> Signed-off-by: Nicholas Nunley <nicholas.d.nunley@intel.com>
> Co-developed-by: Brett Creeley <brett.creeley@intel.com>
> Signed-off-by: Brett Creeley <brett.creeley@intel.com>
> Signed-off-by: Konrad Knitter <konrad.knitter@intel.com>

Hi Konrad,

Some minor feedback from my side.

> diff --git a/drivers/net/ethernet/intel/ice/devlink/health.c b/drivers/net/ethernet/intel/ice/devlink/health.c

...

> +/**
> + * ice_process_health_status_event - Process the health status event from FW
> + * @pf: pointer to the PF structure
> + * @event: event structure containing the Health Status Event opcode
> + *
> + * Decode the Health Status Events and print the associated messages
> + */
> +void ice_process_health_status_event(struct ice_pf *pf, struct ice_rq_event_info *event)
> +{
> +	const struct ice_aqc_health_status_elem *health_info;
> +	u16 count;
> +
> +	health_info = (struct ice_aqc_health_status_elem *)event->msg_buf;
> +	count = le16_to_cpu(event->desc.params.get_health_status.health_status_count);
> +
> +	if (count > (event->buf_len / sizeof(*health_info))) {
> +		dev_err(ice_pf_to_dev(pf), "Received a health status event with invalid element count\n");
> +		return;
> +	}
> +
> +	for (int i = 0; i < count; i++) {
> +		const struct ice_health_status *health_code;
> +		u16 status_code;
> +
> +		status_code = le16_to_cpu(health_info->health_status_code);
> +		health_code = ice_get_health_status(status_code);
> +
> +		if (health_code) {
> +			switch (health_info->event_source) {
> +			case ICE_AQC_HEALTH_STATUS_GLOBAL:
> +				pf->health_reporters.fw_status = *health_info;
> +				devlink_health_report(pf->health_reporters.fw,
> +						      "FW syndrome reported", NULL);
> +				break;
> +			case ICE_AQC_HEALTH_STATUS_PF:
> +			case ICE_AQC_HEALTH_STATUS_PORT:
> +				pf->health_reporters.port_status = *health_info;
> +				devlink_health_report(pf->health_reporters.port,
> +						      "Port syndrome reported", NULL);
> +				break;
> +			default:
> +				dev_err(ice_pf_to_dev(pf), "Health code with unknown source\n");
> +			}

The type of health_info->event_source is __le16.
But here it is being compared against host byte order values.
That doesn't seem correct.

Flagged by Sparse.

> +		} else {
> +			u32 data1, data2;
> +			u16 source;
> +
> +			source = le16_to_cpu(health_info->event_source);
> +			data1 = le32_to_cpu(health_info->internal_data1);
> +			data2 = le32_to_cpu(health_info->internal_data2);
> +			dev_dbg(ice_pf_to_dev(pf),
> +				"Received internal health status code 0x%08x, source: 0x%08x, data1: 0x%08x, data2: 0x%08x",
> +				status_code, source, data1, data2);
> +		}
> +		health_info++;
> +	}
> +}
> +
>  /**
>   * ice_devlink_health_report - boilerplate to call given @reporter
>   *

...

> diff --git a/drivers/net/ethernet/intel/ice/ice_common.c b/drivers/net/ethernet/intel/ice/ice_common.c
> index faba09b9d880..9c61318d3027 100644
> --- a/drivers/net/ethernet/intel/ice/ice_common.c
> +++ b/drivers/net/ethernet/intel/ice/ice_common.c
> @@ -6047,6 +6047,44 @@ bool ice_is_phy_caps_an_enabled(struct ice_aqc_get_phy_caps_data *caps)
>  	return false;
>  }
>  
> +/**
> + * ice_is_fw_health_report_supported

Please consider including a short description here.

Flagged by ./scripts/kernel-doc -Wall -none

> + * @hw: pointer to the hardware structure
> + *
> + * Return: true if firmware supports health status reports,
> + * false otherwise
> + */
> +bool ice_is_fw_health_report_supported(struct ice_hw *hw)
> +{
> +	return ice_is_fw_api_min_ver(hw, ICE_FW_API_HEALTH_REPORT_MAJ,
> +				     ICE_FW_API_HEALTH_REPORT_MIN,
> +				     ICE_FW_API_HEALTH_REPORT_PATCH);
> +}

...

      parent reply	other threads:[~2024-12-09 11:14 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-12-04 12:27 [PATCH iwl-next v2] ice: fw and port health status Konrad Knitter
2024-12-04 12:34 ` [Intel-wired-lan] " Paul Menzel
2024-12-04 12:54   ` Paul Menzel
2024-12-09 11:13 ` Simon Horman [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20241209111359.GA2581@kernel.org \
    --to=horms@kernel.org \
    --cc=andrew+netdev@lunn.ch \
    --cc=anthony.l.nguyen@intel.com \
    --cc=brett.creeley@intel.com \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=konrad.knitter@intel.com \
    --cc=kuba@kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=nicholas.d.nunley@intel.com \
    --cc=pabeni@redhat.com \
    --cc=przemyslaw.kitszel@intel.com \
    --cc=sharon.haroni@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).