netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Simon Horman <horms@kernel.org>
To: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
Cc: intel-wired-lan@lists.osuosl.org, anthony.l.nguyen@intel.com,
	netdev@vger.kernel.org, Kelvin Kang <kelvin.kang@intel.com>,
	Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Subject: Re: [PATCH iwl-net v5] i40e: fix: remove needless retries of NVM update
Date: Thu, 27 Jun 2024 18:33:51 +0100	[thread overview]
Message-ID: <20240627173351.GH3104@kernel.org> (raw)
In-Reply-To: <20240625184953.621684-1-aleksandr.loktionov@intel.com>

On Tue, Jun 25, 2024 at 08:49:53PM +0200, Aleksandr Loktionov wrote:
> Remove wrong EIO to EGAIN conversion and pass all errors as is.
> 
> After commit 230f3d53a547 ("i40e: remove i40e_status"), which should only
> replace F/W specific error codes with Linux kernel generic, all EIO errors
> suddenly started to be converted into EAGAIN which leads nvmupdate to retry
> until it timeouts and sometimes fails after more than 20 minutes in the
> middle of NVM update, so NVM becomes corrupted.
> 
> The bug affects users only at the time when they try to update NVM, and
> only F/W versions that generate errors while nvmupdate. For example, X710DA2
> with 0x8000ECB7 F/W is affected, but there are probably more...
> 
> Command for reproduction is just NVM update:
>  ./nvmupdate64
> 
> In the log instead of:
>  i40e_nvmupd_exec_aq err I40E_ERR_ADMIN_QUEUE_ERROR aq_err I40E_AQ_RC_ENOMEM)
> appears:
>  i40e_nvmupd_exec_aq err -EIO aq_err I40E_AQ_RC_ENOMEM
>  i40e: eeprom check failed (-5), Tx/Rx traffic disabled
> 
> The problematic code did silently convert EIO into EAGAIN which forced
> nvmupdate to ignore EAGAIN error and retry the same operation until timeout.
> That's why NVM update takes 20+ minutes to finish with the fail in the end.
> 
> Fixes: 230f3d53a547 ("i40e: remove i40e_status")
> Co-developed-by: Kelvin Kang <kelvin.kang@intel.com>
> Signed-off-by: Kelvin Kang <kelvin.kang@intel.com>
> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

Hi Aleksandr,

Maybe I'm reading things wrong, I have concerns :(

Amongst other things, the cited commit:
1. Maps a number of different I40E_ERR_* values to -EIO; and
2. Maps checks on different I40E_ERR_* values to -EIO

My concern is that the code may now incorrectly match against -EIO
for cases where it would not have previously matched when more
specific error codes.

In the case at hand:
1. -EIO is returned in place of I40E_ERR_ADMIN_QUEUE_ERROR
2. i40e_aq_rc_to_posix checks for -EIO in place of I40E_ERR_ADMIN_QUEUE_TIMEOUT

As you point out, we are now in a bad place.
Which your patch addresses.

But what about a different case where:
1. -EIO is returned in place of I40E_ERR_ADMIN_QUEUE_TIMEOUT
2. i40e_aq_rc_to_posix checks for -EIO in place of I40E_ERR_ADMIN_QUEUE_TIMEOUT

In this scenario the, the code without your patch is correct,
and with your patch it seems incorrect.

Perhaps only the scenario you are fixing occurs.
If so, all good. But it's not obvious to me that is the case.

I'm likewise concerned by other conditions on -EIO
introduced by the cited commit.

> ---
> v4->v5 commit message update
> https://lore.kernel.org/netdev/20240618132111.3193963-1-aleksandr.loktionov@intel.com/T/#u
> v3->v4 commit message update
> v2->v3 commit messege typos
> v1->v2 commit message update
> ---
>  drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 ----
>  1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> index ee86d2c..55b5bb8 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> @@ -109,10 +109,6 @@ static inline int i40e_aq_rc_to_posix(int aq_ret, int aq_rc)
>  		-EFBIG,      /* I40E_AQ_RC_EFBIG */
>  	};
>  
> -	/* aq_rc is invalid if AQ timed out */
> -	if (aq_ret == -EIO)
> -		return -EAGAIN;
> -

Perhaps it has already been covered, but with this change the aq_ret
argument of this function is longer used.  It could be removed as a
follow-up for iwl-next.

>  	if (!((u32)aq_rc < (sizeof(aq_to_posix) / sizeof((aq_to_posix)[0]))))
>  		return -ERANGE;
>  
> -- 
> 2.25.1
> 
> 

  parent reply	other threads:[~2024-06-27 17:33 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-25 18:49 [PATCH iwl-net v5] i40e: fix: remove needless retries of NVM update Aleksandr Loktionov
2024-06-26  9:28 ` [Intel-wired-lan] " Przemek Kitszel
2024-06-27 17:33 ` Simon Horman [this message]
2024-07-08 15:38   ` Loktionov, Aleksandr
2024-07-09  7:42     ` Simon Horman
2024-07-08 12:59 ` Brelinski, Tony

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20240627173351.GH3104@kernel.org \
    --to=horms@kernel.org \
    --cc=aleksandr.loktionov@intel.com \
    --cc=anthony.l.nguyen@intel.com \
    --cc=arkadiusz.kubalewski@intel.com \
    --cc=intel-wired-lan@lists.osuosl.org \
    --cc=kelvin.kang@intel.com \
    --cc=netdev@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).