netdev.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH iwl-net v3] i40e: fix hot issue NVM content is corrupted after nvmupdate
@ 2024-06-12 11:04 Aleksandr Loktionov
  2024-06-13 14:45 ` [Intel-wired-lan] " Brelinski, Tony
                   ` (2 more replies)
  0 siblings, 3 replies; 5+ messages in thread
From: Aleksandr Loktionov @ 2024-06-12 11:04 UTC (permalink / raw)
  To: intel-wired-lan, anthony.l.nguyen, aleksandr.loktionov
  Cc: netdev, Kelvin Kang, Arkadiusz Kubalewski

The bug affects users only at the time when they try to update NVM, and
only F/W versions that generate errors while nvmupdate. For example X710DA2
with 0x8000ECB7 F/W is affected, but there are probably more...

After 230f3d53a547 patch, which should only replace F/W specific error codes
with Linux kernel generic, all EIO errors started to be converted into EAGAIN
which leads nvmupdate to retry until it timeouts and sometimes fails after
more than 20 minutes in the middle of NVM update, so NVM becomes corrupted.

Remove wrong EIO to EGAIN conversion and pass all errors as is.

Fixes: 230f3d53a547 ("i40e: remove i40e_status")
Co-developed-by: Kelvin Kang <kelvin.kang@intel.com>
Signed-off-by: Kelvin Kang <kelvin.kang@intel.com>
Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
---
reproduction:
./nvmupdate64

v2->v3 commit messege typos
v1->v2 commit message update
---
 drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
index ee86d2c..55b5bb8 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
@@ -109,10 +109,6 @@ static inline int i40e_aq_rc_to_posix(int aq_ret, int aq_rc)
 		-EFBIG,      /* I40E_AQ_RC_EFBIG */
 	};
 
-	/* aq_rc is invalid if AQ timed out */
-	if (aq_ret == -EIO)
-		return -EAGAIN;
-
 	if (!((u32)aq_rc < (sizeof(aq_to_posix) / sizeof((aq_to_posix)[0]))))
 		return -ERANGE;
 
-- 
2.25.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* RE: [Intel-wired-lan] [PATCH iwl-net v3] i40e: fix hot issue NVM content is corrupted after nvmupdate
  2024-06-12 11:04 [PATCH iwl-net v3] i40e: fix hot issue NVM content is corrupted after nvmupdate Aleksandr Loktionov
@ 2024-06-13 14:45 ` Brelinski, Tony
  2024-06-13 21:27 ` Tony Nguyen
  2024-06-14  7:43 ` [Intel-wired-lan] " Paul Menzel
  2 siblings, 0 replies; 5+ messages in thread
From: Brelinski, Tony @ 2024-06-13 14:45 UTC (permalink / raw)
  To: Loktionov, Aleksandr, intel-wired-lan@lists.osuosl.org,
	Nguyen, Anthony L, Loktionov, Aleksandr
  Cc: netdev@vger.kernel.org, Kang, Kelvin, Kubalewski, Arkadiusz

> -----Original Message-----
> From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of
> Aleksandr Loktionov
> Sent: Wednesday, June 12, 2024 4:04 AM
> To: intel-wired-lan@lists.osuosl.org; Nguyen, Anthony L
> <anthony.l.nguyen@intel.com>; Loktionov, Aleksandr
> <aleksandr.loktionov@intel.com>
> Cc: netdev@vger.kernel.org; Kang, Kelvin <kelvin.kang@intel.com>;
> Kubalewski, Arkadiusz <arkadiusz.kubalewski@intel.com>
> Subject: [Intel-wired-lan] [PATCH iwl-net v3] i40e: fix hot issue NVM content is
> corrupted after nvmupdate
>
> The bug affects users only at the time when they try to update NVM, and only
> F/W versions that generate errors while nvmupdate. For example X710DA2
> with 0x8000ECB7 F/W is affected, but there are probably more...
>
> After 230f3d53a547 patch, which should only replace F/W specific error codes
> with Linux kernel generic, all EIO errors started to be converted into EAGAIN
> which leads nvmupdate to retry until it timeouts and sometimes fails after
> more than 20 minutes in the middle of NVM update, so NVM becomes
> corrupted.
>
> Remove wrong EIO to EGAIN conversion and pass all errors as is.
>
> Fixes: 230f3d53a547 ("i40e: remove i40e_status")
> Co-developed-by: Kelvin Kang <kelvin.kang@intel.com>
> Signed-off-by: Kelvin Kang <kelvin.kang@intel.com>
> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> reproduction:
> ./nvmupdate64
>
> v2->v3 commit messege typos
> v1->v2 commit message update
> ---
>  drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 ----
>  1 file changed, 4 deletions(-)

Tested-by: Tony Brelinski <tony.brelinski@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH iwl-net v3] i40e: fix hot issue NVM content is corrupted after nvmupdate
  2024-06-12 11:04 [PATCH iwl-net v3] i40e: fix hot issue NVM content is corrupted after nvmupdate Aleksandr Loktionov
  2024-06-13 14:45 ` [Intel-wired-lan] " Brelinski, Tony
@ 2024-06-13 21:27 ` Tony Nguyen
  2024-06-14  6:57   ` Loktionov, Aleksandr
  2024-06-14  7:43 ` [Intel-wired-lan] " Paul Menzel
  2 siblings, 1 reply; 5+ messages in thread
From: Tony Nguyen @ 2024-06-13 21:27 UTC (permalink / raw)
  To: Aleksandr Loktionov, intel-wired-lan
  Cc: netdev, Kelvin Kang, Arkadiusz Kubalewski, Przemek Kitszel

On 6/12/2024 4:04 AM, Aleksandr Loktionov wrote:

As Przemek has pointed out...

"hot issue" doesn't necessarily carry the same meaning; better to just 
drop that out of the title.

> The bug affects users only at the time when they try to update NVM, and
> only F/W versions that generate errors while nvmupdate. For example X710DA2
> with 0x8000ECB7 F/W is affected, but there are probably more...
> 
> After 230f3d53a547 patch, which should only replace F/W specific error codes

Could you cite the commit in the preferred style of SHA +title?

I'd suggest

'After commit 230f3d53a547 ("i40e: remove i40e_status"),'

Thanks,
Tony

> with Linux kernel generic, all EIO errors started to be converted into EAGAIN
> which leads nvmupdate to retry until it timeouts and sometimes fails after
> more than 20 minutes in the middle of NVM update, so NVM becomes corrupted.
> 
> Remove wrong EIO to EGAIN conversion and pass all errors as is.
> 
> Fixes: 230f3d53a547 ("i40e: remove i40e_status")
> Co-developed-by: Kelvin Kang <kelvin.kang@intel.com>
> Signed-off-by: Kelvin Kang <kelvin.kang@intel.com>
> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* RE: [PATCH iwl-net v3] i40e: fix hot issue NVM content is corrupted after nvmupdate
  2024-06-13 21:27 ` Tony Nguyen
@ 2024-06-14  6:57   ` Loktionov, Aleksandr
  0 siblings, 0 replies; 5+ messages in thread
From: Loktionov, Aleksandr @ 2024-06-14  6:57 UTC (permalink / raw)
  To: Nguyen, Anthony L, intel-wired-lan@lists.osuosl.org
  Cc: netdev@vger.kernel.org, Kang, Kelvin, Kubalewski, Arkadiusz,
	Kitszel, Przemyslaw



> -----Original Message-----
> From: Nguyen, Anthony L <anthony.l.nguyen@intel.com>
> Sent: Thursday, June 13, 2024 11:27 PM
> To: Loktionov, Aleksandr <aleksandr.loktionov@intel.com>; intel-wired-
> lan@lists.osuosl.org
> Cc: netdev@vger.kernel.org; Kang, Kelvin <kelvin.kang@intel.com>;
> Kubalewski, Arkadiusz <arkadiusz.kubalewski@intel.com>; Kitszel,
> Przemyslaw <przemyslaw.kitszel@intel.com>
> Subject: Re: [PATCH iwl-net v3] i40e: fix hot issue NVM content is
> corrupted after nvmupdate
> 
> On 6/12/2024 4:04 AM, Aleksandr Loktionov wrote:
> 
> As Przemek has pointed out...
> 
> "hot issue" doesn't necessarily carry the same meaning; better to just
> drop that out of the title.
> 
Does it make sense to change the commit title and start a new mailing list thread when it was already tested and ready to merge?

> > The bug affects users only at the time when they try to update NVM,
> > and only F/W versions that generate errors while nvmupdate. For
> > example X710DA2 with 0x8000ECB7 F/W is affected, but there are
> probably more...
> >
> > After 230f3d53a547 patch, which should only replace F/W specific
> error
> > codes
> 
> Could you cite the commit in the preferred style of SHA +title?
Please see Fixed: tag below

> 
> I'd suggest
> 
> 'After commit 230f3d53a547 ("i40e: remove i40e_status"),'
> 
Is it really needed to mention it twice in the commit message?

> Thanks,
> Tony
> 
> > with Linux kernel generic, all EIO errors started to be converted
> into
> > EAGAIN which leads nvmupdate to retry until it timeouts and
> sometimes
> > fails after more than 20 minutes in the middle of NVM update, so NVM
> becomes corrupted.
> >
> > Remove wrong EIO to EGAIN conversion and pass all errors as is.
> >
> > Fixes: 230f3d53a547 ("i40e: remove i40e_status")
> > Co-developed-by: Kelvin Kang <kelvin.kang@intel.com>
> > Signed-off-by: Kelvin Kang <kelvin.kang@intel.com>
> > Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> > Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-wired-lan] [PATCH iwl-net v3] i40e: fix hot issue NVM content is corrupted after nvmupdate
  2024-06-12 11:04 [PATCH iwl-net v3] i40e: fix hot issue NVM content is corrupted after nvmupdate Aleksandr Loktionov
  2024-06-13 14:45 ` [Intel-wired-lan] " Brelinski, Tony
  2024-06-13 21:27 ` Tony Nguyen
@ 2024-06-14  7:43 ` Paul Menzel
  2 siblings, 0 replies; 5+ messages in thread
From: Paul Menzel @ 2024-06-14  7:43 UTC (permalink / raw)
  To: Aleksandr Loktionov
  Cc: anthony.l.nguyen, intel-wired-lan, netdev, Kelvin Kang,
	Arkadiusz Kubalewski

Dear Aleksandr,


Am 12.06.24 um 13:04 schrieb Aleksandr Loktionov:
> The bug affects users only at the time when they try to update NVM, and
> only F/W versions that generate errors while nvmupdate. For example X710DA2
> with 0x8000ECB7 F/W is affected, but there are probably more...
> 
> After 230f3d53a547 patch, which should only replace F/W specific error codes
> with Linux kernel generic, all EIO errors started to be converted into EAGAIN
> which leads nvmupdate to retry until it timeouts and sometimes fails after
> more than 20 minutes in the middle of NVM update, so NVM becomes corrupted.
> 
> Remove wrong EIO to EGAIN conversion and pass all errors as is.

I am still not convinced your change is correct with this statement. The 
blamed commit converts the error

```
diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h 
b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
index ee394aacef4d..267f2e0a21ce 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
+++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
@@ -5,7 +5,6 @@
  #define _I40E_ADMINQ_H_

  #include "i40e_osdep.h"
-#include "i40e_status.h"
  #include "i40e_adminq_cmd.h"

  #define I40E_ADMINQ_DESC(R, i)   \
@@ -117,7 +116,7 @@ static inline int i40e_aq_rc_to_posix(int aq_ret, 
int aq_rc)
         };

         /* aq_rc is invalid if AQ timed out */
-       if (aq_ret == I40E_ERR_ADMIN_QUEUE_TIMEOUT)
+       if (aq_ret == -EIO)
                 return -EAGAIN;

         if (!((u32)aq_rc < (sizeof(aq_to_posix) / 
sizeof((aq_to_posix)[0]))))
```

> Fixes: 230f3d53a547 ("i40e: remove i40e_status")
> Co-developed-by: Kelvin Kang <kelvin.kang@intel.com>
> Signed-off-by: Kelvin Kang <kelvin.kang@intel.com>
> Reviewed-by: Arkadiusz Kubalewski <arkadiusz.kubalewski@intel.com>
> Signed-off-by: Aleksandr Loktionov <aleksandr.loktionov@intel.com>
> ---
> reproduction:
> ./nvmupdate64

Would be nice to have in the commit message, and also any error messages 
it throws.

> v2->v3 commit messege typos
> v1->v2 commit message update
> ---
>   drivers/net/ethernet/intel/i40e/i40e_adminq.h | 4 ----
>   1 file changed, 4 deletions(-)
> 
> diff --git a/drivers/net/ethernet/intel/i40e/i40e_adminq.h b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> index ee86d2c..55b5bb8 100644
> --- a/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> +++ b/drivers/net/ethernet/intel/i40e/i40e_adminq.h
> @@ -109,10 +109,6 @@ static inline int i40e_aq_rc_to_posix(int aq_ret, int aq_rc)
>   		-EFBIG,      /* I40E_AQ_RC_EFBIG */
>   	};
>   
> -	/* aq_rc is invalid if AQ timed out */
> -	if (aq_ret == -EIO)
> -		return -EAGAIN;
> -

… but you remove the check entirely. Why is that correct?

>   	if (!((u32)aq_rc < (sizeof(aq_to_posix) / sizeof((aq_to_posix)[0]))))
>   		return -ERANGE;
>   


Kind regards,

Paul

^ permalink raw reply related	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-06-14  7:44 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-06-12 11:04 [PATCH iwl-net v3] i40e: fix hot issue NVM content is corrupted after nvmupdate Aleksandr Loktionov
2024-06-13 14:45 ` [Intel-wired-lan] " Brelinski, Tony
2024-06-13 21:27 ` Tony Nguyen
2024-06-14  6:57   ` Loktionov, Aleksandr
2024-06-14  7:43 ` [Intel-wired-lan] " Paul Menzel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).