public inbox for linux-nvme@lists.infradead.org
 help / color / mirror / Atom feed
* [PATCH] nvme: avoid bogus CRTO values
@ 2023-09-12 21:47 Keith Busch
  2023-09-13 10:50 ` Christoph Hellwig
                   ` (2 more replies)
  0 siblings, 3 replies; 11+ messages in thread
From: Keith Busch @ 2023-09-12 21:47 UTC (permalink / raw)
  To: linux-nvme, hch
  Cc: sagi, Keith Busch, Cláudio Sampaio, Felix Yan, stable

From: Keith Busch <kbusch@kernel.org>

Some devices are reporting controller ready mode support, but return 0
for CRTO. These devices require a much higher time to ready than that,
so they are failing to initialize after the driver starter preferring
that value over CAP.TO.

The spec requires that CAP.TO match the appropritate CRTO value, or be
set to 0xff if CRTO is larger than that. This means that CAP.TO can be
used to validate if CRTO is reliable, and provides an appropriate
fallback for setting the timeout value if not. Use whichever is larger.

Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863
Reported-by: Cláudio Sampaio <patola@gmail.com>
Reported-by: Felix Yan <felixonmars@archlinux.org>
Based-on-a-patch-by: Felix Yan <felixonmars@archlinux.org>
Cc: stable@vger.kernel.org
Signed-off-by: Keith Busch <kbusch@kernel.org>
---
 drivers/nvme/host/core.c | 48 ++++++++++++++++++++++++----------------
 1 file changed, 29 insertions(+), 19 deletions(-)

diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
index 37b6fa7466620..4adc0b2f12f1e 100644
--- a/drivers/nvme/host/core.c
+++ b/drivers/nvme/host/core.c
@@ -2245,25 +2245,8 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
 	else
 		ctrl->ctrl_config = NVME_CC_CSS_NVM;
 
-	if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
-		u32 crto;
-
-		ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
-		if (ret) {
-			dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
-				ret);
-			return ret;
-		}
-
-		if (ctrl->cap & NVME_CAP_CRMS_CRIMS) {
-			ctrl->ctrl_config |= NVME_CC_CRIME;
-			timeout = NVME_CRTO_CRIMT(crto);
-		} else {
-			timeout = NVME_CRTO_CRWMT(crto);
-		}
-	} else {
-		timeout = NVME_CAP_TIMEOUT(ctrl->cap);
-	}
+	if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS)
+		ctrl->ctrl_config |= NVME_CC_CRIME;
 
 	ctrl->ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) << NVME_CC_MPS_SHIFT;
 	ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE;
@@ -2277,6 +2260,33 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
 	if (ret)
 		return ret;
 
+	/* CAP value may change after initial CC write */
+	ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap);
+	if (ret)
+		return ret;
+
+	timeout = NVME_CAP_TIMEOUT(ctrl->cap);
+	if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
+		u32 crto;
+
+		ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
+		if (ret) {
+			dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
+				ret);
+			return ret;
+		}
+
+		/*
+		 * CRTO should always be greater or equal to CAP.TO, but some
+		 * devices are known to get this wrong. Use the larger of the
+		 * two values.
+		 */
+		if (ctrl->ctrl_config & NVME_CC_CRIME)
+			timeout = max(timeout, NVME_CRTO_CRIMT(crto));
+		else
+			timeout = max(timeout, NVME_CRTO_CRWMT(crto));
+	}
+
 	ctrl->ctrl_config |= NVME_CC_ENABLE;
 	ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config);
 	if (ret)
-- 
2.34.1



^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH] nvme: avoid bogus CRTO values
  2023-09-12 21:47 [PATCH] nvme: avoid bogus CRTO values Keith Busch
@ 2023-09-13 10:50 ` Christoph Hellwig
  2023-09-13 15:15   ` Keith Busch
  2023-09-13 10:53 ` Sagi Grimberg
  2023-09-13 12:25 ` Niklas Cassel
  2 siblings, 1 reply; 11+ messages in thread
From: Christoph Hellwig @ 2023-09-13 10:50 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme, hch, sagi, Keith Busch, Cláudio Sampaio,
	Felix Yan, stable

On Tue, Sep 12, 2023 at 02:47:33PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Some devices are reporting controller ready mode support, but return 0
> for CRTO. These devices require a much higher time to ready than that,
> so they are failing to initialize after the driver starter preferring
> that value over CAP.TO.
> 
> The spec requires that CAP.TO match the appropritate CRTO value, or be
> set to 0xff if CRTO is larger than that. This means that CAP.TO can be
> used to validate if CRTO is reliable, and provides an appropriate
> fallback for setting the timeout value if not. Use whichever is larger.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863
> Reported-by: Cláudio Sampaio <patola@gmail.com>
> Reported-by: Felix Yan <felixonmars@archlinux.org>
> Based-on-a-patch-by: Felix Yan <felixonmars@archlinux.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/nvme/host/core.c | 48 ++++++++++++++++++++++++----------------
>  1 file changed, 29 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c

> +	if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS)

I don't think the NVME_CAP_CRMS_CRWMS check here makes sense, this
should only need the NVME_CAP_CRMS_CRIMS one.

> +	timeout = NVME_CAP_TIMEOUT(ctrl->cap);
> +	if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
> +		u32 crto;
> +
> +		ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
> +		if (ret) {
> +			dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
> +				ret);
> +			return ret;
> +		}
> +
> +		/*
> +		 * CRTO should always be greater or equal to CAP.TO, but some
> +		 * devices are known to get this wrong. Use the larger of the
> +		 * two values.
> +		 */
> +		if (ctrl->ctrl_config & NVME_CC_CRIME)
> +			timeout = max(timeout, NVME_CRTO_CRIMT(crto));
> +		else
> +			timeout = max(timeout, NVME_CRTO_CRWMT(crto));

Should we at least log a harmless one-liner warning if the new timeouts
are too small?


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] nvme: avoid bogus CRTO values
  2023-09-12 21:47 [PATCH] nvme: avoid bogus CRTO values Keith Busch
  2023-09-13 10:50 ` Christoph Hellwig
@ 2023-09-13 10:53 ` Sagi Grimberg
  2023-09-13 12:25 ` Niklas Cassel
  2 siblings, 0 replies; 11+ messages in thread
From: Sagi Grimberg @ 2023-09-13 10:53 UTC (permalink / raw)
  To: Keith Busch, linux-nvme, hch
  Cc: Keith Busch, Cláudio Sampaio, Felix Yan, stable

Reviewed-by: Sagi Grimberg <sagi@grimberg.me>


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] nvme: avoid bogus CRTO values
  2023-09-12 21:47 [PATCH] nvme: avoid bogus CRTO values Keith Busch
  2023-09-13 10:50 ` Christoph Hellwig
  2023-09-13 10:53 ` Sagi Grimberg
@ 2023-09-13 12:25 ` Niklas Cassel
  2023-09-13 15:20   ` Keith Busch
  2 siblings, 1 reply; 11+ messages in thread
From: Niklas Cassel @ 2023-09-13 12:25 UTC (permalink / raw)
  To: Keith Busch
  Cc: linux-nvme@lists.infradead.org, hch@lst.de, sagi@grimberg.me,
	Keith Busch, Cláudio Sampaio, Felix Yan,
	stable@vger.kernel.org

On Tue, Sep 12, 2023 at 02:47:33PM -0700, Keith Busch wrote:
> From: Keith Busch <kbusch@kernel.org>
> 
> Some devices are reporting controller ready mode support, but return 0
> for CRTO. These devices require a much higher time to ready than that,
> so they are failing to initialize after the driver starter preferring
> that value over CAP.TO.
> 
> The spec requires that CAP.TO match the appropritate CRTO value, or be
> set to 0xff if CRTO is larger than that. This means that CAP.TO can be
> used to validate if CRTO is reliable, and provides an appropriate
> fallback for setting the timeout value if not. Use whichever is larger.
> 
> Link: https://bugzilla.kernel.org/show_bug.cgi?id=217863
> Reported-by: Cláudio Sampaio <patola@gmail.com>
> Reported-by: Felix Yan <felixonmars@archlinux.org>
> Based-on-a-patch-by: Felix Yan <felixonmars@archlinux.org>
> Cc: stable@vger.kernel.org
> Signed-off-by: Keith Busch <kbusch@kernel.org>
> ---
>  drivers/nvme/host/core.c | 48 ++++++++++++++++++++++++----------------
>  1 file changed, 29 insertions(+), 19 deletions(-)
> 
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index 37b6fa7466620..4adc0b2f12f1e 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -2245,25 +2245,8 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
>  	else
>  		ctrl->ctrl_config = NVME_CC_CSS_NVM;
>  
> -	if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
> -		u32 crto;
> -
> -		ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
> -		if (ret) {
> -			dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
> -				ret);
> -			return ret;
> -		}
> -
> -		if (ctrl->cap & NVME_CAP_CRMS_CRIMS) {
> -			ctrl->ctrl_config |= NVME_CC_CRIME;
> -			timeout = NVME_CRTO_CRIMT(crto);
> -		} else {
> -			timeout = NVME_CRTO_CRWMT(crto);
> -		}
> -	} else {
> -		timeout = NVME_CAP_TIMEOUT(ctrl->cap);
> -	}
> +	if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS)
> +		ctrl->ctrl_config |= NVME_CC_CRIME;
>  
>  	ctrl->ctrl_config |= (NVME_CTRL_PAGE_SHIFT - 12) << NVME_CC_MPS_SHIFT;
>  	ctrl->ctrl_config |= NVME_CC_AMS_RR | NVME_CC_SHN_NONE;
> @@ -2277,6 +2260,33 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl)
>  	if (ret)
>  		return ret;
>  
> +	/* CAP value may change after initial CC write */
> +	ret = ctrl->ops->reg_read64(ctrl, NVME_REG_CAP, &ctrl->cap);
> +	if (ret)
> +		return ret;
> +
> +	timeout = NVME_CAP_TIMEOUT(ctrl->cap);
> +	if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
> +		u32 crto;
> +
> +		ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
> +		if (ret) {
> +			dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
> +				ret);
> +			return ret;
> +		}
> +
> +		/*
> +		 * CRTO should always be greater or equal to CAP.TO, but some
> +		 * devices are known to get this wrong. Use the larger of the
> +		 * two values.
> +		 */
> +		if (ctrl->ctrl_config & NVME_CC_CRIME)
> +			timeout = max(timeout, NVME_CRTO_CRIMT(crto));
> +		else
> +			timeout = max(timeout, NVME_CRTO_CRWMT(crto));

I saw the original bug report.
But wasn't the problem that these were compared before NVME_CC_CRIME had
been written?

i.e. is this max() check still needed for the bug reporter's NVMe drive,
after NVME_CC_CRIME was been written and CAP has been re-read?
(If so, would a quirk be better?)


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] nvme: avoid bogus CRTO values
  2023-09-13 10:50 ` Christoph Hellwig
@ 2023-09-13 15:15   ` Keith Busch
  0 siblings, 0 replies; 11+ messages in thread
From: Keith Busch @ 2023-09-13 15:15 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Keith Busch, linux-nvme, sagi, Cláudio Sampaio, Felix Yan,
	stable

On Wed, Sep 13, 2023 at 12:50:15PM +0200, Christoph Hellwig wrote:
> On Tue, Sep 12, 2023 at 02:47:33PM -0700, Keith Busch wrote:
> >  1 file changed, 29 insertions(+), 19 deletions(-)
> > 
> > diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> 
> > +	if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS)
> 
> I don't think the NVME_CAP_CRMS_CRWMS check here makes sense, this
> should only need the NVME_CAP_CRMS_CRIMS one.

I think you're right, but we currently only enable CRIME if both are
set, so that would be a functional change snuck into this sanity check.
 
> > +	timeout = NVME_CAP_TIMEOUT(ctrl->cap);
> > +	if (ctrl->cap & NVME_CAP_CRMS_CRWMS) {
> > +		u32 crto;
> > +
> > +		ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto);
> > +		if (ret) {
> > +			dev_err(ctrl->device, "Reading CRTO failed (%d)\n",
> > +				ret);
> > +			return ret;
> > +		}
> > +
> > +		/*
> > +		 * CRTO should always be greater or equal to CAP.TO, but some
> > +		 * devices are known to get this wrong. Use the larger of the
> > +		 * two values.
> > +		 */
> > +		if (ctrl->ctrl_config & NVME_CC_CRIME)
> > +			timeout = max(timeout, NVME_CRTO_CRIMT(crto));
> > +		else
> > +			timeout = max(timeout, NVME_CRTO_CRWMT(crto));
> 
> Should we at least log a harmless one-liner warning if the new timeouts
> are too small?

Felix had something like that in an earlier patch, but it would print on
every reset. That could get a bit spammy on the kernel logs, but I can
add a dev_warn_once() instead.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] nvme: avoid bogus CRTO values
  2023-09-13 12:25 ` Niklas Cassel
@ 2023-09-13 15:20   ` Keith Busch
  2023-09-13 16:14     ` Niklas Cassel
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2023-09-13 15:20 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Keith Busch, linux-nvme@lists.infradead.org, hch@lst.de,
	sagi@grimberg.me, Cláudio Sampaio, Felix Yan,
	stable@vger.kernel.org

On Wed, Sep 13, 2023 at 12:25:29PM +0000, Niklas Cassel wrote:
> > +		if (ctrl->ctrl_config & NVME_CC_CRIME)
> > +			timeout = max(timeout, NVME_CRTO_CRIMT(crto));
> > +		else
> > +			timeout = max(timeout, NVME_CRTO_CRWMT(crto));
> 
> I saw the original bug report.
> But wasn't the problem that these were compared before NVME_CC_CRIME had
> been written?
>
> i.e. is this max() check still needed for the bug reporter's NVMe drive,
> after NVME_CC_CRIME was been written and CAP has been re-read?
> (If so, would a quirk be better?)

The values reported in CRTO don't change with the CC writes. It's only
CAP.TO that can change, so we still can't rely on CRTO at any point in
the sequence for these devices.
 


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] nvme: avoid bogus CRTO values
  2023-09-13 15:20   ` Keith Busch
@ 2023-09-13 16:14     ` Niklas Cassel
  2023-09-13 16:46       ` Keith Busch
  0 siblings, 1 reply; 11+ messages in thread
From: Niklas Cassel @ 2023-09-13 16:14 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, linux-nvme@lists.infradead.org, hch@lst.de,
	sagi@grimberg.me, Cláudio Sampaio, Felix Yan,
	stable@vger.kernel.org

Hello Keith,

On Wed, Sep 13, 2023 at 08:20:09AM -0700, Keith Busch wrote:
> On Wed, Sep 13, 2023 at 12:25:29PM +0000, Niklas Cassel wrote:
> > > +		if (ctrl->ctrl_config & NVME_CC_CRIME)
> > > +			timeout = max(timeout, NVME_CRTO_CRIMT(crto));
> > > +		else
> > > +			timeout = max(timeout, NVME_CRTO_CRWMT(crto));
> > 
> > I saw the original bug report.
> > But wasn't the problem that these were compared before NVME_CC_CRIME had
> > been written?
> >
> > i.e. is this max() check still needed for the bug reporter's NVMe drive,
> > after NVME_CC_CRIME was been written and CAP has been re-read?
> > (If so, would a quirk be better?)
> 
> The values reported in CRTO don't change with the CC writes. It's only
> CAP.TO that can change, so we still can't rely on CRTO at any point in
> the sequence for these devices.

Yes, I know that CRTO (which is the new register), doesn't change.
It is supposed to have to values:
CRTO.CRIMT
CRTO.CRWMT

The hack to modify CAP.TO to be in sync with either CRTO.CRIMT or
CRTO.CRWMT is really ugly.

Considering that CRTO.CRIMT and CRTO.CRWMT are also 16-bit,
so wider than CAP.TO, suggests that software should read these
if available, i.e. no need to ever read CAP.TO if
CAP.CRMS.CRWMS is 1 or if CAP.CRMS.CRIMS is 1.

CRTO.CRIMT is allowed to be 0, if CAP.CRMS.CRIMS is 0.
CRTO.CRWMT is not allowed to be 0 if CAP.CRMS.CRWMS is 1.
(and this bit should be set to 1 according to NVMe 2.0).

Additionally, NVMe 2.0b, Figure 35, clearly states register
CRTO as Mandatory for I/O and Admin controllers.


I guess that I just can't understand how a controller can set
bit CAP.CRMS.CRWMS to 1, without setting a value in CRTO.CRWMT.
Is it such a silly spec violation, that they seem to have not
bothered to read what this bit means at all.

Such a controller is so obviously broken that it deserves a quirk.

Although, I understand that using quirks is not always the best
solution for end users...


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] nvme: avoid bogus CRTO values
  2023-09-13 16:14     ` Niklas Cassel
@ 2023-09-13 16:46       ` Keith Busch
  2023-09-15 21:31         ` Niklas Cassel
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2023-09-13 16:46 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Keith Busch, linux-nvme@lists.infradead.org, hch@lst.de,
	sagi@grimberg.me, Cláudio Sampaio, Felix Yan,
	stable@vger.kernel.org

On Wed, Sep 13, 2023 at 04:14:44PM +0000, Niklas Cassel wrote:
> Hello Keith,
> 
> On Wed, Sep 13, 2023 at 08:20:09AM -0700, Keith Busch wrote:
> > On Wed, Sep 13, 2023 at 12:25:29PM +0000, Niklas Cassel wrote:
> > > > +		if (ctrl->ctrl_config & NVME_CC_CRIME)
> > > > +			timeout = max(timeout, NVME_CRTO_CRIMT(crto));
> > > > +		else
> > > > +			timeout = max(timeout, NVME_CRTO_CRWMT(crto));
> > > 
> > > I saw the original bug report.
> > > But wasn't the problem that these were compared before NVME_CC_CRIME had
> > > been written?
> > >
> > > i.e. is this max() check still needed for the bug reporter's NVMe drive,
> > > after NVME_CC_CRIME was been written and CAP has been re-read?
> > > (If so, would a quirk be better?)
> > 
> > The values reported in CRTO don't change with the CC writes. It's only
> > CAP.TO that can change, so we still can't rely on CRTO at any point in
> > the sequence for these devices.
> 
> Yes, I know that CRTO (which is the new register), doesn't change.
> It is supposed to have to values:
> CRTO.CRIMT
> CRTO.CRWMT
> 
> The hack to modify CAP.TO to be in sync with either CRTO.CRIMT or
> CRTO.CRWMT is really ugly.

This is just a sanity check to ensure the device complies with spec,
specifically this part for CAP.TO:

  this field shall be set to:
   a) the value in Controller Ready With Media Timeout (CRTO.CRWMT); or
   b) FFh if CRTO.CRWMT is greater than FFh.

If CRWMT is smaller than CAP.TO, then the device is not spec compliant.
Yeah, the driver should prefer CRTO values, but if they're clearly
unreliable, we ought to fallback to a longer wait since we can't trust
the lower value. What's the worst that can happen? We wait longer than
necessary for a device that never becomes ready? That's totally
acceptable, IMO.
 
> Considering that CRTO.CRIMT and CRTO.CRWMT are also 16-bit,
> so wider than CAP.TO, suggests that software should read these
> if available, i.e. no need to ever read CAP.TO if
> CAP.CRMS.CRWMS is 1 or if CAP.CRMS.CRIMS is 1.
> 
> CRTO.CRIMT is allowed to be 0, if CAP.CRMS.CRIMS is 0.
> CRTO.CRWMT is not allowed to be 0 if CAP.CRMS.CRWMS is 1.
> (and this bit should be set to 1 according to NVMe 2.0).

I don't see any reason why CRTO.CRWMT can't legitimately be 0. I can
conceive of an implementation ready for commands immediately after CC.EN
0->1 without a delay. In this device case, though, it's just wrong.

> Additionally, NVMe 2.0b, Figure 35, clearly states register
> CRTO as Mandatory for I/O and Admin controllers.
 
> I guess that I just can't understand how a controller can set
> bit CAP.CRMS.CRWMS to 1, without setting a value in CRTO.CRWMT.
> Is it such a silly spec violation, that they seem to have not
> bothered to read what this bit means at all.

Yep, but broken controllers are broken. They used to work in Linux
before the driver started preferring the recommended CRTO, so we gotta
fix 'em.

> Such a controller is so obviously broken that it deserves a quirk.
> 
> Although, I understand that using quirks is not always the best
> solution for end users...

If we have a way to sanity check for spec non-compliance, I would prefer
doing that generically rather than quirk specific devices. Coincidently,
I received an internal report just yesterday of a 2nd vendor with
similar breakage, so this might be a common fuck up. Let's not put users
and maintainers through the hassle if we can resolve it just once.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] nvme: avoid bogus CRTO values
  2023-09-13 16:46       ` Keith Busch
@ 2023-09-15 21:31         ` Niklas Cassel
  2023-09-15 21:50           ` Keith Busch
  0 siblings, 1 reply; 11+ messages in thread
From: Niklas Cassel @ 2023-09-15 21:31 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, linux-nvme@lists.infradead.org, hch@lst.de,
	sagi@grimberg.me, Cláudio Sampaio, Felix Yan,
	stable@vger.kernel.org

On Wed, Sep 13, 2023 at 09:46:08AM -0700, Keith Busch wrote:
>  
> > I guess that I just can't understand how a controller can set
> > bit CAP.CRMS.CRWMS to 1, without setting a value in CRTO.CRWMT.
> > Is it such a silly spec violation, that they seem to have not
> > bothered to read what this bit means at all.
> 
> Yep, but broken controllers are broken. They used to work in Linux
> before the driver started preferring the recommended CRTO, so we gotta
> fix 'em.

I'm not saying that these controllers shouldn't work with Linux.
However, these controller used to work with CC.CRIME == 0, so perhaps
we should continue to use them that way?


My reasoning is that if the controller did not even manage to get the
most fundamental part of this new feature correctly (i.e. defining the
actual timeout values for this feature), then perhaps we shouldn't
assume that the rest of the feature, setting the NRDY bit, sending the
AER, etc., is implemented correctly either.

When CC.CRIME == 1, both fields in CRTO actually have a defined meaning:
CRTO.CRIMT is the maximum time that host software should wait for the
controller to become ready.
CRTO.CRWMT is the maximum time that host software should wait for all
attached namespaces to become ready.

So having both fields defined to zero, or rather, to have both fields
defined to a value smaller than CAP.TO, regardless of CC.CRIME value,
is quite bad.

So perhaps it is better to keep CC.CRIME == 0 for such controllers.


> If we have a way to sanity check for spec non-compliance, I would prefer
> doing that generically rather than quirk specific devices.

It's not going to be beautiful, but one way could be to:
-check CAP.CRMS.CRIMS, if it is set to 1:
-write CC.CRIME == 1,
-re-read CAP register, since it can change depending on CC.CRIME (urgh)
-check if CRTO.CRIMT is less than CAP.TO, if so:
-write CC.CRIME == 0 (disable the feature since it is obviously broken)
-re-read CAP register, since it can change depending on CC.CRIME (urgh)


For the record, I'm obviously happy with whatever decision you make,
I'm just giving my two cents...


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] nvme: avoid bogus CRTO values
  2023-09-15 21:31         ` Niklas Cassel
@ 2023-09-15 21:50           ` Keith Busch
  2023-09-15 22:14             ` Niklas Cassel
  0 siblings, 1 reply; 11+ messages in thread
From: Keith Busch @ 2023-09-15 21:50 UTC (permalink / raw)
  To: Niklas Cassel
  Cc: Keith Busch, linux-nvme@lists.infradead.org, hch@lst.de,
	sagi@grimberg.me, Cláudio Sampaio, Felix Yan,
	stable@vger.kernel.org

On Fri, Sep 15, 2023 at 09:31:44PM +0000, Niklas Cassel wrote:
> I'm not saying that these controllers shouldn't work with Linux.
> However, these controller used to work with CC.CRIME == 0, so perhaps
> we should continue to use them that way?

Perhaps I missed something, but I didn't get any indication that these
controllers were reporting CRIMS capability. They're just reporting
CRWMS as far as I know, so CRIME doesn't apply. I only included that
case in the patch for completeness.
 
> So having both fields defined to zero, or rather, to have both fields
> defined to a value smaller than CAP.TO, regardless of CC.CRIME value,
> is quite bad.
> 
> So perhaps it is better to keep CC.CRIME == 0 for such controllers.
> 
> 
> > If we have a way to sanity check for spec non-compliance, I would prefer
> > doing that generically rather than quirk specific devices.
> 
> It's not going to be beautiful, but one way could be to:
> -check CAP.CRMS.CRIMS, if it is set to 1:
> -write CC.CRIME == 1,
> -re-read CAP register, since it can change depending on CC.CRIME (urgh)
> -check if CRTO.CRIMT is less than CAP.TO, if so:
> -write CC.CRIME == 0 (disable the feature since it is obviously broken)
> -re-read CAP register, since it can change depending on CC.CRIME (urgh)

There is a corner case that I am somewhat purposefully ignoring: CAP.TO
is worst case for both CC.EN 0->1 and 1->0, whereas both CRTO values are
only for the 0->1 transition. It's entirely possible some implementation
needs a longer 1->0 transition, so CAP.TO *could* validly be greater
than the either CRTO value. I just don't see that happening in practice,
and even if we do encounter such a device, waiting a little longer on
init for a broken controller doesn't make any situation worse.


^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH] nvme: avoid bogus CRTO values
  2023-09-15 21:50           ` Keith Busch
@ 2023-09-15 22:14             ` Niklas Cassel
  0 siblings, 0 replies; 11+ messages in thread
From: Niklas Cassel @ 2023-09-15 22:14 UTC (permalink / raw)
  To: Keith Busch
  Cc: Keith Busch, linux-nvme@lists.infradead.org, hch@lst.de,
	sagi@grimberg.me, Cláudio Sampaio, Felix Yan,
	stable@vger.kernel.org

On Fri, Sep 15, 2023 at 02:50:20PM -0700, Keith Busch wrote:
> On Fri, Sep 15, 2023 at 09:31:44PM +0000, Niklas Cassel wrote:
> > I'm not saying that these controllers shouldn't work with Linux.
> > However, these controller used to work with CC.CRIME == 0, so perhaps
> > we should continue to use them that way?
> 
> Perhaps I missed something, but I didn't get any indication that these
> controllers were reporting CRIMS capability. They're just reporting
> CRWMS as far as I know, so CRIME doesn't apply. I only included that
> case in the patch for completeness.

Yes, you are right of course.

My brain started thinking about this issue when relaxing...
but I guess that my brain got some details wrong when context
switching to this problem again after too much idle...

Have a nice weekend!


Kind regards,
Niklas

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-09-15 22:15 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-12 21:47 [PATCH] nvme: avoid bogus CRTO values Keith Busch
2023-09-13 10:50 ` Christoph Hellwig
2023-09-13 15:15   ` Keith Busch
2023-09-13 10:53 ` Sagi Grimberg
2023-09-13 12:25 ` Niklas Cassel
2023-09-13 15:20   ` Keith Busch
2023-09-13 16:14     ` Niklas Cassel
2023-09-13 16:46       ` Keith Busch
2023-09-15 21:31         ` Niklas Cassel
2023-09-15 21:50           ` Keith Busch
2023-09-15 22:14             ` Niklas Cassel

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox