* [PATCH 0/1] nvme: add retry for media not ready error
@ 2024-09-30 16:48 gjoyce
2024-09-30 16:48 ` [PATCH 1/1] nvme: retry security commands if media not ready gjoyce
0 siblings, 1 reply; 11+ messages in thread
From: gjoyce @ 2024-09-30 16:48 UTC (permalink / raw)
To: linux-nvme
Cc: kbusch, axboe, hch, sagi, hare, dwagner, msuchanek,
jonathan.derrick, okozina, nilay, gjoyce
From: Greg Joyce <gjoyce@linux.ibm.com>
The NVMe host driver sets the config NVME_CC_CRIME if the capablities
NVME_CAP_CRMS_CRWMS and NVME_CAP_CRMS_CRIMS are both set. This config is
written to the controller before NVME_CC_ENABLE is asserted.
The function nvme_wait_ready() then waits for the controller to indicate
CSTS.RDY. Subsequently, SED Opal init code issues a TCG level 0 discovery
command to determine if the NVMe drive is SED capable. However, there is a
period of time between CSTS.RDY and when the commands listed in Figure 103
may not be successfully executed. If the controller is not ready to process
these commands it may return a status code of Admin Command Media Not Ready.
This code is returned for the level 0 discovery and the NVMe driver marks
the drive as not being SED capable even if it is capable. Subsequent calls
to sed_ioctl() all result in failure and a CLI is unable to manage the drive.
The patch proposed here retries security read/write commands if they fail
with NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY. The maximum retry period is
capped by the value provided by the controller value NVME_CRTO_CRIMT.
------------ relevant NVMe Base Specification version 2.0 passages-----------
Controller Ready Independent of Media Enable (CRIME):
This field controls the controller ready mode. The controller ready
mode is determined by the state of this bit at the time the controller
is enabled by transitioning the CC.EN bit from ‘0’ to ‘1’.
Figure 36: Controller Capabilities
Controllers that support the CRTO register (refer to Figure 62) are
able to indicate larger timeouts for enabling the controller. Host
software should use the value in CRTO.CRWMT or CRTO.CRIMT depending
on the controller ready mode indicated by CC.CRIME to determine the
worst-case timeout for CSTS.RDY to transition from ‘0’ to ‘1’ after
CC.EN transitions from ‘0’ to ‘1’.
Figure 94: Status Code – Generic Command Status Values
Admin Command Media Not Ready: (24h)
The Admin command requires access to media and the media is not ready.
The Do Not Retry bit indicates whether re-issuing the command at a
later time may succeed. This status code shall only be returned:
a) or Admin commands; and
b) if the controller is in Controller Ready Independent of
Media mode (CC.CRIME is bit is set to ‘1’).
Figure 103: Admin Commands Permitted to Return a Status Code of Admin Command
Media Not Ready
Includes Security Send and Security Receive that are used by
TCG Level 0 Discovery.
Greg Joyce (1):
nvme: retry security commands if media not ready
drivers/nvme/host/core.c | 82 ++++++++++++++++++++++++++--------------
1 file changed, 54 insertions(+), 28 deletions(-)
--
gjoyce@linux.ibm.com
^ permalink raw reply [flat|nested] 11+ messages in thread* [PATCH 1/1] nvme: retry security commands if media not ready 2024-09-30 16:48 [PATCH 0/1] nvme: add retry for media not ready error gjoyce @ 2024-09-30 16:48 ` gjoyce 2024-10-02 8:16 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: gjoyce @ 2024-09-30 16:48 UTC (permalink / raw) To: linux-nvme Cc: kbusch, axboe, hch, sagi, hare, dwagner, msuchanek, jonathan.derrick, okozina, nilay, gjoyce From: Greg Joyce <gjoyce@linux.ibm.com> If NVME_CC_CRIME is set, security send/receive commands may fail with status NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY until the controller is able to accepts these commands. Retry the command if this failure occurs. Signed-off-by: Greg Joyce <gjoyce@linux.ibm.com> --- drivers/nvme/host/core.c | 82 ++++++++++++++++++++++++++-------------- 1 file changed, 54 insertions(+), 28 deletions(-) diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c index ba6508455e18..fc4bbf1adf9f 100644 --- a/drivers/nvme/host/core.c +++ b/drivers/nvme/host/core.c @@ -2320,12 +2320,53 @@ static int nvme_get_unique_id(struct gendisk *disk, u8 id[16], return nvme_ns_get_unique_id(disk->private_data, id, type); } +static u32 nvme_get_timeout(struct nvme_ctrl *ctrl) +{ + u32 timeout; + int ret; + + timeout = NVME_CAP_TIMEOUT(ctrl->cap); + if (ctrl->cap & NVME_CAP_CRMS_CRWMS) { + u32 crto, ready_timeout; + + ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto); + if (ret) { + dev_err(ctrl->device, "Reading CRTO failed (%d)\n", + ret); + return ret; + } + + /* + * CRTO should always be greater or equal to CAP.TO, but some + * devices are known to get this wrong. Use the larger of the + * two values. + */ + if (ctrl->ctrl_config & NVME_CC_CRIME) + ready_timeout = NVME_CRTO_CRIMT(crto); + else + ready_timeout = NVME_CRTO_CRWMT(crto); + + if (ready_timeout < timeout) { + dev_warn_once(ctrl->device, "bad crto:%x cap:%llx\n", + crto, ctrl->cap); + } else + timeout = ready_timeout; + } + return timeout; +} + #ifdef CONFIG_BLK_SED_OPAL static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t len, bool send) { struct nvme_ctrl *ctrl = data; struct nvme_command cmd = { }; + u32 timeout; + unsigned long timeout_jiffies; + int ret; + + timeout = nvme_get_timeout(ctrl); + timeout_jiffies = jiffies + timeout * HZ; if (send) cmd.common.opcode = nvme_admin_security_send; @@ -2335,8 +2376,19 @@ static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t l cmd.common.cdw10 = cpu_to_le32(((u32)secp) << 24 | ((u32)spsp) << 8); cmd.common.cdw11 = cpu_to_le32(len); - return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len, + ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len, NVME_QID_ANY, NVME_SUBMIT_AT_HEAD); + while (ret == NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY) { + if (time_after(jiffies, timeout_jiffies)) { + dev_err(ctrl->device, + "Device media not ready; aborting\n"); + return -ENODEV; + } + ssleep(1); + ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, + len, NVME_QID_ANY, NVME_SUBMIT_AT_HEAD); + } + return ret; } static void nvme_configure_opal(struct nvme_ctrl *ctrl, bool was_suspended) @@ -2473,33 +2525,7 @@ int nvme_enable_ctrl(struct nvme_ctrl *ctrl) if (ret) return ret; - timeout = NVME_CAP_TIMEOUT(ctrl->cap); - if (ctrl->cap & NVME_CAP_CRMS_CRWMS) { - u32 crto, ready_timeout; - - ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto); - if (ret) { - dev_err(ctrl->device, "Reading CRTO failed (%d)\n", - ret); - return ret; - } - - /* - * CRTO should always be greater or equal to CAP.TO, but some - * devices are known to get this wrong. Use the larger of the - * two values. - */ - if (ctrl->ctrl_config & NVME_CC_CRIME) - ready_timeout = NVME_CRTO_CRIMT(crto); - else - ready_timeout = NVME_CRTO_CRWMT(crto); - - if (ready_timeout < timeout) - dev_warn_once(ctrl->device, "bad crto:%x cap:%llx\n", - crto, ctrl->cap); - else - timeout = ready_timeout; - } + timeout = nvme_get_timeout(ctrl); ctrl->ctrl_config |= NVME_CC_ENABLE; ret = ctrl->ops->reg_write32(ctrl, NVME_REG_CC, ctrl->ctrl_config); -- gjoyce@linux.ibm.com ^ permalink raw reply related [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] nvme: retry security commands if media not ready 2024-09-30 16:48 ` [PATCH 1/1] nvme: retry security commands if media not ready gjoyce @ 2024-10-02 8:16 ` Christoph Hellwig 2024-10-02 16:51 ` Greg Joyce 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2024-10-02 8:16 UTC (permalink / raw) To: gjoyce Cc: linux-nvme, kbusch, axboe, hch, sagi, hare, dwagner, msuchanek, jonathan.derrick, okozina, nilay On Mon, Sep 30, 2024 at 11:48:43AM -0500, gjoyce@linux.ibm.com wrote: > +static u32 nvme_get_timeout(struct nvme_ctrl *ctrl) get_timeout feels a bit too generic for this specific controller/media ready timeout. > + timeout = NVME_CAP_TIMEOUT(ctrl->cap); > + if (ctrl->cap & NVME_CAP_CRMS_CRWMS) { > + u32 crto, ready_timeout; > + > + ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, &crto); > + if (ret) { > + dev_err(ctrl->device, "Reading CRTO failed (%d)\n", > + ret); > + return ret; > + } And we really should be caching these values instead of reading the register for every security command. > + u32 timeout; > + unsigned long timeout_jiffies; > + int ret; > + > + timeout = nvme_get_timeout(ctrl); > + timeout_jiffies = jiffies + timeout * HZ; > > if (send) > cmd.common.opcode = nvme_admin_security_send; > @@ -2335,8 +2376,19 @@ static int nvme_sec_submit(void *data, u16 spsp, u8 secp, void *buffer, size_t l > cmd.common.cdw10 = cpu_to_le32(((u32)secp) << 24 | ((u32)spsp) << 8); > cmd.common.cdw11 = cpu_to_le32(len); > > - return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len, > + ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, len, > NVME_QID_ANY, NVME_SUBMIT_AT_HEAD); > + while (ret == NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY) { > + if (time_after(jiffies, timeout_jiffies)) { > + dev_err(ctrl->device, > + "Device media not ready; aborting\n"); > + return -ENODEV; > + } > + ssleep(1); > + ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, buffer, > + len, NVME_QID_ANY, NVME_SUBMIT_AT_HEAD); > + } And this also feels a bit odd in that it doesn't catch NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY when it should be ready. I think just marking when the controller is past the timeout and only doing the retry until then might be the better approach. And maybe we should have it in the __nvme_submit_sync_cmd helper for admin command as Security Send/Receive aren't the only commands with this issue. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] nvme: retry security commands if media not ready 2024-10-02 8:16 ` Christoph Hellwig @ 2024-10-02 16:51 ` Greg Joyce 2024-10-03 12:43 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Greg Joyce @ 2024-10-02 16:51 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-nvme, kbusch, axboe, sagi, hare, dwagner, msuchanek, jonathan.derrick, okozina, nilay On Wed, 2024-10-02 at 10:16 +0200, Christoph Hellwig wrote: > On Mon, Sep 30, 2024 at 11:48:43AM -0500, gjoyce@linux.ibm.com wrote: > > +static u32 nvme_get_timeout(struct nvme_ctrl *ctrl) > > get_timeout feels a bit too generic for this specific > controller/media > ready timeout. Agreed. I'll change the name to something more specific. > > > + timeout = NVME_CAP_TIMEOUT(ctrl->cap); > > + if (ctrl->cap & NVME_CAP_CRMS_CRWMS) { > > + u32 crto, ready_timeout; > > + > > + ret = ctrl->ops->reg_read32(ctrl, NVME_REG_CRTO, > > &crto); > > + if (ret) { > > + dev_err(ctrl->device, "Reading CRTO failed > > (%d)\n", > > + ret); > > + return ret; > > + } > > And we really should be caching these values instead of reading the > register for every security command. If we cache the timeout value(s) in nvme_ctrl then it may be possible to just eliminate nvme_get_timeout() entirely. Is this the approach that you were thinking? > > > + u32 timeout; > > + unsigned long timeout_jiffies; > > + int ret; > > + > > + timeout = nvme_get_timeout(ctrl); > > + timeout_jiffies = jiffies + timeout * HZ; > > > > if (send) > > cmd.common.opcode = nvme_admin_security_send; > > @@ -2335,8 +2376,19 @@ static int nvme_sec_submit(void *data, u16 > > spsp, u8 secp, void *buffer, size_t l > > cmd.common.cdw10 = cpu_to_le32(((u32)secp) << 24 | > > ((u32)spsp) << 8); > > cmd.common.cdw11 = cpu_to_le32(len); > > > > - return __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, > > buffer, len, > > + ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, NULL, > > buffer, len, > > NVME_QID_ANY, NVME_SUBMIT_AT_HEAD); > > + while (ret == NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY) { > > + if (time_after(jiffies, timeout_jiffies)) { > > + dev_err(ctrl->device, > > + "Device media not ready; > > aborting\n"); > > + return -ENODEV; > > + } > > + ssleep(1); > > + ret = __nvme_submit_sync_cmd(ctrl->admin_q, &cmd, > > NULL, buffer, > > + len, NVME_QID_ANY, > > NVME_SUBMIT_AT_HEAD); > > + } > > And this also feels a bit odd in that it doesn't catch > NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY when it should be ready. > I think just marking when the controller is past the timeout and > only doing the retry until then might be the better approach. And > maybe we should have it in the __nvme_submit_sync_cmd helper for > admin command as Security Send/Receive aren't the only commands with > this issue. > I'll look at the spec some more but you're probably correct that when the status code transitions from NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY to success, then NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY should not be returned again. I'm a little confused about what you're saying about the timeout. nvme_enable_ctrl() does determine the correct timeout value and passes it to nvme_wait_ready() but NVME_CSTS_RDY is set well before the media is ready (if CC.CRIME is set). Unfortunately there doesn't seem to be any controller status that indicates when the media is ready. I thought about having nvme_wait_ready() wait the whole timeout if CC.CRIME is set, but I think that is contrary to intent of CC.CRIME. And on the SSD that I'm looking at the timeout is 15 seconds which would be a pretty big hit to boot time. I was looking to solve the specific case of the security commands but certainly any of the commands in Figure 103 could have the same issue. The heart of the problem is that CC.CRIME support is not complete. I'll see if I can address that in a more comprehensive manner. -Greg ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] nvme: retry security commands if media not ready 2024-10-02 16:51 ` Greg Joyce @ 2024-10-03 12:43 ` Christoph Hellwig 2024-10-03 13:30 ` Greg Joyce 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2024-10-03 12:43 UTC (permalink / raw) To: Greg Joyce Cc: Christoph Hellwig, linux-nvme, kbusch, axboe, sagi, hare, dwagner, msuchanek, jonathan.derrick, okozina, nilay On Wed, Oct 02, 2024 at 11:51:56AM -0500, Greg Joyce wrote: > If we cache the timeout value(s) in nvme_ctrl then it may be possible > to just eliminate nvme_get_timeout() entirely. Is this the approach > that you were thinking? Yes. > I'm a little confused about what you're saying about the timeout. > nvme_enable_ctrl() does determine the correct timeout value and passes > it to nvme_wait_ready() but NVME_CSTS_RDY is set well before the media > is ready (if CC.CRIME is set). Unfortunately there doesn't seem to be > any controller status that indicates when the media is ready. I thought > about having nvme_wait_ready() wait the whole timeout if CC.CRIME is > set, but I think that is contrary to intent of CC.CRIME. And on the SSD > that I'm looking at the timeout is 15 seconds which would be a pretty > big hit to boot time. What I mean is to simply set a timer for the controller ready timeout, fire it when we set CC.EN and then the time sets a new media ready flag in ctrl->flags. Then only retry when this flag is not set. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] nvme: retry security commands if media not ready 2024-10-03 12:43 ` Christoph Hellwig @ 2024-10-03 13:30 ` Greg Joyce 2024-10-03 14:41 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Greg Joyce @ 2024-10-03 13:30 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-nvme, kbusch, axboe, sagi, hare, dwagner, msuchanek, jonathan.derrick, okozina, nilay On Thu, 2024-10-03 at 14:43 +0200, Christoph Hellwig wrote: > On Wed, Oct 02, 2024 at 11:51:56AM -0500, Greg Joyce wrote: > > If we cache the timeout value(s) in nvme_ctrl then it may be > > possible > > to just eliminate nvme_get_timeout() entirely. Is this the approach > > that you were thinking? > > Yes. > > > I'm a little confused about what you're saying about the timeout. > > nvme_enable_ctrl() does determine the correct timeout value and > > passes > > it to nvme_wait_ready() but NVME_CSTS_RDY is set well before the > > media > > is ready (if CC.CRIME is set). Unfortunately there doesn't seem to > > be > > any controller status that indicates when the media is ready. I > > thought > > about having nvme_wait_ready() wait the whole timeout if CC.CRIME > > is > > set, but I think that is contrary to intent of CC.CRIME. And on the > > SSD > > that I'm looking at the timeout is 15 seconds which would be a > > pretty > > big hit to boot time. > > What I mean is to simply set a timer for the controller ready > timeout, > fire it when we set CC.EN and then the time sets a new media ready > flag in ctrl->flags. Then only retry when this flag is not set. > Thanks. I think that leads me to ask a larger question. If we always wait until media ready, does the driver really need to set NVME_CC_CRIME? Currently it only gets set if both the NVME_CAP_CRMS_CRWMS and NVME_CAP_CRMS_CRIMS are set. So the NVME_CC_CRIME config option is only set based on drive capabilities and not any user configuration. If NVME_CC_CRIME is not set, then the controller doesn't assert ready until the media is ready and this issue does not occur. So the question is, should the fix be to just remove the NVMe driver code that sets NVME_CC_CRIME? -Greg ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] nvme: retry security commands if media not ready 2024-10-03 13:30 ` Greg Joyce @ 2024-10-03 14:41 ` Christoph Hellwig 2024-10-03 23:35 ` Greg Joyce 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2024-10-03 14:41 UTC (permalink / raw) To: Greg Joyce Cc: Christoph Hellwig, linux-nvme, kbusch, axboe, sagi, hare, dwagner, msuchanek, jonathan.derrick, okozina, nilay On Thu, Oct 03, 2024 at 08:30:21AM -0500, Greg Joyce wrote: > Thanks. I think that leads me to ask a larger question. If we always > wait until media ready, We don't generally wait - the namespaces only become online when they are ready, but otherwise we should be up. Unfortunately the technical working group allowed some admin command to return media not ready, probably to shoe horn existing implementation. But this generally is a bad quality of implementation and there is no good reason for it if the little bit of media used by these comes from a separate pool that doesn't take as long to recover. Or in other words, if you see this error someonone at IBM messed up writing their purchase spec.. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] nvme: retry security commands if media not ready 2024-10-03 14:41 ` Christoph Hellwig @ 2024-10-03 23:35 ` Greg Joyce 2024-10-04 5:41 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Greg Joyce @ 2024-10-03 23:35 UTC (permalink / raw) To: Christoph Hellwig Cc: linux-nvme, kbusch, axboe, sagi, hare, dwagner, msuchanek, jonathan.derrick, okozina, nilay On Thu, 2024-10-03 at 16:41 +0200, Christoph Hellwig wrote: > On Thu, Oct 03, 2024 at 08:30:21AM -0500, Greg Joyce wrote: > > Thanks. I think that leads me to ask a larger question. If we > > always > > wait until media ready, > > We don't generally wait - the namespaces only become online when they > are ready, but otherwise we should be up. Unfortunately the > technical > working group allowed some admin command to return media not ready, > probably to shoe horn existing implementation. But this generally > is a bad quality of implementation and there is no good reason for it > if the little bit of media used by these comes from a separate pool > that doesn't take as long to recover. Or in other words, if you see > this error someonone at IBM messed up writing their purchase spec.. > I agree, I wonder about the value/wisdom of the CC.CRIME capability especially since there is no way to read status that indicates that the media is ready for the Figure 103 commands. But it is a defined feature and setting it does cause CSTS.RDY to be asserted before the media is ready. The Kioxia CM7 drive does set both CRWMS and CRIMS (CRMS=11b). And these lines in the NVMe driver thus set CC.CRIME: if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & NVME_CAP_CRMS_CRIMS) ctrl->ctrl_config |= NVME_CC_CRIME; After reading more of the spec and driver code and discussions here, I suggest that those two lines be removed. This has the effect of returning to the NVMe version 1.4 behavior and CSTS.RDY will not be asserted until the media is ready for commands. Greg ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] nvme: retry security commands if media not ready 2024-10-03 23:35 ` Greg Joyce @ 2024-10-04 5:41 ` Christoph Hellwig 2024-10-04 7:22 ` Nilay Shroff 0 siblings, 1 reply; 11+ messages in thread From: Christoph Hellwig @ 2024-10-04 5:41 UTC (permalink / raw) To: Greg Joyce Cc: Christoph Hellwig, linux-nvme, kbusch, axboe, sagi, hare, dwagner, msuchanek, jonathan.derrick, okozina, nilay On Thu, Oct 03, 2024 at 06:35:30PM -0500, Greg Joyce wrote: > I agree, I wonder about the value/wisdom of the CC.CRIME capability > especially since there is no way to read status that indicates that the > media is ready for the Figure 103 commands. But it is a defined feature > and setting it does cause CSTS.RDY to be asserted before the media is > ready. > > The Kioxia CM7 drive does set both CRWMS and CRIMS (CRMS=11b). And > these lines in the NVMe driver thus set CC.CRIME: > > if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & > NVME_CAP_CRMS_CRIMS) > ctrl->ctrl_config |= NVME_CC_CRIME; > > After reading more of the spec and driver code and discussions here, I > suggest that those two lines be removed. This has the effect of > returning to the NVMe version 1.4 behavior and CSTS.RDY will not be > asserted until the media is ready for commands. Well, it is a useful feature unless random admin commands return not ready. Which got weaseld into the spec, but really should not happen to make the feature useful. So I think we'll need to put the workaround in instead of messing up the proper implementations of the feature that aren't this silly. And make sure the big companies put that into their purchase specs. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] nvme: retry security commands if media not ready 2024-10-04 5:41 ` Christoph Hellwig @ 2024-10-04 7:22 ` Nilay Shroff 2024-10-04 12:24 ` Christoph Hellwig 0 siblings, 1 reply; 11+ messages in thread From: Nilay Shroff @ 2024-10-04 7:22 UTC (permalink / raw) To: Christoph Hellwig, Greg Joyce Cc: linux-nvme, kbusch, axboe, sagi, hare, dwagner, msuchanek, jonathan.derrick, okozina On 10/4/24 11:11, Christoph Hellwig wrote: > On Thu, Oct 03, 2024 at 06:35:30PM -0500, Greg Joyce wrote: >> I agree, I wonder about the value/wisdom of the CC.CRIME capability >> especially since there is no way to read status that indicates that the >> media is ready for the Figure 103 commands. But it is a defined feature >> and setting it does cause CSTS.RDY to be asserted before the media is >> ready. >> >> The Kioxia CM7 drive does set both CRWMS and CRIMS (CRMS=11b). And >> these lines in the NVMe driver thus set CC.CRIME: >> >> if (ctrl->cap & NVME_CAP_CRMS_CRWMS && ctrl->cap & >> NVME_CAP_CRMS_CRIMS) >> ctrl->ctrl_config |= NVME_CC_CRIME; >> >> After reading more of the spec and driver code and discussions here, I >> suggest that those two lines be removed. This has the effect of >> returning to the NVMe version 1.4 behavior and CSTS.RDY will not be >> asserted until the media is ready for commands. > > Well, it is a useful feature unless random admin commands return > not ready. Which got weaseld into the spec, but really should not > happen to make the feature useful. So I think we'll need to put the > workaround in instead of messing up the proper implementations of > the feature that aren't this silly. And make sure the big companies > put that into their purchase specs. > How about retrying the command, from nvme_retry_req(), if the return status of the security (or for that matter any admin) command suggests to retry the request (i.e. DNR bit in the status is cleared to 0 and return status of the command is NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY)? Basically, nvme_retry_req() shall calculate the "delay" for retrying request. We should first mark the timestamp the moment we enable the controller (i.e. time when CC.EN bit is set) and later nvme_retry_req() can use it to calculate the time remaining for media to be ready. And use this remaining time as "dealy" for re-queuing the failed request. Yes we also need to set the flag NVME_SUBMIT_RETRY while submitting request in nvme_sec_submit() so that nvme_decide_disposition() allow retrying the request if it fails. Thanks, --Nilay ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: [PATCH 1/1] nvme: retry security commands if media not ready 2024-10-04 7:22 ` Nilay Shroff @ 2024-10-04 12:24 ` Christoph Hellwig 0 siblings, 0 replies; 11+ messages in thread From: Christoph Hellwig @ 2024-10-04 12:24 UTC (permalink / raw) To: Nilay Shroff Cc: Christoph Hellwig, Greg Joyce, linux-nvme, kbusch, axboe, sagi, hare, dwagner, msuchanek, jonathan.derrick, okozina On Fri, Oct 04, 2024 at 12:52:50PM +0530, Nilay Shroff wrote: > How about retrying the command, from nvme_retry_req(), if the return status > of the security (or for that matter any admin) command suggests to retry > the request (i.e. DNR bit in the status is cleared to 0 and return status of > the command is NVME_SC_ADMIN_COMMAND_MEDIA_NOT_READY)? It doesn't sound too horrible, but I'd have to see the result to judge it. ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2024-10-04 12:24 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-09-30 16:48 [PATCH 0/1] nvme: add retry for media not ready error gjoyce 2024-09-30 16:48 ` [PATCH 1/1] nvme: retry security commands if media not ready gjoyce 2024-10-02 8:16 ` Christoph Hellwig 2024-10-02 16:51 ` Greg Joyce 2024-10-03 12:43 ` Christoph Hellwig 2024-10-03 13:30 ` Greg Joyce 2024-10-03 14:41 ` Christoph Hellwig 2024-10-03 23:35 ` Greg Joyce 2024-10-04 5:41 ` Christoph Hellwig 2024-10-04 7:22 ` Nilay Shroff 2024-10-04 12:24 ` Christoph Hellwig
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox