From: Seungwon Jeon <tgih.jun@samsung.com>
To: 'Sujit Reddy Thumma' <sthumma@codeaurora.org>
Cc: 'Vinayak Holikatti' <vinholikatti@gmail.com>,
'Santosh Y' <santoshsy@gmail.com>,
"'James E.J. Bottomley'" <JBottomley@parallels.com>,
linux-scsi@vger.kernel.org, linux-arm-msm@vger.kernel.org
Subject: RE: [PATCH V3 4/4] scsi: ufs: Improve UFS fatal error handling
Date: Wed, 24 Jul 2013 22:39:33 +0900 [thread overview]
Message-ID: <001a01ce8873$3aed9b70$b0c8d250$%jun@samsung.com> (raw)
In-Reply-To: <51EEA40E.9010506@codeaurora.org>
On Wed, July 24, 2013, Sujit Reddy Thumma wrote:
> On 7/23/2013 2:04 PM, Seungwon Jeon wrote:
> > On Sat, July 20, 2013, Sujit Reddy Thumma wrote:
> >> On 7/19/2013 7:28 PM, Seungwon Jeon wrote:
> >>> On Tue, July 09, 2013, Sujit Reddy Thumma wrote:
> >>>> Error handling in UFS driver is broken and resets the host controller
> >>>> for fatal errors without re-initialization. Correct the fatal error
> >>>> handling sequence according to UFS Host Controller Interface (HCI)
> >>>> v1.1 specification.
> >>>>
> >>>> o Upon determining fatal error condition the host controller may hang
> >>>> forever until a reset is applied, so just retrying the command doesn't
> >>>> work without a reset. So, the reset is applied in the driver context
> >>>> in a separate work and SCSI mid-layer isn't informed until reset is
> >>>> applied.
> >>>>
> >>>> o Processed requests which are completed without error are reported to
> >>>> SCSI layer as successful and any pending commands that are not started
> >>>> yet or are not cause of the error are re-queued into scsi midlayer queue.
> >>>> For the command that caused error, host controller or device is reset
> >>>> and DID_ERROR is returned for command retry after applying reset.
> >>>>
> >>>> o SCSI is informed about the expected Unit-Attentioni exception from the
> >>> Attention'i', typo.
> >> Okay.
> >>
> >>>
> >>>> device for the immediate command after a reset so that the SCSI layer
> >>>> take necessary steps to establish communication with the device.
> >>>>
> >>>> Signed-off-by: Sujit Reddy Thumma <sthumma@codeaurora.org>
> >>>> ---
> >>>> drivers/scsi/ufs/ufshcd.c | 349 +++++++++++++++++++++++++++++++++++---------
> >>>> drivers/scsi/ufs/ufshcd.h | 2 +
> >>>> drivers/scsi/ufs/ufshci.h | 19 ++-
> >>>> 3 files changed, 295 insertions(+), 75 deletions(-)
> >>>>
> >>>> diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> >>>> index b4c9910..2a3874f 100644
> >>>> --- a/drivers/scsi/ufs/ufshcd.c
> >>>> +++ b/drivers/scsi/ufs/ufshcd.c
> >>>> @@ -80,6 +80,14 @@ enum {
> >>>> UFSHCD_EH_DEVICE_RESET_PENDING = (1 << 1),
> >>>> };
> >>>>
> >>>> +/* UFSHCD UIC layer error flags */
> >>>> +enum {
> >>>> + UFSHCD_UIC_DL_PA_INIT_ERROR = (1 << 0), /* Data link layer error */
> >>>> + UFSHCD_UIC_NL_ERROR = (1 << 1), /* Network layer error */
> >>>> + UFSHCD_UIC_TL_ERROR = (1 << 2), /* Transport Layer error */
> >>>> + UFSHCD_UIC_DME_ERROR = (1 << 3), /* DME error */
> >>>> +};
> >>>> +
> >>>> /* Interrupt configuration options */
> >>>> enum {
> >>>> UFSHCD_INT_DISABLE,
> >>>> @@ -108,6 +116,7 @@ enum {
> >>>>
> >>>> static void ufshcd_tmc_handler(struct ufs_hba *hba);
> >>>> static void ufshcd_async_scan(void *data, async_cookie_t cookie);
> >>>> +static int ufshcd_reset_and_restore(struct ufs_hba *hba);
> >>>>
> >>>> /*
> >>>> * ufshcd_wait_for_register - wait for register value to change
> >>>> @@ -1605,9 +1614,6 @@ static int ufshcd_make_hba_operational(struct ufs_hba *hba)
> >>>> goto out;
> >>>> }
> >>>>
> >>>> - if (hba->ufshcd_state == UFSHCD_STATE_RESET)
> >>>> - scsi_unblock_requests(hba->host);
> >>>> -
> >>>> out:
> >>>> return err;
> >>>> }
> >>>> @@ -1733,66 +1739,6 @@ static int ufshcd_validate_dev_connection(struct ufs_hba *hba)
> >>>> }
> >>>>
> >>>> /**
> >>>> - * ufshcd_do_reset - reset the host controller
> >>>> - * @hba: per adapter instance
> >>>> - *
> >>>> - * Returns SUCCESS/FAILED
> >>>> - */
> >>>> -static int ufshcd_do_reset(struct ufs_hba *hba)
> >>>> -{
> >>>> - struct ufshcd_lrb *lrbp;
> >>>> - unsigned long flags;
> >>>> - int tag;
> >>>> -
> >>>> - /* block commands from midlayer */
> >>>> - scsi_block_requests(hba->host);
> >>>> -
> >>>> - spin_lock_irqsave(hba->host->host_lock, flags);
> >>>> - hba->ufshcd_state = UFSHCD_STATE_RESET;
> >>>> -
> >>>> - /* send controller to reset state */
> >>>> - ufshcd_hba_stop(hba);
> >>>> - spin_unlock_irqrestore(hba->host->host_lock, flags);
> >>>> -
> >>>> - /* abort outstanding commands */
> >>>> - for (tag = 0; tag < hba->nutrs; tag++) {
> >>>> - if (test_bit(tag, &hba->outstanding_reqs)) {
> >>>> - lrbp = &hba->lrb[tag];
> >>>> - if (lrbp->cmd) {
> >>>> - scsi_dma_unmap(lrbp->cmd);
> >>>> - lrbp->cmd->result = DID_RESET << 16;
> >>>> - lrbp->cmd->scsi_done(lrbp->cmd);
> >>>> - lrbp->cmd = NULL;
> >>>> - clear_bit_unlock(tag, &hba->lrb_in_use);
> >>>> - }
> >>>> - }
> >>>> - }
> >>>> -
> >>>> - /* complete device management command */
> >>>> - if (hba->dev_cmd.complete)
> >>>> - complete(hba->dev_cmd.complete);
> >>>> -
> >>>> - /* clear outstanding request/task bit maps */
> >>>> - hba->outstanding_reqs = 0;
> >>>> - hba->outstanding_tasks = 0;
> >>>> -
> >>>> - /* Host controller enable */
> >>>> - if (ufshcd_hba_enable(hba)) {
> >>>> - dev_err(hba->dev,
> >>>> - "Reset: Controller initialization failed\n");
> >>>> - return FAILED;
> >>>> - }
> >>>> -
> >>>> - if (ufshcd_link_startup(hba)) {
> >>>> - dev_err(hba->dev,
> >>>> - "Reset: Link start-up failed\n");
> >>>> - return FAILED;
> >>>> - }
> >>>> -
> >>>> - return SUCCESS;
> >>>> -}
> >>>> -
> >>>> -/**
> >>>> * ufshcd_slave_alloc - handle initial SCSI device configurations
> >>>> * @sdev: pointer to SCSI device
> >>>> *
> >>>> @@ -1809,6 +1755,9 @@ static int ufshcd_slave_alloc(struct scsi_device *sdev)
> >>>> sdev->use_10_for_ms = 1;
> >>>> scsi_set_tag_type(sdev, MSG_SIMPLE_TAG);
> >>>>
> >>>> + /* allow SCSI layer to restart the device in case of errors */
> >>>> + sdev->allow_restart = 1;
> >>>> +
> >>>> /*
> >>>> * Inform SCSI Midlayer that the LUN queue depth is same as the
> >>>> * controller queue depth. If a LUN queue depth is less than the
> >>>> @@ -2013,6 +1962,9 @@ ufshcd_transfer_rsp_status(struct ufs_hba *hba, struct ufshcd_lrb *lrbp)
> >>>> case OCS_ABORTED:
> >>>> result |= DID_ABORT << 16;
> >>>> break;
> >>>> + case OCS_INVALID_COMMAND_STATUS:
> >>>> + result |= DID_REQUEUE << 16;
> >>>> + break;
> >>>> case OCS_INVALID_CMD_TABLE_ATTR:
> >>>> case OCS_INVALID_PRDT_ATTR:
> >>>> case OCS_MISMATCH_DATA_BUF_SIZE:
> >>>> @@ -2405,42 +2357,295 @@ out:
> >>>> return err;
> >>>> }
> >>>>
> >>>> +static void ufshcd_decide_eh_xfer_req(struct ufs_hba *hba, u32 ocs)
> >>>> +{
> >>>> + switch (ocs) {
> >>>> + case OCS_SUCCESS:
> >>>> + case OCS_INVALID_COMMAND_STATUS:
> >>>> + break;
> >>>> + case OCS_MISMATCH_DATA_BUF_SIZE:
> >>>> + case OCS_MISMATCH_RESP_UPIU_SIZE:
> >>>> + case OCS_PEER_COMM_FAILURE:
> >>>> + case OCS_FATAL_ERROR:
> >>>> + case OCS_ABORTED:
> >>>> + case OCS_INVALID_CMD_TABLE_ATTR:
> >>>> + case OCS_INVALID_PRDT_ATTR:
> >>>> + ufshcd_set_host_reset_pending(hba);
> >>> Should host be reset on ocs error, including below ufshcd_decide_eh_task_req?
> >>> It's just overall command status.
> >>
> >> Yes, the error handling section in the UFS 1.1 spec. mentions so.
> > If host's reset is required, it should be allowed in fatal situation.
> > Deciding with OCS field seems not proper. There is no mentions for that in spec.
> > If I have a wrong information, please let it clear.
> >
>
> You can refer to section 8.3 of HCI spec.
> On fatal errors the controller h/w will have to update the OCS field of
> the command that caused error and then raise an fatal error interrupt.
> The s/w reads the OCS value and determine commands that are in error
> and then carry out reset.
I don't think so.
OCS field can be updated regardless of fatal error.
As mentioned previously, your implementations are gathering all errors into 'ufshcd_fatal_err_handler'.
It means that non-fatal error is also handled and if any OCS value, host reset will be reserved.
>
> >>
> >>>
> >>>> + break;
> >>>> + default:
> >>>> + dev_err(hba->dev, "%s: unknown OCS 0x%x\n",
> >>>> + __func__, ocs);
> >>>> + BUG();
> >>>> + }
> >>>> +}
> >>>> +
> >>>> +static void ufshcd_decide_eh_task_req(struct ufs_hba *hba, u32 ocs)
> >>>> +{
> >>>> + switch (ocs) {
> >>>> + case OCS_TMR_SUCCESS:
> >>>> + case OCS_TMR_INVALID_COMMAND_STATUS:
> >>>> + break;
> >>>> + case OCS_TMR_MISMATCH_REQ_SIZE:
> >>>> + case OCS_TMR_MISMATCH_RESP_SIZE:
> >>>> + case OCS_TMR_PEER_COMM_FAILURE:
> >>>> + case OCS_TMR_INVALID_ATTR:
> >>>> + case OCS_TMR_ABORTED:
> >>>> + case OCS_TMR_FATAL_ERROR:
> >>>> + ufshcd_set_host_reset_pending(hba);
> >>>> + break;
> >>>> + default:
> >>>> + dev_err(hba->dev, "%s: uknown TMR OCS 0x%x\n",
> >>>> + __func__, ocs);
> >>>> + BUG();
> >>>> + }
> >>>> +}
> >>>> +
> >>>> /**
> >>>> - * ufshcd_fatal_err_handler - handle fatal errors
> >>>> + * ufshcd_error_autopsy_transfer_req() - reads OCS field of failed command and
> >>>> + * decide error handling
> >>>> * @hba: per adapter instance
> >>>> + * @err_xfer: bit mask for transfer request errors
> >>>> + *
> >>>> + * Iterate over completed transfer requests and
> >>>> + * set error handling flags.
> >>>> + */
> >>>> +static void
> >>>> +ufshcd_error_autopsy_transfer_req(struct ufs_hba *hba, u32 *err_xfer)
> >>>> +{
> >>>> + unsigned long completed;
> >>>> + u32 doorbell;
> >>>> + int index;
> >>>> + int ocs;
> >>>> +
> >>>> + if (!err_xfer)
> >>>> + goto out;
> >>>> +
> >>>> + doorbell = ufshcd_readl(hba, REG_UTP_TRANSFER_REQ_DOOR_BELL);
> >>>> + completed = doorbell ^ (u32)hba->outstanding_reqs;
> >>>> +
> >>>> + for (index = 0; index < hba->nutrs; index++) {
> >>>> + if (test_bit(index, &completed)) {
> >>>> + ocs = ufshcd_get_tr_ocs(&hba->lrb[index]);
> >>>> + if ((ocs == OCS_SUCCESS) ||
> >>>> + (ocs == OCS_INVALID_COMMAND_STATUS))
> >>>> + continue;
> >>>> +
> >>>> + *err_xfer |= (1 << index);
> >>>> + ufshcd_decide_eh_xfer_req(hba, ocs);
> >>>> + }
> >>>> + }
> >>>> +out:
> >>>> + return;
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * ufshcd_error_autopsy_task_req() - reads OCS field of failed command and
> >>>> + * decide error handling
> >>>> + * @hba: per adapter instance
> >>>> + * @err_tm: bit mask for task management errors
> >>>> + *
> >>>> + * Iterate over completed task management requests and
> >>>> + * set error handling flags.
> >>>> + */
> >>>> +static void
> >>>> +ufshcd_error_autopsy_task_req(struct ufs_hba *hba, u32 *err_tm)
> >>>> +{
> >>>> + unsigned long completed;
> >>>> + u32 doorbell;
> >>>> + int index;
> >>>> + int ocs;
> >>>> +
> >>>> + if (!err_tm)
> >>>> + goto out;
> >>>> +
> >>>> + doorbell = ufshcd_readl(hba, REG_UTP_TASK_REQ_DOOR_BELL);
> >>>> + completed = doorbell ^ (u32)hba->outstanding_tasks;
> >>>> +
> >>>> + for (index = 0; index < hba->nutmrs; index++) {
> >>>> + if (test_bit(index, &completed)) {
> >>>> + struct utp_task_req_desc *tm_descp;
> >>>> +
> >>>> + tm_descp = hba->utmrdl_base_addr;
> >>>> + ocs = ufshcd_get_tmr_ocs(&tm_descp[index]);
> >>>> + if ((ocs == OCS_TMR_SUCCESS) ||
> >>>> + (ocs == OCS_TMR_INVALID_COMMAND_STATUS))
> >>>> + continue;
> >>>> +
> >>>> + *err_tm |= (1 << index);
> >>>> + ufshcd_decide_eh_task_req(hba, ocs);
> >>>> + }
> >>>> + }
> >>>> +
> >>>> +out:
> >>>> + return;
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * ufshcd_fatal_err_handler - handle fatal errors
> >>>> + * @work: pointer to work structure
> >>>> */
> >>>> static void ufshcd_fatal_err_handler(struct work_struct *work)
> >>>> {
> >>>> struct ufs_hba *hba;
> >>>> + unsigned long flags;
> >>>> + u32 err_xfer = 0;
> >>>> + u32 err_tm = 0;
> >>>> + int err;
> >>>> +
> >>>> hba = container_of(work, struct ufs_hba, feh_workq);
> >>>>
> >>>> pm_runtime_get_sync(hba->dev);
> >>>> - /* check if reset is already in progress */
> >>>> - if (hba->ufshcd_state != UFSHCD_STATE_RESET)
> >>>> - ufshcd_do_reset(hba);
> >>>> + spin_lock_irqsave(hba->host->host_lock, flags);
> >>>> + if (hba->ufshcd_state == UFSHCD_STATE_RESET) {
> >>>> + /* complete processed requests and exit */
> >>>> + ufshcd_transfer_req_compl(hba);
> >>>> + ufshcd_tmc_handler(hba);
> >>>> + spin_unlock_irqrestore(hba->host->host_lock, flags);
> >>>> + pm_runtime_put_sync(hba->dev);
> >>>> + return;
> >>> Host driver is here with finishing 'scsi_block_requests'.
> >>> 'scsi_unblock_requests' can be called somewhere?
> >>
> >> No, but it can be possible that SCSI command timeout which triggers
> >> device/host reset and fatal error handler race each other.
> > Sorry, I didn't get your meaning exactly.
> > I saw that scsi_block_requests is done before ufshcd_fatal_err_handler is scheduled.
> > If device or host was requested from scsi mid-layer just before ufshcd_fatal_err_handler,
> > ufshcd_fatal_err_handler will be out through if statement. Then, there is nowhere to call
> scsi_unblock_requests
> > though device/host reset is done successfully.
>
> You are right, this should return with scsi_unblock_requests()
> called and there is no need to complete the processed requests as we
> might be in middle of something else while the RESET is in progress.
>
>
> >>
> >>>
> >>>> + }
> >>>> +
> >>>> + hba->ufshcd_state = UFSHCD_STATE_RESET;
> >>>> + ufshcd_error_autopsy_transfer_req(hba, &err_xfer);
> >>>> + ufshcd_error_autopsy_task_req(hba, &err_tm);
> >>>> +
> >>>> + /*
> >>>> + * Complete successful and pending transfer requests.
> >>>> + * DID_REQUEUE is returned for pending requests as they have
> >>>> + * nothing to do with error'ed request and SCSI layer should
> >>>> + * not treat them as errors and decrement retry count.
> >>>> + */
> >>>> + hba->outstanding_reqs &= ~err_xfer;
> >>>> + ufshcd_transfer_req_compl(hba);
> >>>> + spin_unlock_irqrestore(hba->host->host_lock, flags);
> >>>> + ufshcd_complete_pending_reqs(hba);
> >>>> + spin_lock_irqsave(hba->host->host_lock, flags);
> >>>> + hba->outstanding_reqs |= err_xfer;
> >>> Hmm... error handling seems so complicated.
> >>> To simplify it, how about below?
> >>>
> >>> 1. If requests(transfer or task management) are completed, finish them with success/failure.
> >> This is what we are trying to do above.
> >>
> >>> 2. If there are pending requests, abort them.
> >> No, if a fatal error is occurred it is possible that host controller is
> >> freez'ed we are not sure if it can take task management commands and
> >> execute them.
> > I meant that aborting the request by clearing corresponding UTMRLCLR/UTMRLCLR.
> >
>
> I am doing the same in this patch -
> 1) Return to SCSI the successful commands.
> 2) Clear the pending (but not cause of error) commands by writing into
> UTMRLCLR/UTRCLR registers. So scsi_host_result = DID_REQUEUE
> 3) Reset and return the commands that "caused error" to SCSI with
> DID_ERROR.
>
> Am I doing anything extra than what you have suggested?
If some are cleared, let me review more.
>
> >>
> >>> 3. If fatal error, reset.
> >>>
> >>
> >>
> >>>> +
> >>>> + /* Complete successful and pending task requests */
> >>>> + hba->outstanding_tasks &= ~err_tm;
> >>>> + ufshcd_tmc_handler(hba);
> >>>> + spin_unlock_irqrestore(hba->host->host_lock, flags);
> >>>> + ufshcd_complete_pending_tasks(hba);
> >>>> + spin_lock_irqsave(hba->host->host_lock, flags);
> >>>> +
> >>>> + hba->outstanding_tasks |= err_tm;
> >>>> +
> >>>> + /*
> >>>> + * Controller may generate multiple fatal errors, handle
> >>>> + * errors based on severity.
> >>>> + * 1) DEVICE_FATAL_ERROR
> >>>> + * 2) SYSTEM_BUS/CONTROLLER_FATAL_ERROR
> >>>> + * 3) UIC_ERROR
> >>>> + */
> >>>> + if (hba->errors & DEVICE_FATAL_ERROR) {
> >>>> + /*
> >>>> + * Some HBAs may not clear UTRLDBR/UTMRLDBR or update
> >>>> + * OCS field on device fatal error.
> >>>> + */
> >>>> + ufshcd_set_host_reset_pending(hba);
> >>> In DEVICE_FATAL_ERROR, ufshcd_device_reset_pending is right?
> >>
> >> It looks so, but the spec. mentions to reset the host as well (8.3.6).
> > Do you pointing below?
> > [8.3.6. Device Errors are fatal errors. ...the host software shall reset the device too.]
> >
>
> I meant "8.3.6: When this condition occurs, host software shall follow
> the same procedure for UIC error handling as described in 8.2.2,". There
> is an error in the spec. it was not 8.2.2 but 8.3.2 for UIC error
> handling. So going by 8.3.2 HCE needs to be toggled.
I feel like 8.3.2 of spec. makes it difficult to identifying 'device fatal error' with a fatal UIC error.
It needs to clarify these.
Anyway, I found some descriptions related to host' reset.
5.3.1 Device Fatal Error Status (DFES):
...
If the error occurs, host SW should reset the host controller.
I's explicit. If spec. saying is right, we would reset host.
>
>
>
> >>
> >>>
> >>>> + } else if (hba->errors & (SYSTEM_BUS_FATAL_ERROR |
> >>>> + CONTROLLER_FATAL_ERROR)) {
> >>>> + /* eh flags should be set in err autopsy based on OCS values */
> >>>> + if (!hba->eh_flags)
> >>>> + WARN(1, "%s: fatal error without error handling\n",
> >>>> + dev_name(hba->dev));
> >>>> + } else if (hba->errors & UIC_ERROR) {
> >>>> + if (hba->uic_error & UFSHCD_UIC_DL_PA_INIT_ERROR) {
> >>>> + /* fatal error - reset controller */
> >>>> + ufshcd_set_host_reset_pending(hba);
> >>>> + } else if (hba->uic_error & (UFSHCD_UIC_NL_ERROR |
> >>>> + UFSHCD_UIC_TL_ERROR |
> >>>> + UFSHCD_UIC_DME_ERROR)) {
> >>>> + /* non-fatal, report error to SCSI layer */
> >>>> + if (!hba->eh_flags) {
> >>>> + spin_unlock_irqrestore(
> >>>> + hba->host->host_lock, flags);
> >>>> + ufshcd_complete_pending_reqs(hba);
> >>>> + ufshcd_complete_pending_tasks(hba);
> >>>> + spin_lock_irqsave(hba->host->host_lock, flags);
> >>>> + }
> >>>> + }
> >>>> + }
> >>>> + spin_unlock_irqrestore(hba->host->host_lock, flags);
> >>>> +
> >>>> + if (hba->eh_flags) {
> >>>> + err = ufshcd_reset_and_restore(hba);
> >>>> + if (err) {
> >>>> + ufshcd_clear_host_reset_pending(hba);
> >>>> + ufshcd_clear_device_reset_pending(hba);
> >>>> + dev_err(hba->dev, "%s: reset and restore failed\n",
> >>>> + __func__);
> >>>> + hba->ufshcd_state = UFSHCD_STATE_ERROR;
> >>>> + }
> >>>> + /*
> >>>> + * Inform scsi mid-layer that we did reset and allow to handle
> >>>> + * Unit Attention properly.
> >>>> + */
> >>>> + scsi_report_bus_reset(hba->host, 0);
> >>>> + hba->errors = 0;
> >>>> + hba->uic_error = 0;
> >>>> + }
> >>>> + scsi_unblock_requests(hba->host);
> >>>> pm_runtime_put_sync(hba->dev);
> >>>> }
> >>>>
> >>>> /**
> >>>> - * ufshcd_err_handler - Check for fatal errors
> >>>> - * @work: pointer to a work queue structure
> >>>> + * ufshcd_update_uic_error - check and set fatal UIC error flags.
> >>>> + * @hba: per-adapter instance
> >>>> */
> >>>> -static void ufshcd_err_handler(struct ufs_hba *hba)
> >>>> +static void ufshcd_update_uic_error(struct ufs_hba *hba)
> >>>> {
> >>>> u32 reg;
> >>>>
> >>>> + /* PA_INIT_ERROR is fatal and needs UIC reset */
> >>>> + reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_DATA_LINK_LAYER);
> >>>> + if (reg & UIC_DATA_LINK_LAYER_ERROR_PA_INIT)
> >>>> + hba->uic_error |= UFSHCD_UIC_DL_PA_INIT_ERROR;
> >>>> +
> >>>> + /* UIC NL/TL/DME errors needs software retry */
> >>>> + reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_NETWORK_LAYER);
> >>>> + if (reg)
> >>>> + hba->uic_error |= UFSHCD_UIC_NL_ERROR;
> >>>> +
> >>>> + reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_TRANSPORT_LAYER);
> >>>> + if (reg)
> >>>> + hba->uic_error |= UFSHCD_UIC_TL_ERROR;
> >>>> +
> >>>> + reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_DME);
> >>>> + if (reg)
> >>>> + hba->uic_error |= UFSHCD_UIC_DME_ERROR;
> >>> REG_UIC_ERROR_CODE_PHY_ADAPTER_LAYER is not handled.
> >>
> >> UFS spec. mentions that it is non-fatal error and UIC recovers
> >> by itself and doesn't need software intervention.
> > Ok.
> >
> >>
> >>>
> >>>> +
> >>>> + dev_dbg(hba->dev, "%s: UIC error flags = 0x%08x\n",
> >>>> + __func__, hba->uic_error);
> >>>> +}
> >>>> +
> >>>> +/**
> >>>> + * ufshcd_err_handler - Check for fatal errors
> >>>> + * @hba: per-adapter instance
> >>>> + */
> >>>> +static void ufshcd_err_handler(struct ufs_hba *hba)
> >>>> +{
> >>>> if (hba->errors & INT_FATAL_ERRORS)
> >>>> goto fatal_eh;
> >>>>
> >>>> if (hba->errors & UIC_ERROR) {
> >>>> - reg = ufshcd_readl(hba, REG_UIC_ERROR_CODE_DATA_LINK_LAYER);
> >>>> - if (reg & UIC_DATA_LINK_LAYER_ERROR_PA_INIT)
> >>>> + hba->uic_error = 0;
> >>>> + ufshcd_update_uic_error(hba);
> >>>> + if (hba->uic_error)
> >>> Except UFSHCD_UIC_DL_PA_INIT_ERROR, it's not fatal. Should it go to fatal_eh?
> >>
> >> Please see the UIC error handling in ufshcd_fatal_err_handler(), others
> >> need software intervention so I combined it with fatal_eh to complete
> >> the requests and report to SCSI.
> > As gathering all error(fatal, non-fatal)handling into origin one, it makes confused.
> > Then, I would be better to rename ufshcd_fatal_err_handler.
> >
>
> Yeah, ufshcd_err_handler is apt but it is already consumed. Probably,
> ufshcd_err_handler -> ufshcd_check_errors
> ufshcd_fatal_err_handler -> ufshcd_err_handler
> rename would be fine?
I like it.
Thanks,
Seungwon Jeon
next prev parent reply other threads:[~2013-07-24 13:39 UTC|newest]
Thread overview: 27+ messages / expand[flat|nested] mbox.gz Atom feed top
2013-07-09 9:16 [PATCH V3 0/4] scsi: ufs: Improve UFS error handling Sujit Reddy Thumma
2013-07-09 9:16 ` [PATCH V3 1/4] scsi: ufs: Fix broken task management command implementation Sujit Reddy Thumma
2013-07-09 10:42 ` merez
2013-07-19 13:56 ` Seungwon Jeon
2013-07-19 18:26 ` Sujit Reddy Thumma
2013-07-23 8:24 ` Seungwon Jeon
2013-07-23 15:40 ` Sujit Reddy Thumma
2013-07-09 9:16 ` [PATCH V3 2/4] scsi: ufs: Fix hardware race conditions while aborting a command Sujit Reddy Thumma
2013-07-09 10:42 ` merez
2013-07-19 13:56 ` Seungwon Jeon
2013-07-19 18:26 ` Sujit Reddy Thumma
2013-07-09 9:16 ` [PATCH V3 3/4] scsi: ufs: Fix device and host reset methods Sujit Reddy Thumma
2013-07-09 10:43 ` merez
2013-07-19 13:57 ` Seungwon Jeon
2013-07-19 18:26 ` Sujit Reddy Thumma
2013-07-23 8:27 ` Seungwon Jeon
2013-07-23 15:40 ` Sujit Reddy Thumma
2013-07-24 13:39 ` Seungwon Jeon
2013-07-29 9:45 ` Sujit Reddy Thumma
2013-07-09 9:16 ` [PATCH V3 4/4] scsi: ufs: Improve UFS fatal error handling Sujit Reddy Thumma
2013-07-09 10:43 ` merez
2013-07-19 13:58 ` Seungwon Jeon
2013-07-19 18:26 ` Sujit Reddy Thumma
2013-07-23 8:34 ` Seungwon Jeon
2013-07-23 15:41 ` Sujit Reddy Thumma
2013-07-24 13:39 ` Seungwon Jeon [this message]
2013-07-29 9:45 ` Sujit Reddy Thumma
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='001a01ce8873$3aed9b70$b0c8d250$%jun@samsung.com' \
--to=tgih.jun@samsung.com \
--cc=JBottomley@parallels.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=santoshsy@gmail.com \
--cc=sthumma@codeaurora.org \
--cc=vinholikatti@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).