public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Can Guo <cang@codeaurora.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, rnayak@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	saravanak@google.com, salyzyn@google.com,
	Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Nitin Rawat <nitirawa@codeaurora.org>,
	Tomas Winkler <tomas.winkler@intel.com>,
	Bean Huo <beanhuo@micron.com>,
	Satya Tangirala <satyat@google.com>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 4/4] scsi: ufs: Fix up and simplify error recovery mechanism
Date: Tue, 14 Jul 2020 12:26:06 +0800	[thread overview]
Message-ID: <47e7a4ec9a0404bc6d01818fcdad90eb@codeaurora.org> (raw)
In-Reply-To: <fe00619c-f337-397f-9ccf-7babda095210@acm.org>

Hi Bart,

On 2020-07-14 11:52, Bart Van Assche wrote:
> On 2020-07-13 19:28, Can Guo wrote:
>> o Queue eh_work on a single threaded workqueue to avoid concurrency 
>> between
>>   eh_works.
> 
> Please use another approach (mutex?) to serialize error handling. There 
> are
> already way too workqueues in a running Linux system.
> 
>> o According to the UFSHCI JEDEC spec, hibern8 enter/exit error occurs 
>> when
>>   the link is broken. This actaully applies to any power mode change
>>   operations. In this change, if a power mode change operation 
>> (including
>>   AH8 enter/exit) fails, mark the link state as UIC_LINK_BROKEN_STATE 
>> and
>>   schedule eh_work. eh_work needs to do full reset and restore to 
>> recover
>>   the link back to active. Before the link state is recovered to 
>> active by
>>   eh_work, any power mode change attempts just return -ENOLINK to 
>> avoid
>>   consecutive HW error.
>> 
>> o To avoid concurrency between eh_work and link recovery, remove link
>>   recovery from hibern8 enter/exit func. If hibern8 enter/exit func 
>> fails,
>>   simply return error code and let eh_work run in parallel.
>> 
>> o Recover UFS hba runtime PM error in eh_work. If 
>> ufschd_suspend/resume
>>   fails due to UFS error, e.g. hibern8 enter/exit error and SSU cmd 
>> error,
>>   the runtime PM framework saves the error to dev.power.runtime_error.
>>   After that, hba runtime suspend/resume would not be invoked anymore 
>> until
>>   dev.power.runtime_error is cleared. The runtime PM error can be 
>> recovered
>>   in eh_work by calling pm_runtime_set_active() after reset and 
>> restore
>>   succeeds. Meanwhile, if pm_runtime_set_active() returns no error, 
>> which
>>   means dev.power.runtime_error is cleared, we also need to explicitly
>>   resume those scsi devices under hba in case any of them has failed 
>> to be
>>   resumed due to hba runtime resume error.
>> 
>> o Fix a racing problem between eh_work and ufshcd_suspend/resume. In 
>> the
>>   old code, it blocks scsi requests before schedules eh_work, but when
>>   eh_work calls pm_runtime_get_sync(), if ufshcd_suspend/resume is 
>> sending
>>   a scsi cmd, most likely the SSU cmd, pm_runtime_get_sync() will 
>> never
>>   return because scsi requests were blocked. To fix this racing 
>> problem,
>>   o Don't block scsi requests before schedule eh_work, but let eh_work
>>     block scsi requests when eh_work is ready to start error recovery.
>>   o Meanwhile, if eh_work is schueduled due to fatal error, don't 
>> requeue
>>     the scsi cmds sent from ufshcd_suspend/resume path, but simply let 
>> the
>>     scsi cmds fail. If the scsi cmds fail, hba runtime suspend/resume 
>> fails
>>     too, but it does hurt since eh_work recovers hba runtime PM error.
>> 
>> o Move host/regs dump in ufshcd_check_errors() to eh_work because 
>> heavy
>>   dump in IRQ context can lead to stability issues. In addition, some 
>> clean
>>   up in ufshcd_print_host_regs() and ufshcd_print_host_state().
> 
> The above list is a long list. To me that is a sign that this patch 
> needs to
> be split into multiple patches.
> 
> Thanks,
> 
> Bart.

Sure, will split it into a few patches.

Thanks,

Can Guo.

  reply	other threads:[~2020-07-14  4:28 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
     [not found] <1594693693-22466-1-git-send-email-cang@codeaurora.org>
2020-07-14  2:28 ` [PATCH v2 1/4] scsi: ufs: Add checks before setting clk-gating states Can Guo
2020-07-14  3:38   ` Bart Van Assche
2020-07-14  4:11     ` Can Guo
2020-07-14  2:28 ` [PATCH v2 2/4] scsi: ufs: Fix imbalanced scsi_block_reqs_cnt caused by ufshcd_hold() Can Guo
2020-07-14  3:41   ` Bart Van Assche
2020-07-14  4:11     ` Can Guo
2020-07-14  2:28 ` [PATCH v2 3/4] ufs: ufs-qcom: Fix a few BUGs in func ufs_qcom_dump_dbg_regs() Can Guo
2020-07-14  3:47   ` Bart Van Assche
2020-07-14  4:17     ` Can Guo
2020-07-14  2:28 ` [PATCH v2 4/4] scsi: ufs: Fix up and simplify error recovery mechanism Can Guo
2020-07-14  3:52   ` Bart Van Assche
2020-07-14  4:26     ` Can Guo [this message]
2020-07-14  9:13       ` Can Guo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47e7a4ec9a0404bc6d01818fcdad90eb@codeaurora.org \
    --to=cang@codeaurora.org \
    --cc=alim.akhtar@samsung.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=hongwus@codeaurora.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=nitirawa@codeaurora.org \
    --cc=rnayak@codeaurora.org \
    --cc=salyzyn@google.com \
    --cc=saravanak@google.com \
    --cc=satyat@google.com \
    --cc=stanley.chu@mediatek.com \
    --cc=tomas.winkler@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox