All of lore.kernel.org
 help / color / mirror / Atom feed
From: Can Guo <cang@codeaurora.org>
To: Bart Van Assche <bvanassche@acm.org>
Cc: asutoshd@codeaurora.org, nguyenb@codeaurora.org,
	hongwus@codeaurora.org, rnayak@codeaurora.org,
	linux-scsi@vger.kernel.org, kernel-team@android.com,
	saravanak@google.com, salyzyn@google.com,
	Alim Akhtar <alim.akhtar@samsung.com>,
	Avri Altman <avri.altman@wdc.com>,
	"James E.J. Bottomley" <jejb@linux.ibm.com>,
	"Martin K. Petersen" <martin.petersen@oracle.com>,
	Stanley Chu <stanley.chu@mediatek.com>,
	Nitin Rawat <nitirawa@codeaurora.org>,
	Tomas Winkler <tomas.winkler@intel.com>,
	Bean Huo <beanhuo@micron.com>,
	Satya Tangirala <satyat@google.com>,
	open list <linux-kernel@vger.kernel.org>
Subject: Re: [PATCH v2 4/4] scsi: ufs: Fix up and simplify error recovery mechanism
Date: Tue, 14 Jul 2020 12:26:06 +0800	[thread overview]
Message-ID: <47e7a4ec9a0404bc6d01818fcdad90eb@codeaurora.org> (raw)
In-Reply-To: <fe00619c-f337-397f-9ccf-7babda095210@acm.org>

Hi Bart,

On 2020-07-14 11:52, Bart Van Assche wrote:
> On 2020-07-13 19:28, Can Guo wrote:
>> o Queue eh_work on a single threaded workqueue to avoid concurrency 
>> between
>>   eh_works.
> 
> Please use another approach (mutex?) to serialize error handling. There 
> are
> already way too workqueues in a running Linux system.
> 
>> o According to the UFSHCI JEDEC spec, hibern8 enter/exit error occurs 
>> when
>>   the link is broken. This actaully applies to any power mode change
>>   operations. In this change, if a power mode change operation 
>> (including
>>   AH8 enter/exit) fails, mark the link state as UIC_LINK_BROKEN_STATE 
>> and
>>   schedule eh_work. eh_work needs to do full reset and restore to 
>> recover
>>   the link back to active. Before the link state is recovered to 
>> active by
>>   eh_work, any power mode change attempts just return -ENOLINK to 
>> avoid
>>   consecutive HW error.
>> 
>> o To avoid concurrency between eh_work and link recovery, remove link
>>   recovery from hibern8 enter/exit func. If hibern8 enter/exit func 
>> fails,
>>   simply return error code and let eh_work run in parallel.
>> 
>> o Recover UFS hba runtime PM error in eh_work. If 
>> ufschd_suspend/resume
>>   fails due to UFS error, e.g. hibern8 enter/exit error and SSU cmd 
>> error,
>>   the runtime PM framework saves the error to dev.power.runtime_error.
>>   After that, hba runtime suspend/resume would not be invoked anymore 
>> until
>>   dev.power.runtime_error is cleared. The runtime PM error can be 
>> recovered
>>   in eh_work by calling pm_runtime_set_active() after reset and 
>> restore
>>   succeeds. Meanwhile, if pm_runtime_set_active() returns no error, 
>> which
>>   means dev.power.runtime_error is cleared, we also need to explicitly
>>   resume those scsi devices under hba in case any of them has failed 
>> to be
>>   resumed due to hba runtime resume error.
>> 
>> o Fix a racing problem between eh_work and ufshcd_suspend/resume. In 
>> the
>>   old code, it blocks scsi requests before schedules eh_work, but when
>>   eh_work calls pm_runtime_get_sync(), if ufshcd_suspend/resume is 
>> sending
>>   a scsi cmd, most likely the SSU cmd, pm_runtime_get_sync() will 
>> never
>>   return because scsi requests were blocked. To fix this racing 
>> problem,
>>   o Don't block scsi requests before schedule eh_work, but let eh_work
>>     block scsi requests when eh_work is ready to start error recovery.
>>   o Meanwhile, if eh_work is schueduled due to fatal error, don't 
>> requeue
>>     the scsi cmds sent from ufshcd_suspend/resume path, but simply let 
>> the
>>     scsi cmds fail. If the scsi cmds fail, hba runtime suspend/resume 
>> fails
>>     too, but it does hurt since eh_work recovers hba runtime PM error.
>> 
>> o Move host/regs dump in ufshcd_check_errors() to eh_work because 
>> heavy
>>   dump in IRQ context can lead to stability issues. In addition, some 
>> clean
>>   up in ufshcd_print_host_regs() and ufshcd_print_host_state().
> 
> The above list is a long list. To me that is a sign that this patch 
> needs to
> be split into multiple patches.
> 
> Thanks,
> 
> Bart.

Sure, will split it into a few patches.

Thanks,

Can Guo.

  reply	other threads:[~2020-07-14  4:27 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-07-14  2:28 [PATCH v2 0/4] Fix up and simplify error recovery mechanism Can Guo
2020-07-14  2:28 ` [PATCH v2 1/4] scsi: ufs: Add checks before setting clk-gating states Can Guo
2020-07-14  3:38   ` Bart Van Assche
2020-07-14  4:11     ` Can Guo
2020-07-14  2:28 ` [PATCH v2 2/4] scsi: ufs: Fix imbalanced scsi_block_reqs_cnt caused by ufshcd_hold() Can Guo
2020-07-14  3:41   ` Bart Van Assche
2020-07-14  4:11     ` Can Guo
2020-07-14  2:28 ` [PATCH v2 3/4] ufs: ufs-qcom: Fix a few BUGs in func ufs_qcom_dump_dbg_regs() Can Guo
2020-07-14  3:47   ` Bart Van Assche
2020-07-14  4:17     ` Can Guo
2020-07-14  2:28 ` [PATCH v2 4/4] scsi: ufs: Fix up and simplify error recovery mechanism Can Guo
2020-07-14  3:52   ` Bart Van Assche
2020-07-14  4:26     ` Can Guo [this message]
2020-07-14  9:13       ` Can Guo
  -- strict thread matches above, loose matches on Subject: below --
2020-07-14 20:46 kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47e7a4ec9a0404bc6d01818fcdad90eb@codeaurora.org \
    --to=cang@codeaurora.org \
    --cc=alim.akhtar@samsung.com \
    --cc=asutoshd@codeaurora.org \
    --cc=avri.altman@wdc.com \
    --cc=beanhuo@micron.com \
    --cc=bvanassche@acm.org \
    --cc=hongwus@codeaurora.org \
    --cc=jejb@linux.ibm.com \
    --cc=kernel-team@android.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=martin.petersen@oracle.com \
    --cc=nguyenb@codeaurora.org \
    --cc=nitirawa@codeaurora.org \
    --cc=rnayak@codeaurora.org \
    --cc=salyzyn@google.com \
    --cc=saravanak@google.com \
    --cc=satyat@google.com \
    --cc=stanley.chu@mediatek.com \
    --cc=tomas.winkler@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.