From: Stanley Chu <stanley.chu@mediatek.com>
To: Can Guo <cang@codeaurora.org>,
"avri.altman@wdc.com" <avri.altman@wdc.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"martin.petersen@oracle.com" <martin.petersen@oracle.com>,
"Andy Teng (鄧如宏)" <Andy.Teng@mediatek.com>,
"jejb@linux.ibm.com" <jejb@linux.ibm.com>,
"Chun-Hung Wu (巫駿宏)" <Chun-hung.Wu@mediatek.com>,
"Kuohong Wang (王國鴻)" <kuohong.wang@mediatek.com>,
"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"CC Chou (周志杰)" <cc.chou@mediatek.com>,
"linux-mediatek@lists.infradead.org"
<linux-mediatek@lists.infradead.org>,
"Peter Wang (王信友)" <peter.wang@mediatek.com>,
"alim.akhtar@samsung.com" <alim.akhtar@samsung.com>,
"matthias.bgg@gmail.com" <matthias.bgg@gmail.com>,
"asutoshd@codeaurora.org" <asutoshd@codeaurora.org>,
"Chaotian Jing (井朝天)" <Chaotian.Jing@mediatek.com>,
"bvanassche@acm.org" <bvanassche@acm.org>,
"linux-arm-kernel@lists.infradead.org"
<linux-arm-kernel@lists.infradead.org>,
"beanhuo@micron.com" <beanhuo@micron.com>
Subject: Re: [SPAM]Re: [PATCH v2] scsi: ufs: Fix possible infinite loop in ufshcd_hold
Date: Tue, 4 Aug 2020 13:27:30 +0800 [thread overview]
Message-ID: <1596518850.27829.5.camel@mtkswgap22> (raw)
In-Reply-To: <1596095961.17247.51.camel@mtkswgap22>
Hi Avri,
Would you have any suggestions for this patch?
We need this patch to enable UFSHCD_CAP_HIBERN8_WITH_CLK_GATING in
MediaTek platform.
Thanks a lot,
Stanley Chu
On Thu, 2020-07-30 at 15:59 +0800, Stanley Chu wrote:
> Hi Can,
>
> On Wed, 2020-07-29 at 18:53 +0800, Can Guo wrote:
> > Hi Stanley,
> >
> > On 2020-07-29 18:26, Stanley Chu wrote:
> > > Hi Can,
> > >
> > > On Wed, 2020-07-29 at 16:43 +0800, Can Guo wrote:
> > >> Hi Stanley,
> > >>
> > >> On 2020-07-29 10:40, Stanley Chu wrote:
> > >> > In ufshcd_suspend(), after clk-gating is suspended and link is set
> > >> > as Hibern8 state, ufshcd_hold() is still possibly invoked before
> > >> > ufshcd_suspend() returns. For example, MediaTek's suspend vops may
> > >> > issue UIC commands which would call ufshcd_hold() during the command
> > >> > issuing flow.
> > >> >
> > >> > Now if UFSHCD_CAP_HIBERN8_WITH_CLK_GATING capability is enabled,
> > >> > then ufshcd_hold() may enter infinite loops because there is no
> > >> > clk-ungating work scheduled or pending. In this case, ufshcd_hold()
> > >> > shall just bypass, and keep the link as Hibern8 state.
> > >> >
> > >>
> > >> The infinite loop is expected as ufshcd_hold is called again after
> > >> link is put to hibern8 state, so in QCOM's code, we never do this.
> > >
> > > Sadly MediaTek have to do this to make our UniPro to enter low-power
> > > mode.
> > >
> > >> The cap UFSHCD_CAP_HIBERN8_WITH_CLK_GATING means UIC link state
> > >> must not be HIBERN8 after ufshcd_hold(async=false) returns.
> > >
> > > If driver is not in PM scenarios, e.g., suspended, above statement
> > > shall
> > > be always followed. But two obvious violations are existed,
> > >
> > > 1. In ufshcd_suspend(), link is set as HIBERN8 behind ufshcd_hold()
> > > 2. In ufshcd_resume(), link is set back as Active before
> > > ufshcd_release() is invoked
> > >
> > > So as my understanding, special conditions are allowed in PM scenarios,
> > > and this is why "hba->clk_gating.is_suspended" is introduced. By this
> > > thought, I used "hba->clk_gating.is_suspended" in this patch as the
> > > mandatory condition to allow ufshcd_hold() usage in vendor suspend and
> > > resume callbacks.
> > >
> > >
> > >> Instead of bailing out from that loop, which makes the logic of
> > >> ufshcd_hold and clk gating even more complex, how about removing
> > >> ufshcd_hold/release from ufshcd_send_uic_cmd()? I think they are
> > >> redundant and we should never send DME cmds if clocks/powers are
> > >> not ready. I mean callers should make sure they are ready to send
> > >> DME cmds (and only callers know when), but not leave that job to
> > >> ufshcd_send_uic_cmd(). It is convenient to remove ufshcd_hold/
> > >> release from ufshcd_send_uic_cmd() as there are not many places
> > >> sending DME cmds without holding the clocks, ufs_bsg.c is one.
> > >> And I have tested my idea on my setup, it worked well for me.
> > >> Another benefit is that it also allows us to use DME cmds
> > >> in clk gating/ungating contexts if we need to in the future.
> > >>
> > >
> > > Brilliant idea! But this may not solve problems if vendor callbacks
> > > need
> > > more than UIC commands in the future.
> > >
> > > This simple patch could make all vendor operations on UFSHCI in PM
> > > callbacks possible with UFSHCD_CAP_HIBERN8_WITH_CLK_GATING enabled, and
> > > again, it allows those operations in PM scenarios only.
> > >
> >
> > Other than UIC cmds, I can only think of device manangement cmds (like
> > query).
> > If device management cmds come into the way in the future, we fix it as
> > well.
> > I mean that is the right thing to do in my opinion - just like we don't
> > call
> > pm_runtime_get_sync() in ufshcd_send_uic_cmd().
> >
> > I can understand that you want a simple/quick fix to get it work for you
> > once
> > for all, but from my point of view, debugging clk gating/ungating really
> > takes
> > huge efforts sometime (I've spent a lot of time on it). Some flash
> > vendors also
> > use it in their own driver widely which makes some failure scenes even
> > harder to
> > undertand/debug. So the first thing comes to my head is that we should
> > avoid
> > making it more complex or giving it more exceptions.
> >
> > From functionality point of view, it looks ok to me. It is just that I
> > cannot
> > predict it won't cause new problems since the clk gating/ungating
> > sequeces are
> > like magic in some use cases sometime.
>
> Thanks for the functionality review.
>
> I totally understand what you mentioned above about the clk-gating
> debugging because we also spent lots of time for issue analysis.
>
> I just finished some fault injection for this patch in our platform, the
> results are fine.
>
> The active window of this patch is limited: Starting from
> ufshcd_link_state_transition() in ufshcd_suspend to ufshcd_vops_resume()
> in ufshcd_resume() because the link is back to LINKUP state in MediaTek
> resume callback. So I was focus on injecting errors in our callbacks
> between this period and most of injected fails triggered host and device
> reset flow.
>
> For example,
> Suspend: UniPro PowerDownControl timeout
> Resume: hba_enable timeout
> Resume: UniPro PowerDownControl timeout
> Resume: HIBERN8 Exit timeout
>
> Hope these tests can ease your concerns.
>
> Thanks,
> Stanley Chu
>
> >
> > Thanks,
> >
> > Can Guo.
> >
> > >> Please let me know your idea, thanks.
> > >>
> > >> Can Guo.
> > >
> > > Thanks,
> > > Stanley Chu
> > >
> > >>
> > >> > Signed-off-by: Stanley Chu <stanley.chu@mediatek.com>
> > >> > Signed-off-by: Andy Teng <andy.teng@mediatek.com>
> > >> >
> > >> > ---
> > >> >
> > >> > Changes since v1:
> > >> > - Fix return value: Use unique bool variable to get the result of
> > >> > flush_work(). Thcan prevent incorrect returned value, i.e., rc, if
> > >> > flush_work() returns true
> > >> > - Fix commit message
> > >> >
> > >> > ---
> > >> > drivers/scsi/ufs/ufshcd.c | 5 ++++-
> > >> > 1 file changed, 4 insertions(+), 1 deletion(-)
> > >> >
> > >> > diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
> > >> > index 577cc0d7487f..acba2271c5d3 100644
> > >> > --- a/drivers/scsi/ufs/ufshcd.c
> > >> > +++ b/drivers/scsi/ufs/ufshcd.c
> > >> > @@ -1561,6 +1561,7 @@ static void ufshcd_ungate_work(struct work_struct
> > >> > *work)
> > >> > int ufshcd_hold(struct ufs_hba *hba, bool async)
> > >> > {
> > >> > int rc = 0;
> > >> > + bool flush_result;
> > >> > unsigned long flags;
> > >> >
> > >> > if (!ufshcd_is_clkgating_allowed(hba))
> > >> > @@ -1592,7 +1593,9 @@ int ufshcd_hold(struct ufs_hba *hba, bool async)
> > >> > break;
> > >> > }
> > >> > spin_unlock_irqrestore(hba->host->host_lock, flags);
> > >> > - flush_work(&hba->clk_gating.ungate_work);
> > >> > + flush_result = flush_work(&hba->clk_gating.ungate_work);
> > >> > + if (hba->clk_gating.is_suspended && !flush_result)
> > >> > + goto out;
> > >> > spin_lock_irqsave(hba->host->host_lock, flags);
> > >> > goto start;
> > >> > }
>
_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
next prev parent reply other threads:[~2020-08-04 5:29 UTC|newest]
Thread overview: 8+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-07-29 2:40 [PATCH v2] scsi: ufs: Fix possible infinite loop in ufshcd_hold Stanley Chu
2020-07-29 8:43 ` Can Guo
2020-07-29 10:26 ` Stanley Chu
2020-07-29 10:53 ` Can Guo
2020-07-30 7:59 ` Stanley Chu
2020-08-04 5:27 ` Stanley Chu [this message]
2020-08-04 6:10 ` [SPAM]Re: " Avri Altman
2020-08-07 0:39 ` Stanley Chu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1596518850.27829.5.camel@mtkswgap22 \
--to=stanley.chu@mediatek.com \
--cc=Andy.Teng@mediatek.com \
--cc=Chaotian.Jing@mediatek.com \
--cc=Chun-hung.Wu@mediatek.com \
--cc=alim.akhtar@samsung.com \
--cc=asutoshd@codeaurora.org \
--cc=avri.altman@wdc.com \
--cc=beanhuo@micron.com \
--cc=bvanassche@acm.org \
--cc=cang@codeaurora.org \
--cc=cc.chou@mediatek.com \
--cc=jejb@linux.ibm.com \
--cc=kuohong.wang@mediatek.com \
--cc=linux-arm-kernel@lists.infradead.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-mediatek@lists.infradead.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=matthias.bgg@gmail.com \
--cc=peter.wang@mediatek.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox