From: ygardi@codeaurora.org
To: "Hannes Reinecke" <hare@suse.de>
Cc: "Yaniv Gardi" <ygardi@codeaurora.org>,
james.bottomley@hansenpartnership.com,
linux-kernel@vger.kernel.org, linux-scsi@vger.kernel.org,
linux-arm-msm@vger.kernel.org, santoshsy@gmail.com,
linux-scsi-owner@vger.kernel.org,
"Gilad Broner" <gbroner@codeaurora.org>,
"Vinayak Holikatti" <vinholikatti@gmail.com>,
"James E.J. Bottomley" <jbottomley@odin.com>,
"Martin K. Petersen" <martin.petersen@oracle.com>
Subject: Re: [PATCH v7 03/17] scsi: ufs: implement scsi host timeout handler
Date: Tue, 8 Mar 2016 13:36:35 -0000 [thread overview]
Message-ID: <44a072f304240a3b19440e13c53a76df.squirrel@us.codeaurora.org> (raw)
In-Reply-To: <56DECD23.7090704@suse.de>
> On 03/08/2016 01:35 PM, Yaniv Gardi wrote:
>> A race condition exists between request requeueing and scsi layer
>> error handling:
>> When UFS driver queuecommand returns a busy status for a request,
>> it will be requeued and its tag will be freed and set to -1.
>> At the same time it is possible that the request will timeout and
>> scsi layer will start error handling for it. The scsi layer reuses
>> the request and its tag to send error related commands to the device,
>> however its tag is no longer valid.
>> As this request was never really sent to the device, there is no
>> point to start error handling with the device.
>> Implement the scsi error handling timeout callback and bypass SCSI
>> error handling for request that were not actually sent to the device.
>> For such requests simply reset the block layer timer. Otherwise, let
>> SCSI layer perform the usual error handling.
>>
>> Reviewed-by: Dolev Raviv <draviv@codeaurora.org>
>> Signed-off-by: Gilad Broner <gbroner@codeaurora.org>
>> Signed-off-by: Yaniv Gardi <ygardi@codeaurora.org>
>>
>> ---
>> drivers/scsi/ufs/ufshcd.c | 36 ++++++++++++++++++++++++++++++++++++
>> 1 file changed, 36 insertions(+)
>>
> Having a timeout handler is always a good idea, even though this
> doesn't do anything here.
> Are we sure that the requests will return eventually?
> Does the UFS spec provide for a command abort?
>
I'm sorry, but I believe you are wrong in this case.
This timeout handler is doing exactly what we intend it to do,
and also, it is already tested and verified to fix the race condition i
explained a few threads back.
if the scsi command was dispatched to UFS and sent, let the usual SCSI
error handling handle it (return value is BLK_EH_NOT_HANDLED).
but, if the SCSI command was not actually dispatched to UFS driver, then
return BLK_EH_RESET_TIMER and reset the timer, so we don't get
>>unjustified<< timeout, for command that was never dispatched.
also, i will paste again, the race-condition scenario, if anyone is
interested:
----------
I will describe a race condition happened to us a while ago, that was
quite difficult to understand and fix.
So, this patch is not about the "busy" returning to the scsi dispatch
routine. it's about the abort triggered after 30 seconds.
imagine a request being queued and sent to the scsi, and then to the ufs.
a timer, initialized to 30 seconds start ticking.
but the request is never sent to the ufs device, as queuecommand() returns
with "SCSI_MLQUEUE_HOST_BUSY" (which is normal behavior).
so, now, the request should be re-queued, and its timer should be reset.
(REMEMBER THIS POINT, let's call it "POINT A")
BUT, a context switch happens before it's actually re-queued, and CPU is
moving to other tasks, doing other things for 30 seconds. yes, sounds
crazy, but it did happen.
NOW, the timeout_handler invoked, and the scsi_abort() routine start
executing, (since 30 seconds passed with no completion).
so far, so good.
but hey, another context switch happens, right at the beginning of
scsi_abort() routine, before anything useful happens. (this is "POINT B")
so, now, context is going back "POINT A", to the blk_requeue_request()
routine, that is calling:
blk_delete_timer(rq); (which does nothing cause the timer already expired)
and then it calls:
blk_queue_end_tag()
which place "-1" in the tag field of the request, marking the request, as
"not tagged yet".
however, a context switch happens again, and we are back in scsi_abort()
routine ("POINT B"), that now needs to abort this very request, but hey,
in the "tag" field, what it sees is tag "-1" which is obviously wrong.
this patch fixes this very rare race condition:
1. upon timeout, blk_rq_timed_out() is called
2. then it calls rq_timed_out_fn() which eventually call
the new callback presented in this patch: "ufshcd_eh_timed_out()"
3. this routine returns with the right flag:
BLK_EH_NOT_HANDLED or BLK_EH_RESET_TIMER.
4. blk_rq_timed_out() checks the returned value:
in case of BLK_EH_HANDLED, it handles normally, meaning, calling scsi_abort()
in case of BLK_EH_RESET_TIMER it starts a new timer, and scsi_abort()
never called.
hope that helps.
regards,
Yaniv
> Cheers,
>
> Hannes
> --
> Dr. Hannes Reinecke Teamlead Storage & Networking
> hare@suse.de +49 911 74053 688
> SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
> GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
> HRB 21284 (AG Nürnberg)
>
next prev parent reply other threads:[~2016-03-08 13:36 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-03-08 12:35 [PATCH v7 00/17] add fixes, device quirks, error recovery, Yaniv Gardi
2016-03-08 12:35 ` [PATCH v7 01/17] scsi: ufs-qcom: add number of lanes per direction Yaniv Gardi
2016-03-08 12:35 ` [PATCH v7 02/17] scsi: ufs: avoid spurious UFS host controller interrupts Yaniv Gardi
2016-03-08 12:35 ` [PATCH v7 03/17] scsi: ufs: implement scsi host timeout handler Yaniv Gardi
2016-03-08 13:01 ` Hannes Reinecke
2016-03-08 13:20 ` Hannes Reinecke
2016-03-08 19:58 ` ygardi
2016-03-09 6:44 ` Hannes Reinecke
2016-03-08 13:36 ` ygardi [this message]
2016-03-08 12:35 ` [PATCH v7 04/17] scsi: ufs: verify hba controller hce reg value Yaniv Gardi
2016-03-08 12:35 ` [PATCH v7 05/17] scsi: ufs: add support to read device and string descriptors Yaniv Gardi
2016-03-08 12:35 ` [PATCH v7 06/17] scsi: ufs: separate device and host quirks Yaniv Gardi
2016-03-10 2:16 ` Martin K. Petersen
2016-03-10 15:36 ` ygardi
2016-03-08 12:35 ` [PATCH v7 07/17] scsi: ufs: disable vccq if it's not needed by UFS device Yaniv Gardi
2016-03-08 12:35 ` [PATCH v7 08/17] scsi: ufs: make error handling bit faster Yaniv Gardi
2016-03-08 12:36 ` [PATCH v7 09/17] scsi: ufs: add error recovery after DL NAC error Yaniv Gardi
2016-03-08 12:36 ` [PATCH v7 10/17] scsi: ufs: add retry for query descriptors Yaniv Gardi
2016-03-08 12:36 ` [PATCH v7 11/17] scsi: ufs: handle non spec compliant bkops behaviour by device Yaniv Gardi
2016-03-08 12:36 ` [PATCH v7 12/17] scsi: ufs: tune UniPro parameters to optimize hibern8 exit time Yaniv Gardi
2016-03-08 12:36 ` [PATCH v7 13/17] scsi: ufs: fix leakage during link off state Yaniv Gardi
2016-03-08 12:36 ` [PATCH v7 14/17] scsi: ufs: add device quirk delay before putting UFS rails in LPM Yaniv Gardi
2016-03-08 12:36 ` [PATCH v7 15/17] scsi: ufs-qcom: set PA_Local_TX_LCC_Enable before link startup Yaniv Gardi
2016-03-08 13:02 ` Hannes Reinecke
2016-03-08 12:36 ` [PATCH v7 16/17] scsi: ufs-qcom: enable/disable the device ref clock Yaniv Gardi
2016-03-08 13:03 ` Hannes Reinecke
2016-03-08 12:36 ` [PATCH v7 17/17] scsi: ufs-qcom: add printouts of testbus debug registers Yaniv Gardi
2016-03-08 13:04 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=44a072f304240a3b19440e13c53a76df.squirrel@us.codeaurora.org \
--to=ygardi@codeaurora.org \
--cc=gbroner@codeaurora.org \
--cc=hare@suse.de \
--cc=james.bottomley@hansenpartnership.com \
--cc=jbottomley@odin.com \
--cc=linux-arm-msm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi-owner@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=santoshsy@gmail.com \
--cc=vinholikatti@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).