From: James Bottomley <James.Bottomley@HansenPartnership.com>
To: Rajat Jain <rajatja@google.com>
Cc: linux-scsi@vger.kernel.org, linux-kernel@vger.kernel.org,
rajatxjain@gmail.com
Subject: Re: [PATCH] scsi: Avoid potential infinite eh_timeout_handler() loop
Date: Thu, 04 Jun 2015 13:27:21 -0700 [thread overview]
Message-ID: <1433449641.1571.50.camel@HansenPartnership.com> (raw)
In-Reply-To: <1433443222-8260-1-git-send-email-rajatja@google.com>
On Thu, 2015-06-04 at 11:40 -0700, Rajat Jain wrote:
> Each cmd timeout should result in scmd->retries++. Currently it happens
> just only before a command is requeued back. However, if the LLD
> eh_timed_out() handler asks to reset timer back again, then also it should
> be incremented because effectively LLD will be given a full time period
> (SD_TIMEOUT = 30 secs!) to attempt to complete the command.
>
> Why this is a problem:
>
> => Currently the SCSI low level transport drivers can provide
> eh_timeout_handler() calls (for e.g. iscsi provides this) to deal
> with command timeouts.
>
> => The eh_timeout_handler() can return BLK_EH_RESET_TIMER that causes the
> SCSI / block layer to reset the timer, thus giving more time to the
> LLD.
>
> => Currently a LLD can potentially loop infinitely on a command if it
> always keeps on returning BLK_EH_RESET_TIMER.
>
> * => Other than choking its own devices, if the command that is stuck is a
> command issued during sd_probe_async() (e.g. a partition table scan),
> then it impacts all the disks because no other disks can be removed
> from the system until sd_probe_async() returns. (sd_remove waits on
> async_synchronize_full_domain(...))
>
> => This problem actually resulted in the situation mentioned above,
> whereby no disks in the system (on other scsi hosts) could be removed,
> because of a stuck scsi command to read the partition tables of an
> unrelated problematic disk during probe. The threads were stuck at:
>
> schedule+0x312/0x7a0
> async_synchronize_cookie_domain+0xb8/0x115
> ? __wake_up_bit+0x40/0x40
> async_synchronize_full_domain+0x15/0x17
> sd_remove+0x5f/0x135
> __device_release_driver+0x8a/0xe0
> device_release_driver+0x23/0x30
> bus_remove_device+0x10f/0x123
> device_del+0x132/0x18e
> __scsi_remove_device+0x56/0xb6
> scsi_remove_device+0x26/0x33
> scsi_remove_target+0x12d/0x1a0
> ...
>
> What this patch does:
> => Ensure that any quests to reset the timer are accounted for, so that
> there is a finite upper bound on the time that a command is tried.
> Once allowed number of retries is reached, we proceed to standard
> error handling procedure (abort etc.) by scheduling the command
> for EH.
>
> Signed-off-by: Rajat Jain <rajatja@google.com>
This is actually wrong. Originally the code you're suggesting did exist
and it used to cause us to time out far too early on some conditions.
Now scmd->retries is for specific things that shouldn't be retried too
often. Anything else appears to retry forever but in fact there's a
specific check (in the softirq and io_completion) to check that a
retryable failure hasn't taken longer than (cmd->allowed + 1) *
req->timeout.
This means effectively that nothing in SCSI is allowed to retry forever.
James
next prev parent reply other threads:[~2015-06-04 20:27 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-06-04 18:40 [PATCH] scsi: Avoid potential infinite eh_timeout_handler() loop Rajat Jain
2015-06-04 20:27 ` James Bottomley [this message]
2015-06-04 21:49 ` Rajat Jain
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1433449641.1571.50.camel@HansenPartnership.com \
--to=james.bottomley@hansenpartnership.com \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=rajatja@google.com \
--cc=rajatxjain@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox