From: Bart Van Assche <bvanassche@acm.org>
To: Hannes Reinecke <hare@suse.de>
Cc: linux-scsi <linux-scsi@vger.kernel.org>
Subject: Re: SCSI LLDs, the SCSI error handler and host resource lifetime
Date: Wed, 21 Nov 2012 13:26:35 +0100 [thread overview]
Message-ID: <50ACC87B.5010401@acm.org> (raw)
In-Reply-To: <50AC808D.1060700@suse.de>
On 11/21/12 08:19, Hannes Reinecke wrote:
> On 11/20/2012 03:24 PM, Bart Van Assche wrote:
>> If I interpret the SCSI error handler source code correctly then
>> scsi_unjam_host() may proceed concurrently with scsi_remove_host().
>> This means that the LLD eh_abort_handler callback may get invoked after
>> scsi_remove_host() finished. At least the SRP initiator (ib_srp) cleans
>> up resources necessary for aborting commands as soon as
>> scsi_remove_host() returns. That looks like a race condition to me. As
>> far as I can see it is only safe to clean up such resources after the
>> EH thread has been stopped. Any opinions about adding an additional
>> callback for this purpose in struct scsi_host_template ?
>>
>> Note: it doesn't look like a good idea to me to let scsi_remove_host()
>> wait until error recovery has finished since scsi_remove_host() may get
>> invoked from the context of a workqueue. If any work gets queued on the
>> same workqueue related to SCSI error handling letting scsi_remove_host()
>> wait for the error handler to finish might result in a deadlock.
>>
>> The patch below is a request for comments patch that does not only add a
>> callback to struct scsi_host_template but also fixes a (hard to trigger)
>> race condition in ib_srp: avoid that ib_destroy_cm_id() frees the IB RC
>> connection while srp_send_tsk_mgmt() is using it.
>>
> Hmm.
> This would still mean that the eh thread will run until finished.
> Which can take _A LOT_ of time (we're speaking hours here).
> I would rather have an additional return code in the various
> scsi_try_XXX functions to terminate the loop quickly.
How about combining both approaches ? I think the additional callback is
needed anyway to prevent the race condition explained above. Making the
SCSI EH stop quicker after scsi_remove_host() has been invoked looks
like a good idea to me but I'm not sure that change alone is sufficient.
Bart.
next prev parent reply other threads:[~2012-11-21 12:26 UTC|newest]
Thread overview: 5+ messages / expand[flat|nested] mbox.gz Atom feed top
2012-11-20 14:24 SCSI LLDs, the SCSI error handler and host resource lifetime Bart Van Assche
2012-11-21 7:19 ` Hannes Reinecke
2012-11-21 12:26 ` Bart Van Assche [this message]
2012-11-26 17:23 ` Bart Van Assche
2012-11-27 15:37 ` Hannes Reinecke
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=50ACC87B.5010401@acm.org \
--to=bvanassche@acm.org \
--cc=hare@suse.de \
--cc=linux-scsi@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).