From: Hannes Reinecke <hare@suse.de>
To: Kevin Groeneveld <KGroeneveld@lenbrook.com>,
"JBottomley@odin.com" <JBottomley@odin.com>
Cc: "linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"festevam@gmail.com" <festevam@gmail.com>,
"richard.zhu@freescale.com" <richard.zhu@freescale.com>,
"arnd@arndb.de" <arnd@arndb.de>,
"linux@arm.linux.org.uk" <linux@arm.linux.org.uk>
Subject: Re: [PATCH] scsi: fix hang in scsi error handling
Date: Fri, 17 Jul 2015 08:02:48 +0200 [thread overview]
Message-ID: <55A89A88.9000901@suse.de> (raw)
In-Reply-To: <BF6B9AADDDF11740967545E971C7C0DE02234A@MAIL1.pickering.lenbrook.com>
On 07/16/2015 08:55 PM, Kevin Groeneveld wrote:
>> -----Original Message-----
>> From: Hannes Reinecke [mailto:hare@suse.de]
>> Sent: July-16-15 7:11 AM
>>> When the hang occurs shost->host_busy == 2 and shost->host_failed == 1
>>> in the scsi_eh_wakeup function. However this function only wakes the
>>> error handler if host_busy == host_failed.
>>>
>> Which just means that one command is still outstanding, and we need to wait
>> for it to complete.
>> But see below...
>
> So the root cause of the hang is maybe that the second command never
> completes? Maybe host_failed being non zero is blocking something in the
> port multiplier code?
>
Yes, I think that's one of the reasons. You really should
investigate what happens to the second command.
(It might well be that the second command is issued _before_ the
first command completes, effectively creating a livelock.)
>> Hmm.
>> I am really not sure about this.
>
> I wasn't sure either, that is one reason why I posted the patch.
>
>> 'host_busy' indicates the number of outstanding commands, and
>> 'host_failed' is the number of commands which have failed (on the ground
>> that failed commands are considered outstanding, too).
>>
>> So the first hunk would change the behaviour from 'start SCSI EH once all
>> commands are completed or failed' to 'start SCSI EH for _any_ command if
>> scsi_eh_wakeup is called'
>> (note that shost_failed might be '0'...).
>> Which doesn't sound right.
>
> So could the patch create any problems by starting the EH any time
> scsi_eh_wakeup is called? Or is it is just inefficient?
>
The patch will play havoc with the SCSI EH code, as by the time SCSI
EH is working on the list of failed commands the host is assumed to
be stopped. So there cannot be any out-of-band modifications to the
list of failed commands.
With you patch commands might fail _while SCSI EH is active_,
so the list of failed commands will be modified during SCSI EH.
As the SCSI EH code doesn't have any locks on that list things will
become very tangled after that.
>> I guess this needs further debugging to get to the bottom of it.
>
> Any suggestions on things I could try?
>
Enable SCSI logging (or scsi tracing) and figure out what happens to
the second command.
> The fact that the problem goes away when I only enable one CPU core makes
> me think there is a race happening somewhere.
>
Yeah, most definitely. But I doubt it's in the error handler, it's
more likely somewhere else.
Cheers,
Hannes
--
Dr. Hannes Reinecke zSeries & Storage
hare@suse.de +49 911 74053 688
SUSE LINUX GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: F. Imendörffer, J. Smithard, J. Guild, D. Upmanyu, G. Norton
HRB 21284 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
next prev parent reply other threads:[~2015-07-17 6:02 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2015-07-15 12:47 [PATCH] scsi: fix hang in scsi error handling Kevin Groeneveld
2015-07-16 11:11 ` Hannes Reinecke
2015-07-16 18:55 ` Kevin Groeneveld
2015-07-17 6:02 ` Hannes Reinecke [this message]
2015-07-27 10:38 ` Hannes Reinecke
2015-07-27 15:31 ` Kevin Groeneveld
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=55A89A88.9000901@suse.de \
--to=hare@suse.de \
--cc=JBottomley@odin.com \
--cc=KGroeneveld@lenbrook.com \
--cc=arnd@arndb.de \
--cc=festevam@gmail.com \
--cc=linux-scsi@vger.kernel.org \
--cc=linux@arm.linux.org.uk \
--cc=richard.zhu@freescale.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.