public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bart Van Assche <Bart.VanAssche@wdc.com>
To: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"abdhalee@linux.vnet.ibm.com" <abdhalee@linux.vnet.ibm.com>,
	"brking@linux.vnet.ibm.com" <brking@linux.vnet.ibm.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"hch@lst.de" <hch@lst.de>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
	"sachinp@linux.vnet.ibm.com" <sachinp@linux.vnet.ibm.com>,
	"linux-next@vger.kernel.org" <linux-next@vger.kernel.org>,
	"hare@suse.com" <hare@suse.com>,
	"mpe@ellerman.id.au" <mpe@ellerman.id.au>
Subject: Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc
Date: Fri, 18 Aug 2017 21:41:13 +0000	[thread overview]
Message-ID: <1503092473.2622.17.camel@wdc.com> (raw)
In-Reply-To: <71fb9c1b-9f3f-acdc-8bb5-aa1240aea763@linux.vnet.ibm.com>

On Fri, 2017-08-18 at 16:04 -0500, Brian King wrote:
> I think I have an understanding what is going on and why Bart's patch is causing problems for ipr.
> I can work around the boot hang in ipr, but ultimately I think we need to figure out a fix
> in scsi / block. I added some tracing and confirmed its not a matter of commands getting stuck
> in ipr. The issue is we are retrying failed commands until we finally run out of time. This is
> what I see:
> 
> 1. sd_revalidate_disk calls scsi_report_opcode
> 2. ipr RAID arrays don't support MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES
> 3. ipr returns the command with DID_ERROR
> 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, and returns NEEDS_RETRY
> 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which calls scsi_mq_requeue_cmd
> 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior we did not
> 7. This results in the command getting scmd->retries zeroed out when it gets re-queued,
>    since we go through prep again and we lose our retry counter, resulting in lots and lots of retries.
> 8. Since the default command timeout for an ipr RAID array is 120 seconds, these retries go on for
>    quite a long time...
> 9. Finally, the command has been retried so long we trip over the overall retry timer
>    in scsi_softirq_done and we timeout the command.
> 
> I'll follow up with a patch to ipr to workaround the hang, but I think we need to somehow preserve
> the retry counter in the scsi command, as this will likely cause issues with other drivers. 

Hello Brian,

Thanks for the detailed analysis. This is very helpful. Have you considered
to change the ipr driver such that it terminates REPORT SUPPORTED OPERATION
CODES commands with the appropriate check condition code instead of DID_ERROR?

Thanks,

Bart.

  parent reply	other threads:[~2017-08-18 21:42 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-16 17:00 [BUG][bisected 270065e] linux-next fails to boot on powerpc Abdul Haleem
2017-08-16 17:21 ` Bart Van Assche
2017-08-16 23:18   ` Brian King
2017-08-17 15:52     ` Bart Van Assche
2017-08-18 21:04       ` Brian King
2017-08-18 21:17         ` [PATCH] ipr: Set no_report_opcodes for RAID arrays Brian King
2017-08-21 20:22           ` Martin K. Petersen
2017-08-18 21:41         ` Bart Van Assche [this message]
2017-08-18 21:57           ` [BUG][bisected 270065e] linux-next fails to boot on powerpc Brian King
2017-08-18 22:13             ` Bart Van Assche
2017-08-21 22:11               ` [PATCH 0/2] Allow scsi_prep_fn to occur for retried commands Brian King
2017-08-21 22:13                 ` [PATCH 1/2] scsi: Move scsi_cmd->jiffies_at_alloc initialization to allocation time Brian King
2017-08-21 22:16                   ` Brian King
2017-08-21 22:40                   ` [PATCHv2 " Brian King
2017-08-22  6:51                   ` [PATCH " hch
2017-08-21 22:14                 ` [PATCH 2/2] scsi: Preserve retry counter through scsi_prep_fn Brian King
2017-08-22  6:51                   ` hch
2017-08-22  6:42                 ` [PATCH 0/2] Allow scsi_prep_fn to occur for retried commands Abdul Haleem
2017-08-21 20:27             ` [BUG][bisected 270065e] linux-next fails to boot on powerpc Martin K. Petersen
2017-08-17  1:33   ` Michael Ellerman
2017-08-16 20:25 ` Bart Van Assche
2017-08-17  7:06   ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1503092473.2622.17.camel@wdc.com \
    --to=bart.vanassche@wdc.com \
    --cc=abdhalee@linux.vnet.ibm.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=sachinp@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox