From: Bart Van Assche <Bart.VanAssche@wdc.com>
To: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"abdhalee@linux.vnet.ibm.com" <abdhalee@linux.vnet.ibm.com>,
"brking@linux.vnet.ibm.com" <brking@linux.vnet.ibm.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"hch@lst.de" <hch@lst.de>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
"sachinp@linux.vnet.ibm.com" <sachinp@linux.vnet.ibm.com>,
"linux-next@vger.kernel.org" <linux-next@vger.kernel.org>,
"hare@suse.com" <hare@suse.com>,
"mpe@ellerman.id.au" <mpe@ellerman.id.au>
Subject: Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc
Date: Fri, 18 Aug 2017 21:41:13 +0000 [thread overview]
Message-ID: <1503092473.2622.17.camel@wdc.com> (raw)
In-Reply-To: <71fb9c1b-9f3f-acdc-8bb5-aa1240aea763@linux.vnet.ibm.com>
On Fri, 2017-08-18 at 16:04 -0500, Brian King wrote:
> I think I have an understanding what is going on and why Bart's patch is causing problems for ipr.
> I can work around the boot hang in ipr, but ultimately I think we need to figure out a fix
> in scsi / block. I added some tracing and confirmed its not a matter of commands getting stuck
> in ipr. The issue is we are retrying failed commands until we finally run out of time. This is
> what I see:
>
> 1. sd_revalidate_disk calls scsi_report_opcode
> 2. ipr RAID arrays don't support MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES
> 3. ipr returns the command with DID_ERROR
> 4. scsi_decide_disposition goes to maybe_retry, increments scmd->retries, and returns NEEDS_RETRY
> 5. scsi_softirq_done calls scsi_queue_insert to requeue the command, which calls scsi_mq_requeue_cmd
> 6. With Bart's change, we then clear RQF_DONTPREP in this path, while prior we did not
> 7. This results in the command getting scmd->retries zeroed out when it gets re-queued,
> since we go through prep again and we lose our retry counter, resulting in lots and lots of retries.
> 8. Since the default command timeout for an ipr RAID array is 120 seconds, these retries go on for
> quite a long time...
> 9. Finally, the command has been retried so long we trip over the overall retry timer
> in scsi_softirq_done and we timeout the command.
>
> I'll follow up with a patch to ipr to workaround the hang, but I think we need to somehow preserve
> the retry counter in the scsi command, as this will likely cause issues with other drivers.
Hello Brian,
Thanks for the detailed analysis. This is very helpful. Have you considered
to change the ipr driver such that it terminates REPORT SUPPORTED OPERATION
CODES commands with the appropriate check condition code instead of DID_ERROR?
Thanks,
Bart.
WARNING: multiple messages have this Message-ID (diff)
From: Bart Van Assche <Bart.VanAssche@wdc.com>
To: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
"abdhalee@linux.vnet.ibm.com" <abdhalee@linux.vnet.ibm.com>,
"brking@linux.vnet.ibm.com" <brking@linux.vnet.ibm.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
"hch@lst.de" <hch@lst.de>,
"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
"sachinp@linux.vnet.ibm.com" <sachinp@linux.vnet.ibm.com>,
"linux-next@vger.kernel.org" <linux-next@vger.kernel.org>,
"hare@suse.com" <hare@suse.com>,
"mpe@ellerman.id.au" <mpe@ellerman.id.au>
Subject: Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc
Date: Fri, 18 Aug 2017 21:41:13 +0000 [thread overview]
Message-ID: <1503092473.2622.17.camel@wdc.com> (raw)
In-Reply-To: <71fb9c1b-9f3f-acdc-8bb5-aa1240aea763@linux.vnet.ibm.com>
T24gRnJpLCAyMDE3LTA4LTE4IGF0IDE2OjA0IC0wNTAwLCBCcmlhbiBLaW5nIHdyb3RlOg0KPiBJ
IHRoaW5rIEkgaGF2ZSBhbiB1bmRlcnN0YW5kaW5nIHdoYXQgaXMgZ29pbmcgb24gYW5kIHdoeSBC
YXJ0J3MgcGF0Y2ggaXMgY2F1c2luZyBwcm9ibGVtcyBmb3IgaXByLg0KPiBJIGNhbiB3b3JrIGFy
b3VuZCB0aGUgYm9vdCBoYW5nIGluIGlwciwgYnV0IHVsdGltYXRlbHkgSSB0aGluayB3ZSBuZWVk
IHRvIGZpZ3VyZSBvdXQgYSBmaXgNCj4gaW4gc2NzaSAvIGJsb2NrLiBJIGFkZGVkIHNvbWUgdHJh
Y2luZyBhbmQgY29uZmlybWVkIGl0cyBub3QgYSBtYXR0ZXIgb2YgY29tbWFuZHMgZ2V0dGluZyBz
dHVjaw0KPiBpbiBpcHIuIFRoZSBpc3N1ZSBpcyB3ZSBhcmUgcmV0cnlpbmcgZmFpbGVkIGNvbW1h
bmRzIHVudGlsIHdlIGZpbmFsbHkgcnVuIG91dCBvZiB0aW1lLiBUaGlzIGlzDQo+IHdoYXQgSSBz
ZWU6DQo+IA0KPiAxLiBzZF9yZXZhbGlkYXRlX2Rpc2sgY2FsbHMgc2NzaV9yZXBvcnRfb3Bjb2Rl
DQo+IDIuIGlwciBSQUlEIGFycmF5cyBkb24ndCBzdXBwb3J0IE1BSU5URU5BTkNFX0lOIC8gTUlf
UkVQT1JUX1NVUFBPUlRFRF9PUEVSQVRJT05fQ09ERVMNCj4gMy4gaXByIHJldHVybnMgdGhlIGNv
bW1hbmQgd2l0aCBESURfRVJST1INCj4gNC4gc2NzaV9kZWNpZGVfZGlzcG9zaXRpb24gZ29lcyB0
byBtYXliZV9yZXRyeSwgaW5jcmVtZW50cyBzY21kLT5yZXRyaWVzLCBhbmQgcmV0dXJucyBORUVE
U19SRVRSWQ0KPiA1LiBzY3NpX3NvZnRpcnFfZG9uZSBjYWxscyBzY3NpX3F1ZXVlX2luc2VydCB0
byByZXF1ZXVlIHRoZSBjb21tYW5kLCB3aGljaCBjYWxscyBzY3NpX21xX3JlcXVldWVfY21kDQo+
IDYuIFdpdGggQmFydCdzIGNoYW5nZSwgd2UgdGhlbiBjbGVhciBSUUZfRE9OVFBSRVAgaW4gdGhp
cyBwYXRoLCB3aGlsZSBwcmlvciB3ZSBkaWQgbm90DQo+IDcuIFRoaXMgcmVzdWx0cyBpbiB0aGUg
Y29tbWFuZCBnZXR0aW5nIHNjbWQtPnJldHJpZXMgemVyb2VkIG91dCB3aGVuIGl0IGdldHMgcmUt
cXVldWVkLA0KPiAgICBzaW5jZSB3ZSBnbyB0aHJvdWdoIHByZXAgYWdhaW4gYW5kIHdlIGxvc2Ug
b3VyIHJldHJ5IGNvdW50ZXIsIHJlc3VsdGluZyBpbiBsb3RzIGFuZCBsb3RzIG9mIHJldHJpZXMu
DQo+IDguIFNpbmNlIHRoZSBkZWZhdWx0IGNvbW1hbmQgdGltZW91dCBmb3IgYW4gaXByIFJBSUQg
YXJyYXkgaXMgMTIwIHNlY29uZHMsIHRoZXNlIHJldHJpZXMgZ28gb24gZm9yDQo+ICAgIHF1aXRl
IGEgbG9uZyB0aW1lLi4uDQo+IDkuIEZpbmFsbHksIHRoZSBjb21tYW5kIGhhcyBiZWVuIHJldHJp
ZWQgc28gbG9uZyB3ZSB0cmlwIG92ZXIgdGhlIG92ZXJhbGwgcmV0cnkgdGltZXINCj4gICAgaW4g
c2NzaV9zb2Z0aXJxX2RvbmUgYW5kIHdlIHRpbWVvdXQgdGhlIGNvbW1hbmQuDQo+IA0KPiBJJ2xs
IGZvbGxvdyB1cCB3aXRoIGEgcGF0Y2ggdG8gaXByIHRvIHdvcmthcm91bmQgdGhlIGhhbmcsIGJ1
dCBJIHRoaW5rIHdlIG5lZWQgdG8gc29tZWhvdyBwcmVzZXJ2ZQ0KPiB0aGUgcmV0cnkgY291bnRl
ciBpbiB0aGUgc2NzaSBjb21tYW5kLCBhcyB0aGlzIHdpbGwgbGlrZWx5IGNhdXNlIGlzc3VlcyB3
aXRoIG90aGVyIGRyaXZlcnMuIA0KDQpIZWxsbyBCcmlhbiwNCg0KVGhhbmtzIGZvciB0aGUgZGV0
YWlsZWQgYW5hbHlzaXMuIFRoaXMgaXMgdmVyeSBoZWxwZnVsLiBIYXZlIHlvdSBjb25zaWRlcmVk
DQp0byBjaGFuZ2UgdGhlIGlwciBkcml2ZXIgc3VjaCB0aGF0IGl0IHRlcm1pbmF0ZXMgUkVQT1JU
IFNVUFBPUlRFRCBPUEVSQVRJT04NCkNPREVTIGNvbW1hbmRzIHdpdGggdGhlIGFwcHJvcHJpYXRl
IGNoZWNrIGNvbmRpdGlvbiBjb2RlIGluc3RlYWQgb2YgRElEX0VSUk9SPw0KDQpUaGFua3MsDQoN
CkJhcnQu
next prev parent reply other threads:[~2017-08-18 21:41 UTC|newest]
Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-08-16 17:00 [BUG][bisected 270065e] linux-next fails to boot on powerpc Abdul Haleem
2017-08-16 17:21 ` Bart Van Assche
2017-08-16 17:21 ` Bart Van Assche
2017-08-16 23:18 ` Brian King
2017-08-17 15:52 ` Bart Van Assche
2017-08-17 15:52 ` Bart Van Assche
2017-08-18 21:04 ` Brian King
2017-08-18 21:17 ` [PATCH] ipr: Set no_report_opcodes for RAID arrays Brian King
2017-08-21 20:22 ` Martin K. Petersen
2017-08-21 20:22 ` Martin K. Petersen
2017-08-18 21:41 ` Bart Van Assche [this message]
2017-08-18 21:41 ` [BUG][bisected 270065e] linux-next fails to boot on powerpc Bart Van Assche
2017-08-18 21:57 ` Brian King
2017-08-18 22:13 ` Bart Van Assche
2017-08-18 22:13 ` Bart Van Assche
2017-08-21 22:11 ` [PATCH 0/2] Allow scsi_prep_fn to occur for retried commands Brian King
2017-08-21 22:13 ` [PATCH 1/2] scsi: Move scsi_cmd->jiffies_at_alloc initialization to allocation time Brian King
2017-08-21 22:16 ` Brian King
2017-08-21 22:40 ` [PATCHv2 " Brian King
2017-08-22 6:51 ` [PATCH " hch
2017-08-21 22:14 ` [PATCH 2/2] scsi: Preserve retry counter through scsi_prep_fn Brian King
2017-08-22 6:51 ` hch
2017-08-22 6:42 ` [PATCH 0/2] Allow scsi_prep_fn to occur for retried commands Abdul Haleem
2017-08-21 20:27 ` [BUG][bisected 270065e] linux-next fails to boot on powerpc Martin K. Petersen
2017-08-21 20:27 ` Martin K. Petersen
2017-08-17 1:33 ` Michael Ellerman
2017-08-17 1:33 ` Michael Ellerman
2017-08-16 20:25 ` Bart Van Assche
2017-08-16 20:25 ` Bart Van Assche
2017-08-17 7:06 ` Michael Ellerman
2017-08-17 7:06 ` Michael Ellerman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1503092473.2622.17.camel@wdc.com \
--to=bart.vanassche@wdc.com \
--cc=abdhalee@linux.vnet.ibm.com \
--cc=brking@linux.vnet.ibm.com \
--cc=hare@suse.com \
--cc=hch@lst.de \
--cc=linux-kernel@vger.kernel.org \
--cc=linux-next@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=linuxppc-dev@lists.ozlabs.org \
--cc=mpe@ellerman.id.au \
--cc=sachinp@linux.vnet.ibm.com \
--cc=sfr@canb.auug.org.au \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.