public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Bart Van Assche <Bart.VanAssche@wdc.com>
To: "linuxppc-dev@lists.ozlabs.org" <linuxppc-dev@lists.ozlabs.org>,
	"abdhalee@linux.vnet.ibm.com" <abdhalee@linux.vnet.ibm.com>,
	"brking@linux.vnet.ibm.com" <brking@linux.vnet.ibm.com>
Cc: "linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"hch@lst.de" <hch@lst.de>,
	"linux-scsi@vger.kernel.org" <linux-scsi@vger.kernel.org>,
	"sfr@canb.auug.org.au" <sfr@canb.auug.org.au>,
	"linux-next@vger.kernel.org" <linux-next@vger.kernel.org>,
	"hare@suse.com" <hare@suse.com>,
	"sachinp@linux.vnet.ibm.com" <sachinp@linux.vnet.ibm.com>,
	"mpe@ellerman.id.au" <mpe@ellerman.id.au>
Subject: Re: [BUG][bisected 270065e] linux-next fails to boot on powerpc
Date: Thu, 17 Aug 2017 15:52:45 +0000	[thread overview]
Message-ID: <1502985161.2615.8.camel@wdc.com> (raw)
In-Reply-To: <2f686064-3e32-df8d-134f-962b5181da9d@linux.vnet.ibm.com>

On Wed, 2017-08-16 at 18:18 -0500, Brian King wrote:
> On 08/16/2017 12:21 PM, Bart Van Assche wrote:
> > On Wed, 2017-08-16 at 22:30 +0530, Abdul Haleem wrote:
> > > As of next-20170809, linux-next on powerpc boot hung with below trace
> > > message.
> > > 
> > > [ ... ]
> > > 
> > > A bisection resulted in first bad commit (270065e92 - scsi: scsi-mq:
> > > Always unprepare ...) in the merge branch 'scsi/for-next'
> > > 
> > > System booted fine when the below commit is reverted: 
> > > 
> > > commit 270065e92c317845d69095ec8e3d18616b5b39d5
> > > Author: Bart Van Assche <bart.vanassche@wdc.com>
> > > Date:   Thu Aug 3 14:40:14 2017 -0700
> > > 
> > >     scsi: scsi-mq: Always unprepare before requeuing a request
> > 
> > Hello Brian and Michael,
> > 
> > Do you agree that this probably indicates a bug in the PowerPC block driver
> > that is used to access the boot disk? Anyway, since a solution is not yet
> > available, I will submit a revert for this patch.
> 
> I've been looking at this a bit, and can recreate the issue, but haven't
> got to root cause of the issue as of yet. If I do a sysrq-w while the system is hung
> during boot I see this:
> 
> [   25.561523] Workqueue: events_unbound async_run_entry_fn
> [   25.561527] Call Trace:
> [   25.561529] [c0000001697873f0] [c000000169701600] 0xc000000169701600 (unreliable)
> [   25.561534] [c0000001697875c0] [c00000000001ab78] __switch_to+0x2e8/0x430
> [   25.561539] [c000000169787620] [c00000000091ccb0] __schedule+0x310/0xa00
> [   25.561543] [c0000001697876f0] [c00000000091d3e0] schedule+0x40/0xb0
> [   25.561548] [c000000169787720] [c000000000921e40] schedule_timeout+0x200/0x430
> [   25.561553] [c000000169787810] [c00000000091db10] io_schedule_timeout+0x30/0x70
> [   25.561558] [c000000169787840] [c00000000091e978] wait_for_common_io.constprop.3+0x178/0x280
> [   25.561563] [c0000001697878c0] [c00000000047f7ec] blk_execute_rq+0x7c/0xd0
> [   25.561567] [c000000169787910] [c000000000614cd0] scsi_execute+0x100/0x230
> [   25.561572] [c000000169787990] [c00000000060d29c] scsi_report_opcode+0xbc/0x170
> [   25.561577] [c000000169787a50] [d000000004fe6404] sd_revalidate_disk+0xe04/0x1620 [sd_mod]
> [   25.561583] [c000000169787b80] [d000000004fe6d84] sd_probe_async+0xb4/0x230 [sd_mod]
> [   25.561588] [c000000169787c00] [c00000000010fc44] async_run_entry_fn+0x74/0x210
> [   25.561593] [c000000169787c90] [c000000000102f48] process_one_work+0x198/0x480
> [   25.561598] [c000000169787d30] [c0000000001032b8] worker_thread+0x88/0x510
> [   25.561603] [c000000169787dc0] [c00000000010b030] kthread+0x160/0x1a0
> [   25.561608] [c000000169787e30] [c00000000000b3a4] ret_from_kernel_thread+0x5c/0xb8
> 
> I was noticing that we are commonly in scsi_report_opcode. Since ipr RAID arrays don't support
> the MAINTENANCE_IN / MI_REPORT_SUPPORTED_OPERATION_CODES, I tried setting sdev->no_report_opcodes = 1
> in ipr's slave configure. This seems to eliminate the boot hang for me, but is only working around
> the issue. Since this command is not supported by ipr, it should return with an illegal request.
> When I'm hung at this point, there is nothing outstanding to the adapter / driver. I'll continue
> debugging...

(+linux-scsi)

Hello Brian,

Is kernel debugging enabled on your test system? Is lockdep enabled?
Anyway, stack traces like the above usually mean that a request got stuck in
a block or scsi driver (ipr in this case). Information about pending requests,
including the SCSI CDB, is available under /sys/kernel/debug/block (see also
commit 0eebd005dd07 ("scsi: Implement blk_mq_ops.show_rq()")).

Bart.

  reply	other threads:[~2017-08-17 15:53 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-16 17:00 [BUG][bisected 270065e] linux-next fails to boot on powerpc Abdul Haleem
2017-08-16 17:21 ` Bart Van Assche
2017-08-16 23:18   ` Brian King
2017-08-17 15:52     ` Bart Van Assche [this message]
2017-08-18 21:04       ` Brian King
2017-08-18 21:17         ` [PATCH] ipr: Set no_report_opcodes for RAID arrays Brian King
2017-08-21 20:22           ` Martin K. Petersen
2017-08-18 21:41         ` [BUG][bisected 270065e] linux-next fails to boot on powerpc Bart Van Assche
2017-08-18 21:57           ` Brian King
2017-08-18 22:13             ` Bart Van Assche
2017-08-21 22:11               ` [PATCH 0/2] Allow scsi_prep_fn to occur for retried commands Brian King
2017-08-21 22:13                 ` [PATCH 1/2] scsi: Move scsi_cmd->jiffies_at_alloc initialization to allocation time Brian King
2017-08-21 22:16                   ` Brian King
2017-08-21 22:40                   ` [PATCHv2 " Brian King
2017-08-22  6:51                   ` [PATCH " hch
2017-08-21 22:14                 ` [PATCH 2/2] scsi: Preserve retry counter through scsi_prep_fn Brian King
2017-08-22  6:51                   ` hch
2017-08-22  6:42                 ` [PATCH 0/2] Allow scsi_prep_fn to occur for retried commands Abdul Haleem
2017-08-21 20:27             ` [BUG][bisected 270065e] linux-next fails to boot on powerpc Martin K. Petersen
2017-08-17  1:33   ` Michael Ellerman
2017-08-16 20:25 ` Bart Van Assche
2017-08-17  7:06   ` Michael Ellerman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1502985161.2615.8.camel@wdc.com \
    --to=bart.vanassche@wdc.com \
    --cc=abdhalee@linux.vnet.ibm.com \
    --cc=brking@linux.vnet.ibm.com \
    --cc=hare@suse.com \
    --cc=hch@lst.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-next@vger.kernel.org \
    --cc=linux-scsi@vger.kernel.org \
    --cc=linuxppc-dev@lists.ozlabs.org \
    --cc=mpe@ellerman.id.au \
    --cc=sachinp@linux.vnet.ibm.com \
    --cc=sfr@canb.auug.org.au \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox