From: "tj@kernel.org" <tj@kernel.org>
To: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "hch@lst.de" <hch@lst.de>,
"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling
Date: Thu, 8 Feb 2018 09:00:10 -0800 [thread overview]
Message-ID: <20180208170010.GP695913@devbig577.frc2.facebook.com> (raw)
In-Reply-To: <1518107501.3611.19.camel@wdc.com>
Hello, Bart.
On Thu, Feb 08, 2018 at 04:31:43PM +0000, Bart Van Assche wrote:
> > That sounds more like a scsi hotplug bug than an issue in the timeout
> > code unless we messed up @req pointer to begin with.
>
> I don't think that this is related to SCSI hotplugging: this crash does not
> occur with the v4.15 block layer core and it does not occur with my timeout
> handler rework patch applied either. I think that means that we cannot
> exclude the block layer core timeout handler rework as a possible cause.
>
> The disassembler output is as follows:
>
> (gdb) disas /s scsi_times_out
> Dump of assembler code for function scsi_times_out:
> drivers/scsi/scsi_error.c:
> 282 {
> 0x0000000000005bd0 <+0>: push %r13
> 0x0000000000005bd2 <+2>: push %r12
> 0x0000000000005bd4 <+4>: push %rbp
> ./include/linux/blk-mq.h:
> 300 return rq + 1;
> 0x0000000000005bd5 <+5>: lea 0x178(%rdi),%rbp
> drivers/scsi/scsi_error.c:
> 282 {
> 0x0000000000005bdc <+12>: push %rbx
> 283 struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> 284 enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
> 285 struct Scsi_Host *host = scmd->device->host;
> 0x0000000000005bdd <+13>: mov 0x1b0(%rdi),%rax
> 282 {
> 0x0000000000005be4 <+20>: mov %rdi,%rbx
> 283 struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> 284 enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
> 285 struct Scsi_Host *host = scmd->device->host;
> 0x0000000000005be7 <+23>: mov (%rax),%r13
> 0x0000000000005bea <+26>: nopl 0x0(%rax,%rax,1)
> [ ... ]
> (gdb) print /x sizeof(struct request)
> $2 = 0x178
> (gdb) print &(((struct scsi_cmnd*)0)->device)
> $4 = (struct scsi_device **) 0x38 <scsi_cmd_get_serial+8>
> (gdb) print &(((struct scsi_device*)0)->host)
> $5 = (struct Scsi_Host **) 0x0
>
> The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23. The
> instruction at that address tries to dereference scsi_cmnd.device (%rax). The
> register dump shows that that pointer has the value NULL. The only function I
> know of that clears the scsi_cmnd.device pointer is scsi_req_init(). The only
> caller of that function in the SCSI core is scsi_initialize_rq(). That function
> has two callers, namely scsi_init_command() and blk_get_request(). However,
> the scsi_cmnd.device pointer is not cleared when a request finishes. This is
> why I think that the above crash report indicates that scsi_times_out() was
> called for a request that was being reinitialized and not by device hotplugging.
I could be misreading it but scsi_cmnd->device dereference should be
the following.
0x0000000000005bdd <+13>: mov 0x1b0(%rdi),%rax
%rdi is @req, 0x1b0(%rdi) seems to be the combined arithmetic of
blk_mq_rq_to_pdu() and ->device dereference - 0x178 + 0x38. The
faulting access is (%rax), which is deref'ing host from device.
Thanks.
--
tejun
next prev parent reply other threads:[~2018-02-08 17:00 UTC|newest]
Thread overview: 28+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-02-07 1:11 [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling Bart Van Assche
2018-02-07 17:06 ` Tejun Heo
2018-02-07 17:27 ` Bart Van Assche
2018-02-07 17:35 ` tj
2018-02-07 18:14 ` Bart Van Assche
2018-02-07 20:07 ` tj
2018-02-07 23:48 ` Bart Van Assche
2018-02-08 1:09 ` Bart Van Assche
2018-02-08 15:39 ` tj
2018-02-08 15:40 ` tj
2018-02-08 16:31 ` Bart Van Assche
2018-02-08 17:00 ` tj [this message]
2018-02-08 17:10 ` Bart Van Assche
2018-02-08 17:19 ` tj
2018-02-08 17:37 ` Bart Van Assche
2018-02-08 17:40 ` tj
2018-02-08 17:48 ` Bart Van Assche
2018-02-08 17:54 ` tj
2018-02-13 21:20 ` tj
2018-02-14 16:58 ` Bart Van Assche
2018-02-18 13:11 ` tj
2018-02-21 18:53 ` Bart Van Assche
2018-02-21 19:21 ` tj
2018-02-21 22:55 ` Bart Van Assche
2018-02-07 19:03 ` Bart Van Assche
2018-02-07 20:09 ` tj
2018-02-07 21:02 ` Bart Van Assche
2018-02-07 21:40 ` tj
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180208170010.GP695913@devbig577.frc2.facebook.com \
--to=tj@kernel.org \
--cc=Bart.VanAssche@wdc.com \
--cc=axboe@kernel.dk \
--cc=hch@lst.de \
--cc=linux-block@vger.kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.