public inbox for linux-block@vger.kernel.org
 help / color / mirror / Atom feed
From: "tj@kernel.org" <tj@kernel.org>
To: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "hch@lst.de" <hch@lst.de>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling
Date: Thu, 8 Feb 2018 09:00:10 -0800	[thread overview]
Message-ID: <20180208170010.GP695913@devbig577.frc2.facebook.com> (raw)
In-Reply-To: <1518107501.3611.19.camel@wdc.com>

Hello, Bart.

On Thu, Feb 08, 2018 at 04:31:43PM +0000, Bart Van Assche wrote:
> > That sounds more like a scsi hotplug bug than an issue in the timeout
> > code unless we messed up @req pointer to begin with.
> 
> I don't think that this is related to SCSI hotplugging: this crash does not
> occur with the v4.15 block layer core and it does not occur with my timeout
> handler rework patch applied either. I think that means that we cannot
> exclude the block layer core timeout handler rework as a possible cause.
> 
> The disassembler output is as follows:
> 
> (gdb) disas /s scsi_times_out
> Dump of assembler code for function scsi_times_out:
> drivers/scsi/scsi_error.c:
> 282     {
>    0x0000000000005bd0 <+0>:     push   %r13
>    0x0000000000005bd2 <+2>:     push   %r12
>    0x0000000000005bd4 <+4>:     push   %rbp
> ./include/linux/blk-mq.h:
> 300             return rq + 1;
>    0x0000000000005bd5 <+5>:     lea    0x178(%rdi),%rbp
> drivers/scsi/scsi_error.c:
> 282     {
>    0x0000000000005bdc <+12>:    push   %rbx
> 283             struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> 284             enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
> 285             struct Scsi_Host *host = scmd->device->host;
>    0x0000000000005bdd <+13>:    mov    0x1b0(%rdi),%rax
> 282     {
>    0x0000000000005be4 <+20>:    mov    %rdi,%rbx
> 283             struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> 284             enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
> 285             struct Scsi_Host *host = scmd->device->host;
>    0x0000000000005be7 <+23>:    mov    (%rax),%r13
>    0x0000000000005bea <+26>:    nopl   0x0(%rax,%rax,1)
> [ ... ]
> (gdb) print /x sizeof(struct request)
> $2 = 0x178
> (gdb) print &(((struct scsi_cmnd*)0)->device)
> $4 = (struct scsi_device **) 0x38 <scsi_cmd_get_serial+8>
> (gdb) print &(((struct scsi_device*)0)->host)       
> $5 = (struct Scsi_Host **) 0x0
> 
> The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23. The
> instruction at that address tries to dereference scsi_cmnd.device (%rax). The
> register dump shows that that pointer has the value NULL. The only function I
> know of that clears the scsi_cmnd.device pointer is scsi_req_init(). The only
> caller of that function in the SCSI core is scsi_initialize_rq(). That function
> has two callers, namely scsi_init_command() and blk_get_request(). However,
> the scsi_cmnd.device pointer is not cleared when a request finishes. This is
> why I think that the above crash report indicates that scsi_times_out() was
> called for a request that was being reinitialized and not by device hotplugging.

I could be misreading it but scsi_cmnd->device dereference should be
the following.

    0x0000000000005bdd <+13>:    mov    0x1b0(%rdi),%rax

%rdi is @req, 0x1b0(%rdi) seems to be the combined arithmetic of
blk_mq_rq_to_pdu() and ->device dereference - 0x178 + 0x38.  The
faulting access is (%rax), which is deref'ing host from device.

Thanks.

-- 
tejun

  reply	other threads:[~2018-02-08 17:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-07  1:11 [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling Bart Van Assche
2018-02-07 17:06 ` Tejun Heo
2018-02-07 17:27   ` Bart Van Assche
2018-02-07 17:35     ` tj
2018-02-07 18:14       ` Bart Van Assche
2018-02-07 20:07         ` tj
2018-02-07 23:48           ` Bart Van Assche
2018-02-08  1:09             ` Bart Van Assche
2018-02-08 15:39               ` tj
2018-02-08 15:40                 ` tj
2018-02-08 16:31                 ` Bart Van Assche
2018-02-08 17:00                   ` tj [this message]
2018-02-08 17:10                     ` Bart Van Assche
2018-02-08 17:19                       ` tj
2018-02-08 17:37                         ` Bart Van Assche
2018-02-08 17:40                           ` tj
2018-02-08 17:48                             ` Bart Van Assche
2018-02-08 17:54                               ` tj
2018-02-13 21:20                   ` tj
2018-02-14 16:58                     ` Bart Van Assche
2018-02-18 13:11                       ` tj
2018-02-21 18:53                         ` Bart Van Assche
2018-02-21 19:21                           ` tj
2018-02-21 22:55                             ` Bart Van Assche
2018-02-07 19:03   ` Bart Van Assche
2018-02-07 20:09     ` tj
2018-02-07 21:02       ` Bart Van Assche
2018-02-07 21:40         ` tj

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180208170010.GP695913@devbig577.frc2.facebook.com \
    --to=tj@kernel.org \
    --cc=Bart.VanAssche@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox