All of lore.kernel.org
 help / color / mirror / Atom feed
From: "tj@kernel.org" <tj@kernel.org>
To: Bart Van Assche <Bart.VanAssche@wdc.com>
Cc: "hch@lst.de" <hch@lst.de>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"axboe@kernel.dk" <axboe@kernel.dk>
Subject: Re: [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling
Date: Thu, 8 Feb 2018 09:00:10 -0800	[thread overview]
Message-ID: <20180208170010.GP695913@devbig577.frc2.facebook.com> (raw)
In-Reply-To: <1518107501.3611.19.camel@wdc.com>

Hello, Bart.

On Thu, Feb 08, 2018 at 04:31:43PM +0000, Bart Van Assche wrote:
> > That sounds more like a scsi hotplug bug than an issue in the timeout
> > code unless we messed up @req pointer to begin with.
> 
> I don't think that this is related to SCSI hotplugging: this crash does not
> occur with the v4.15 block layer core and it does not occur with my timeout
> handler rework patch applied either. I think that means that we cannot
> exclude the block layer core timeout handler rework as a possible cause.
> 
> The disassembler output is as follows:
> 
> (gdb) disas /s scsi_times_out
> Dump of assembler code for function scsi_times_out:
> drivers/scsi/scsi_error.c:
> 282     {
>    0x0000000000005bd0 <+0>:     push   %r13
>    0x0000000000005bd2 <+2>:     push   %r12
>    0x0000000000005bd4 <+4>:     push   %rbp
> ./include/linux/blk-mq.h:
> 300             return rq + 1;
>    0x0000000000005bd5 <+5>:     lea    0x178(%rdi),%rbp
> drivers/scsi/scsi_error.c:
> 282     {
>    0x0000000000005bdc <+12>:    push   %rbx
> 283             struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> 284             enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
> 285             struct Scsi_Host *host = scmd->device->host;
>    0x0000000000005bdd <+13>:    mov    0x1b0(%rdi),%rax
> 282     {
>    0x0000000000005be4 <+20>:    mov    %rdi,%rbx
> 283             struct scsi_cmnd *scmd = blk_mq_rq_to_pdu(req);
> 284             enum blk_eh_timer_return rtn = BLK_EH_NOT_HANDLED;
> 285             struct Scsi_Host *host = scmd->device->host;
>    0x0000000000005be7 <+23>:    mov    (%rax),%r13
>    0x0000000000005bea <+26>:    nopl   0x0(%rax,%rax,1)
> [ ... ]
> (gdb) print /x sizeof(struct request)
> $2 = 0x178
> (gdb) print &(((struct scsi_cmnd*)0)->device)
> $4 = (struct scsi_device **) 0x38 <scsi_cmd_get_serial+8>
> (gdb) print &(((struct scsi_device*)0)->host)       
> $5 = (struct Scsi_Host **) 0x0
> 
> The crash is reported at address scsi_times_out+0x17 == scsi_times_out+23. The
> instruction at that address tries to dereference scsi_cmnd.device (%rax). The
> register dump shows that that pointer has the value NULL. The only function I
> know of that clears the scsi_cmnd.device pointer is scsi_req_init(). The only
> caller of that function in the SCSI core is scsi_initialize_rq(). That function
> has two callers, namely scsi_init_command() and blk_get_request(). However,
> the scsi_cmnd.device pointer is not cleared when a request finishes. This is
> why I think that the above crash report indicates that scsi_times_out() was
> called for a request that was being reinitialized and not by device hotplugging.

I could be misreading it but scsi_cmnd->device dereference should be
the following.

    0x0000000000005bdd <+13>:    mov    0x1b0(%rdi),%rax

%rdi is @req, 0x1b0(%rdi) seems to be the combined arithmetic of
blk_mq_rq_to_pdu() and ->device dereference - 0x178 + 0x38.  The
faulting access is (%rax), which is deref'ing host from device.

Thanks.

-- 
tejun

  reply	other threads:[~2018-02-08 17:00 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-07  1:11 [PATCH v2] blk-mq: Fix race between resetting the timer and completion handling Bart Van Assche
2018-02-07 17:06 ` Tejun Heo
2018-02-07 17:27   ` Bart Van Assche
2018-02-07 17:35     ` tj
2018-02-07 18:14       ` Bart Van Assche
2018-02-07 20:07         ` tj
2018-02-07 23:48           ` Bart Van Assche
2018-02-08  1:09             ` Bart Van Assche
2018-02-08 15:39               ` tj
2018-02-08 15:40                 ` tj
2018-02-08 16:31                 ` Bart Van Assche
2018-02-08 17:00                   ` tj [this message]
2018-02-08 17:10                     ` Bart Van Assche
2018-02-08 17:19                       ` tj
2018-02-08 17:37                         ` Bart Van Assche
2018-02-08 17:40                           ` tj
2018-02-08 17:48                             ` Bart Van Assche
2018-02-08 17:54                               ` tj
2018-02-13 21:20                   ` tj
2018-02-14 16:58                     ` Bart Van Assche
2018-02-18 13:11                       ` tj
2018-02-21 18:53                         ` Bart Van Assche
2018-02-21 19:21                           ` tj
2018-02-21 22:55                             ` Bart Van Assche
2018-02-07 19:03   ` Bart Van Assche
2018-02-07 20:09     ` tj
2018-02-07 21:02       ` Bart Van Assche
2018-02-07 21:40         ` tj

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180208170010.GP695913@devbig577.frc2.facebook.com \
    --to=tj@kernel.org \
    --cc=Bart.VanAssche@wdc.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.