linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Bart Van Assche <Bart.VanAssche@sandisk.com>
To: "hch@lst.de" <hch@lst.de>, "axboe@kernel.dk" <axboe@kernel.dk>
Cc: "torvalds@linux-foundation.org" <torvalds@linux-foundation.org>,
	"linux-kernel@vger.kernel.org" <linux-kernel@vger.kernel.org>,
	"linux-block@vger.kernel.org" <linux-block@vger.kernel.org>,
	"snitzer@redhat.com" <snitzer@redhat.com>
Subject: Re: [GIT PULL] Block pull request for- 4.11-rc1
Date: Fri, 24 Feb 2017 17:39:43 +0000	[thread overview]
Message-ID: <1487957968.2575.6.camel@sandisk.com> (raw)
In-Reply-To: <f9b3b2ca-b19b-7c4a-52bb-789c3e2b39e5@kernel.dk>

On Mon, 2017-02-20 at 09:32 -0700, Jens Axboe wrote:
> On 02/20/2017 09:16 AM, Bart Van Assche wrote:
> > On 02/19/2017 11:35 PM, Christoph Hellwig wrote:
> > > On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote:
> > > > That said, we will look into this again, of course. Christoph, any idea?
> > > 
> > > No idea really - this seems so far away from the code touched, and there
> > > are no obvious signs for a memory scamble from another object touched
> > > that I think if it really bisects down to that issue it must be a timing
> > > issue.
> > > 
> > > But reading Bart's message again:  Did you actually bisect it down
> > > to the is commit?  Or just test the whole tree?  Between the 4.10-rc5
> > > merge and all the block tree there might a few more likely suspects
> > > like the scsi bdi lifetime fixes that James mentioned.
> > 
> > Hello Christoph,
> > 
> > As far as I know Jens does not rebase his trees so we can use the commit
> > date to check which patch went in when. From the first of Jan's bdi patches:
> > 
> > CommitDate: Thu Feb 2 08:18:41 2017 -0700
> > 
> > So the bdi patches went in several days after I reported the general protection
> > fault issue.
> > 
> > In an e-mail of January 30th I wrote the following: "Running the srp-test
> > software against kernel 4.9.6 and kernel 4.10-rc5 went fine.  With your
> > for-4.11/block branch (commit 400f73b23f457a) however I just ran into
> > the following warning: [ ... ]" That means that I did not hit the crash with
> > Jens' for-4.11/block branch but only with the for-next branch. The patches
> > on Jens' for-next branch after that commit that were applied before I ran
> > my test are:
> > 
> > $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block drivers/md/dm{,-mpath,-table}.[ch]
> > fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in __blk_rq_prep_clone
> > 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of struct request
> > 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation into bsg_setup_queue
> > eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocation to the owner of the request_queue
> > 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for extra command data
> > 5ea708d15a928f7a479987704203616d3274c03b block: simplify blk_init_allocated_queue
> > e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check
> > f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into for-4.11/rq-refactor
> > 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable
> > bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from __blkdev_issue_zeroout
> > f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size and bdev_zone_size
> > 
> > Do you see any patch in the above list that does not belong to the "split
> > scsi passthrough fields out of struct request" series and that could have
> > caused the reported behavior change?
> 
> Bart, since you are the only one that can reproduce this, can you just bisect
> your way through that series?

Hello Jens,

Since Christoph also has access to IB hardware I will leave it to Christoph
to do the bisect. Anyway, I just reproduced this crash with Linus' current
tree (commit f1ef09fde17f) by running srp-test/run_tests -r 10 -t 02-sq-on-mq
(see also https://github.com/bvanassche/srp-test):

[ 1629.920553] general protection fault: 0000 [#1] SMP
[ 1629.921193] CPU: 6 PID: 46 Comm: ksoftirqd/6 Tainted: G          I     4.10.0-dbg+ #1
[ 1629.921289] RIP: 0010:rq_completed+0x12/0x90 [dm_mod]
[ 1629.921316] RSP: 0018:ffffc90001bdbda8 EFLAGS: 00010246
[ 1629.921344] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000
[ 1629.921372] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b
[ 1629.921401] RBP: ffffc90001bdbdc0 R08: ffff8803a3858d48 R09: 0000000000000000
[ 1629.921429] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000
[ 1629.921458] R13: 0000000000000000 R14: ffffffff81c05120 R15: 0000000000000004
[ 1629.921489] FS:  0000000000000000(0000) GS:ffff88046ef80000(0000) knlGS:0000000000000000
[ 1629.921520] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 1629.921547] CR2: 00007fb6324486b8 CR3: 0000000001c0f000 CR4: 00000000001406e0
[ 1629.921576] Call Trace:
[ 1629.921605]  dm_softirq_done+0xe6/0x1e0 [dm_mod]
[ 1629.921637]  blk_done_softirq+0x88/0xa0
[ 1629.921663]  __do_softirq+0xba/0x4c0
[ 1629.921744]  run_ksoftirqd+0x1a/0x50
[ 1629.921769]  smpboot_thread_fn+0x123/0x1e0
[ 1629.921797]  kthread+0x107/0x140
[ 1629.921944]  ret_from_fork+0x2e/0x40
[ 1629.921972] Code: ff ff 31 f6 48 89 c7 e8 ed 96 2f e1 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 63 f6 48 89 e5 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b a7 70 02 00 00 f0 ff 8c b7 38 03 00 00 e8 3a 43 ff ff 85 
[ 1629.922093] RIP: rq_completed+0x12/0x90 [dm_mod] RSP: ffffc90001bdbda8

$ gdb drivers/md/dm-mod.ko
(gdb) list *(rq_completed+0x12)    
0xdf62 is in rq_completed (drivers/md/dm-rq.c:187).
182      * the md may be freed in dm_put() at the end of this function.
183      * Or do dm_get() before calling this function and dm_put() later.
184      */
185     static void rq_completed(struct mapped_device *md, int rw, bool run_queue)
186     {
187             struct request_queue *q = md->queue;
188             unsigned long flags;
189
190             atomic_dec(&md->pending[rw]);
191
(gdb) disas rq_completed  
Dump of assembler code for function rq_completed:
   0x000000000000df50 <+0>:     push   %rbp
   0x000000000000df51 <+1>:     movslq %esi,%rsi
   0x000000000000df54 <+4>:     mov    %rsp,%rbp
   0x000000000000df57 <+7>:     push   %r13
   0x000000000000df59 <+9>:     mov    %edx,%r13d
   0x000000000000df5c <+12>:    push   %r12
   0x000000000000df5e <+14>:    push   %rbx
   0x000000000000df5f <+15>:    mov    %rdi,%rbx
   0x000000000000df62 <+18>:    mov    0x270(%rdi),%r12
[ ... ]

So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6b6b
at offset 0x270. I think this means the crash is caused by a use-after-free.

Bart.

  parent reply	other threads:[~2017-02-24 17:40 UTC|newest]

Thread overview: 37+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-02-20  0:10 [GIT PULL] Block pull request for- 4.11-rc1 Jens Axboe
2017-02-20  1:09 ` Bart Van Assche
2017-02-20  1:15   ` Jens Axboe
2017-02-20  2:12     ` James Bottomley
2017-02-20  2:59       ` Jens Axboe
2017-02-20  3:02         ` Jens Axboe
2017-02-20  7:35     ` Christoph Hellwig
2017-02-20 16:16       ` Bart Van Assche
2017-02-20 16:32         ` Jens Axboe
2017-02-21  1:18           ` Bart Van Assche
2017-02-24 17:39           ` Bart Van Assche [this message]
2017-02-24 17:51             ` Jens Axboe
2017-02-24 19:43             ` Linus Torvalds
2017-02-24 20:00               ` Jens Axboe
2017-02-24 20:22                 ` Jens Axboe
2017-02-24 21:15                   ` Bart Van Assche
2017-02-25 18:17                   ` hch
2017-02-25 18:22                     ` Jens Axboe
2017-02-21 19:11 ` Linus Torvalds
2017-02-21 19:34   ` Jens Axboe
2017-02-21 23:02   ` Linus Torvalds
2017-02-21 23:15     ` Jens Axboe
2017-02-21 23:23       ` Linus Torvalds
2017-02-22 18:14         ` Jens Axboe
2017-02-22 18:26           ` Linus Torvalds
2017-02-22 18:41             ` Jens Axboe
2017-02-22 18:45               ` Linus Torvalds
2017-02-22 18:52                 ` Jens Axboe
2017-02-22 18:56                   ` Linus Torvalds
2017-02-22 18:58                     ` Jens Axboe
2017-02-22 19:04                       ` Linus Torvalds
2017-02-22 21:29                         ` Jens Axboe
2017-02-22 18:42             ` Linus Torvalds
2017-02-22 18:44               ` Jens Axboe
2017-02-22 21:50                 ` Markus Trippelsdorf
2017-02-22 21:55                   ` Jens Axboe
2017-02-23  0:16                   ` Linus Torvalds

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1487957968.2575.6.camel@sandisk.com \
    --to=bart.vanassche@sandisk.com \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=snitzer@redhat.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).