From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: From: Bart Van Assche To: "hch@lst.de" , "axboe@kernel.dk" CC: "torvalds@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "snitzer@redhat.com" Subject: Re: [GIT PULL] Block pull request for- 4.11-rc1 Date: Fri, 24 Feb 2017 17:39:43 +0000 Message-ID: <1487957968.2575.6.camel@sandisk.com> References: <1D08B61A9CF0974AA09887BE32D889DA0A74F3@ULS-OP-MBXIP03.sdcorp.global.sandisk.com> <633d226d-cec3-a01f-a069-ffff307e9715@kernel.dk> <20170220073539.GA17687@lst.de> <1D08B61A9CF0974AA09887BE32D889DA0A8CBD@ULS-OP-MBXIP03.sdcorp.global.sandisk.com> In-Reply-To: Content-Type: text/plain; charset="iso-8859-1" MIME-Version: 1.0 Return-Path: Bart.VanAssche@sandisk.com List-ID: On Mon, 2017-02-20 at 09:32 -0700, Jens Axboe wrote: > On 02/20/2017 09:16 AM, Bart Van Assche wrote: > > On 02/19/2017 11:35 PM, Christoph Hellwig wrote: > > > On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote: > > > > That said, we will look into this again, of course. Christoph, any = idea? > > >=20 > > > No idea really - this seems so far away from the code touched, and th= ere > > > are no obvious signs for a memory scamble from another object touched > > > that I think if it really bisects down to that issue it must be a tim= ing > > > issue. > > >=20 > > > But reading Bart's message again: Did you actually bisect it down > > > to the is commit? Or just test the whole tree? Between the 4.10-rc5 > > > merge and all the block tree there might a few more likely suspects > > > like the scsi bdi lifetime fixes that James mentioned. > >=20 > > Hello Christoph, > >=20 > > As far as I know Jens does not rebase his trees so we can use the commi= t > > date to check which patch went in when. From the first of Jan's bdi pat= ches: > >=20 > > CommitDate: Thu Feb 2 08:18:41 2017 -0700 > >=20 > > So the bdi patches went in several days after I reported the general pr= otection > > fault issue. > >=20 > > In an e-mail of January 30th I wrote the following: "Running the srp-te= st > > software against kernel 4.9.6 and kernel 4.10-rc5 went fine. With your > > for-4.11/block branch (commit 400f73b23f457a) however I just ran into > > the following warning: [ ... ]" That means that I did not hit the crash= with > > Jens' for-4.11/block branch but only with the for-next branch. The patc= hes > > on Jens' for-next branch after that commit that were applied before I r= an > > my test are: > >=20 > > $ PAGER=3D git log --format=3Doneline 400f73b23f457a..fb045ca25cc7 bloc= k drivers/md/dm{,-mpath,-table}.[ch] > > fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags = in __blk_rq_prep_clone > > 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out = of struct request > > 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation= into bsg_setup_queue > > eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request alloc= ation to the owner of the request_queue > > 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size f= or extra command data > > 5ea708d15a928f7a479987704203616d3274c03b block: simplify blk_init_alloc= ated_queue > > e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check > > f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' = into for-4.11/rq-refactor > > 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable > > bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard fr= om __blkdev_issue_zeroout > > f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_s= ize and bdev_zone_size > >=20 > > Do you see any patch in the above list that does not belong to the "spl= it > > scsi passthrough fields out of struct request" series and that could ha= ve > > caused the reported behavior change? >=20 > Bart, since you are the only one that can reproduce this, can you just bi= sect > your way through that series? Hello Jens, Since Christoph also has access to IB hardware I will leave it to Christoph to do the bisect. Anyway, I just reproduced this crash with Linus' current tree (commit f1ef09fde17f) by running=A0srp-test/run_tests -r 10 -t 02-sq-o= n-mq (see also=A0https://github.com/bvanassche/srp-test): [ 1629.920553] general protection fault: 0000 [#1] SMP [ 1629.921193] CPU: 6 PID: 46 Comm: ksoftirqd/6 Tainted: G I 4= .10.0-dbg+ #1 [ 1629.921289] RIP: 0010:rq_completed+0x12/0x90 [dm_mod] [ 1629.921316] RSP: 0018:ffffc90001bdbda8 EFLAGS: 00010246 [ 1629.921344] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 00000000000= 00000 [ 1629.921372] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6= b6b6b [ 1629.921401] RBP: ffffc90001bdbdc0 R08: ffff8803a3858d48 R09: 00000000000= 00000 [ 1629.921429] R10: 0000000000000000 R11: 0000000000000000 R12: 00000000000= 00000 [ 1629.921458] R13: 0000000000000000 R14: ffffffff81c05120 R15: 00000000000= 00004 [ 1629.921489] FS: 0000000000000000(0000) GS:ffff88046ef80000(0000) knlGS:= 0000000000000000 [ 1629.921520] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1629.921547] CR2: 00007fb6324486b8 CR3: 0000000001c0f000 CR4: 00000000001= 406e0 [ 1629.921576] Call Trace: [ 1629.921605] dm_softirq_done+0xe6/0x1e0 [dm_mod] [ 1629.921637] blk_done_softirq+0x88/0xa0 [ 1629.921663] __do_softirq+0xba/0x4c0 [ 1629.921744] run_ksoftirqd+0x1a/0x50 [ 1629.921769] smpboot_thread_fn+0x123/0x1e0 [ 1629.921797] kthread+0x107/0x140 [ 1629.921944] ret_from_fork+0x2e/0x40 [ 1629.921972] Code: ff ff 31 f6 48 89 c7 e8 ed 96 2f e1 5d c3 90 66 2e 0f = 1f 84 00 00 00 00 00 55 48 63 f6 48 89 e5 41 55 41 89 d5 41 54 53 48 89 fb = <4c> 8b a7 70 02 00 00 f0 ff 8c b7 38 03 00 00 e8 3a 43 ff ff 85=20 [ 1629.922093] RIP: rq_completed+0x12/0x90 [dm_mod] RSP: ffffc90001bdbda8 $ gdb drivers/md/dm-mod.ko (gdb) list *(rq_completed+0x12) =A0=A0=A0 0xdf62 is in rq_completed (drivers/md/dm-rq.c:187). 182 =A0=A0=A0=A0=A0* the md may be freed in dm_put() at the end of this fun= ction. 183 =A0=A0=A0=A0=A0* Or do dm_get() before calling this function and dm_put= () later. 184 =A0=A0=A0=A0=A0*/ 185 =A0=A0=A0=A0static void rq_completed(struct mapped_device *md, int rw, = bool run_queue) 186 =A0=A0=A0=A0{ 187 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0struct request_queue *q =3D md->que= ue; 188 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0unsigned long flags; 189 190 =A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0=A0atomic_dec(&md->pending[rw]); 191 (gdb) disas rq_completed =A0 Dump of assembler code for function rq_completed: =A0=A00x000000000000df50 <+0>: =A0=A0=A0=A0push =A0=A0%rbp =A0=A00x000000000000df51 <+1>: =A0=A0=A0=A0movslq %esi,%rsi =A0=A00x000000000000df54 <+4>: =A0=A0=A0=A0mov =A0=A0=A0%rsp,%rbp =A0=A00x000000000000df57 <+7>: =A0=A0=A0=A0push =A0=A0%r13 =A0=A00x000000000000df59 <+9>: =A0=A0=A0=A0mov =A0=A0=A0%edx,%r13d =A0=A00x000000000000df5c <+12>: =A0=A0=A0push =A0=A0%r12 =A0=A00x000000000000df5e <+14>: =A0=A0=A0push =A0=A0%rbx =A0=A00x000000000000df5f <+15>: =A0=A0=A0mov =A0=A0=A0%rdi,%rbx =A0=A00x000000000000df62 <+18>: =A0=A0=A0mov =A0=A0=A00x270(%rdi),%r12 [ ... ] So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6= b6b at offset 0x270. I think this means the crash is caused by a use-after-free= . Bart.= From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1751430AbdBXRkh (ORCPT ); Fri, 24 Feb 2017 12:40:37 -0500 Received: from esa1.hgst.iphmx.com ([68.232.141.245]:37420 "EHLO esa1.hgst.iphmx.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751246AbdBXRk0 (ORCPT ); Fri, 24 Feb 2017 12:40:26 -0500 X-IronPort-AV: E=Sophos;i="5.35,201,1483977600"; d="scan'208";a="96471546" Authentication-Results: spf=pass (sender IP is 74.221.232.54) smtp.mailfrom=sandisk.com; vger.kernel.org; dkim=none (message not signed) header.d=none;vger.kernel.org; dmarc=bestguesspass action=none header.from=sandisk.com; X-AuditID: ac1c2133-99bff7000000c960-ee-58b06fe07ea4 From: Bart Van Assche To: "hch@lst.de" , "axboe@kernel.dk" CC: "torvalds@linux-foundation.org" , "linux-kernel@vger.kernel.org" , "linux-block@vger.kernel.org" , "snitzer@redhat.com" Subject: Re: [GIT PULL] Block pull request for- 4.11-rc1 Thread-Topic: [GIT PULL] Block pull request for- 4.11-rc1 Thread-Index: AQHSjsT77WTDjhhSnU2cBScVd6bd5w== Date: Fri, 24 Feb 2017 17:39:43 +0000 Message-ID: <1487957968.2575.6.camel@sandisk.com> References: <1D08B61A9CF0974AA09887BE32D889DA0A74F3@ULS-OP-MBXIP03.sdcorp.global.sandisk.com> <633d226d-cec3-a01f-a069-ffff307e9715@kernel.dk> <20170220073539.GA17687@lst.de> <1D08B61A9CF0974AA09887BE32D889DA0A8CBD@ULS-OP-MBXIP03.sdcorp.global.sandisk.com> In-Reply-To: Accept-Language: nl-NL, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [172.28.1.254] Content-Type: text/plain; charset="iso-8859-1" Content-ID: <2AB0B3AAE37C5D44A0E4415C010D38C7@sandisk.com> MIME-Version: 1.0 X-Brightmail-Tracker: H4sIAAAAAAAAA+NgFtrMIsWRmVeSWpSXmKPExsWyRobxn+7j/A0RBl+f8VmsvtvPZrFy9VEm i723tC0u75rDZtG28SujxaO+t+wObB6Xz5Z6nJjxm8Vj980GNo/3+66yeXzeJBfAGsVlk5Ka k1mWWqRvl8CV0bP7AlvBHJOKhr3XGRsYV2l3MXJySAiYSDQe+s/axcjFISSwhEni687dzBDO JUaJ3auPMoNUsQkYScyesIcFxBYRcJK4fnsqWBGzwAdGiZvfJrCBJIQFLCQefJkCVWQpsWnu PUYIW0/i3dnfTF2MHBwsAqoSx97yg4R5BQwlHm5fxw6x7COTRM/fB2wgNZwCthK92zhATEYB WYmW19wg5cwC4hK3nsxngjhaQGLJnvPMELaoxMvH/1ghbAWJzyv+sUHU60ncmDoFbCKzgJXE n/PqEGFtiWULXzNDXCAocXLmE5YJjGKzkGyYhaR7FkL3LCTds5B0L2BkXcUoVpyYXJybnlpg aKJXnJiXklmcrZecn7uJERyVisY7GP9tcD/EKMDBqMTDy2G1IUKINbGsuDL3EKMEB7OSCO/v aKAQb0piZVVqUX58UWlOavEhRmkOFiVx3pjZUyOEBNITS1KzU1MLUotgskwcnFINjELWcpKr Pc4bbu3NqdsXdZp5teyvn6Eqv9mCCvnqrPYZ8s1697b5U7qS91VHk7nnNaM6ZBt1v0dp601I +Lb1f0b4vicvtkea2Mq8ny578ZoM8ykB2WlWV2z7bm7+Ze+uZn5caeGuzVcn9Cx6fNze48rU X+HhHdJRsxRMN234uKb06J13Uot3/VBiKc5INNRiLipOBAD+p2ZzxgIAAA== X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-HT: Tenant X-Forefront-Antispam-Report: CIP:74.221.232.54;IPV:NLI;CTRY:US;EFV:NLI;SFV:NSPM;SFS:(10019020)(6009001)(7916002)(39860400002)(39450400003)(39850400002)(39840400002)(39410400002)(2980300002)(438002)(377424004)(377454003)(199003)(24454002)(52314003)(189002)(7736002)(54356999)(76176999)(6246003)(54906002)(2950100002)(93886004)(2501003)(2906002)(8936002)(38730400002)(33646002)(36756003)(2270400002)(626004)(356003)(53936002)(97736004)(47776003)(50986999)(50466002)(229853002)(305945005)(69596002)(8746002)(102836003)(6116002)(68736007)(4326007)(106116001)(3846002)(575784001)(53546006)(86362001)(103116003)(189998001)(106466001)(5660300001)(6306002)(92566002)(23756003)(81166006)(81156014)(8676002)(2900100001)(7099028);DIR:OUT;SFP:1102;SCL:1;SRVR:CY4PR04MB0742;H:sacsmgep14.sandisk.com;FPR:;SPF:Pass;PTR:InfoDomainNonexistent;A:1;MX:1;LANG:en; X-Microsoft-Exchange-Diagnostics: 1;BN3NAM04FT025;1:ff/w8wbjpKWntZaxfwXHQV8w+T3CVGqUQ/gpW29D4z5tTdyfXqsmxBmyncpin+gm1Axow7D82M5WX+rCQ1vpJwGHexb0yVBLpVV6rEXEcq3MNk3hxKdavk60kP6ejWvBWVAiY94IQi4aquTfEegn/GKRg5mJ0lYMt6U0w3OD9kaUioMX0P0oFsm3EszSTiIn3HVtUPuOxN2HXkIK0eSeQogJ6vMd0ZJvTtJpl5IDheE3RRcrrNRkbOxfWQuzx89fJNgYNHTUMaOmKaQUFNfM7jdbbEk45+gjvvm58y5daDPOZHcljEyspshyEffjOSPsFafQle3xvYIqH8gmi6Yorr5eSFc5vGGuFDpVF7PfhrCfh6Z7lCgYxd1139vB808bLLeRjaWXJ1Of0xJ38vjI+tLaShl0N3LQETErTF3n9qCBJfvZPUFbEBik64p1gAy9tDQFpOr010LjdT3D1WS6SF4ThlFWvboZPuwieevc8SukXlzAl1FkaxmfHKFhAuyfNraZQszhAOGrOfjYrhKRxVjsrIN7f1NlngtITBj5DgFhT8uAbjDuAavEgtLhrEmh X-MS-Office365-Filtering-Correlation-Id: 259ab30d-a037-4646-b288-08d45cdc20e2 X-Microsoft-Antispam: UriScan:;BCL:0;PCL:0;RULEID:(22001)(8251501002);SRVR:CY4PR04MB0742; X-Microsoft-Exchange-Diagnostics: 1;CY4PR04MB0742;3:Z2JgXhM9oy8BShTlBU3wWpcjJDdaZXNBEhEikM7/Ue5ic3UZ7h1b7dZhsGRXc1oNAYLuWZm+IQp+H4fYRNobKGgbu1uBIU4Zw8rB1E9qnXzytECII/1KMFEVyorrdlvcCqI9e/s65g9dhSf07JBLHWNaiDOJ5w9NRTVkXMV953D7+45SCOt+7sTL22IzZV4t3JQPEzOl/G0UmY4GiVMzGVsOnKG2NXtbffG4m2qE53zLckR6HIPCvaLRGNgmAX0EG8hQdCgZ5X8gojK88Ag3pu+VRqmhW/lx5jNzm+NUUVRgDvPfT0SS8WobX2WM2Co7mGFtq8DuqCmO33l/XBBUppaZ2Ufw2Wap8PM7v8txkPuQZ/I6WxqUM2mhnol8XmKEbLKhlo66Yhm8HuwY3AXpxg==;25:9vw6JGUAjTThM4e2fOSgNU/wonGOmYXNhZ3s/snjeWqPNsaiOABX5mVpo7luR9UckJ49H+6qnMk0YpYIXuD2E59RPsuk33XnmJV8b/gzX7s5jI00TQCNh8rMTAsl8IagXxWvHbp41lJfmx+B/95a5gdK+bYX5h6wG/2s1CYH0zo7chgguTIjmzywCFoTBKQGQLFmGhrjjCWrAvk/JgR1o69agO0BdCuJHGgl3whTyio3Si10rK7CY474G5OsymJcAwXDaaUzC+qio7vMX3O/IefRAA9zLtAk2APoJ+OUB8crZa3qGbszNDwExF+oVUM5YbflWk35/EW36y6RnaAicp5N1Xwjo6ln/tQ1LvXyy/e5jmorTq4WEkuT1OWNdsVsoq/d530dITtwUTIKy1KAI/x97D2UHa5tJ4EBxbd5jShjQ8EO/VUWjRi2MdA0ViGxNYoJDn8SOnXY5ZEnF4G7gA== X-Microsoft-Exchange-Diagnostics: 1;CY4PR04MB0742;31:UMyzzIKzi3obba70re154pWRdD9OGy/n3khKLUBccHHHAKK2Xr+JkycxMs90x7KIq/70NcMjBfu2m5lKxPXXFz2Y+46Fr2gO+V2gJ/ZnsEE6CtmzgGxSIZemcDWj61fJMD7kmp9CfzQDBUZxq4uCwtYW90LLFerhHwrBkHZMZt5yhpa1qzu/kBGSfuc+iHUtQ4wlYisYOeRlNWbbojfxYs8sQyIqYk1eOqeEJYi/uuRWknUSaWjzM9ZMqrJKUoNFGE3B6diRas4aa5JdbJS/rQ==;20:5wiLMmKV0HfIZch2fyXckC5zf+BPrY6n05+gnM0dmOMx7QzaKJHp7UPsoCkGg68ti3V+Dl8plyCv/W8MNmcnIApxX6WMTu8XTZkDAfgKTBmmLfcI5cp8D6ZzxihgDdqlDCKheok+4+0i4ckUmL8T448SjWmpY4SCFq+p2qHq/6ej4AaYT981/RuaLWQGwNk4m9B1JhAmwKLf23a0H12mn25fbabYWPGHE9CfOH1H0TRIO5e8WeHUiRSMeX0gWhZtXFHrYiABu2bplY5oRplD1uhLosykfoxq6eZ1uH8Wi0L+MF8FwPtgPTxolOEy6ubSpkgvicUHQMSU/oj3jbzcUZpL1JyD6x0GwwnmFbXTwe1hi3XvWJb71SJUsps80CiFIF7hU0YRrYAa1pC2h/HxasqQM4NkiJyCgxL3a/7fyru4D6kE5F5G3xfQP1J0mpM1M19ltge6IhFA1kic6oYGAdijvAGyMACFK/dY0jV4xoXu+1pIEOXTq3fOrMq89JgO X-Microsoft-Antispam-PRVS: X-Exchange-Antispam-Report-Test: UriScan:(166708455590820); X-Exchange-Antispam-Report-CFA-Test: BCL:0;PCL:0;RULEID:(6040375)(601004)(2401047)(13015025)(5005006)(8121501046)(13023025)(13017025)(13024025)(13018025)(3002001)(10201501046)(6055026)(6041248)(20161123562025)(20161123558025)(20161123555025)(20161123560025)(20161123564025)(6072148);SRVR:CY4PR04MB0742;BCL:0;PCL:0;RULEID:;SRVR:CY4PR04MB0742; X-Microsoft-Exchange-Diagnostics: 1;CY4PR04MB0742;4:acO4FuEdDBDHYu3azrv4hbQNEzRMJsCyneIcxfPvbVr1+OJ5WHIPVbK8HRFE56pzS+RBo/rXKBM9Tnkf00FAtdxye+Sdk2nhL5fMNkX4UbBoWeafXuzUCd3YjzvVt+ankUORB42/8JRcZ1b4jvjgN5oeG5LQmmxRVic8m2XaK8DoWreI646YblOv8FEBDQ6E6Nt6GF6kc9MwPXrl7/Bv8fTZj/Ivr46x5/TpUCpgNv+p7FycK6jWg5ouAwheIc0Bmp9FnNwWcs51U+GWRERH90hG4oe1a4B0d6f6Z23ufLRarXglkFd9u2U35FS2x6HWqt8bZ7SUjslSokvkNXAXBymP0M7N5XKpbzqxh0MY3k7AUY8agZngWlKkiMh3yJE9giLiuOOUKzKe3HwEhgXaAtBUCnLcIzj2yxvElMN36NZhT6g7CAp1XxjlL0CpNc4SMZsfltaJ7XK1qJQxlnQN1iy5E57ojwp3qm8yxaY+z9BcCB+qk/p0jeMBHxlLqg2jiWlkoi3lJP5oftkjo4352Yg5PSfaGsOsYgpUk+JTdxMc8UhGUmoswAvL5lrmC/2OfqO9rG5GH+tZyQ8JNcvn1XM3wDowv0IXOVf5GTaOyf4NdxY/xuEr0TC5csHFcTRwYPWLkEAFqpLhIu8temD0fPJ6EeeWlpcgUpIwWHdec+ULRPeYJIoS4evTQZ24f5Sg/SavMZoG6TtQxLBDeLyjYDlop3Bo8zAxtIEqqSbuIJG7SQwe86ZxNSY5h3Xpqnf2VFdTtJ6Adn8EvLIvzkEgbQ== X-Forefront-PRVS: 0228DDDDD7 X-Microsoft-Exchange-Diagnostics: =?iso-8859-1?Q?1;CY4PR04MB0742;23:WEn8ZwPVpA358RwkFEkDsOph6/JEn+tga2CclSy?= =?iso-8859-1?Q?K3HQ+LJKYh4mhxvJA5ICWnjve4c+ARGtg9VVYLd36THTS8E0+4QxywjNdG?= =?iso-8859-1?Q?ehzw+1SjNg0JlZD3FgrjqFdl7SvPCOkmm4pnZn483PD8q7EPNxrmoMRw3L?= =?iso-8859-1?Q?QkywkrBqudeUMteGqLgT5aTg00k3seVXnNYAijogEcY2mAUG+kMkwu2H3X?= =?iso-8859-1?Q?J7/+9fy4SECW8PkJBVCGemFP73i0Ets1FhcUfe12v6x4e7KhF8PNFJO1yv?= =?iso-8859-1?Q?mmCMNH4QXRT/GLopqat4jwmIw7VKI85VbW9H/U6jOKX5x8t3zsAin5C2VC?= =?iso-8859-1?Q?cxbpbBceBPm6Bf7lPfEz4uodrJD6w87cnJyh6lfGzCXXeyG0fI8M/zCo1r?= =?iso-8859-1?Q?8oLsXeF0BYiEl15y63jY70pWoa23Kg4qsSUhczqic54mMl41E687vEer40?= =?iso-8859-1?Q?BwFO7GPi+jWlnPUetTS29aQ+DZve5cMEqzJ+HstwGQL8BRU1OtBa486SfC?= =?iso-8859-1?Q?SGdOkRCOCZ5F3EWYF0fneNtlHc2FB7yuD/6Vcd+Y1SU0rqmyE+eXRWNFwf?= =?iso-8859-1?Q?LKAV6q74zl0fLqQcO0ZJxKkbLz3vAaEjZ4ofE8cm7x3vlW035GS5AkOV8T?= =?iso-8859-1?Q?mpHpm8xKXBV4Czjqu3gyV/jh8d4pI4k1+Y+RymQXTMHneOh7QUnx6MFBsg?= =?iso-8859-1?Q?j6gpV2vtkPfOmIv/FsuKHY3GsW01EtV4cKlVtTq7lleLbLnDX479aKoUX0?= =?iso-8859-1?Q?1UeQTHaQkUQ19+LC+psyDXYjvKoyoKZtM9FuBgNgNKs5jA85D+g5AvMqgH?= =?iso-8859-1?Q?ay/5mqS30ye1gofA6RccZH6EA4UwHLLFhF6F3jB2601WoS3jO24vP79ilg?= =?iso-8859-1?Q?kDUM/KJZke1XG4E69AbeWpNh/te2qivmwIZ4pLMO6KwRYOL4DdgMAjhG9+?= =?iso-8859-1?Q?WtUIafFFV4KXm4VPxar24jwC4d8hsFH/2GVe6ljgH3OpYPLLdyHZdf76Fa?= =?iso-8859-1?Q?4TbVDfsJdBzll8t8byXj8Sxaw4OL1hZrb44vRQhd7fOF2HWnvTssicHQlY?= =?iso-8859-1?Q?oiDZP2mqtyEBfc9JD9Ye67yPzksht2Ep9GOSqw6GZOjFrshRSF/lkzVzSZ?= =?iso-8859-1?Q?+2hijB+ei1VPJ/YzCKnBavnw4Pp8yNzMGv1RHALqhBZBBq8J8RDxRnuZxs?= =?iso-8859-1?Q?/RKSyUaqVG6ctW5sh4PqYCoksZIYqGUflPVhD0uvvl6p/wVpA+Z4wBv4v6?= =?iso-8859-1?Q?L/iHgNpmaflShpcQB2cumdfw84uxNiZRcV5xf8EKD2SEEDZZriuPrXDlvW?= =?iso-8859-1?Q?u+W00YvFw2UMWPLqv7d0nwjUej/ERpG8J2BkEHWBk2APqvvhZppe6P4MYK?= =?iso-8859-1?Q?P/XQqZT115Q5fWMKMb6qXjJXmyx/Cx5kuPY2aKckCZwmrMNioCTwzZdKk5?= =?iso-8859-1?Q?38xRTEwTWDwHuAM13G4Mb1RiO+76B5L2+45acMKbC+6/Qrax7s1w96dHCR?= =?iso-8859-1?Q?lB32+h/xG8nCQrs+RI18=3D?= X-Microsoft-Exchange-Diagnostics: 1;CY4PR04MB0742;6:Rds9CsiSxoeuFVLEs1Kx4yEuUuHKX0ugxJACTUk5XE2GoFioQxEYy6P0LYZbb2CqbtTZw/MTlfWxer51Zb7uF6WZTfEUqbqT20zEZsBY5tmeNxEizP5Rq8sTjMe2wyETXh6JxYyzMgCr+PESj5mDWl2VZ7ZXCGDAcIIyTXixvxJTUJmtOpGQvIXutoEjjPTDOPYfcFvXPEJbgdCcTcF0ptHN5bi+pS3YJlGyRT/9Vt+rg7r3iFjx6XjueCgnYzP/x+0gTSsDbHkJgSdXiGMuzDv2fM5TO2xsPeGgx3x0ujaHVa92Wv5Y6YsP8/wfMadrUOuMZ7FMQUjmwBW1ojXbWons3vcTT5xwDSXaCZa9fWeWUuixYJGLOqx/2fC3HwNKntoDSGxXUDhqaCdK+P/SntDdSD95bFdG+Q2vJEqNEos=;5:GtBrxn971v+62braUPt484+PYefHkvTzRNzsBXpEXzJHIQ3n0cIhHenPrtFA0Xx/CuGKOh1vkOUOL5lBYevzf5ynfIaeP80D758oeluHNXsKhWTpvCzbs+YWa3Wyw+Q65sFxunQa0iOuswhXmaVDMA==;24:UpiJ5QofNGtVgEbTZ5PpC/09sEpyFAoCvC38VTBxI6WQvLGQ1AP3Zfr4lzbzXTDbmhnw9FXAKrhHlin1q7EKZawyQqf/fSGR2dGLK5zkAsA= SpamDiagnosticOutput: 1:99 SpamDiagnosticMetadata: NSPM X-Microsoft-Exchange-Diagnostics: 1;CY4PR04MB0742;7:Q/1ttr/ZNmmVtpH2LVQKST7xdQgk5F329ydrMWt5hUgoFco9qXQiWz/wGtnT5RuqEgyb+niZOHW3uEEy0fKLP48VHDLVspLIig15INIqLvQIbkgqopORs5WCui/pOhFMpsqdm/LqN0yPN0UdnZqRr/pC2GudDdin+5VNHA/ALgcwAYwNpHx4agcmHcmWP3wlxXP3IGIwGup/gAzpCOTRR9NGGhdusEwdPnY29u+L/w+3ZkrLlmO597grwBws08i5sXZuRhNqR9Tu1RxbLyVSOHf2ZSs11k+Dbxk8UAVoOWzcQhmOH0fSKzfff/Jlc6FKipneV72RMMK+lQO9EKkorA==;20:DG5bdjjh2CyhT3wkwpXFQOKs+uMfVeltMzEHcEQEF6sdqLpxU61lDXmaCnIGwDkS8j9RTPoOTYNm1IFne6t5fqJyjohY0w853Pflu+3LCTfA+rAkjmgwXECXoDmd3E+73hjirqvKZBNKw9StiifCBhFCaDciR+UYidHb5wjmAT58EqBZypyhKkaA2+bIYGUzghYnBIwjSsk5eETwrR1GJnGdGPL5LRTnNmHVx2GguSDALdaSuwUqsKk/YhHrsJ2N X-OriginatorOrg: sandisk.com X-MS-Exchange-CrossTenant-OriginalArrivalTime: 24 Feb 2017 17:39:47.4433 (UTC) X-MS-Exchange-CrossTenant-Id: b61c8803-16f3-4c35-9b17-6f65f441df86 X-MS-Exchange-CrossTenant-OriginalAttributedTenantConnectingIp: TenantId=b61c8803-16f3-4c35-9b17-6f65f441df86;Ip=[74.221.232.54];Helo=[sacsmgep14.sandisk.com] X-MS-Exchange-CrossTenant-FromEntityHeader: HybridOnPrem X-MS-Exchange-Transport-CrossTenantHeadersStamped: CY4PR04MB0742 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Content-Transfer-Encoding: 8bit X-MIME-Autoconverted: from quoted-printable to 8bit by mail.home.local id v1OHefMN009654 On Mon, 2017-02-20 at 09:32 -0700, Jens Axboe wrote: > On 02/20/2017 09:16 AM, Bart Van Assche wrote: > > On 02/19/2017 11:35 PM, Christoph Hellwig wrote: > > > On Sun, Feb 19, 2017 at 06:15:41PM -0700, Jens Axboe wrote: > > > > That said, we will look into this again, of course. Christoph, any idea? > > > > > > No idea really - this seems so far away from the code touched, and there > > > are no obvious signs for a memory scamble from another object touched > > > that I think if it really bisects down to that issue it must be a timing > > > issue. > > > > > > But reading Bart's message again: Did you actually bisect it down > > > to the is commit? Or just test the whole tree? Between the 4.10-rc5 > > > merge and all the block tree there might a few more likely suspects > > > like the scsi bdi lifetime fixes that James mentioned. > > > > Hello Christoph, > > > > As far as I know Jens does not rebase his trees so we can use the commit > > date to check which patch went in when. From the first of Jan's bdi patches: > > > > CommitDate: Thu Feb 2 08:18:41 2017 -0700 > > > > So the bdi patches went in several days after I reported the general protection > > fault issue. > > > > In an e-mail of January 30th I wrote the following: "Running the srp-test > > software against kernel 4.9.6 and kernel 4.10-rc5 went fine. With your > > for-4.11/block branch (commit 400f73b23f457a) however I just ran into > > the following warning: [ ... ]" That means that I did not hit the crash with > > Jens' for-4.11/block branch but only with the for-next branch. The patches > > on Jens' for-next branch after that commit that were applied before I ran > > my test are: > > > > $ PAGER= git log --format=oneline 400f73b23f457a..fb045ca25cc7 block drivers/md/dm{,-mpath,-table}.[ch] > > fb045ca25cc7b6d46368ab8221774489c2a81648 block: don't assign cmd_flags in __blk_rq_prep_clone > > 82ed4db499b8598f16f8871261bff088d6b0597f block: split scsi_request out of struct request > > 8ae94eb65be9425af4d57a4f4cfebfdf03081e93 block/bsg: move queue creation into bsg_setup_queue > > eb8db831be80692bf4bda3dfc55001daf64ec299 dm: always defer request allocation to the owner of the request_queue > > 6d247d7f71d1fa4b66a5f4da7b1daa21510d529b block: allow specifying size for extra command data > > 5ea708d15a928f7a479987704203616d3274c03b block: simplify blk_init_allocated_queue > > e6f7f93d58de74700f83dd0547dd4306248a093d block: fix elevator init check > > f924ba70c1b12706c6679d793202e8f4c125f7ae Merge branch 'for-4.11/block' into for-4.11/rq-refactor > > 88a7503376f4f3bf303c809d1a389739e1205614 blk-mq: Remove unused variable > > bef13315e990fd3d3fb4c39013aefd53f06c3657 block: don't try to discard from __blkdev_issue_zeroout > > f99e86485cc32cd16e5cc97f9bb0474f28608d84 block: Rename blk_queue_zone_size and bdev_zone_size > > > > Do you see any patch in the above list that does not belong to the "split > > scsi passthrough fields out of struct request" series and that could have > > caused the reported behavior change? > > Bart, since you are the only one that can reproduce this, can you just bisect > your way through that series? Hello Jens, Since Christoph also has access to IB hardware I will leave it to Christoph to do the bisect. Anyway, I just reproduced this crash with Linus' current tree (commit f1ef09fde17f) by running srp-test/run_tests -r 10 -t 02-sq-on-mq (see also https://github.com/bvanassche/srp-test): [ 1629.920553] general protection fault: 0000 [#1] SMP [ 1629.921193] CPU: 6 PID: 46 Comm: ksoftirqd/6 Tainted: G I 4.10.0-dbg+ #1 [ 1629.921289] RIP: 0010:rq_completed+0x12/0x90 [dm_mod] [ 1629.921316] RSP: 0018:ffffc90001bdbda8 EFLAGS: 00010246 [ 1629.921344] RAX: 0000000000000000 RBX: 6b6b6b6b6b6b6b6b RCX: 0000000000000000 [ 1629.921372] RDX: 0000000000000000 RSI: 0000000000000000 RDI: 6b6b6b6b6b6b6b6b [ 1629.921401] RBP: ffffc90001bdbdc0 R08: ffff8803a3858d48 R09: 0000000000000000 [ 1629.921429] R10: 0000000000000000 R11: 0000000000000000 R12: 0000000000000000 [ 1629.921458] R13: 0000000000000000 R14: ffffffff81c05120 R15: 0000000000000004 [ 1629.921489] FS: 0000000000000000(0000) GS:ffff88046ef80000(0000) knlGS:0000000000000000 [ 1629.921520] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 1629.921547] CR2: 00007fb6324486b8 CR3: 0000000001c0f000 CR4: 00000000001406e0 [ 1629.921576] Call Trace: [ 1629.921605] dm_softirq_done+0xe6/0x1e0 [dm_mod] [ 1629.921637] blk_done_softirq+0x88/0xa0 [ 1629.921663] __do_softirq+0xba/0x4c0 [ 1629.921744] run_ksoftirqd+0x1a/0x50 [ 1629.921769] smpboot_thread_fn+0x123/0x1e0 [ 1629.921797] kthread+0x107/0x140 [ 1629.921944] ret_from_fork+0x2e/0x40 [ 1629.921972] Code: ff ff 31 f6 48 89 c7 e8 ed 96 2f e1 5d c3 90 66 2e 0f 1f 84 00 00 00 00 00 55 48 63 f6 48 89 e5 41 55 41 89 d5 41 54 53 48 89 fb <4c> 8b a7 70 02 00 00 f0 ff 8c b7 38 03 00 00 e8 3a 43 ff ff 85 [ 1629.922093] RIP: rq_completed+0x12/0x90 [dm_mod] RSP: ffffc90001bdbda8 $ gdb drivers/md/dm-mod.ko (gdb) list *(rq_completed+0x12)     0xdf62 is in rq_completed (drivers/md/dm-rq.c:187). 182      * the md may be freed in dm_put() at the end of this function. 183      * Or do dm_get() before calling this function and dm_put() later. 184      */ 185     static void rq_completed(struct mapped_device *md, int rw, bool run_queue) 186     { 187             struct request_queue *q = md->queue; 188             unsigned long flags; 189 190             atomic_dec(&md->pending[rw]); 191 (gdb) disas rq_completed   Dump of assembler code for function rq_completed:   0x000000000000df50 <+0>:     push   %rbp   0x000000000000df51 <+1>:     movslq %esi,%rsi   0x000000000000df54 <+4>:     mov    %rsp,%rbp   0x000000000000df57 <+7>:     push   %r13   0x000000000000df59 <+9>:     mov    %edx,%r13d   0x000000000000df5c <+12>:    push   %r12   0x000000000000df5e <+14>:    push   %rbx   0x000000000000df5f <+15>:    mov    %rdi,%rbx   0x000000000000df62 <+18>:    mov    0x270(%rdi),%r12 [ ... ] So the crash is caused by an attempt to dereference address 0x6b6b6b6b6b6b6b6b at offset 0x270. I think this means the crash is caused by a use-after-free. Bart.