From mboxrd@z Thu Jan 1 00:00:00 1970 From: Mike Snitzer Subject: Re: blk-mq request allocation stalls Date: Tue, 13 Jan 2015 09:17:46 -0500 Message-ID: <20150113141746.GA30411@redhat.com> References: <54B3F78D.2020704@kernel.dk> <54B3FE89.200@sandisk.com> <54B3FFAE.4070609@kernel.dk> <54B40E8A.6010005@kernel.dk> <20150112191138.GC21518@redhat.com> <20150112202113.GA23234@redhat.com> <54B50F9D.2040000@sandisk.com> Reply-To: device-mapper development Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <54B50F9D.2040000@sandisk.com> List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Sender: dm-devel-bounces@redhat.com Errors-To: dm-devel-bounces@redhat.com To: Bart Van Assche Cc: Keith Busch , Jens Axboe , device-mapper development , Jun'ichi Nomura , Christoph Hellwig List-Id: dm-devel.ids On Tue, Jan 13 2015 at 7:29am -0500, Bart Van Assche wrote: > On 01/12/15 21:22, Mike Snitzer wrote: > > FYI, I staged Keith's patch here: > > https://git.kernel.org/cgit/linux/kernel/git/device-mapper/linux-dm.git/commit/?h=dm-for-3.20-blk-mq&id=7004ddf2462df38c6e3232ac020ed6ff655cc07e > > > > Bart, this is the tip of the linux-dm.git "dm-for-3.20-blk-mq" branch. > > Please test, it should hopefully take care of the stall you've been > > seeing. > > Hello Mike, > > In the quick test I ran the I/O stalls were indeed gone. Thanks :-) Good news, followed by a new mole rearing its head ;) > However, I hit another issue while running I/O on top of a multipath > device (on a kernel with lockdep and SLUB memory poisoning enabled): > > NMI watchdog: BUG: soft lockup - CPU#7 stuck for 23s! [kdmwork-253:0:3116] > CPU: 7 PID: 3116 Comm: kdmwork-253:0 Tainted: G W 3.19.0-rc4-debug+ #1 > Call Trace: > [] kmem_cache_alloc+0x28e/0x2c0 > [] alloc_iova_mem+0x1a/0x20 > [] alloc_iova+0x2e/0x250 > [] intel_alloc_iova+0x95/0xd0 > [] intel_map_sg+0xc5/0x260 > [] srp_queuecommand+0xa11/0xc30 [ib_srp] > [] scsi_dispatch_cmd+0xde/0x5a0 [scsi_mod] > [] scsi_queue_rq+0x630/0x700 [scsi_mod] > [] __blk_mq_run_hw_queue+0x1dd/0x370 > [] blk_mq_alloc_request+0xde/0x150 > [] blk_get_request+0x2e/0xe0 > [] __multipath_map.isra.15+0x1cf/0x210 [dm_multipath] > [] multipath_clone_and_map+0x1a/0x20 [dm_multipath] > [] map_tio_request+0x1d5/0x3a0 [dm_mod] > [] kthread_worker_fn+0x86/0x1b0 > [] kthread+0xef/0x110 > [] ret_from_fork+0x7c/0xb0 Unfortunate. Is this still with a 16MB backing device or is it real hardware? Can you share the workload so that myself and/or Keith could try to reproduce?