From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1756179AbcBBRZJ (ORCPT ); Tue, 2 Feb 2016 12:25:09 -0500 Received: from mga02.intel.com ([134.134.136.20]:33962 "EHLO mga02.intel.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754713AbcBBRZH (ORCPT ); Tue, 2 Feb 2016 12:25:07 -0500 X-ExtLoop1: 1 X-IronPort-AV: E=Sophos;i="5.22,385,1449561600"; d="scan'208";a="645995682" Date: Tue, 2 Feb 2016 17:25:03 +0000 From: Keith Busch To: Wenbo Wang Cc: Jens Axboe , Wenbo Wang , "linux-kernel@vger.kernel.org" , "linux-nvme@lists.infradead.org" , "Wenwei.Tao" Subject: Re: [PATCH] NVMe: do not touch sq door bell if nvmeq has been suspended Message-ID: <20160202172503.GC10728@localhost.localdomain> References: <1454341324-21273-1-git-send-email-mail_weber_wang@163.com> <56AF8DB5.70206@fb.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: User-Agent: Mutt/1.5.20 (2009-06-14) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Feb 02, 2016 at 07:15:57AM +0000, Wenbo Wang wrote: > Jens, > > I did the following test to validate the issue. > > 1. Modify code as below to increase the chance of races. > Add 10s delay after nvme_dev_unmap() in nvme_dev_disable() > Add 10s delay before __nvme_submit_cmd() If running sync IO, preempt is disabled. You can't just put a 10 second delay there. Wouldn't you hit a "scheduling while atomic" bug instead? If blk-mq is running the h/w context from its work queue, that might be a different issue. Maybe we can change the "cancel_delayed_work" to "cancel_delayed_work_sync" in blk_mq_stop_hw_queues. If there's still a window where blk-mq can insert a request after the driver requested to stop queues, I think we should try to close it with the block layer.