From mboxrd@z Thu Jan 1 00:00:00 1970 From: Jens Axboe Subject: Re: [PATCH] scsi-mq: fix hw queue hang caused by timeout Date: Fri, 19 Sep 2014 08:21:12 -0600 Message-ID: <541C3BD8.2070206@fb.com> References: <1411055950-28657-1-git-send-email-ming.lei@canonical.com> <20140918163549.GB3950@lst.de> <541B105E.1030507@fb.com> Mime-Version: 1.0 Content-Type: text/plain; charset="utf-8" Content-Transfer-Encoding: 7bit Return-path: In-Reply-To: Sender: linux-kernel-owner@vger.kernel.org To: Ming Lei Cc: Christoph Hellwig , James Bottomley , Linux SCSI List , Linux Kernel Mailing List , Douglas Gilbert List-Id: linux-scsi@vger.kernel.org On 09/19/2014 08:18 AM, Ming Lei wrote: > On Fri, Sep 19, 2014 at 9:07 PM, Ming Lei wrote: >> On Fri, Sep 19, 2014 at 1:03 AM, Jens Axboe wrote: >>> On 2014-09-18 10:35, Christoph Hellwig wrote: >>>> >>>> On Thu, Sep 18, 2014 at 11:59:10PM +0800, Ming Lei wrote: >>>>> >>>>> If there are two requests or more timed out, the dispatch queue >>>>> is put into stopped state and never be recoverd, and there >>>>> is no such problem in non-mq mode. >>>>> >>>>> This patch trys to recover the stopped queue when the queue >>>>> becomes unbusy, then the following retries can move on. >>>>> >>>>> Basically this patch maintains same behavior for this situation >>>>> with non-mq mode. >>>> >>>> >>>> This looks somewhat similar to the issues that Doug reported, and I >>>> remember >>>> when he was last running into boot problems it was timeout related, too. >>>> >>>> As far as the implementation is concerned I think the correct fix is >>>> to clear the BLK_MQ_S_STOPPED queue flags in blk_mq_kick_requeue_list. >>> >>> >>> Since that's the kick part of the requeue, auto-starting the queue for that >>> makes a lot of sense. I say that's the way we go. >> >> Yeah, that looks better. >> >> But it doesn't work after the simple change, and I need to >> investigate further. > > It is because of the timer miss, now it starts to work. Excellent. I think most new issues should be fixed in for-linus for inclusion in this round. It's much bigger than I hoped for this late in the cycle, but lots of us have run a lot of testing, so that's not a huge worry. -- Jens Axboe