public inbox for linux-kernel@vger.kernel.org
 help / color / mirror / Atom feed
From: Ming Lei <tom.leiming@gmail.com>
To: Keith Busch <keith.busch@intel.com>
Cc: Jens Axboe <axboe@fb.com>,
	linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	Christoph Hellwig <hch@infradead.org>,
	Yi Zhang <yizhan@redhat.com>,
	Bart Van Assche <bart.vanassche@sandisk.com>,
	Hannes Reinecke <hare@suse.de>
Subject: Re: [PATCH] blk-mq: don't complete un-started request in timeout handler
Date: Thu, 23 Mar 2017 20:14:02 +0800	[thread overview]
Message-ID: <20170323121401.GA17152@ming.t460p> (raw)
In-Reply-To: <20170322155817.GA18960@localhost.localdomain>

On Wed, Mar 22, 2017 at 11:58:17AM -0400, Keith Busch wrote:
> On Tue, Mar 21, 2017 at 11:03:59PM -0400, Jens Axboe wrote:
> > On 03/21/2017 10:14 PM, Ming Lei wrote:
> > > When iterating busy requests in timeout handler,
> > > if the STARTED flag of one request isn't set, that means
> > > the request is being processed in block layer or driver, and
> > > isn't submitted to hardware yet.
> > > 
> > > In current implementation of blk_mq_check_expired(),
> > > if the request queue becomes dying, un-started requests are
> > > handled as being completed/freed immediately. This way is
> > > wrong, and can cause rq corruption or double allocation[1][2],
> > > when doing I/O and removing&resetting NVMe device at the sametime.
> > 
> > I agree, completing it looks bogus. If the request is in a scheduler or
> > on a software queue, this won't end well at all. Looks like it was
> > introduced by this patch:
> > 
> > commit eb130dbfc40eabcd4e10797310bda6b9f6dd7e76
> > Author: Keith Busch <keith.busch@intel.com>
> > Date:   Thu Jan 8 08:59:53 2015 -0700
> > 
> >     blk-mq: End unstarted requests on a dying queue
> > 
> > Before that, we just ignored it. Keith?
> 
> The above was intended for a stopped hctx on a dying queue such that
> there's nothing in flight to the driver. Nvme had been relying on this
> to end unstarted requests so we may progress when a controller dies.

So the brokenness started just from the begining.

> 
> We've since obviated the need: we restart the hw queues to flush entered
> requests to failure, so we don't need that brokenness.

Looks the following commit need to be backported too if we port this patch.

commit 69d9a99c258eb1d6478fd9608a2070890797eed7
Author: Keith Busch <keith.busch@intel.com>
Date:   Wed Feb 24 09:15:56 2016 -0700

    NVMe: Move error handling to failed reset handler
 

Thanks,
Ming

      parent reply	other threads:[~2017-03-23 12:14 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-03-22  2:14 [PATCH] blk-mq: don't complete un-started request in timeout handler Ming Lei
2017-03-22  3:03 ` Jens Axboe
2017-03-22 15:58   ` Keith Busch
2017-03-22 16:35     ` Jens Axboe
2017-03-23 12:14     ` Ming Lei [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20170323121401.GA17152@ming.t460p \
    --to=tom.leiming@gmail.com \
    --cc=axboe@fb.com \
    --cc=bart.vanassche@sandisk.com \
    --cc=hare@suse.de \
    --cc=hch@infradead.org \
    --cc=keith.busch@intel.com \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=yizhan@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox