From: "Steve Wise" <swise@opengridcomputing.com>
To: "'Bart Van Assche'" <bart.vanassche@sandisk.com>,
"'Jens Axboe'" <axboe@fb.com>
Cc: <linux-block@vger.kernel.org>,
"'James Bottomley'" <jejb@linux.vnet.ibm.com>,
"'Martin K. Petersen'" <martin.petersen@oracle.com>,
"'Mike Snitzer'" <snitzer@redhat.com>,
<linux-rdma@vger.kernel.org>, <linux-nvme@lists.infradead.org>,
"'Keith Busch'" <keith.busch@intel.com>,
"'Doug Ledford'" <dledford@redhat.com>,
<linux-scsi@vger.kernel.org>, "'Christoph Hellwig'" <hch@lst.de>
Subject: RE: [PATCH 9/9] [RFC] nvme: Fix a race condition
Date: Tue, 27 Sep 2016 11:56:16 -0500 [thread overview]
Message-ID: <016a01d218e0$0fe7dc00$2fb79400$@opengridcomputing.com> (raw)
In-Reply-To: <9d8e0f32-6703-cf23-d424-bcecb65c2a26@sandisk.com>
> On 09/27/2016 09:31 AM, Steve Wise wrote:
> >> @@ -2079,11 +2075,15 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
> >> void nvme_stop_queues(struct nvme_ctrl *ctrl)
> >> {
> >> struct nvme_ns *ns;
> >> + struct request_queue *q;
> >>
> >> mutex_lock(&ctrl->namespaces_mutex);
> >> list_for_each_entry(ns, &ctrl->namespaces, list) {
> >> - blk_mq_cancel_requeue_work(ns->queue);
> >> - blk_mq_stop_hw_queues(ns->queue);
> >> + q = ns->queue;
> >> + blk_quiesce_queue(q);
> >> + blk_mq_cancel_requeue_work(q);
> >> + blk_mq_stop_hw_queues(q);
> >> + blk_resume_queue(q);
> >> }
> >> mutex_unlock(&ctrl->namespaces_mutex);
> >
> > Hey Bart, should nvme_stop_queues() really be resuming the blk queue?
>
> Hello Steve,
>
> Would you perhaps prefer that blk_resume_queue(q) is called from
> nvme_start_queues()? I think that would make the NVMe code harder to
> review.
I'm still learning the blk code (and nvme code :)), but I would think
blk_resume_queue() would cause requests to start being submit on the NVME
queues, which I believe shouldn't happen when they are stopped. I'm currently
debugging a problem where requests are submitted to the nvme-rdma driver while
it has supposedly stopped all the nvme and blk mqs. I tried your series at
Christoph's request to see if it resolved my problem, but it didn't.
> The above code won't cause any unexpected side effects if an
> NVMe namespace is removed after nvme_stop_queues() has been called and
> before nvme_start_queues() is called. Moving the blk_resume_queue(q)
> call into nvme_start_queues() will only work as expected if no
> namespaces are added nor removed between the nvme_stop_queues() and
> nvme_start_queues() calls. I'm not familiar enough with the NVMe code to
> know whether or not this change is safe ...
>
I'll have to look and see if new namespaces can be added/deleted while a nvme
controller is in the RECONNECTING state. In the meantime, I'm going to move
the blk_resume_queue() to nvme_start_queues() and see if it helps my problem.
Christoph: Thoughts?
Steve.
WARNING: multiple messages have this Message-ID (diff)
From: swise@opengridcomputing.com (Steve Wise)
Subject: [PATCH 9/9] [RFC] nvme: Fix a race condition
Date: Tue, 27 Sep 2016 11:56:16 -0500 [thread overview]
Message-ID: <016a01d218e0$0fe7dc00$2fb79400$@opengridcomputing.com> (raw)
In-Reply-To: <9d8e0f32-6703-cf23-d424-bcecb65c2a26@sandisk.com>
> On 09/27/2016 09:31 AM, Steve Wise wrote:
> >> @@ -2079,11 +2075,15 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
> >> void nvme_stop_queues(struct nvme_ctrl *ctrl)
> >> {
> >> struct nvme_ns *ns;
> >> + struct request_queue *q;
> >>
> >> mutex_lock(&ctrl->namespaces_mutex);
> >> list_for_each_entry(ns, &ctrl->namespaces, list) {
> >> - blk_mq_cancel_requeue_work(ns->queue);
> >> - blk_mq_stop_hw_queues(ns->queue);
> >> + q = ns->queue;
> >> + blk_quiesce_queue(q);
> >> + blk_mq_cancel_requeue_work(q);
> >> + blk_mq_stop_hw_queues(q);
> >> + blk_resume_queue(q);
> >> }
> >> mutex_unlock(&ctrl->namespaces_mutex);
> >
> > Hey Bart, should nvme_stop_queues() really be resuming the blk queue?
>
> Hello Steve,
>
> Would you perhaps prefer that blk_resume_queue(q) is called from
> nvme_start_queues()? I think that would make the NVMe code harder to
> review.
I'm still learning the blk code (and nvme code :)), but I would think
blk_resume_queue() would cause requests to start being submit on the NVME
queues, which I believe shouldn't happen when they are stopped. I'm currently
debugging a problem where requests are submitted to the nvme-rdma driver while
it has supposedly stopped all the nvme and blk mqs. I tried your series at
Christoph's request to see if it resolved my problem, but it didn't.
> The above code won't cause any unexpected side effects if an
> NVMe namespace is removed after nvme_stop_queues() has been called and
> before nvme_start_queues() is called. Moving the blk_resume_queue(q)
> call into nvme_start_queues() will only work as expected if no
> namespaces are added nor removed between the nvme_stop_queues() and
> nvme_start_queues() calls. I'm not familiar enough with the NVMe code to
> know whether or not this change is safe ...
>
I'll have to look and see if new namespaces can be added/deleted while a nvme
controller is in the RECONNECTING state. In the meantime, I'm going to move
the blk_resume_queue() to nvme_start_queues() and see if it helps my problem.
Christoph: Thoughts?
Steve.
WARNING: multiple messages have this Message-ID (diff)
From: "Steve Wise" <swise@opengridcomputing.com>
To: 'Bart Van Assche' <bart.vanassche@sandisk.com>,
'Jens Axboe' <axboe@fb.com>
Cc: linux-block@vger.kernel.org,
'James Bottomley' <jejb@linux.vnet.ibm.com>,
"'Martin K. Petersen'" <martin.petersen@oracle.com>,
'Mike Snitzer' <snitzer@redhat.com>,
linux-rdma@vger.kernel.org, linux-nvme@lists.infradead.org,
'Keith Busch' <keith.busch@intel.com>,
'Doug Ledford' <dledford@redhat.com>,
linux-scsi@vger.kernel.org, 'Christoph Hellwig' <hch@lst.de>
Subject: RE: [PATCH 9/9] [RFC] nvme: Fix a race condition
Date: Tue, 27 Sep 2016 11:56:16 -0500 [thread overview]
Message-ID: <016a01d218e0$0fe7dc00$2fb79400$@opengridcomputing.com> (raw)
In-Reply-To: <9d8e0f32-6703-cf23-d424-bcecb65c2a26@sandisk.com>
> On 09/27/2016 09:31 AM, Steve Wise wrote:
> >> @@ -2079,11 +2075,15 @@ EXPORT_SYMBOL_GPL(nvme_kill_queues);
> >> void nvme_stop_queues(struct nvme_ctrl *ctrl)
> >> {
> >> struct nvme_ns *ns;
> >> + struct request_queue *q;
> >>
> >> mutex_lock(&ctrl->namespaces_mutex);
> >> list_for_each_entry(ns, &ctrl->namespaces, list) {
> >> - blk_mq_cancel_requeue_work(ns->queue);
> >> - blk_mq_stop_hw_queues(ns->queue);
> >> + q = ns->queue;
> >> + blk_quiesce_queue(q);
> >> + blk_mq_cancel_requeue_work(q);
> >> + blk_mq_stop_hw_queues(q);
> >> + blk_resume_queue(q);
> >> }
> >> mutex_unlock(&ctrl->namespaces_mutex);
> >
> > Hey Bart, should nvme_stop_queues() really be resuming the blk queue?
>
> Hello Steve,
>
> Would you perhaps prefer that blk_resume_queue(q) is called from
> nvme_start_queues()? I think that would make the NVMe code harder to
> review.
I'm still learning the blk code (and nvme code :)), but I would think
blk_resume_queue() would cause requests to start being submit on the NVME
queues, which I believe shouldn't happen when they are stopped. I'm currently
debugging a problem where requests are submitted to the nvme-rdma driver while
it has supposedly stopped all the nvme and blk mqs. I tried your series at
Christoph's request to see if it resolved my problem, but it didn't.
> The above code won't cause any unexpected side effects if an
> NVMe namespace is removed after nvme_stop_queues() has been called and
> before nvme_start_queues() is called. Moving the blk_resume_queue(q)
> call into nvme_start_queues() will only work as expected if no
> namespaces are added nor removed between the nvme_stop_queues() and
> nvme_start_queues() calls. I'm not familiar enough with the NVMe code to
> know whether or not this change is safe ...
>
I'll have to look and see if new namespaces can be added/deleted while a nvme
controller is in the RECONNECTING state. In the meantime, I'm going to move
the blk_resume_queue() to nvme_start_queues() and see if it helps my problem.
Christoph: Thoughts?
Steve.
next prev parent reply other threads:[~2016-09-27 16:56 UTC|newest]
Thread overview: 82+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-09-26 18:25 [PATCH 0/9] Introduce blk_quiesce_queue() and blk_resume_queue() Bart Van Assche
2016-09-26 18:25 ` Bart Van Assche
2016-09-26 18:26 ` [PATCH 1/9] blk-mq: Introduce blk_mq_queue_stopped() Bart Van Assche
2016-09-26 18:26 ` Bart Van Assche
2016-09-26 18:26 ` Bart Van Assche
2016-09-27 6:20 ` Hannes Reinecke
2016-09-27 6:20 ` Hannes Reinecke
2016-09-27 7:38 ` Johannes Thumshirn
2016-09-27 7:38 ` Johannes Thumshirn
2016-09-27 7:38 ` Johannes Thumshirn
2016-09-26 18:26 ` [PATCH 2/9] dm: Fix a race condition related to stopping and starting queues Bart Van Assche
2016-09-26 18:26 ` Bart Van Assche
2016-09-27 6:21 ` Hannes Reinecke
2016-09-27 6:21 ` Hannes Reinecke
2016-09-27 6:21 ` Hannes Reinecke
2016-09-27 7:47 ` Johannes Thumshirn
2016-09-27 7:47 ` Johannes Thumshirn
2016-09-27 7:47 ` Johannes Thumshirn
2016-09-26 18:27 ` [PATCH 3/9] [RFC] nvme: Use BLK_MQ_S_STOPPED instead of QUEUE_FLAG_STOPPED in blk-mq code Bart Van Assche
2016-09-26 18:27 ` Bart Van Assche
2016-09-26 18:27 ` Bart Van Assche
2016-09-26 18:27 ` [PATCH 4/9] block: Move blk_freeze_queue() and blk_unfreeze_queue() code Bart Van Assche
2016-09-26 18:27 ` Bart Van Assche
2016-09-27 6:26 ` Hannes Reinecke
2016-09-27 6:26 ` Hannes Reinecke
2016-09-27 6:26 ` Hannes Reinecke
2016-09-27 7:52 ` Johannes Thumshirn
2016-09-27 7:52 ` Johannes Thumshirn
2016-09-27 7:52 ` Johannes Thumshirn
2016-09-26 18:27 ` [PATCH 5/9] block: Extend blk_freeze_queue_start() to the non-blk-mq path Bart Van Assche
2016-09-26 18:27 ` Bart Van Assche
2016-09-26 18:27 ` Bart Van Assche
2016-09-27 7:50 ` Johannes Thumshirn
2016-09-27 7:50 ` Johannes Thumshirn
2016-09-27 7:50 ` Johannes Thumshirn
2016-09-27 13:22 ` Ming Lei
2016-09-27 13:22 ` Ming Lei
2016-09-27 14:42 ` Bart Van Assche
2016-09-27 14:42 ` Bart Van Assche
2016-09-27 14:42 ` Bart Van Assche
2016-09-27 15:55 ` Bart Van Assche
2016-09-27 15:55 ` Bart Van Assche
2016-09-27 15:55 ` Bart Van Assche
2016-09-26 18:28 ` [PATCH 6/9] block: Rename mq_freeze_wq and mq_freeze_depth Bart Van Assche
2016-09-26 18:28 ` Bart Van Assche
2016-09-27 7:51 ` Johannes Thumshirn
2016-09-27 7:51 ` Johannes Thumshirn
2016-09-27 7:51 ` Johannes Thumshirn
2016-09-26 18:28 ` [PATCH 7/9] blk-mq: Introduce blk_quiesce_queue() and blk_resume_queue() Bart Van Assche
2016-09-26 18:28 ` Bart Van Assche
2016-09-26 18:28 ` [PATCH 8/9] SRP transport: Port srp_wait_for_queuecommand() to scsi-mq Bart Van Assche
2016-09-26 18:28 ` Bart Van Assche
2016-09-26 18:28 ` [PATCH 9/9] [RFC] nvme: Fix a race condition Bart Van Assche
2016-09-26 18:28 ` Bart Van Assche
2016-09-27 16:31 ` Steve Wise
2016-09-27 16:31 ` Steve Wise
2016-09-27 16:31 ` Steve Wise
2016-09-27 16:43 ` Bart Van Assche
2016-09-27 16:43 ` Bart Van Assche
2016-09-27 16:43 ` Bart Van Assche
2016-09-27 16:56 ` James Bottomley
2016-09-27 16:56 ` James Bottomley
2016-09-27 17:09 ` Bart Van Assche
2016-09-27 17:09 ` Bart Van Assche
2016-09-27 17:09 ` Bart Van Assche
2016-09-28 14:23 ` Steve Wise
2016-09-28 14:23 ` Steve Wise
2016-09-28 14:23 ` Steve Wise
2016-09-27 16:56 ` Steve Wise [this message]
2016-09-27 16:56 ` Steve Wise
2016-09-27 16:56 ` Steve Wise
2016-09-26 18:33 ` [PATCH 0/9] Introduce blk_quiesce_queue() and blk_resume_queue() Mike Snitzer
2016-09-26 18:33 ` Mike Snitzer
2016-09-26 18:33 ` Mike Snitzer
2016-09-26 18:46 ` Bart Van Assche
2016-09-26 18:46 ` Bart Van Assche
2016-09-26 18:46 ` Bart Van Assche
2016-09-26 22:26 ` Bart Van Assche
2016-09-26 22:26 ` Bart Van Assche
2016-09-26 22:26 ` Bart Van Assche
2016-10-11 16:27 ` Laurence Oberman
2016-10-11 16:27 ` Laurence Oberman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='016a01d218e0$0fe7dc00$2fb79400$@opengridcomputing.com' \
--to=swise@opengridcomputing.com \
--cc=axboe@fb.com \
--cc=bart.vanassche@sandisk.com \
--cc=dledford@redhat.com \
--cc=hch@lst.de \
--cc=jejb@linux.vnet.ibm.com \
--cc=keith.busch@intel.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=linux-rdma@vger.kernel.org \
--cc=linux-scsi@vger.kernel.org \
--cc=martin.petersen@oracle.com \
--cc=snitzer@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.