* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-08-21 8:47 ` [PATCH 1/2] " Ming Lei
@ 2022-08-24 11:15 ` Hannes Reinecke
2022-08-24 14:07 ` Keith Busch
2022-08-25 10:02 ` Chao Leng
2022-08-28 14:37 ` Sagi Grimberg
2 siblings, 1 reply; 24+ messages in thread
From: Hannes Reinecke @ 2022-08-24 11:15 UTC (permalink / raw)
To: Ming Lei, Christoph Hellwig, linux-nvme
Cc: Yi Zhang, Sagi Grimberg, Keith Busch
On 8/21/22 10:47, Ming Lei wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> In some corner cases[1], freeze wait and unfreeze API may be called on
> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> freeze APIs more reliably, then this kind of issues can be avoided.
> And similar approach has been applied on stopping/quiescing nvme queues.
>
> [1] https://lore.kernel.org/linux-nvme/20220801125753.1434024-1-ming.lei@redhat.com/
>
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Link: https://lore.kernel.org/linux-block/CAHj4cs--KPTAGP=jj+7KMe=arDv=HeGeOgs1T8vbusyk=EjXow@mail.gmail.com/#r
>
> Add comment log.
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/nvme/host/core.c | 16 +++++++++++++---
> drivers/nvme/host/nvme.h | 1 +
> 2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index af367b22871b..fe4ae3616ed1 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -5028,8 +5028,11 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
> struct nvme_ns *ns;
>
> down_read(&ctrl->namespaces_rwsem);
> - list_for_each_entry(ns, &ctrl->namespaces, list)
> + list_for_each_entry(ns, &ctrl->namespaces, list) {
> + if (!test_and_clear_bit(NVME_NS_FREEZE, &ns->flags))
> + continue;
> blk_mq_unfreeze_queue(ns->queue);
> + }
> up_read(&ctrl->namespaces_rwsem);
> }
> EXPORT_SYMBOL_GPL(nvme_unfreeze);
> @@ -5040,6 +5043,8 @@ int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
>
> down_read(&ctrl->namespaces_rwsem);
> list_for_each_entry(ns, &ctrl->namespaces, list) {
> + if (!test_bit(NVME_NS_FREEZE, &ns->flags))
> + continue;
> timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
> if (timeout <= 0)
> break;
> @@ -5054,8 +5059,11 @@ void nvme_wait_freeze(struct nvme_ctrl *ctrl)
> struct nvme_ns *ns;
>
> down_read(&ctrl->namespaces_rwsem);
> - list_for_each_entry(ns, &ctrl->namespaces, list)
> + list_for_each_entry(ns, &ctrl->namespaces, list) {
> + if (!test_bit(NVME_NS_FREEZE, &ns->flags))
> + continue;
> blk_mq_freeze_queue_wait(ns->queue);
> + }
> up_read(&ctrl->namespaces_rwsem);
> }
> EXPORT_SYMBOL_GPL(nvme_wait_freeze);
> @@ -5065,8 +5073,10 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl)
> struct nvme_ns *ns;
>
> down_read(&ctrl->namespaces_rwsem);
> - list_for_each_entry(ns, &ctrl->namespaces, list)
> + list_for_each_entry(ns, &ctrl->namespaces, list) { > + set_bit(NVME_NS_FREEZE, &ns->flags);
Why not test_and_set_bit()?
Which would have the nice effect of adding a memory barrier ...
> blk_freeze_queue_start(ns->queue);
> + }
> up_read(&ctrl->namespaces_rwsem);
> }
> EXPORT_SYMBOL_GPL(nvme_start_freeze);
Cheers,
Hannes
--
Dr. Hannes Reinecke Kernel Storage Architect
hare@suse.de +49 911 74053 688
SUSE Software Solutions Germany GmbH, Maxfeldstr. 5, 90409 Nürnberg
HRB 36809 (AG Nürnberg), GF: Felix Imendörffer
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-08-24 11:15 ` Hannes Reinecke
@ 2022-08-24 14:07 ` Keith Busch
0 siblings, 0 replies; 24+ messages in thread
From: Keith Busch @ 2022-08-24 14:07 UTC (permalink / raw)
To: Hannes Reinecke
Cc: Ming Lei, Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg
On Wed, Aug 24, 2022 at 01:15:28PM +0200, Hannes Reinecke wrote:
> On 8/21/22 10:47, Ming Lei wrote:
> > @@ -5065,8 +5073,10 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl)
> > struct nvme_ns *ns;
> > down_read(&ctrl->namespaces_rwsem);
> > - list_for_each_entry(ns, &ctrl->namespaces, list)
> > + list_for_each_entry(ns, &ctrl->namespaces, list) {
> > + set_bit(NVME_NS_FREEZE, &ns->flags);
>
> Why not test_and_set_bit()?
> Which would have the nice effect of adding a memory barrier ...
Sure, I think you can even make a new WARN if the test fails. It should never
fail in this path, otherwise we could mess up the freeze depth.
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-08-21 8:47 ` [PATCH 1/2] " Ming Lei
2022-08-24 11:15 ` Hannes Reinecke
@ 2022-08-25 10:02 ` Chao Leng
2022-09-06 4:49 ` Christoph Hellwig
2022-09-06 8:45 ` Ming Lei
2022-08-28 14:37 ` Sagi Grimberg
2 siblings, 2 replies; 24+ messages in thread
From: Chao Leng @ 2022-08-25 10:02 UTC (permalink / raw)
To: Ming Lei, Christoph Hellwig, linux-nvme
Cc: Yi Zhang, Sagi Grimberg, Keith Busch
On 2022/8/21 16:47, Ming Lei wrote:
> From: Keith Busch <kbusch@kernel.org>
>
> In some corner cases[1], freeze wait and unfreeze API may be called on
> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> freeze APIs more reliably, then this kind of issues can be avoided.
> And similar approach has been applied on stopping/quiescing nvme queues.
This leads to another problem: the process that needs to be
in the frozen state is not actually frozen.
It's not safe.
There is the same risk on stopping/quiescing nvme queues.
>
> [1] https://lore.kernel.org/linux-nvme/20220801125753.1434024-1-ming.lei@redhat.com/
>
> Reported-by: Yi Zhang <yi.zhang@redhat.com>
> Link: https://lore.kernel.org/linux-block/CAHj4cs--KPTAGP=jj+7KMe=arDv=HeGeOgs1T8vbusyk=EjXow@mail.gmail.com/#r
>
> Add comment log.
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
> drivers/nvme/host/core.c | 16 +++++++++++++---
> drivers/nvme/host/nvme.h | 1 +
> 2 files changed, 14 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/nvme/host/core.c b/drivers/nvme/host/core.c
> index af367b22871b..fe4ae3616ed1 100644
> --- a/drivers/nvme/host/core.c
> +++ b/drivers/nvme/host/core.c
> @@ -5028,8 +5028,11 @@ void nvme_unfreeze(struct nvme_ctrl *ctrl)
> struct nvme_ns *ns;
>
> down_read(&ctrl->namespaces_rwsem);
> - list_for_each_entry(ns, &ctrl->namespaces, list)
> + list_for_each_entry(ns, &ctrl->namespaces, list) {
> + if (!test_and_clear_bit(NVME_NS_FREEZE, &ns->flags))
> + continue;
> blk_mq_unfreeze_queue(ns->queue);
> + }
> up_read(&ctrl->namespaces_rwsem);
> }
> EXPORT_SYMBOL_GPL(nvme_unfreeze);
> @@ -5040,6 +5043,8 @@ int nvme_wait_freeze_timeout(struct nvme_ctrl *ctrl, long timeout)
>
> down_read(&ctrl->namespaces_rwsem);
> list_for_each_entry(ns, &ctrl->namespaces, list) {
> + if (!test_bit(NVME_NS_FREEZE, &ns->flags))
> + continue;
> timeout = blk_mq_freeze_queue_wait_timeout(ns->queue, timeout);
> if (timeout <= 0)
> break;
> @@ -5054,8 +5059,11 @@ void nvme_wait_freeze(struct nvme_ctrl *ctrl)
> struct nvme_ns *ns;
>
> down_read(&ctrl->namespaces_rwsem);
> - list_for_each_entry(ns, &ctrl->namespaces, list)
> + list_for_each_entry(ns, &ctrl->namespaces, list) {
> + if (!test_bit(NVME_NS_FREEZE, &ns->flags))
> + continue;
> blk_mq_freeze_queue_wait(ns->queue);
> + }
> up_read(&ctrl->namespaces_rwsem);
> }
> EXPORT_SYMBOL_GPL(nvme_wait_freeze);
> @@ -5065,8 +5073,10 @@ void nvme_start_freeze(struct nvme_ctrl *ctrl)
> struct nvme_ns *ns;
>
> down_read(&ctrl->namespaces_rwsem);
> - list_for_each_entry(ns, &ctrl->namespaces, list)
> + list_for_each_entry(ns, &ctrl->namespaces, list) {
> + set_bit(NVME_NS_FREEZE, &ns->flags);
> blk_freeze_queue_start(ns->queue);
> + }
> up_read(&ctrl->namespaces_rwsem);
> }
> EXPORT_SYMBOL_GPL(nvme_start_freeze);
> diff --git a/drivers/nvme/host/nvme.h b/drivers/nvme/host/nvme.h
> index 1bdf714dcd9e..702ba19b5647 100644
> --- a/drivers/nvme/host/nvme.h
> +++ b/drivers/nvme/host/nvme.h
> @@ -484,6 +484,7 @@ struct nvme_ns {
> #define NVME_NS_FORCE_RO 3
> #define NVME_NS_READY 4
> #define NVME_NS_STOPPED 5
> +#define NVME_NS_FREEZE 6
>
> struct cdev cdev;
> struct device cdev_device;
>
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-08-25 10:02 ` Chao Leng
@ 2022-09-06 4:49 ` Christoph Hellwig
2022-09-06 8:45 ` Ming Lei
1 sibling, 0 replies; 24+ messages in thread
From: Christoph Hellwig @ 2022-09-06 4:49 UTC (permalink / raw)
To: Chao Leng
Cc: Ming Lei, Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
Keith Busch
On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>
>
> On 2022/8/21 16:47, Ming Lei wrote:
>> From: Keith Busch <kbusch@kernel.org>
>>
>> In some corner cases[1], freeze wait and unfreeze API may be called on
>> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
>> freeze APIs more reliably, then this kind of issues can be avoided.
>> And similar approach has been applied on stopping/quiescing nvme queues.
> This leads to another problem: the process that needs to be
> in the frozen state is not actually frozen.
Where do frozen processes come into play here?
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-08-25 10:02 ` Chao Leng
2022-09-06 4:49 ` Christoph Hellwig
@ 2022-09-06 8:45 ` Ming Lei
2022-09-06 9:32 ` Chao Leng
1 sibling, 1 reply; 24+ messages in thread
From: Ming Lei @ 2022-09-06 8:45 UTC (permalink / raw)
To: Chao Leng
Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
Keith Busch
On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>
>
> On 2022/8/21 16:47, Ming Lei wrote:
> > From: Keith Busch <kbusch@kernel.org>
> >
> > In some corner cases[1], freeze wait and unfreeze API may be called on
> > unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> > freeze APIs more reliably, then this kind of issues can be avoided.
> > And similar approach has been applied on stopping/quiescing nvme queues.
> This leads to another problem: the process that needs to be
> in the frozen state is not actually frozen.
> It's not safe.
The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
can be done only the flag is set. Not sure how it isn't safe.
Meantime calling blk_mq_freeze_queue_wait() on queue not being started
to freeze is usually a bug, and I think WARN_ON_ONCE() can be added
in nvme_wait_freeze().
Thanks,
Ming
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-09-06 8:45 ` Ming Lei
@ 2022-09-06 9:32 ` Chao Leng
2022-09-07 0:33 ` Ming Lei
0 siblings, 1 reply; 24+ messages in thread
From: Chao Leng @ 2022-09-06 9:32 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
Keith Busch
On 2022/9/6 16:45, Ming Lei wrote:
> On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>>
>>
>> On 2022/8/21 16:47, Ming Lei wrote:
>>> From: Keith Busch <kbusch@kernel.org>
>>>
>>> In some corner cases[1], freeze wait and unfreeze API may be called on
>>> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
>>> freeze APIs more reliably, then this kind of issues can be avoided.
>>> And similar approach has been applied on stopping/quiescing nvme queues.
>> This leads to another problem: the process that needs to be
>> in the frozen state is not actually frozen.
>> It's not safe.
>
> The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
> can be done only the flag is set. Not sure how it isn't safe.
I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
If just set_bit in nvme_start_freeze, it will cause another problem in
below scenario.
A: start freeze and set the bit;B:start freeze and set the bit;
and then
A:test and clear the bit, and unfreeze;B: test and skip;
The queue will be frozen for ever.
In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
No matter how to use NVME_NS_FREEZE , it may cause problems.
The freeze mechanism is perfect, and no additional protection mechanism is required.
>
> Meantime calling blk_mq_freeze_queue_wait() on queue not being started
> to freeze is usually a bug, and I think WARN_ON_ONCE() can be added
> in nvme_wait_freeze().
>
>
> Thanks,
> Ming
>
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-09-06 9:32 ` Chao Leng
@ 2022-09-07 0:33 ` Ming Lei
2022-09-07 1:18 ` Chao Leng
0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2022-09-07 0:33 UTC (permalink / raw)
To: Chao Leng
Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
Keith Busch, ming.lei
On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
>
>
> On 2022/9/6 16:45, Ming Lei wrote:
> > On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
> > >
> > >
> > > On 2022/8/21 16:47, Ming Lei wrote:
> > > > From: Keith Busch <kbusch@kernel.org>
> > > >
> > > > In some corner cases[1], freeze wait and unfreeze API may be called on
> > > > unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> > > > freeze APIs more reliably, then this kind of issues can be avoided.
> > > > And similar approach has been applied on stopping/quiescing nvme queues.
> > > This leads to another problem: the process that needs to be
> > > in the frozen state is not actually frozen.
> > > It's not safe.
> >
> > The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
> > can be done only the flag is set. Not sure how it isn't safe.
> I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
> If just set_bit in nvme_start_freeze, it will cause another problem in
> below scenario.
> A: start freeze and set the bit;B:start freeze and set the bit;
> and then
> A:test and clear the bit, and unfreeze;B: test and skip;
> The queue will be frozen for ever.
One simple approach is to replace down_read(->namespaces_rwsem) with
down_write(->namespaces_rwsem) in nvme_start_freeze() and
nvme_unfreeze().
>
> In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
> No matter how to use NVME_NS_FREEZE , it may cause problems.
> The freeze mechanism is perfect, and no additional protection mechanism is required.
block layer requires queue freeze and unfreeze APIs to be called in
pair strictly, that is why I add the 1st patch.
Thanks,
Ming
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-09-07 0:33 ` Ming Lei
@ 2022-09-07 1:18 ` Chao Leng
2022-09-07 2:06 ` Ming Lei
0 siblings, 1 reply; 24+ messages in thread
From: Chao Leng @ 2022-09-07 1:18 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
Keith Busch
On 2022/9/7 8:33, Ming Lei wrote:
> On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
>>
>>
>> On 2022/9/6 16:45, Ming Lei wrote:
>>> On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>>>>
>>>>
>>>> On 2022/8/21 16:47, Ming Lei wrote:
>>>>> From: Keith Busch <kbusch@kernel.org>
>>>>>
>>>>> In some corner cases[1], freeze wait and unfreeze API may be called on
>>>>> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
>>>>> freeze APIs more reliably, then this kind of issues can be avoided.
>>>>> And similar approach has been applied on stopping/quiescing nvme queues.
>>>> This leads to another problem: the process that needs to be
>>>> in the frozen state is not actually frozen.
>>>> It's not safe.
>>>
>>> The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
>>> can be done only the flag is set. Not sure how it isn't safe.
>> I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
>> If just set_bit in nvme_start_freeze, it will cause another problem in
>> below scenario.
>> A: start freeze and set the bit;B:start freeze and set the bit;
>> and then
>> A:test and clear the bit, and unfreeze;B: test and skip;
>> The queue will be frozen for ever.
>
> One simple approach is to replace down_read(->namespaces_rwsem) with
> down_write(->namespaces_rwsem) in nvme_start_freeze() and
> nvme_unfreeze().
>
>>
>> In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
>> No matter how to use NVME_NS_FREEZE , it may cause problems.
>> The freeze mechanism is perfect, and no additional protection mechanism is required.
>
> block layer requires queue freeze and unfreeze APIs to be called in
> pair strictly, that is why I add the 1st patch.
From your bug analysis, the reason is that nvme_wait_freeze is called without nvme_start_freeze.
patch 2/2 is already delete the nvme_wait_freeze.
If there is another bug of unmatched freeze and unfreeze,
can you describe the analysis of unmatched freeze and unfreeze?
The current patch 1/2 will introduce the unmatched freeze and unfreeze rather than solved it.
Maybe another patch is needed to fix the bug.
>
>
>
> Thanks,
> Ming
>
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-09-07 1:18 ` Chao Leng
@ 2022-09-07 2:06 ` Ming Lei
2022-09-07 5:58 ` Chao Leng
0 siblings, 1 reply; 24+ messages in thread
From: Ming Lei @ 2022-09-07 2:06 UTC (permalink / raw)
To: Chao Leng
Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
Keith Busch
On Wed, Sep 07, 2022 at 09:18:52AM +0800, Chao Leng wrote:
>
>
> On 2022/9/7 8:33, Ming Lei wrote:
> > On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
> > >
> > >
> > > On 2022/9/6 16:45, Ming Lei wrote:
> > > > On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
> > > > >
> > > > >
> > > > > On 2022/8/21 16:47, Ming Lei wrote:
> > > > > > From: Keith Busch <kbusch@kernel.org>
> > > > > >
> > > > > > In some corner cases[1], freeze wait and unfreeze API may be called on
> > > > > > unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
> > > > > > freeze APIs more reliably, then this kind of issues can be avoided.
> > > > > > And similar approach has been applied on stopping/quiescing nvme queues.
> > > > > This leads to another problem: the process that needs to be
> > > > > in the frozen state is not actually frozen.
> > > > > It's not safe.
> > > >
> > > > The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
> > > > can be done only the flag is set. Not sure how it isn't safe.
> > > I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
> > > If just set_bit in nvme_start_freeze, it will cause another problem in
> > > below scenario.
> > > A: start freeze and set the bit;B:start freeze and set the bit;
> > > and then
> > > A:test and clear the bit, and unfreeze;B: test and skip;
> > > The queue will be frozen for ever.
> >
> > One simple approach is to replace down_read(->namespaces_rwsem) with
> > down_write(->namespaces_rwsem) in nvme_start_freeze() and
> > nvme_unfreeze().
> >
> > >
> > > In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
> > > No matter how to use NVME_NS_FREEZE , it may cause problems.
> > > The freeze mechanism is perfect, and no additional protection mechanism is required.
> >
> > block layer requires queue freeze and unfreeze APIs to be called in
> > pair strictly, that is why I add the 1st patch.
> From your bug analysis, the reason is that nvme_wait_freeze is called without nvme_start_freeze.
> patch 2/2 is already delete the nvme_wait_freeze.
> If there is another bug of unmatched freeze and unfreeze,
> can you describe the analysis of unmatched freeze and unfreeze?
> The current patch 1/2 will introduce the unmatched freeze and unfreeze rather than solved it.
> Maybe another patch is needed to fix the bug.
Please look at nvme_reset_work():
...
if (dev->online_queues < 2) {
...
} else {
nvme_start_queues(&dev->ctrl);
nvme_wait_freeze(&dev->ctrl);
if (!dev->ctrl.tagset)
nvme_pci_alloc_tag_set(dev);
else
nvme_pci_update_nr_queues(dev);
nvme_dbbuf_set(dev);
nvme_unfreeze(&dev->ctrl);
}
...
nvme_wait_freeze() is called on unfrozen queue, so io hang is caused.
If nvme_unfreeze() is called on unfrozen queue, WARN_ON_ONCE() in
__blk_mq_unfreeze_queue() will be triggered.
Thanks,
Ming
^ permalink raw reply [flat|nested] 24+ messages in thread* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-09-07 2:06 ` Ming Lei
@ 2022-09-07 5:58 ` Chao Leng
0 siblings, 0 replies; 24+ messages in thread
From: Chao Leng @ 2022-09-07 5:58 UTC (permalink / raw)
To: Ming Lei
Cc: Christoph Hellwig, linux-nvme, Yi Zhang, Sagi Grimberg,
Keith Busch
On 2022/9/7 10:06, Ming Lei wrote:
> On Wed, Sep 07, 2022 at 09:18:52AM +0800, Chao Leng wrote:
>>
>>
>> On 2022/9/7 8:33, Ming Lei wrote:
>>> On Tue, Sep 06, 2022 at 05:32:01PM +0800, Chao Leng wrote:
>>>>
>>>>
>>>> On 2022/9/6 16:45, Ming Lei wrote:
>>>>> On Thu, Aug 25, 2022 at 06:02:33PM +0800, Chao Leng wrote:
>>>>>>
>>>>>>
>>>>>> On 2022/8/21 16:47, Ming Lei wrote:
>>>>>>> From: Keith Busch <kbusch@kernel.org>
>>>>>>>
>>>>>>> In some corner cases[1], freeze wait and unfreeze API may be called on
>>>>>>> unfrozen queue, add one per-ns flag of NVME_NS_FREEZE to make these
>>>>>>> freeze APIs more reliably, then this kind of issues can be avoided.
>>>>>>> And similar approach has been applied on stopping/quiescing nvme queues.
>>>>>> This leads to another problem: the process that needs to be
>>>>>> in the frozen state is not actually frozen.
>>>>>> It's not safe.
>>>>>
>>>>> The flag is just to control if queue wait is needed, blk_mq_freeze_queue_wait
>>>>> can be done only the flag is set. Not sure how it isn't safe.
>>>> I thought that the use of NVME_NS_FREEZE was the same as NVME_NS_STOPPED.
>>>> If just set_bit in nvme_start_freeze, it will cause another problem in
>>>> below scenario.
>>>> A: start freeze and set the bit;B:start freeze and set the bit;
>>>> and then
>>>> A:test and clear the bit, and unfreeze;B: test and skip;
>>>> The queue will be frozen for ever.
>>>
>>> One simple approach is to replace down_read(->namespaces_rwsem) with
>>> down_write(->namespaces_rwsem) in nvme_start_freeze() and
>>> nvme_unfreeze().
>>>
>>>>
>>>> In addition, I think patch 2/2 can fix the bug well, patch 1/2 is not necessary.
>>>> No matter how to use NVME_NS_FREEZE , it may cause problems.
>>>> The freeze mechanism is perfect, and no additional protection mechanism is required.
>>>
>>> block layer requires queue freeze and unfreeze APIs to be called in
>>> pair strictly, that is why I add the 1st patch.
>> From your bug analysis, the reason is that nvme_wait_freeze is called without nvme_start_freeze.
>> patch 2/2 is already delete the nvme_wait_freeze.
>> If there is another bug of unmatched freeze and unfreeze,
>> can you describe the analysis of unmatched freeze and unfreeze?
>> The current patch 1/2 will introduce the unmatched freeze and unfreeze rather than solved it.
>> Maybe another patch is needed to fix the bug.
>
> Please look at nvme_reset_work():
> ...
> if (dev->online_queues < 2) {
> ...
> } else {
> nvme_start_queues(&dev->ctrl);
> nvme_wait_freeze(&dev->ctrl);
> if (!dev->ctrl.tagset)
> nvme_pci_alloc_tag_set(dev);
> else
> nvme_pci_update_nr_queues(dev);
> nvme_dbbuf_set(dev);
> nvme_unfreeze(&dev->ctrl);
> }
> ...
>
> nvme_wait_freeze() is called on unfrozen queue, so io hang is caused.
>
> If nvme_unfreeze() is called on unfrozen queue, WARN_ON_ONCE() in
> __blk_mq_unfreeze_queue() will be triggered.
The bug is specific to the PCI transport layer.
It might be a better option to fix this bug by adding flag for normal connection
and abnormal recovery like other transport layers, and then perform different processing.
Different processes are processed differently to avoid unexpected nvme_unfreeze.
>
>
>
> Thanks,
> Ming
>
> .
>
^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: [PATCH 1/2] nvme: make NVMe freeze API reliably
2022-08-21 8:47 ` [PATCH 1/2] " Ming Lei
2022-08-24 11:15 ` Hannes Reinecke
2022-08-25 10:02 ` Chao Leng
@ 2022-08-28 14:37 ` Sagi Grimberg
2 siblings, 0 replies; 24+ messages in thread
From: Sagi Grimberg @ 2022-08-28 14:37 UTC (permalink / raw)
To: Ming Lei, Christoph Hellwig, linux-nvme; +Cc: Yi Zhang, Keith Busch
Reviewed-by: Sagi Grimberg <sagi@grimberg.me>
^ permalink raw reply [flat|nested] 24+ messages in thread