From: Keith Busch <kbusch@kernel.org>
To: Dongli Zhang <dongli.zhang@oracle.com>
Cc: Sagi Grimberg <sagi@grimberg.me>,
linux-nvme@lists.infradead.org, linux-block@vger.kernel.org
Subject: Re: nvme double __blk_mq_complete_request() bugs
Date: Mon, 25 May 2020 10:45:16 -0600 [thread overview]
Message-ID: <20200525164516.GC73686@C02WT3WMHTD6> (raw)
In-Reply-To: <49f32df9-81a9-4c15-9950-aceff8fb291e@oracle.com>
On Sun, May 24, 2020 at 07:33:02AM -0700, Dongli Zhang wrote:
> >> After code analysis, I think this is for nvme-pci as well.
> >>
> >> nvme_process_cq()
> >> -> nvme_handle_cqe()
> >> -> nvme_end_request()
> >> -> blk_mq_complete_request()
> >> nvme_reset_work()
> >> -> nvme_dev_disable()
> >> -> nvme_reap_pending_cqes()
> >> -> nvme_process_cq()
> >> -> nvme_handle_cqe()
> >> -> nvme_end_request()
> >> -> blk_mq_complete_request()
> >> -> __blk_mq_complete_request()
> >> -> __blk_mq_complete_request()
> >
> > nvme_dev_disable will first disable the queues before reaping the pending cqes so
> > it shouldn't have this issue.
> >
>
> Would you mind help explain how nvme_dev_disable() would avoid this issue?
>
> nvme_dev_disable() would:
>
> 1. freeze all the queues so that new request would not enter and submit
> 2. NOT wait for freezing during live reset so that q->q_usage_counter is not
> guaranteed to be zero.
> 3. quiesce all the queues so that new request would not dispatch
> 4. delete the queue and free irq
>
> However, I do not find a mechanism to prevent if a nvme_end_request() is already
> in progress.
>
> E.g., suppose __blk_mq_complete_request() is already triggered on cpu 3 and
> waiting for its first line "WRITE_ONCE(rq->state, MQ_RQ_COMPLETE)" to be
> executed ... while another cpu is doing live reset. I do not see how to prevent
> such race.
The queues and their interrupts are torn and synchronized down before the reset
reclaims uncompleted requests. There's no other context that can be running
completions at that point.
WARNING: multiple messages have this Message-ID (diff)
From: Keith Busch <kbusch@kernel.org>
To: Dongli Zhang <dongli.zhang@oracle.com>
Cc: linux-block@vger.kernel.org, Sagi Grimberg <sagi@grimberg.me>,
linux-nvme@lists.infradead.org
Subject: Re: nvme double __blk_mq_complete_request() bugs
Date: Mon, 25 May 2020 10:45:16 -0600 [thread overview]
Message-ID: <20200525164516.GC73686@C02WT3WMHTD6> (raw)
In-Reply-To: <49f32df9-81a9-4c15-9950-aceff8fb291e@oracle.com>
On Sun, May 24, 2020 at 07:33:02AM -0700, Dongli Zhang wrote:
> >> After code analysis, I think this is for nvme-pci as well.
> >>
> >> nvme_process_cq()
> >> -> nvme_handle_cqe()
> >> -> nvme_end_request()
> >> -> blk_mq_complete_request()
> >> nvme_reset_work()
> >> -> nvme_dev_disable()
> >> -> nvme_reap_pending_cqes()
> >> -> nvme_process_cq()
> >> -> nvme_handle_cqe()
> >> -> nvme_end_request()
> >> -> blk_mq_complete_request()
> >> -> __blk_mq_complete_request()
> >> -> __blk_mq_complete_request()
> >
> > nvme_dev_disable will first disable the queues before reaping the pending cqes so
> > it shouldn't have this issue.
> >
>
> Would you mind help explain how nvme_dev_disable() would avoid this issue?
>
> nvme_dev_disable() would:
>
> 1. freeze all the queues so that new request would not enter and submit
> 2. NOT wait for freezing during live reset so that q->q_usage_counter is not
> guaranteed to be zero.
> 3. quiesce all the queues so that new request would not dispatch
> 4. delete the queue and free irq
>
> However, I do not find a mechanism to prevent if a nvme_end_request() is already
> in progress.
>
> E.g., suppose __blk_mq_complete_request() is already triggered on cpu 3 and
> waiting for its first line "WRITE_ONCE(rq->state, MQ_RQ_COMPLETE)" to be
> executed ... while another cpu is doing live reset. I do not see how to prevent
> such race.
The queues and their interrupts are torn and synchronized down before the reset
reclaims uncompleted requests. There's no other context that can be running
completions at that point.
_______________________________________________
linux-nvme mailing list
linux-nvme@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-nvme
next prev parent reply other threads:[~2020-05-25 16:45 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-05-18 4:30 nvme double __blk_mq_complete_request() bugs Dongli Zhang
2020-05-18 4:30 ` Dongli Zhang
2020-05-18 7:51 ` Sagi Grimberg
2020-05-18 7:51 ` Sagi Grimberg
2020-05-24 14:33 ` Dongli Zhang
2020-05-24 14:33 ` Dongli Zhang
2020-05-25 16:45 ` Keith Busch [this message]
2020-05-25 16:45 ` Keith Busch
2020-05-27 1:04 ` Dongli Zhang
2020-05-27 1:04 ` Dongli Zhang
2020-05-27 3:36 ` Keith Busch
2020-05-27 3:36 ` Keith Busch
2020-05-27 4:14 ` Dongli Zhang
2020-05-27 4:14 ` Dongli Zhang
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20200525164516.GC73686@C02WT3WMHTD6 \
--to=kbusch@kernel.org \
--cc=dongli.zhang@oracle.com \
--cc=linux-block@vger.kernel.org \
--cc=linux-nvme@lists.infradead.org \
--cc=sagi@grimberg.me \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.