* [PATCHv2 0/2] nvme shadow doorbel buf fixes @ 2021-10-14 16:45 Keith Busch 2021-10-14 16:45 ` [PATCHv2 1/2] nvme-pci: clear shadow doorbell memory on resets Keith Busch 2021-10-14 16:45 ` [PATCHv2 2/2] nvme-pci: remove cached shadow doorbell offsets Keith Busch 0 siblings, 2 replies; 9+ messages in thread From: Keith Busch @ 2021-10-14 16:45 UTC (permalink / raw) To: linux-nvme, hch, sagi; +Cc: John Levon, Keith Busch This is v2 from an older patch series posted here: https://lore.kernel.org/linux-nvme/20200427235243.2268765-1-kbusch@kernel.org/ Changes since v1: Added comments explaining patch 1 Dropped patch 3 for now, which was just a performance optimization Merged up for current master Keith Busch (2): nvme-pci: clear shadow doorbell memory on resets nvme-pci: remove cached shadow doorbell offsets drivers/nvme/host/pci.c | 109 ++++++++++++++++++---------------------- 1 file changed, 49 insertions(+), 60 deletions(-) -- 2.25.4 ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCHv2 1/2] nvme-pci: clear shadow doorbell memory on resets 2021-10-14 16:45 [PATCHv2 0/2] nvme shadow doorbel buf fixes Keith Busch @ 2021-10-14 16:45 ` Keith Busch 2021-10-14 21:21 ` John Levon 2021-10-20 17:23 ` Christoph Hellwig 2021-10-14 16:45 ` [PATCHv2 2/2] nvme-pci: remove cached shadow doorbell offsets Keith Busch 1 sibling, 2 replies; 9+ messages in thread From: Keith Busch @ 2021-10-14 16:45 UTC (permalink / raw) To: linux-nvme, hch, sagi; +Cc: John Levon, Keith Busch The host memory doorbell and event buffers need to be initialized on each reset so the driver doesn't observe stale values from the previous instantiation. Signed-off-by: Keith Busch <kbusch@kernel.org> --- drivers/nvme/host/pci.c | 9 ++++++++- 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 0dd4b44b59cd..9dd173bfa57b 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -245,8 +245,15 @@ static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev) { unsigned int mem_size = nvme_dbbuf_size(dev); - if (dev->dbbuf_dbs) + if (dev->dbbuf_dbs) { + /* + * Clear the dbbuf memory so the driver doesn't observe stale + * values from the previous instantiation. + */ + memset(dev->dbbuf_dbs, 0, mem_size); + memset(dev->dbbuf_eis, 0, mem_size); return 0; + } dev->dbbuf_dbs = dma_alloc_coherent(dev->dev, mem_size, &dev->dbbuf_dbs_dma_addr, -- 2.25.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCHv2 1/2] nvme-pci: clear shadow doorbell memory on resets 2021-10-14 16:45 ` [PATCHv2 1/2] nvme-pci: clear shadow doorbell memory on resets Keith Busch @ 2021-10-14 21:21 ` John Levon 2021-10-20 17:23 ` Christoph Hellwig 1 sibling, 0 replies; 9+ messages in thread From: John Levon @ 2021-10-14 21:21 UTC (permalink / raw) To: Keith Busch; +Cc: linux-nvme, hch, sagi On Thu, Oct 14, 2021 at 09:45:42AM -0700, Keith Busch wrote: > The host memory doorbell and event buffers need to be initialized on > each reset so the driver doesn't observe stale values from the previous > instantiation. > > Signed-off-by: Keith Busch <kbusch@kernel.org> I reverted our workaround and tried this patch with my simple test case: mkfs /dev/nvme0n1 echo 1 >/sys/devices/pci0000:00/0000:00:06.0/reset mkfs /dev/nvme0n1 Works fine where previously it would essentially permanently hang the controller. Tested-by: John Levon <john.levon@nutanix.com> thanks john > --- > drivers/nvme/host/pci.c | 9 ++++++++- > 1 file changed, 8 insertions(+), 1 deletion(-) > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 0dd4b44b59cd..9dd173bfa57b 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -245,8 +245,15 @@ static int nvme_dbbuf_dma_alloc(struct nvme_dev *dev) > { > unsigned int mem_size = nvme_dbbuf_size(dev); > > - if (dev->dbbuf_dbs) > + if (dev->dbbuf_dbs) { > + /* > + * Clear the dbbuf memory so the driver doesn't observe stale > + * values from the previous instantiation. > + */ > + memset(dev->dbbuf_dbs, 0, mem_size); > + memset(dev->dbbuf_eis, 0, mem_size); > return 0; > + } > > dev->dbbuf_dbs = dma_alloc_coherent(dev->dev, mem_size, > &dev->dbbuf_dbs_dma_addr, ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCHv2 1/2] nvme-pci: clear shadow doorbell memory on resets 2021-10-14 16:45 ` [PATCHv2 1/2] nvme-pci: clear shadow doorbell memory on resets Keith Busch 2021-10-14 21:21 ` John Levon @ 2021-10-20 17:23 ` Christoph Hellwig 1 sibling, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2021-10-20 17:23 UTC (permalink / raw) To: Keith Busch; +Cc: linux-nvme, hch, sagi, John Levon Thanks, applied to nvme-5.16. ^ permalink raw reply [flat|nested] 9+ messages in thread
* [PATCHv2 2/2] nvme-pci: remove cached shadow doorbell offsets 2021-10-14 16:45 [PATCHv2 0/2] nvme shadow doorbel buf fixes Keith Busch 2021-10-14 16:45 ` [PATCHv2 1/2] nvme-pci: clear shadow doorbell memory on resets Keith Busch @ 2021-10-14 16:45 ` Keith Busch 2021-10-14 16:53 ` Christoph Hellwig 2021-10-14 17:09 ` John Levon 1 sibling, 2 replies; 9+ messages in thread From: Keith Busch @ 2021-10-14 16:45 UTC (permalink / raw) To: linux-nvme, hch, sagi; +Cc: John Levon, Keith Busch Real nvme hardware doesn't support the shadow doorbell feature. Remove the overhead of saving this special feature per-queue and instead obtain the address offsets from device providing it. And when this feature is in use, the specification requires all queue updates use this mechanism, so don't don't treat the admin queue differently. Signed-off-by: Keith Busch <kbusch@kernel.org> --- drivers/nvme/host/pci.c | 100 ++++++++++++++++------------------------ 1 file changed, 41 insertions(+), 59 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 9dd173bfa57b..65c0e925944c 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -209,10 +209,6 @@ struct nvme_queue { #define NVMEQ_SQ_CMB 1 #define NVMEQ_DELETE_ERROR 2 #define NVMEQ_POLLED 3 - u32 *dbbuf_sq_db; - u32 *dbbuf_cq_db; - u32 *dbbuf_sq_ei; - u32 *dbbuf_cq_ei; struct completion delete_done; }; @@ -289,29 +285,6 @@ static void nvme_dbbuf_dma_free(struct nvme_dev *dev) } } -static void nvme_dbbuf_init(struct nvme_dev *dev, - struct nvme_queue *nvmeq, int qid) -{ - if (!dev->dbbuf_dbs || !qid) - return; - - nvmeq->dbbuf_sq_db = &dev->dbbuf_dbs[sq_idx(qid, dev->db_stride)]; - nvmeq->dbbuf_cq_db = &dev->dbbuf_dbs[cq_idx(qid, dev->db_stride)]; - nvmeq->dbbuf_sq_ei = &dev->dbbuf_eis[sq_idx(qid, dev->db_stride)]; - nvmeq->dbbuf_cq_ei = &dev->dbbuf_eis[cq_idx(qid, dev->db_stride)]; -} - -static void nvme_dbbuf_free(struct nvme_queue *nvmeq) -{ - if (!nvmeq->qid) - return; - - nvmeq->dbbuf_sq_db = NULL; - nvmeq->dbbuf_cq_db = NULL; - nvmeq->dbbuf_sq_ei = NULL; - nvmeq->dbbuf_cq_ei = NULL; -} - static void nvme_dbbuf_set(struct nvme_dev *dev) { struct nvme_command c = { }; @@ -328,13 +301,10 @@ static void nvme_dbbuf_set(struct nvme_dev *dev) dev_warn(dev->ctrl.device, "unable to set dbbuf\n"); /* Free memory and continue on */ nvme_dbbuf_dma_free(dev); - - for (i = 1; i <= dev->online_queues; i++) - nvme_dbbuf_free(&dev->queues[i]); } } -static inline int nvme_dbbuf_need_event(u16 event_idx, u16 new_idx, u16 old) +static inline bool nvme_dbbuf_need_event(u16 event_idx, u16 new_idx, u16 old) { return (u16)(new_idx - event_idx - 1) < (u16)(new_idx - old); } @@ -343,31 +313,48 @@ static inline int nvme_dbbuf_need_event(u16 event_idx, u16 new_idx, u16 old) static bool nvme_dbbuf_update_and_check_event(u16 value, u32 *dbbuf_db, volatile u32 *dbbuf_ei) { - if (dbbuf_db) { - u16 old_value; + u16 old_value; - /* - * Ensure that the queue is written before updating - * the doorbell in memory - */ - wmb(); + /* + * Ensure that the queue is written before updating the doorbell in + * memory + */ + wmb(); - old_value = *dbbuf_db; - *dbbuf_db = value; + old_value = *dbbuf_db; + *dbbuf_db = value; - /* - * Ensure that the doorbell is updated before reading the event - * index from memory. The controller needs to provide similar - * ordering to ensure the envent index is updated before reading - * the doorbell. - */ - mb(); + /* + * Ensure that the doorbell is updated before reading the event index + * from memory. The controller needs to provide similar ordering to + * ensure the envent index is updated before reading the doorbell. + */ + mb(); + return nvme_dbbuf_need_event(*dbbuf_ei, value, old_value); +} - if (!nvme_dbbuf_need_event(*dbbuf_ei, value, old_value)) - return false; - } +static bool nvme_dbbuf_update_sq(struct nvme_queue *nvmeq) +{ + struct nvme_dev *dev = nvmeq->dev; - return true; + if (!dev->dbbuf_dbs) + return true; + + return nvme_dbbuf_update_and_check_event(nvmeq->sq_tail, + &dev->dbbuf_dbs[sq_idx(nvmeq->qid, dev->db_stride)], + &dev->dbbuf_eis[sq_idx(nvmeq->qid, dev->db_stride)]); +} + +static bool nvme_dbbuf_update_cq(struct nvme_queue *nvmeq) +{ + struct nvme_dev *dev = nvmeq->dev; + + if (!dev->dbbuf_dbs) + return true; + + return nvme_dbbuf_update_and_check_event(nvmeq->cq_head, + &dev->dbbuf_dbs[cq_idx(nvmeq->qid, dev->db_stride)], + &dev->dbbuf_eis[cq_idx(nvmeq->qid, dev->db_stride)]); } /* @@ -494,8 +481,7 @@ static inline void nvme_write_sq_db(struct nvme_queue *nvmeq, bool write_sq) return; } - if (nvme_dbbuf_update_and_check_event(nvmeq->sq_tail, - nvmeq->dbbuf_sq_db, nvmeq->dbbuf_sq_ei)) + if (nvme_dbbuf_update_sq(nvmeq)) writel(nvmeq->sq_tail, nvmeq->q_db); nvmeq->last_sq_tail = nvmeq->sq_tail; } @@ -989,11 +975,8 @@ static inline bool nvme_cqe_pending(struct nvme_queue *nvmeq) static inline void nvme_ring_cq_doorbell(struct nvme_queue *nvmeq) { - u16 head = nvmeq->cq_head; - - if (nvme_dbbuf_update_and_check_event(head, nvmeq->dbbuf_cq_db, - nvmeq->dbbuf_cq_ei)) - writel(head, nvmeq->q_db + nvmeq->dev->db_stride); + if (nvme_dbbuf_update_cq(nvmeq)) + writel(nvmeq->cq_head, nvmeq->q_db + nvmeq->dev->db_stride); } static inline struct blk_mq_tags *nvme_queue_tagset(struct nvme_queue *nvmeq) @@ -1556,7 +1539,6 @@ static void nvme_init_queue(struct nvme_queue *nvmeq, u16 qid) nvmeq->cq_phase = 1; nvmeq->q_db = &dev->dbs[qid * 2 * dev->db_stride]; memset((void *)nvmeq->cqes, 0, CQ_SIZE(nvmeq)); - nvme_dbbuf_init(dev, nvmeq, qid); dev->online_queues++; wmb(); /* ensure the first interrupt sees the initialization */ } -- 2.25.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* Re: [PATCHv2 2/2] nvme-pci: remove cached shadow doorbell offsets 2021-10-14 16:45 ` [PATCHv2 2/2] nvme-pci: remove cached shadow doorbell offsets Keith Busch @ 2021-10-14 16:53 ` Christoph Hellwig 2021-10-14 17:09 ` John Levon 1 sibling, 0 replies; 9+ messages in thread From: Christoph Hellwig @ 2021-10-14 16:53 UTC (permalink / raw) To: Keith Busch; +Cc: linux-nvme, hch, sagi, John Levon On Thu, Oct 14, 2021 at 09:45:43AM -0700, Keith Busch wrote: > +static bool nvme_dbbuf_update_sq(struct nvme_queue *nvmeq) > +{ > + struct nvme_dev *dev = nvmeq->dev; > > + if (!dev->dbbuf_dbs) > + return true; > + > + return nvme_dbbuf_update_and_check_event(nvmeq->sq_tail, > + &dev->dbbuf_dbs[sq_idx(nvmeq->qid, dev->db_stride)], > + &dev->dbbuf_eis[sq_idx(nvmeq->qid, dev->db_stride)]); > +} > + > +static bool nvme_dbbuf_update_cq(struct nvme_queue *nvmeq) > +{ > + struct nvme_dev *dev = nvmeq->dev; > + > + if (!dev->dbbuf_dbs) > + return true; > + > + return nvme_dbbuf_update_and_check_event(nvmeq->cq_head, > + &dev->dbbuf_dbs[cq_idx(nvmeq->qid, dev->db_stride)], > + &dev->dbbuf_eis[cq_idx(nvmeq->qid, dev->db_stride)]); This would look much cleaner with a local variable for the offset. But otherwise it looks sensible to me. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCHv2 2/2] nvme-pci: remove cached shadow doorbell offsets 2021-10-14 16:45 ` [PATCHv2 2/2] nvme-pci: remove cached shadow doorbell offsets Keith Busch 2021-10-14 16:53 ` Christoph Hellwig @ 2021-10-14 17:09 ` John Levon 2021-10-14 17:49 ` Keith Busch 1 sibling, 1 reply; 9+ messages in thread From: John Levon @ 2021-10-14 17:09 UTC (permalink / raw) To: Keith Busch; +Cc: linux-nvme, hch, sagi On Thu, Oct 14, 2021 at 09:45:43AM -0700, Keith Busch wrote: > And when this feature is in use, the specification requires all queue > updates use this mechanism, so don't don't treat the admin queue > differently. Hi Keith, I think you might have missed the other email from me? This would cause implementation difficulties for a controller, right? Currently we presume that the admin queue's shadow doorbells aren't valid, so always look at the BAR0 location. Given we need to support older Linux versions, we don't have any non-hacky way to handle this: we'd need something horrible that presumes those Linux versions - and any other driver implementations - always leave the admin queue shadows at zero, so if we see a non-zero value, it must be doing shadow doorbells. It's unclear to me what the advantage of fixing this is, given Linux has been out of spec for so long. regards john ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCHv2 2/2] nvme-pci: remove cached shadow doorbell offsets 2021-10-14 17:09 ` John Levon @ 2021-10-14 17:49 ` Keith Busch 2021-10-14 18:33 ` John Levon 0 siblings, 1 reply; 9+ messages in thread From: Keith Busch @ 2021-10-14 17:49 UTC (permalink / raw) To: John Levon; +Cc: linux-nvme, hch, sagi On Thu, Oct 14, 2021 at 06:09:30PM +0100, John Levon wrote: > On Thu, Oct 14, 2021 at 09:45:43AM -0700, Keith Busch wrote: > > > And when this feature is in use, the specification requires all queue > > updates use this mechanism, so don't don't treat the admin queue > > differently. > > Hi Keith, I think you might have missed the other email from me? > > This would cause implementation difficulties for a controller, right? If the controller is not maintaining the admin queue shadow, then the event index will always tell the host to ring the doorbell, right? So the same host behvaior would ultimately happen. > Currently we presume that the admin queue's shadow doorbells aren't valid, so > always look at the BAR0 location. Given we need to support older Linux versions, > we don't have any non-hacky way to handle this: we'd need something horrible > that presumes those Linux versions - and any other driver implementations - > always leave the admin queue shadows at zero, so if we see a non-zero value, it > must be doing shadow doorbells. > > It's unclear to me what the advantage of fixing this is, given Linux has been > out of spec for so long. Ah, I skimmed the part that said we are not up to spec compliance, so aligning with the spec with any feature usually justifies itself. I see your point was the opposite :) I don't consider this part of the patch to all that important, though, so I can leave the admin queues alone if it's concerning. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: [PATCHv2 2/2] nvme-pci: remove cached shadow doorbell offsets 2021-10-14 17:49 ` Keith Busch @ 2021-10-14 18:33 ` John Levon 0 siblings, 0 replies; 9+ messages in thread From: John Levon @ 2021-10-14 18:33 UTC (permalink / raw) To: Keith Busch; +Cc: linux-nvme, hch, sagi On Thu, Oct 14, 2021 at 10:49:38AM -0700, Keith Busch wrote: > On Thu, Oct 14, 2021 at 06:09:30PM +0100, John Levon wrote: > > On Thu, Oct 14, 2021 at 09:45:43AM -0700, Keith Busch wrote: > > > > > And when this feature is in use, the specification requires all queue > > > updates use this mechanism, so don't don't treat the admin queue > > > differently. > > > > This would cause implementation difficulties for a controller, right? > > If the controller is not maintaining the admin queue shadow, then the > event index will always tell the host to ring the doorbell, right? So > the same host behvaior would ultimately happen. The eventidx only causes the additional BAR0 writes if the doorbell update crosses it (for example, a 4->6 transition when host sees eventidx of 5). So we couldn't just leave eventidx at zero, and neither can we maintain it to force BAR0 writes, as there are races that means we'd have to look at the shadow doorbell value - and make assumptions - anyway. This would all be a lot easier if zero wasn't a valid doorbell value, but we're stuck now. > > Currently we presume that the admin queue's shadow doorbells aren't valid, so > > always look at the BAR0 location. Given we need to support older Linux versions, > > we don't have any non-hacky way to handle this: we'd need something horrible > > that presumes those Linux versions - and any other driver implementations - > > always leave the admin queue shadows at zero, so if we see a non-zero value, it > > must be doing shadow doorbells. > > > > It's unclear to me what the advantage of fixing this is, given Linux has been > > out of spec for so long. > > Ah, I skimmed the part that said we are not up to spec compliance, so > aligning with the spec with any feature usually justifies itself. I see > your point was the opposite :) Exactly. The test SPDK client is out of spec in exactly the same way. I'm trying to follow up on the spec side so it reflects reality (I can't find a driver that behaves differently here), but that's obviously a slow process. thanks john ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2021-10-20 17:24 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2021-10-14 16:45 [PATCHv2 0/2] nvme shadow doorbel buf fixes Keith Busch 2021-10-14 16:45 ` [PATCHv2 1/2] nvme-pci: clear shadow doorbell memory on resets Keith Busch 2021-10-14 21:21 ` John Levon 2021-10-20 17:23 ` Christoph Hellwig 2021-10-14 16:45 ` [PATCHv2 2/2] nvme-pci: remove cached shadow doorbell offsets Keith Busch 2021-10-14 16:53 ` Christoph Hellwig 2021-10-14 17:09 ` John Levon 2021-10-14 17:49 ` Keith Busch 2021-10-14 18:33 ` John Levon
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox