* [PATCH] nvme: avoid cqe corruption when update at the same time as read [not found] <1963234885.1121305.1457109031908.JavaMail.zimbra@kalray.eu> @ 2016-03-05 7:23 ` Marta Rybczynska 2016-03-09 17:08 ` Christoph Hellwig 0 siblings, 1 reply; 8+ messages in thread From: Marta Rybczynska @ 2016-03-05 7:23 UTC (permalink / raw) The cqe structure read normally happens from lower to upper addresses and the validity bit (status) is at the highest address. If the PCI updates the memory when the cqe is read by multiple non-atomic loads, the structure may be corrupted. Avoid this by reading the status first and then the whole structure. Signed-off-by: Marta Rybczynska <marta.rybczynska at kalray.eu> --- drivers/nvme/host/pci.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index a128672..cf74ffe 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -729,12 +729,13 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) phase = nvmeq->cq_phase; for (;;) { - struct nvme_completion cqe = nvmeq->cqes[head]; - u16 status = le16_to_cpu(cqe.status); + struct nvme_completion cqe; + u16 status = le16_to_cpu(nvmeq->cqes[head].status); struct request *req; if ((status & 1) != phase) break; + cqe = nvmeq->cqes[head]; nvmeq->sq_head = le16_to_cpu(cqe.sq_head); if (++head == nvmeq->q_depth) { head = 0; -- 2.1.4 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] nvme: avoid cqe corruption when update at the same time as read 2016-03-05 7:23 ` [PATCH] nvme: avoid cqe corruption when update at the same time as read Marta Rybczynska @ 2016-03-09 17:08 ` Christoph Hellwig 2016-03-10 8:31 ` Johannes Thumshirn 2016-03-10 11:08 ` Marta Rybczynska 0 siblings, 2 replies; 8+ messages in thread From: Christoph Hellwig @ 2016-03-09 17:08 UTC (permalink / raw) On Sat, Mar 05, 2016@08:23:15AM +0100, Marta Rybczynska wrote: > The cqe structure read normally happens from lower to upper addresses > and the validity bit (status) is at the highest address. If the PCI > updates the memory when the cqe is read by multiple non-atomic loads, > the structure may be corrupted. Avoid this by reading the status > first and then the whole structure. Doing the phase check separately sounds sensible to me, but how about doing something like the version below, which ensures we only read the whole cqe after the check, and cleans things up a bit by using a common helper: diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index 1d1c6d1..f6ea5d3 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -704,6 +704,11 @@ static void nvme_complete_rq(struct request *req) blk_mq_end_request(req, error); } +static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head) +{ + return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) == head; +} + static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) { u16 head, phase; @@ -711,13 +716,10 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) head = nvmeq->cq_head; phase = nvmeq->cq_phase; - for (;;) { + while (nvme_cqe_valid(nvmeq, head)) { struct nvme_completion cqe = nvmeq->cqes[head]; - u16 status = le16_to_cpu(cqe.status); struct request *req; - if ((status & 1) != phase) - break; if (++head == nvmeq->q_depth) { head = 0; phase = !phase; @@ -748,7 +750,7 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) req = blk_mq_tag_to_rq(*nvmeq->tags, cqe.command_id); if (req->cmd_type == REQ_TYPE_DRV_PRIV && req->special) memcpy(req->special, &cqe, sizeof(cqe)); - blk_mq_complete_request(req, status >> 1); + blk_mq_complete_request(req, le16_to_cpu(cqe.status) >> 1); } @@ -789,18 +791,17 @@ static irqreturn_t nvme_irq(int irq, void *data) static irqreturn_t nvme_irq_check(int irq, void *data) { struct nvme_queue *nvmeq = data; - struct nvme_completion cqe = nvmeq->cqes[nvmeq->cq_head]; - if ((le16_to_cpu(cqe.status) & 1) != nvmeq->cq_phase) - return IRQ_NONE; - return IRQ_WAKE_THREAD; + + if (nvme_cqe_valid(nvmeq, nvmeq->cq_head)) + return IRQ_WAKE_THREAD; + return IRQ_NONE; } static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag) { struct nvme_queue *nvmeq = hctx->driver_data; - if ((le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) == - nvmeq->cq_phase) { + if (nvme_cqe_valid(nvmeq, nvmeq->cq_head)) { spin_lock_irq(&nvmeq->q_lock); __nvme_process_cq(nvmeq, &tag); spin_unlock_irq(&nvmeq->q_lock); ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] nvme: avoid cqe corruption when update at the same time as read 2016-03-09 17:08 ` Christoph Hellwig @ 2016-03-10 8:31 ` Johannes Thumshirn 2016-03-10 11:08 ` Marta Rybczynska 1 sibling, 0 replies; 8+ messages in thread From: Johannes Thumshirn @ 2016-03-10 8:31 UTC (permalink / raw) On Wed, Mar 09, 2016@09:08:46AM -0800, Christoph Hellwig wrote: > On Sat, Mar 05, 2016@08:23:15AM +0100, Marta Rybczynska wrote: > > The cqe structure read normally happens from lower to upper addresses > > and the validity bit (status) is at the highest address. If the PCI > > updates the memory when the cqe is read by multiple non-atomic loads, > > the structure may be corrupted. Avoid this by reading the status > > first and then the whole structure. > > Doing the phase check separately sounds sensible to me, but how about > doing something like the version below, which ensures we only read the > whole cqe after the check, and cleans things up a bit by using a common > helper: > > diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c > index 1d1c6d1..f6ea5d3 100644 > --- a/drivers/nvme/host/pci.c > +++ b/drivers/nvme/host/pci.c > @@ -704,6 +704,11 @@ static void nvme_complete_rq(struct request *req) > blk_mq_end_request(req, error); > } > > +static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head) > +{ > + return (le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) == head; > +} > + > static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) > { > u16 head, phase; > @@ -711,13 +716,10 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) > head = nvmeq->cq_head; > phase = nvmeq->cq_phase; > > - for (;;) { > + while (nvme_cqe_valid(nvmeq, head)) { > struct nvme_completion cqe = nvmeq->cqes[head]; > - u16 status = le16_to_cpu(cqe.status); > struct request *req; > > - if ((status & 1) != phase) > - break; > if (++head == nvmeq->q_depth) { > head = 0; > phase = !phase; > @@ -748,7 +750,7 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) > req = blk_mq_tag_to_rq(*nvmeq->tags, cqe.command_id); > if (req->cmd_type == REQ_TYPE_DRV_PRIV && req->special) > memcpy(req->special, &cqe, sizeof(cqe)); > - blk_mq_complete_request(req, status >> 1); > + blk_mq_complete_request(req, le16_to_cpu(cqe.status) >> 1); > > } > > @@ -789,18 +791,17 @@ static irqreturn_t nvme_irq(int irq, void *data) > static irqreturn_t nvme_irq_check(int irq, void *data) > { > struct nvme_queue *nvmeq = data; > - struct nvme_completion cqe = nvmeq->cqes[nvmeq->cq_head]; > - if ((le16_to_cpu(cqe.status) & 1) != nvmeq->cq_phase) > - return IRQ_NONE; > - return IRQ_WAKE_THREAD; > + > + if (nvme_cqe_valid(nvmeq, nvmeq->cq_head)) > + return IRQ_WAKE_THREAD; > + return IRQ_NONE; > } > > static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag) > { > struct nvme_queue *nvmeq = hctx->driver_data; > > - if ((le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) == > - nvmeq->cq_phase) { > + if (nvme_cqe_valid(nvmeq, nvmeq->cq_head)) { > spin_lock_irq(&nvmeq->q_lock); > __nvme_process_cq(nvmeq, &tag); > spin_unlock_irq(&nvmeq->q_lock); > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme at lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-nvme This looks sensible, Reviewed-by: Johannes Thumshirn <jthumshirn at suse.de> -- Johannes Thumshirn Storage jthumshirn at suse.de +49 911 74053 689 SUSE LINUX GmbH, Maxfeldstr. 5, 90409 N?rnberg GF: Felix Imend?rffer, Jane Smithard, Graham Norton HRB 21284 (AG N?rnberg) Key fingerprint = EC38 9CAB C2C4 F25D 8600 D0D0 0393 969D 2D76 0850 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] nvme: avoid cqe corruption when update at the same time as read 2016-03-09 17:08 ` Christoph Hellwig 2016-03-10 8:31 ` Johannes Thumshirn @ 2016-03-10 11:08 ` Marta Rybczynska 2016-03-10 14:36 ` Christoph Hellwig 1 sibling, 1 reply; 8+ messages in thread From: Marta Rybczynska @ 2016-03-10 11:08 UTC (permalink / raw) ----- Le 9 Mar 16, ? 18:08, Christoph Hellwig hch at infradead.org a ?crit : > On Sat, Mar 05, 2016@08:23:15AM +0100, Marta Rybczynska wrote: >> The cqe structure read normally happens from lower to upper addresses >> and the validity bit (status) is at the highest address. If the PCI >> updates the memory when the cqe is read by multiple non-atomic loads, >> the structure may be corrupted. Avoid this by reading the status >> first and then the whole structure. > > Doing the phase check separately sounds sensible to me, but how about > doing something like the version below, which ensures we only read the > whole cqe after the check, and cleans things up a bit by using a common > helper: > Looks like a good refactoring to me. However, it seems to me that in nvme_cqe_valid we should be checking for phase, not for head. I will test it and re-submit. Regards, Marta Rybczynska ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] nvme: avoid cqe corruption when update at the same time as read 2016-03-10 11:08 ` Marta Rybczynska @ 2016-03-10 14:36 ` Christoph Hellwig 2016-03-10 15:07 ` Marta Rybczynska 2016-03-10 15:07 ` Keith Busch 0 siblings, 2 replies; 8+ messages in thread From: Christoph Hellwig @ 2016-03-10 14:36 UTC (permalink / raw) On Thu, Mar 10, 2016@12:08:48PM +0100, Marta Rybczynska wrote: > Looks like a good refactoring to me. However, it seems to me that in > nvme_cqe_valid we should be checking for phase, not for head. Yeah, this doesn't work as-is. I still think that non-obvious check should be split out into something. Maybe just: static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head, u16 phase) { return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase; } ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] nvme: avoid cqe corruption when update at the same time as read 2016-03-10 14:36 ` Christoph Hellwig @ 2016-03-10 15:07 ` Marta Rybczynska 2016-03-15 8:24 ` Christoph Hellwig 2016-03-10 15:07 ` Keith Busch 1 sibling, 1 reply; 8+ messages in thread From: Marta Rybczynska @ 2016-03-10 15:07 UTC (permalink / raw) ----- Le 10 Mar 16, ? 15:36, Christoph Hellwig hch at infradead.org a ?crit : > > Yeah, this doesn't work as-is. I still think that non-obvious > check should be split out into something. Maybe just: > > static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head, u16 phase) > { > return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase; > } I did the same in my version. Here's what I have now: diff --git a/drivers/nvme/host/pci.c b/drivers/nvme/host/pci.c index e9f18e1..6d4a616 100644 --- a/drivers/nvme/host/pci.c +++ b/drivers/nvme/host/pci.c @@ -704,6 +704,13 @@ static void nvme_complete_rq(struct request *req) blk_mq_end_request(req, error); } +/* We read the CQE phase first to check if the rest of the entry is valid */ +static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head, + u16 phase) +{ + return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase; +} + static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) { u16 head, phase; @@ -711,13 +718,10 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) head = nvmeq->cq_head; phase = nvmeq->cq_phase; - for (;;) { + while (nvme_cqe_valid(nvmeq, head, phase)) { struct nvme_completion cqe = nvmeq->cqes[head]; - u16 status = le16_to_cpu(cqe.status); struct request *req; - if ((status & 1) != phase) - break; if (++head == nvmeq->q_depth) { head = 0; phase = !phase; @@ -748,7 +752,7 @@ static void __nvme_process_cq(struct nvme_queue *nvmeq, unsigned int *tag) req = blk_mq_tag_to_rq(*nvmeq->tags, cqe.command_id); if (req->cmd_type == REQ_TYPE_DRV_PRIV && req->special) memcpy(req->special, &cqe, sizeof(cqe)); - blk_mq_complete_request(req, status >> 1); + blk_mq_complete_request(req, le16_to_cpu(cqe.status) >> 1); } @@ -789,18 +793,16 @@ static irqreturn_t nvme_irq(int irq, void *data) static irqreturn_t nvme_irq_check(int irq, void *data) { struct nvme_queue *nvmeq = data; - struct nvme_completion cqe = nvmeq->cqes[nvmeq->cq_head]; - if ((le16_to_cpu(cqe.status) & 1) != nvmeq->cq_phase) - return IRQ_NONE; - return IRQ_WAKE_THREAD; + if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase)) + return IRQ_WAKE_THREAD; + return IRQ_NONE; } static int nvme_poll(struct blk_mq_hw_ctx *hctx, unsigned int tag) { struct nvme_queue *nvmeq = hctx->driver_data; - if ((le16_to_cpu(nvmeq->cqes[nvmeq->cq_head].status) & 1) == - nvmeq->cq_phase) { + if (nvme_cqe_valid(nvmeq, nvmeq->cq_head, nvmeq->cq_phase)) { spin_lock_irq(&nvmeq->q_lock); __nvme_process_cq(nvmeq, &tag); spin_unlock_irq(&nvmeq->q_lock); -- 2.1.0 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* [PATCH] nvme: avoid cqe corruption when update at the same time as read 2016-03-10 15:07 ` Marta Rybczynska @ 2016-03-15 8:24 ` Christoph Hellwig 0 siblings, 0 replies; 8+ messages in thread From: Christoph Hellwig @ 2016-03-15 8:24 UTC (permalink / raw) Can you resend the patch you've been testing with? No need to attribute me, please keep your signoff as the original and final version will be from you. ^ permalink raw reply [flat|nested] 8+ messages in thread
* [PATCH] nvme: avoid cqe corruption when update at the same time as read 2016-03-10 14:36 ` Christoph Hellwig 2016-03-10 15:07 ` Marta Rybczynska @ 2016-03-10 15:07 ` Keith Busch 1 sibling, 0 replies; 8+ messages in thread From: Keith Busch @ 2016-03-10 15:07 UTC (permalink / raw) On Thu, Mar 10, 2016@06:36:50AM -0800, Christoph Hellwig wrote: > On Thu, Mar 10, 2016@12:08:48PM +0100, Marta Rybczynska wrote: > > Looks like a good refactoring to me. However, it seems to me that in > > nvme_cqe_valid we should be checking for phase, not for head. > > Yeah, this doesn't work as-is. I still think that non-obvious > check should be split out into something. Maybe just: > > static inline bool nvme_cqe_valid(struct nvme_queue *nvmeq, u16 head, u16 phase) > { > return (le16_to_cpu(nvmeq->cqes[head].status) & 1) == phase; > } Looks good. The code had copied the CQE since the initial commit, so just want to mention a torn CQE read could cause real trouble only if the Phase is out of sync with the Command ID. These are in the same DWORD and many (most?) archs compile to read those 4-bytes atomically. We certainly can't rely on that, so this is a good change. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2016-03-15 8:24 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <1963234885.1121305.1457109031908.JavaMail.zimbra@kalray.eu>
2016-03-05 7:23 ` [PATCH] nvme: avoid cqe corruption when update at the same time as read Marta Rybczynska
2016-03-09 17:08 ` Christoph Hellwig
2016-03-10 8:31 ` Johannes Thumshirn
2016-03-10 11:08 ` Marta Rybczynska
2016-03-10 14:36 ` Christoph Hellwig
2016-03-10 15:07 ` Marta Rybczynska
2016-03-15 8:24 ` Christoph Hellwig
2016-03-10 15:07 ` Keith Busch
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox