* [PATCH] NVMe: Remove superfluous cqe_seen @ 2014-05-21 18:14 Sam Bradshaw 2014-05-21 23:17 ` Matthew Wilcox 0 siblings, 1 reply; 5+ messages in thread From: Sam Bradshaw @ 2014-05-21 18:14 UTC (permalink / raw) cqe_seen is redundant with the return value from nvme_process_cq(). Remove it. Signed-off-by: Sam Bradshaw <sbradshaw at micron.com> --- diff --git a/drivers/block/nvme-core.c b/drivers/block/nvme-core.c index cd8a8bc..c2287ac 100644 --- a/drivers/block/nvme-core.c +++ b/drivers/block/nvme-core.c @@ -102,7 +102,6 @@ struct nvme_queue { u16 cq_head; u16 qid; u8 cq_phase; - u8 cqe_seen; u8 q_suspended; cpumask_var_t cpu_mask; struct async_cmd_info cmdinfo; @@ -790,7 +789,6 @@ static int nvme_process_cq(struct nvme_queue *nvmeq) nvmeq->cq_head = head; nvmeq->cq_phase = phase; - nvmeq->cqe_seen = 1; return 1; } @@ -826,8 +824,7 @@ static irqreturn_t nvme_irq(int irq, void *data) struct nvme_queue *nvmeq = data; spin_lock(&nvmeq->q_lock); nvme_process_cq(nvmeq); - result = nvmeq->cqe_seen ? IRQ_HANDLED : IRQ_NONE; - nvmeq->cqe_seen = 0; + result = nvme_process_cq(nvmeq) ? IRQ_HANDLED : IRQ_NONE; spin_unlock(&nvmeq->q_lock); return result; } ^ permalink raw reply related [flat|nested] 5+ messages in thread
* [PATCH] NVMe: Remove superfluous cqe_seen 2014-05-21 18:14 [PATCH] NVMe: Remove superfluous cqe_seen Sam Bradshaw @ 2014-05-21 23:17 ` Matthew Wilcox 2014-05-22 0:10 ` Sam Bradshaw (sbradshaw) 0 siblings, 1 reply; 5+ messages in thread From: Matthew Wilcox @ 2014-05-21 23:17 UTC (permalink / raw) On Wed, May 21, 2014@11:14:25AM -0700, Sam Bradshaw wrote: > cqe_seen is redundant with the return value from nvme_process_cq(). > Remove it. Ah, but it isn't. Look at commit e9539f47525. The purpose of cqe_seen is to be 'sticky', that we *have* indeed processed something on this CQ since the last interrupt, and so if we get an interrupt, the device may have legitimately interrupted us. A more interesting patch might be to implement some kind of autotuning of the interrupt coalescing -- if we're seeing too many times tha the queue has no work on it, then we should increase the time or queue depth used. I'm not sure what event would cause us to *reduce* the time or queue depth, so I fear we'd end up set to a suboptimal value over time. You could argue that the return value from nvme_process_cq() is now never used, and that would be correct, but I have a patch in the works that will use it again, so I don't think it's worth taking out. Was there a performance problem that this solves, or did you just notice it, and send a cleanup? ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] NVMe: Remove superfluous cqe_seen 2014-05-21 23:17 ` Matthew Wilcox @ 2014-05-22 0:10 ` Sam Bradshaw (sbradshaw) 2014-05-23 14:55 ` Matthew Wilcox 2014-06-19 16:59 ` Matthew Wilcox 0 siblings, 2 replies; 5+ messages in thread From: Sam Bradshaw (sbradshaw) @ 2014-05-22 0:10 UTC (permalink / raw) > -----Original Message----- > From: Matthew Wilcox [mailto:willy at linux.intel.com] > Sent: Wednesday, May 21, 2014 4:17 PM > To: Sam Bradshaw (sbradshaw) > Cc: linux-nvme at lists.infradead.org > Subject: Re: [PATCH] NVMe: Remove superfluous cqe_seen > > On Wed, May 21, 2014@11:14:25AM -0700, Sam Bradshaw wrote: > > cqe_seen is redundant with the return value from nvme_process_cq(). > > Remove it. > > Ah, but it isn't. Look at commit e9539f47525. The purpose of cqe_seen > is to be 'sticky', that we *have* indeed processed something on this CQ > since the last interrupt, and so if we get an interrupt, the device may > have legitimately interrupted us. > > A more interesting patch might be to implement some kind of autotuning > of > the interrupt coalescing -- if we're seeing too many times tha the > queue > has no work on it, then we should increase the time or queue depth > used. > I'm not sure what event would cause us to *reduce* the time or queue > depth, so I fear we'd end up set to a suboptimal value over time. > > You could argue that the return value from nvme_process_cq() is now > never used, and that would be correct, but I have a patch in the works > that will use it again, so I don't think it's worth taking out. > > Was there a performance problem that this solves, or did you just > notice > it, and send a cleanup? Performance problem, though not very easily measured. At very high iops rates, most if not all cqe's are processed via nvme_process_cq() in make_request(), leaving nvme_irq() with no work to do. Nevertheless, it always writes cqe_seen, which invalidates a very hot cacheline. This is somewhat exacerbated when IO submissions originate on a remote node relative to the cpu handling the irq. Perhaps a simpler patch where cqe_seen is only written if set? -Sam ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] NVMe: Remove superfluous cqe_seen 2014-05-22 0:10 ` Sam Bradshaw (sbradshaw) @ 2014-05-23 14:55 ` Matthew Wilcox 2014-06-19 16:59 ` Matthew Wilcox 1 sibling, 0 replies; 5+ messages in thread From: Matthew Wilcox @ 2014-05-23 14:55 UTC (permalink / raw) On Thu, May 22, 2014@12:10:19AM +0000, Sam Bradshaw (sbradshaw) wrote: > Performance problem, though not very easily measured. At very high iops > rates, most if not all cqe's are processed via nvme_process_cq() in > make_request(), leaving nvme_irq() with no work to do. Nevertheless, it > always writes cqe_seen, which invalidates a very hot cacheline. This > is somewhat exacerbated when IO submissions originate on a remote node > relative to the cpu handling the irq. > > Perhaps a simpler patch where cqe_seen is only written if set? Sure, we could do that, although in the scenario you're talking about, cqe_seen would almost always get written. I think your workload would perhaps benefit from tuning the interrupt coalescing settings. ^ permalink raw reply [flat|nested] 5+ messages in thread
* [PATCH] NVMe: Remove superfluous cqe_seen 2014-05-22 0:10 ` Sam Bradshaw (sbradshaw) 2014-05-23 14:55 ` Matthew Wilcox @ 2014-06-19 16:59 ` Matthew Wilcox 1 sibling, 0 replies; 5+ messages in thread From: Matthew Wilcox @ 2014-06-19 16:59 UTC (permalink / raw) On Thu, May 22, 2014@12:10:19AM +0000, Sam Bradshaw (sbradshaw) wrote: > Performance problem, though not very easily measured. At very high iops > rates, most if not all cqe's are processed via nvme_process_cq() in > make_request(), leaving nvme_irq() with no work to do. Nevertheless, it > always writes cqe_seen, which invalidates a very hot cacheline. This > is somewhat exacerbated when IO submissions originate on a remote node > relative to the cpu handling the irq. I was thinking "Hey, we should move cqe_seen to a different cacheline". So I looked at the cacheline assignments for the different variables, and cqe_seen is on the same cacheline as cq_head and cq_phase, so that cacheline is already being dirtied. Indeed, it's in the same Dword as cq_phase, so I'd be amazed if the CPU didn't coalesce the two writes. That might be a more fruitful patch ... rearrange nvme_queue to put cq_head, cq_phase and cqe_seen in the same Dword, and expect the CPU to optimise the three assignments into a single Dword store. I'll let you try it out since you have the setup to benchmark it. Right now, this is the layout I see: /* --- cacheline 3 boundary (192 bytes) --- */ u32 * q_db; /* 192 8 */ u16 q_depth; /* 200 2 */ u16 cq_vector; /* 202 2 */ u16 sq_head; /* 204 2 */ u16 sq_tail; /* 206 2 */ u16 cq_head; /* 208 2 */ u16 qid; /* 210 2 */ u8 cq_phase; /* 212 1 */ u8 cqe_seen; /* 213 1 */ u8 q_suspended; /* 214 1 */ I notice a 4-byte hole after q_lock, so moving cq_head, cq_phase and cqe_seen into that space would probably be a good idea (since that cacheline is definitely dirty). I really haven't tried to optimise the frequently-updated parts of the data structure into the same cacheline, and it should really help your bizarre setup :-). ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-06-19 16:59 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2014-05-21 18:14 [PATCH] NVMe: Remove superfluous cqe_seen Sam Bradshaw 2014-05-21 23:17 ` Matthew Wilcox 2014-05-22 0:10 ` Sam Bradshaw (sbradshaw) 2014-05-23 14:55 ` Matthew Wilcox 2014-06-19 16:59 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox