* A question regarding to MSIX interrupts for NVME @ 2013-08-27 20:56 Xuehua Chen 2013-08-27 21:31 ` Keith Busch 0 siblings, 1 reply; 8+ messages in thread From: Xuehua Chen @ 2013-08-27 20:56 UTC (permalink / raw) Hi, All, I have a question regarding to the MSIX interrupts used by the linux nvme driver to consult you. It seems to me that admin queue is sharing vector 0 with IOCQ 1 now. Is there any good reason that admin queue should not have its own vector? It seems to me this could makes interrupt coalescing of nvme a bit strange due to the following reason. In NVMe 1.1 spec, 5.12.1.9 Interrupt Vector Configuration, it is mentioned that ?By default, coalescing settings are enabled for each interrupt vector. Interrupt coalescing is not supported for the Admin Completion Queue.? If a user want to enable coalescing for IOCQ 1, this will enable the coalescing for admin queue well since the interrupt vector are shared and violate the spec. So this somehow makes IOCQ 1 different from other IOCQs. Also does the spec don?t s support interrupt for ACQ because it want ACQ be processed as soon as possible? In the current implementation, admin queue commands could be delayed when there are lots of entries in IOCQ1 being processed. Maybe a separate vector for admin queue could be better for such a situation? The above are my concerns on the current implementation of linux nvme driver regarding to the MSIX interrupts. Really want to know your thoughts and opinions on this. Thanks a lot! Best regards, Xuehua ^ permalink raw reply [flat|nested] 8+ messages in thread
* A question regarding to MSIX interrupts for NVME 2013-08-27 20:56 A question regarding to MSIX interrupts for NVME Xuehua Chen @ 2013-08-27 21:31 ` Keith Busch 2013-08-27 22:13 ` Xuehua Chen 0 siblings, 1 reply; 8+ messages in thread From: Keith Busch @ 2013-08-27 21:31 UTC (permalink / raw) On Tue, 27 Aug 2013, Xuehua Chen wrote: > It seems to me that admin queue is sharing vector 0 with IOCQ 1 now. > Is there any good reason > that admin queue should not have its own vector? It seems to me this > could makes interrupt > coalescing of nvme a bit strange due to the following reason. > > In NVMe 1.1 spec, 5.12.1.9 Interrupt Vector Configuration, it is mentioned that > > ?By default, coalescing settings are enabled for each interrupt > vector. Interrupt coalescing is not supported for > the Admin Completion Queue.? > > If a user want to enable coalescing for IOCQ 1, this will enable the > coalescing for admin queue > well since the interrupt vector are shared and violate the spec. So > this somehow makes IOCQ 1 > different from other IOCQs. If your device enables coalescing on the admin queue when the host enables it for IOQ 1's interrupt vector, I think your device violates the spec rather than the host. > Also does the spec don?t s support interrupt for ACQ because it want > ACQ be processed as soon > as possible? In the current implementation, admin queue commands could > be delayed when there > are lots of entries in IOCQ1 being processed. Maybe a separate vector > for admin queue could be > better for such a situation? The admin queue does not get the kind of activity an IO queue does, so sharing the interrupt with an IO queue seems like a good way to reduce resource requirements without a performance loss. You can also find yourself in a situation where you have no choice but to share the interrupt vector. ^ permalink raw reply [flat|nested] 8+ messages in thread
* A question regarding to MSIX interrupts for NVME 2013-08-27 21:31 ` Keith Busch @ 2013-08-27 22:13 ` Xuehua Chen 2013-08-27 22:35 ` Keith Busch 0 siblings, 1 reply; 8+ messages in thread From: Xuehua Chen @ 2013-08-27 22:13 UTC (permalink / raw) Hi, Keith, Thank you very much for your response. On Tue, Aug 27, 2013@2:31 PM, Keith Busch <keith.busch@intel.com> wrote: > On Tue, 27 Aug 2013, Xuehua Chen wrote: >> >> It seems to me that admin queue is sharing vector 0 with IOCQ 1 now. >> Is there any good reason >> that admin queue should not have its own vector? It seems to me this >> could makes interrupt >> coalescing of nvme a bit strange due to the following reason. >> >> In NVMe 1.1 spec, 5.12.1.9 Interrupt Vector Configuration, it is mentioned >> that >> >> ?By default, coalescing settings are enabled for each interrupt >> vector. Interrupt coalescing is not supported for >> the Admin Completion Queue.? >> >> If a user want to enable coalescing for IOCQ 1, this will enable the >> coalescing for admin queue >> well since the interrupt vector are shared and violate the spec. So >> this somehow makes IOCQ 1 >> different from other IOCQs. > > > If your device enables coalescing on the admin queue when the host > enables it for IOQ 1's interrupt vector, I think your device violates > the spec rather than the host. This seems to makes hw kind of harder to implement. This is part of the reason why I ask the question here:) We have a vector sometimes coalescing and sometimes not, which seems a bit strange. I am not sure I understand the spec well enough. If will be good that the spec is a bit more elaborated on this special corner case. > > >> Also does the spec don?t s support interrupt for ACQ because it want >> ACQ be processed as soon >> as possible? In the current implementation, admin queue commands could >> be delayed when there >> are lots of entries in IOCQ1 being processed. Maybe a separate vector >> for admin queue could be >> better for such a situation? > > > The admin queue does not get the kind of activity an IO queue does, > so sharing the interrupt with an IO queue seems like a good way to > reduce resource requirements without a performance loss. You can also > find yourself in a situation where you have no choice but to share the > interrupt vector. Let's say there are a bunch of cq entries posted to IOCQ1, quickly followed a new admin cq entry, will the admin cq entry be processed right away or wait until the some existing iocqs are processed? I do not have concern with io performance here, just the response of admin command. Since admin queue does not support coalescing, I assume it needs to be processed asap. I think iocq sharing interrupts is fine. Just think admin cq better not share interrupt with any IOCQs. An alternative could be using a separate vector for admin queue with affinity hint to all cpus online for example. Thanks a lot! Best regards, Xuehua ^ permalink raw reply [flat|nested] 8+ messages in thread
* A question regarding to MSIX interrupts for NVME 2013-08-27 22:13 ` Xuehua Chen @ 2013-08-27 22:35 ` Keith Busch 2013-08-28 1:04 ` Xuehua Chen 0 siblings, 1 reply; 8+ messages in thread From: Keith Busch @ 2013-08-27 22:35 UTC (permalink / raw) On Tue, 27 Aug 2013, Xuehua Chen wrote: >> The admin queue does not get the kind of activity an IO queue does, >> so sharing the interrupt with an IO queue seems like a good way to >> reduce resource requirements without a performance loss. You can also >> find yourself in a situation where you have no choice but to share the >> interrupt vector. > > Let's say there are a bunch of cq entries posted to IOCQ1, quickly followed a > new admin cq entry, will the admin cq entry be processed right away or wait > until the some existing iocqs are processed? I do not have concern with > io performance here, just the response of admin command. Since admin > queue does not support coalescing, I assume it needs to be processed asap. > I think iocq sharing interrupts is fine. Just think admin cq better not share > interrupt with any IOCQs. An alternative could be using a separate vector for > admin queue with affinity hint to all cpus online for example. I hadn't thought much about it, but I always assumed coalescing isn't an option for the admin command because you wouldn't expect a workload on there that even comes close to realizing the benefits of coalescing. If the device raises an interrupt for completions on the IOQ or Admin Queue (or both), the driver's interrupt routine will be called twice: once for each queue. The interrupt service routine will process all the completed requests for the first queue it is called with, then it will do so for the other queue. Are you saying that draining the completions from the IO queue takes an unexceptable amount of time if there is a completion on the admin queue? That doesn't seem likely. ^ permalink raw reply [flat|nested] 8+ messages in thread
* A question regarding to MSIX interrupts for NVME 2013-08-27 22:35 ` Keith Busch @ 2013-08-28 1:04 ` Xuehua Chen 2013-08-28 16:58 ` Xuehua Chen 0 siblings, 1 reply; 8+ messages in thread From: Xuehua Chen @ 2013-08-28 1:04 UTC (permalink / raw) On Tue, Aug 27, 2013@3:35 PM, Keith Busch <keith.busch@intel.com> wrote: > On Tue, 27 Aug 2013, Xuehua Chen wrote: >>> >>> The admin queue does not get the kind of activity an IO queue does, >>> so sharing the interrupt with an IO queue seems like a good way to >>> reduce resource requirements without a performance loss. You can also >>> find yourself in a situation where you have no choice but to share the >>> interrupt vector. >> >> >> Let's say there are a bunch of cq entries posted to IOCQ1, quickly >> followed a >> new admin cq entry, will the admin cq entry be processed right away or >> wait >> until the some existing iocqs are processed? I do not have concern with >> io performance here, just the response of admin command. Since admin >> queue does not support coalescing, I assume it needs to be processed asap. >> I think iocq sharing interrupts is fine. Just think admin cq better not >> share >> interrupt with any IOCQs. An alternative could be using a separate vector >> for >> admin queue with affinity hint to all cpus online for example. > > > I hadn't thought much about it, but I always assumed coalescing isn't an > option for the admin command because you wouldn't expect a workload on > there that even comes close to realizing the benefits of coalescing. > > If the device raises an interrupt for completions on the IOQ or Admin > Queue (or both), the driver's interrupt routine will be called twice: > once for each queue. The interrupt service routine will process all the > completed requests for the first queue it is called with, then it will > do so for the other queue. Are you saying that draining the completions > from the IO queue takes an unexceptable amount of time if there is a > completion on the admin queue? That doesn't seem likely. > If it is not for quick response time, I don't understand why the spec specifically mention that "interrupt coalescing is not supported for the admin completion queue". Because I don't see that enabling interrupt coalescing for ACQ will cause problem most of the time as well. Please correct me if this is not right. And if yes, then the spec just made hw implementation more complicated. HW need to implement differently for this vector than for any other vector shared by pure IOCQs. So I tend to think this statement could be for the consideration of short response time. I don't have any timing data here. But NVME spec can support IOCQ with 2**16 entries, maybe very intensive IO could cause some non-negligible delay for admin commands on some fast platforms? Also for weighted round robin with urgent priority class arbitration, ASQ has highest priority than all other SQs. This also seems to me that occasionally AQ need very short response time. Thanks, Best regards, Xuehua ^ permalink raw reply [flat|nested] 8+ messages in thread
* A question regarding to MSIX interrupts for NVME 2013-08-28 1:04 ` Xuehua Chen @ 2013-08-28 16:58 ` Xuehua Chen 2013-08-28 17:14 ` Matthew Wilcox 0 siblings, 1 reply; 8+ messages in thread From: Xuehua Chen @ 2013-08-28 16:58 UTC (permalink / raw) Hi, All, Previous thoughts on short response and timing is just guess and may not be solid. Here are some thoughts on how device and driver could fit for the spec well. "By default, coalescing settings are enabled for each interrupt vector. Interrupt coalescing is not supported for the Admin Completion Queue." Approach 1: 1. Device enables coalescing settings each interrupt vector by default at reset. 2. When configuring admin queue, device disabled coalescing for the vector 0 which is assigned to ACQs. 3. Assigning other vectors to IOCQs. Interrupt vector can be shared between IOCQs. Aproach 2. 1. Device enables coalescing settings for each interrupt vector by default at reset. 2. When configuring admin queue, device disabled coalescing for the vector 0 which is assigned to ACQs. 3. IOCQs can share interrupt with ACQ. But when user try to enable coalescing for the vector associated with ACQ, return error. Approach 3 1. Device enables coalescing settings for each interrupt vector by default at reset and also for vector 0, no interrupt coalescing for ACQ. 2. IOCQs can share interrupt with ACQ. And user can enable coalescing for the vector associated with ACQ. It seems approach 3 can be most flexible. But it comes with a couple of questions. 1. It is wired that when we say the interrupt coalescing is enabled for vector 0 while in the mean time ACQ use the vector and interrupt coalescing is disabled for it. Is this what the spec really wanted? 2. HW implementation is more complex and will this approach really have much advantage than approach 1? If approach 3 is not the spec actually means, then which one is better, approach 1 or approach 2. It seems that this is a trade-off between one extra interrupt vector and the capability of enabling interrupt coalescing for some IOCQs. Will approach 1 cause noticeable performance loss? One extra interrupt is too much? Thanks a lot! Best regards, Xuehua On Tue, Aug 27, 2013@6:04 PM, Xuehua Chen <xuehua@gmail.com> wrote: > On Tue, Aug 27, 2013@3:35 PM, Keith Busch <keith.busch@intel.com> wrote: >> On Tue, 27 Aug 2013, Xuehua Chen wrote: >>>> >>>> The admin queue does not get the kind of activity an IO queue does, >>>> so sharing the interrupt with an IO queue seems like a good way to >>>> reduce resource requirements without a performance loss. You can also >>>> find yourself in a situation where you have no choice but to share the >>>> interrupt vector. >>> >>> >>> Let's say there are a bunch of cq entries posted to IOCQ1, quickly >>> followed a >>> new admin cq entry, will the admin cq entry be processed right away or >>> wait >>> until the some existing iocqs are processed? I do not have concern with >>> io performance here, just the response of admin command. Since admin >>> queue does not support coalescing, I assume it needs to be processed asap. >>> I think iocq sharing interrupts is fine. Just think admin cq better not >>> share >>> interrupt with any IOCQs. An alternative could be using a separate vector >>> for >>> admin queue with affinity hint to all cpus online for example. >> >> >> I hadn't thought much about it, but I always assumed coalescing isn't an >> option for the admin command because you wouldn't expect a workload on >> there that even comes close to realizing the benefits of coalescing. >> >> If the device raises an interrupt for completions on the IOQ or Admin >> Queue (or both), the driver's interrupt routine will be called twice: >> once for each queue. The interrupt service routine will process all the >> completed requests for the first queue it is called with, then it will >> do so for the other queue. Are you saying that draining the completions >> from the IO queue takes an unexceptable amount of time if there is a >> completion on the admin queue? That doesn't seem likely. >> > > If it is not for quick response time, I don't understand why the spec > specifically mention that > "interrupt coalescing is not supported for the admin completion > queue". Because I don't see > that enabling interrupt coalescing for ACQ will cause problem most of > the time as well. Please > correct me if this is not right. And if yes, then the spec just made > hw implementation more > complicated. HW need to implement differently for this vector than for > any other vector shared by > pure IOCQs. So I tend to think this statement could be for the > consideration of short response > time. > > I don't have any timing data here. But NVME spec can support IOCQ with > 2**16 entries, > maybe very intensive IO could cause some non-negligible delay for > admin commands on some > fast platforms? Also for weighted round robin with urgent priority > class arbitration, ASQ has highest > priority than all other SQs. This also seems to me that occasionally > AQ need very short response > time. > > Thanks, > > Best regards, > > Xuehua ^ permalink raw reply [flat|nested] 8+ messages in thread
* A question regarding to MSIX interrupts for NVME 2013-08-28 16:58 ` Xuehua Chen @ 2013-08-28 17:14 ` Matthew Wilcox 2013-08-28 18:49 ` Xuehua Chen 0 siblings, 1 reply; 8+ messages in thread From: Matthew Wilcox @ 2013-08-28 17:14 UTC (permalink / raw) On Wed, Aug 28, 2013@09:58:35AM -0700, Xuehua Chen wrote: > "By default, coalescing settings are enabled for each interrupt > vector. Interrupt coalescing is not supported for the Admin Completion > Queue." > > Approach 1: > 1. Device enables coalescing settings each interrupt vector by default at reset. > 2. When configuring admin queue, device disabled coalescing for the > vector 0 which is assigned to ACQs. > 3. Assigning other vectors to IOCQs. Interrupt vector can be shared > between IOCQs. > > Aproach 2. > 1. Device enables coalescing settings for each interrupt vector by > default at reset. > 2. When configuring admin queue, device disabled coalescing for the > vector 0 which is assigned to ACQs. > 3. IOCQs can share interrupt with ACQ. But when user try to enable > coalescing for the vector associated with ACQ, > return error. > > Approach 3 > 1. Device enables coalescing settings for each interrupt vector by > default at reset and also for vector 0, no interrupt coalescing for > ACQ. > 2. IOCQs can share interrupt with ACQ. And user can enable coalescing > for the vector associated with ACQ. The spec also says: "It is recommended that interrupts for commands that complete in error are not coalesced." So your design needs a way to defeat the coalescing and send the interrupt if an error completion is sent to a completion queue. You can use the same mechanism to defeat the coalescing if any completion is sent to the admin completion queue. > It seems approach 3 can be most flexible. But it comes with a couple > of questions. > 1. It is wired that when we say the interrupt coalescing is enabled > for vector 0 while in the mean time ACQ use the vector and interrupt > coalescing is disabled for it. Is this what the spec really wanted? > 2. HW implementation is more complex and will this approach really > have much advantage than approach 1? > > If approach 3 is not the spec actually means, then which one is > better, approach 1 or approach 2. It seems that this is a trade-off > between one extra interrupt vector and the capability of enabling > interrupt coalescing for some IOCQs. Will approach 1 cause noticeable > performance loss? One extra interrupt is too much? > > Thanks a lot! > > Best regards, > > Xuehua > > > On Tue, Aug 27, 2013@6:04 PM, Xuehua Chen <xuehua@gmail.com> wrote: > > On Tue, Aug 27, 2013@3:35 PM, Keith Busch <keith.busch@intel.com> wrote: > >> On Tue, 27 Aug 2013, Xuehua Chen wrote: > >>>> > >>>> The admin queue does not get the kind of activity an IO queue does, > >>>> so sharing the interrupt with an IO queue seems like a good way to > >>>> reduce resource requirements without a performance loss. You can also > >>>> find yourself in a situation where you have no choice but to share the > >>>> interrupt vector. > >>> > >>> > >>> Let's say there are a bunch of cq entries posted to IOCQ1, quickly > >>> followed a > >>> new admin cq entry, will the admin cq entry be processed right away or > >>> wait > >>> until the some existing iocqs are processed? I do not have concern with > >>> io performance here, just the response of admin command. Since admin > >>> queue does not support coalescing, I assume it needs to be processed asap. > >>> I think iocq sharing interrupts is fine. Just think admin cq better not > >>> share > >>> interrupt with any IOCQs. An alternative could be using a separate vector > >>> for > >>> admin queue with affinity hint to all cpus online for example. > >> > >> > >> I hadn't thought much about it, but I always assumed coalescing isn't an > >> option for the admin command because you wouldn't expect a workload on > >> there that even comes close to realizing the benefits of coalescing. > >> > >> If the device raises an interrupt for completions on the IOQ or Admin > >> Queue (or both), the driver's interrupt routine will be called twice: > >> once for each queue. The interrupt service routine will process all the > >> completed requests for the first queue it is called with, then it will > >> do so for the other queue. Are you saying that draining the completions > >> from the IO queue takes an unexceptable amount of time if there is a > >> completion on the admin queue? That doesn't seem likely. > >> > > > > If it is not for quick response time, I don't understand why the spec > > specifically mention that > > "interrupt coalescing is not supported for the admin completion > > queue". Because I don't see > > that enabling interrupt coalescing for ACQ will cause problem most of > > the time as well. Please > > correct me if this is not right. And if yes, then the spec just made > > hw implementation more > > complicated. HW need to implement differently for this vector than for > > any other vector shared by > > pure IOCQs. So I tend to think this statement could be for the > > consideration of short response > > time. > > > > I don't have any timing data here. But NVME spec can support IOCQ with > > 2**16 entries, > > maybe very intensive IO could cause some non-negligible delay for > > admin commands on some > > fast platforms? Also for weighted round robin with urgent priority > > class arbitration, ASQ has highest > > priority than all other SQs. This also seems to me that occasionally > > AQ need very short response > > time. > > > > Thanks, > > > > Best regards, > > > > Xuehua > > _______________________________________________ > Linux-nvme mailing list > Linux-nvme at lists.infradead.org > http://merlin.infradead.org/mailman/listinfo/linux-nvme ^ permalink raw reply [flat|nested] 8+ messages in thread
* A question regarding to MSIX interrupts for NVME 2013-08-28 17:14 ` Matthew Wilcox @ 2013-08-28 18:49 ` Xuehua Chen 0 siblings, 0 replies; 8+ messages in thread From: Xuehua Chen @ 2013-08-28 18:49 UTC (permalink / raw) > The spec also says: "It is recommended that interrupts for commands that > complete in error are not coalesced." So your design needs a way to > defeat the coalescing and send the interrupt if an error completion is > sent to a completion queue. You can use the same mechanism to defeat > the coalescing if any completion is sent to the admin completion queue. This clarified some of my thoughts. What I said about it is strange for hw to support interrupt coalescing for IOCQ and ACQ sharing is not correct. And we can enable interrupt coalescing even when ACQ and IOCQ are shared. Thanks a lot for this! The info seems to say that error handling better be handled asap. Then it is more likely that the same may apply to admin cq. The question I have is what are the differences in response time comparing allocating a separate vector for ACQ and sharing the interrupt vector with IOCQ1. When there are intensive IOs on IOQ1, if allocating a separate vector that can be triggered on a different cpu, the response time could be faster. Right now I don't see disabling coalescing for admin cq coalescing has much effect since it could be processed after many IOCQ entries are processed. In the mean time, other than an extra interrupt vector, do you see any other disadvantages if we allocate a separate vector for ACQ? Thanks a lot! ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2013-08-28 18:49 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2013-08-27 20:56 A question regarding to MSIX interrupts for NVME Xuehua Chen 2013-08-27 21:31 ` Keith Busch 2013-08-27 22:13 ` Xuehua Chen 2013-08-27 22:35 ` Keith Busch 2013-08-28 1:04 ` Xuehua Chen 2013-08-28 16:58 ` Xuehua Chen 2013-08-28 17:14 ` Matthew Wilcox 2013-08-28 18:49 ` Xuehua Chen
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.