On Thu, 2017-02-16 at 16:35 +0000, Minturn, Dave B wrote:
> I think you guys are talking about two different types of completion
> queues; NVMe CQs and RDMA CQs.   John is correct regarding the NVMe
> SQ/CQ pairing and that there is a 1x1 mapping of the NVMe SQ/CQ pair
> and the RDMA QP.

Dave is right (also Dave is one of the primary authors of the NVMe-oF
specification, to provide context for everyone on the list). There are
at least 3 different completion queues you could be talking about.
There are the completion queues to the NVMe device, the completion
queues as defined by NVMe-oF (which is by definition the receive side
of an RDMA queue pair), and the RDMA completion queue which is a side-
channel notification of events on the RDMA queue pair. I'll address all
three just for posterity.

1) We explicitly choose to allocate NVMe submission and completion
queues in pairs inside SPDK's NVMe driver. Our queues are lockless and
it is universally the case that a completion must look up the original
request, so unless that completion is executing on the same thread as
the submission, that would require some sort of thread-safe data
structure. Every known NVMe driver makes this same choice, and this was
always the expected primary use case in NVMe when the spec was
designed, so I don't think this is controversial.

2) The NVMe-oF specification requires that NVMe-oF submission and
completion queues are allocated in pairs. This is a stronger
requirement than the base NVMe specification, but in reality everyone
was doing this at the NVMe level anyway, so again this was not a
controversial choice in the NVMe-oF specification. Specifically, an
RDMA queue pair is a 1:1 mapping with an NVMe-oF queue pair.

3) The SPDK NVMe-oF target also chooses to create one RDMA completion
queue per RDMA queue pair. I think this is the completion queue you are
talking about. In order to be fully lockless, SPDK must lay out its
data structures very carefully. Specifically, we choose to process all
connections that belong to the same subsystem on a single core. In
practice, a single core is plenty fast to saturate any device backing a
subsystem with lots of spare overhead. In fact, a single core can often
do many subsystems on a single core prior to saturating. An NVMe-oF
subsystem is the largest unit of shared state, so choosing to do all
processing on a single core means that we need to no locks. In an ideal
world, we'd create 1 RDMA completion queue per subsystem (or really,
per NIC per subsystem). That would enable our code to poll a single
completion queue to be notified of all events on all RDMA queue pairs
for a given subsystem. RDMA requires us to select the completion queue
to be used when the RDMA queue pair is created - prior to receiving the
initial CONNECT message. Unfortunately, at that point we cannot deduce
which subsystem that RDMA queue pair belongs to. There just isn't
enough information. The only options today are to create an independent
completion queue for each RDMA queue pair (what we do), or make a set
of global ones but take locks to protect shared state when completions
occur on disparate subsystems (what the Linux kernel does).The only way
to fix this is to make a change to the NVMe-oF specification to provide
additional information upon establishment of a new connection.

I hope that makes things crystal clear. As the specification evolves
we'll of course change our model to always be the most efficient one
possible.

> 
> RDMA CQ's and their associated mappings to the RDMA QP's is
> implementation specific.   Think of RDMA CQ's as the mechanism used
> to signal RDMA completion events.
> ..Dave
> 
> > > -----Original Message-----
> > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of
> > > Kariuki, John K
> > > Sent: Thursday, February 16, 2017 8:26 AM
> > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > Subject: Re: [SPDK] NVMf Target
> > > 
> > > Param
> > > Per the NVM Express over Fabrics 1.0 spec section 1.2 "There is a
> > > 1:1
> > > mapping of a single Submission Queue to a single Completion
> > > Queue. NVMe
> > > over Fabrics does not support the mapping of Multiple Submission
> > > Queues to
> > > a single Completion Queue"
> > > 
> > > -----Original Message-----
> > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of
> > > Kumaraparameshwaran Rathnavel
> > > Sent: Wednesday, February 15, 2017 8:27 PM
> > > To: spdk(a)lists.01.org
> > > Subject: [SPDK] NVMf Target
> > > 
> > > 
> > > Hi All,
> > > 
> > > Why are we not using the same completion queue for multiple queue
> > > pairs.
> > > Whenever we create a queue pair , I see that a completion queue
> > > is also
> > > created. But completion queue can be shared between queue pairs.
> > > Will
> > > Using shared completion queue impact the performance?
> > > 
> > > Thanking you,
> > > Param.
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> 
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk