From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============5717213452786270665==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] SPDK aio examples Date: Fri, 17 Jun 2016 21:57:00 +0000 Message-ID: <1466200619.26925.125.camel@intel.com> In-Reply-To: 700345AA-DD95-4ADC-AF38-34451A3F29FF@playstation.sony.com List-ID: To: spdk@lists.01.org --===============5717213452786270665== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Fri, 2016-06-17 at 20:52 +0000, Bhadauria, Varun wrote: > Thanks Ben > = > Can you also possibly shed some light on the expected behavior when more = than one I/Os are > erroneously submitted on the same qpair? Do the spdk_nvme_ns_cmd_read/wri= te*() return a specific > error value in this case? > = You can submit many I/O per queue pair at the same time as long as you do i= t from a single thread, and you can submit I/O to different queue pairs on different threads simult= aneously with no locks. Are you asking what happens when I/O is submitted simultaneously from diffe= rent threads to the same queue pair? In that case, you run the risk of corrupting the memory state o= f the queue. The queue is implemented as an array in memory with a head and a tail pointer. Submittin= g an I/O to the queue places a command into the next slot, increments the head pointer, and rings= a doorbell register to tell the device new commands are present. If you do this from two threads s= imultaneously, they'd both be copying into the same spot and ringing the doorbell, meaning the de= vice may receive part of one command and part of another. The code is in lib/nvme/nvme_qpair.c:nvme_= qpair_submit_tracker if you want to look. There is no expected error value for this case - the behavior is simply und= efined. In order to catch a user doing this, we'd have to look at some shared state (which means a lo= ck) and the whole purpose of queue pairs is to avoid locking. > Also doesn the spdk_nvme_qpair_process_completions() for a qpair needs to= be invoked from the same > thread that is responsible for issuing i/o on the qpair? Yes - you need to call that function from the same thread that you submitte= d the I/O on. It's fairly obvious that you can only call spdk_nvme_qpair_process_completions on a par= ticular queue pair from 1 thread at a time, but it isn't as obvious why you can't reap your completio= ns on a different thread than your submissions, so let me try and explain that.=C2=A0 We define two objects, a request and a tracker, that are placed on lists. A= request represents a single user call to submit an I/O. A tracker is an entry on the hardware qu= eue. We allow more requests outstanding than available trackers. Submissions and completions m= anipulate the lists of free requests and trackers using a simple linked list, which is not thread = safe. Further, each time a completion happens and frees up a tracker, we check if there are any pend= ing requests and submit them. If we find any on the completion side but we're on a different thread= and the submission path, this would be equivalent to doing submissions from two threads simultaneous= ly. I'm not sure this technical challenge couldn't be overcome, but I am fairly= confident that you don't actually want to do this in your software anyway. Not only is it more compl= icated, but you end up thrashing your CPU cache. The request objects are sitting nicely in your L1= or L2 CPU cache from submission, so when you complete on the same core it is ideal. > = > When any outstanding completions that are processed as a result of calling > spdk_nvme_qpair_process_completions(), does a request=E2=80=99s call back= called on the same core ? Yes - whatever thread you call spdk_nvme_qpair_process_completions on, for = each completion it finds it will call that callback immediately inside of the current thread. So all= of the callbacks for completions found will have been called by the time spdk_nvme_qpair_process= _completions returns. The code is in lib/nvme/nvme_qpair.c:spdk_nvme_qpair_process_completions() - yo= u can see it just loop over the completion entries and call nvme_qpair_complete_tracker for each o= ne. Inside of nvme_qpair_complete_tracker, it calls the callback function. > = > Is it always necessary to call spdk_nvme_qpair_process_completions() to p= rocess completions? Yes - there are no interrupts or backgrounds threads so the driver will onl= y execute in response to calls from the user.=C2=A0 > = > Regards, > Varun Bhadauria > = > = > = > = > = > = > = > = > On 6/17/16, 10:24 AM, "SPDK on behalf of Walker, Benjamin" benjamin.walker(a)intel.com> wrote: > = > > = > > On Wed, 2016-06-15 at 23:56 +0000, Bhadauria, Varun wrote: > > > = > > > Hello Ben > > > = > > > Thank you for the clarification. I was under the false impression tha= t Linux AIO can be made > > > to > > > use SPDK under the hood which is clearly not the case since they will= have to go through the > > > filesystem.=C2=A0 > > I'm sure someone could wrap the AIO interface around the SPDK driver fo= r the specific case where > > the > > user is opening a block device directly with O_DIRECT. It's nearly a 1:= 1 translation for that > > case. > > Unfortunately, most people use Linux AIO on files instead of block devi= ces. > > = > > > = > > > BTW are there any known early filesystem implementation besides ceph= =E2=80=99s rocksdb based bluestore > > > FS > > > which use SPDK. > > The only publicly announced one that I'm aware of is Bluestore inside o= f Ceph. As long as SPDK > > continues to be valuable, I fully expect many filesystems with differen= t designs to appear over > > time. If you have a particular use case where you'd like some sort of f= ilesystem-like layer on > > top > > of SPDK, I'd love to hear about it. At a minimum, it's useful to collec= t requirements from a > > number > > of sources. > > = > > > = > > > = > > > Regards, > > > Varun Bhadauria > > > =C2=A0 > > > = > > > On 6/15/16, 4:37 PM, "SPDK on behalf of Walker, Benjamin" > > of=C2=A0 > > > benjamin.walker(a)intel.com> wrote: > > > = > > > > = > > > > = > > > > Can you explain a bit more about why you want to use AIO? Are you r= eferring to Linux AIO or > > > > POSIX AIO? If you want to do a performance comparison of Linux AIO = and the SPDK NVMe driver > > > > then > > > > the perf tool is your best bet. > > > > = > > > > You can run the perf tool against a block device using Linux AIO by= binding your NVMe device > > > > to > > > > the kernel ("./scripts/setup.sh reset" will hand them all back to t= he kernel) and then doing > > > > something like: > > > > = > > > > ./perf -q 1 -s 4096 -w read -t 10 /dev/nvme0n1 /dev/nvme1n1 > > > > = > > > > -----Original Message----- > > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Bhadau= ria, Varun > > > > Sent: Wednesday, June 15, 2016 4:30 PM > > > > To: Storage Performance Development Kit > > > > Subject: [SPDK] SPDK air examples > > > > = > > > > Hello=C2=A0 > > > > = > > > > Are there any SPDK examples which use AIO?=C2=A0=C2=A0Perf.c has ve= ry little documentation in the > > > > usage > > > > for AIO. > > > > = > > > > Regards, > > > > Varun Bhadauria > > > > = > > > > = > > > > _______________________________________________ > > > > SPDK mailing list > > > > SPDK(a)lists.01.org > > > > https://lists.01.org/mailman/listinfo/spdk > > > > _______________________________________________ > > > > SPDK mailing list > > > > SPDK(a)lists.01.org > > > > https://lists.01.org/mailman/listinfo/spdk > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============5717213452786270665==--