From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============3186708450900795814==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] SPDK aio examples Date: Wed, 22 Jun 2016 17:35:49 +0000 Message-ID: <1466616949.26925.170.camel@intel.com> In-Reply-To: 3A3AEA95-16B6-44BA-B5A7-691CBED02A9B@playstation.sony.com List-ID: To: spdk@lists.01.org --===============3186708450900795814== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, 2016-06-22 at 16:50 +0000, Bhadauria, Varun wrote: > Hi Ben > = > Thank you for the reply. > = > For application I/Os from being getting submitted from different threads = to same queue pair,=C2=A0=C2=A0one > can allocate a queue pair per logical core (given the H/W supports creati= on of those many number > of queue pairs). However getting the current cpu from application code in= volves a system call > overhead (Some OS may not even support this).=C2=A0 > = > The other approach can be to have some worker threads each with is own qu= eue pair that feed of > the=C2=A0=C2=A0application maintained pending I/O queues. However this ap= proach introduces various locking > overheads (to establish this producer consumer model) which may introduce= contentions and prevent > getting the maximum=C2=A0=C2=A0performance. > = > How do you think this problem can be avoided? I think you are assuming that the layer doing the I/O submission is not des= igned with knowledge of the application logic above it. That's true for something like the Linux ke= rnel's block-mq layer - it doesn't know what threading model the application(s) running on it use s= o it just allocates 1 queue pair per core (sharing if necessary) and then has to ask which core a= thread is on to choose the right queue pair. One of the major advantages of SPDK, however, is that= the I/O submission layer is part of the application and can therefore take advantage of additional k= nowledge. Most applications using SPDK will be designed to have 1 thread per CPU core wher= e the thread is running in a tight event loop, polling a queue. I/O coming in off of the network wi= ll be immediately routed to a particular CPU core and it will be processed there until I/O is submit= ted to the disk. In that model, you never have to look up what core you are on - you just have to as= sociate network connections with particular threads one time when the connection is establi= shed. We provide a basic framework for applications to use this model inside of SPDK (header is at i= nclude/spdk/event.h). The framework isn't required to use our drivers, but all of our example applica= tions and our NVMf target use it. > = > Also I don=E2=80=99t see any api for issuing a trim command. Is that bein= g implemented as well? Every specification uses a different word for trim for some reason. TRIM is= the term used by the ATA command set, SCSI calls it UNMAP, and NVMe calls it deallocate. See=C2=A0ht= tp://www.spdk.io/spdk/doc/nvme _8h.html#ae275923b7e982b115483e425c2972ec5.=C2=A0 > = > Also=C2=A0 > Regards, > Varun Bhadauria > = > = > = > = > = > = > = > On 6/17/16, 2:57 PM, "SPDK on behalf of Walker, Benjamin" benjamin.walker(a)intel.com> wrote: > = > > = > > On Fri, 2016-06-17 at 20:52 +0000, Bhadauria, Varun wrote: > > > = > > > Thanks Ben > > > = > > > Can you also possibly shed some light on the expected behavior when m= ore than one I/Os are > > > erroneously submitted on the same qpair? Do the spdk_nvme_ns_cmd_read= /write*() return a > > > specific > > > error value in this case? > > > = > > You can submit many I/O per queue pair at the same time as long as you = do it from a single > > thread, > > and you can submit I/O to different queue pairs on different threads si= multaneously with no > > locks. > > Are you asking what happens when I/O is submitted simultaneously from d= ifferent threads to the > > same > > queue pair? In that case, you run the risk of corrupting the memory sta= te of the queue. The > > queue is > > implemented as an array in memory with a head and a tail pointer. Submi= tting an I/O to the queue > > places a command into the next slot, increments the head pointer, and r= ings a doorbell register > > to > > tell the device new commands are present. If you do this from two threa= ds simultaneously, they'd > > both be copying into the same spot and ringing the doorbell, meaning th= e device may receive part > > of > > one command and part of another. The code is in lib/nvme/nvme_qpair.c:n= vme_qpair_submit_tracker > > if > > you want to look. > > = > > There is no expected error value for this case - the behavior is simply= undefined. In order to > > catch > > a user doing this, we'd have to look at some shared state (which means = a lock) and the whole > > purpose > > of queue pairs is to avoid locking. > > = > > > = > > > Also doesn the spdk_nvme_qpair_process_completions() for a qpair need= s to be invoked from the > > > same > > > thread that is responsible for issuing i/o on the qpair? > > Yes - you need to call that function from the same thread that you subm= itted the I/O on. It's > > fairly > > obvious that you can only call spdk_nvme_qpair_process_completions on a= particular queue pair > > from 1 > > thread at a time, but it isn't as obvious why you can't reap your compl= etions on a different > > thread > > than your submissions, so let me try and explain that.=C2=A0 > > = > > We define two objects, a request and a tracker, that are placed on list= s. A request represents a > > single user call to submit an I/O. A tracker is an entry on the hardwar= e queue. We allow more > > requests outstanding than available trackers. Submissions and completio= ns manipulate the lists > > of > > free requests and trackers using a simple linked list, which is not thr= ead safe. Further, each > > time > > a completion happens and frees up a tracker, we check if there are any = pending requests and > > submit > > them. If we find any on the completion side but we're on a different th= read and the submission > > path, > > this would be equivalent to doing submissions from two threads simultan= eously. > > = > > I'm not sure this technical challenge couldn't be overcome, but I am fa= irly confident that you > > don't > > actually want to do this in your software anyway. Not only is it more c= omplicated, but you end > > up > > thrashing your CPU cache. The request objects are sitting nicely in you= r L1 or L2 CPU cache from > > submission, so when you complete on the same core it is ideal. > > = > > > = > > > = > > > When any outstanding completions that are processed as a result of ca= lling > > > spdk_nvme_qpair_process_completions(), does a request=E2=80=99s call = back called on the same core ? > > Yes - whatever thread you call spdk_nvme_qpair_process_completions on, = for each completion it > > finds > > it will call that callback immediately inside of the current thread. So= all of the callbacks for > > completions found will have been called by the time spdk_nvme_qpair_pro= cess_completions returns. > > The > > code is in lib/nvme/nvme_qpair.c:spdk_nvme_qpair_process_completions() = - you can see it just > > loop > > over the completion entries and call nvme_qpair_complete_tracker for ea= ch one. Inside of > > nvme_qpair_complete_tracker, it calls the callback function. > > = > > > = > > > = > > > Is it always necessary to call spdk_nvme_qpair_process_completions() = to process completions? > > Yes - there are no interrupts or backgrounds threads so the driver will= only execute in response > > to > > calls from the user.=C2=A0 > > = > > > = > > > = > > > Regards, > > > Varun Bhadauria > > > = > > > = > > > = > > > = > > > = > > > = > > > = > > > = > > > On 6/17/16, 10:24 AM, "SPDK on behalf of Walker, Benjamin" > > behalf of > > > benjamin.walker(a)intel.com> wrote: > > > = > > > > = > > > > = > > > > On Wed, 2016-06-15 at 23:56 +0000, Bhadauria, Varun wrote: > > > > > = > > > > > = > > > > > Hello Ben > > > > > = > > > > > Thank you for the clarification. I was under the false impression= that Linux AIO can be > > > > > made > > > > > to > > > > > use SPDK under the hood which is clearly not the case since they = will have to go through > > > > > the > > > > > filesystem.=C2=A0 > > > > I'm sure someone could wrap the AIO interface around the SPDK drive= r for the specific case > > > > where > > > > the > > > > user is opening a block device directly with O_DIRECT. It's nearly = a 1:1 translation for > > > > that > > > > case. > > > > Unfortunately, most people use Linux AIO on files instead of block = devices. > > > > = > > > > > = > > > > > = > > > > > BTW are there any known early filesystem implementation besides c= eph=E2=80=99s rocksdb based > > > > > bluestore > > > > > FS > > > > > which use SPDK. > > > > The only publicly announced one that I'm aware of is Bluestore insi= de of Ceph. As long as > > > > SPDK > > > > continues to be valuable, I fully expect many filesystems with diff= erent designs to appear > > > > over > > > > time. If you have a particular use case where you'd like some sort = of filesystem-like layer > > > > on > > > > top > > > > of SPDK, I'd love to hear about it. At a minimum, it's useful to co= llect requirements from a > > > > number > > > > of sources. > > > > = > > > > > = > > > > > = > > > > > = > > > > > Regards, > > > > > Varun Bhadauria > > > > > =C2=A0 > > > > > = > > > > > On 6/15/16, 4:37 PM, "SPDK on behalf of Walker, Benjamin" > > > > behalf > > > > > of=C2=A0 > > > > > benjamin.walker(a)intel.com> wrote: > > > > > = > > > > > > = > > > > > > = > > > > > > = > > > > > > Can you explain a bit more about why you want to use AIO? Are y= ou referring to Linux AIO > > > > > > or > > > > > > POSIX AIO? If you want to do a performance comparison of Linux = AIO and the SPDK NVMe > > > > > > driver > > > > > > then > > > > > > the perf tool is your best bet. > > > > > > = > > > > > > You can run the perf tool against a block device using Linux AI= O by binding your NVMe > > > > > > device > > > > > > to > > > > > > the kernel ("./scripts/setup.sh reset" will hand them all back = to the kernel) and then > > > > > > doing > > > > > > something like: > > > > > > = > > > > > > ./perf -q 1 -s 4096 -w read -t 10 /dev/nvme0n1 /dev/nvme1n1 > > > > > > = > > > > > > -----Original Message----- > > > > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Bh= adauria, Varun > > > > > > Sent: Wednesday, June 15, 2016 4:30 PM > > > > > > To: Storage Performance Development Kit > > > > > > Subject: [SPDK] SPDK air examples > > > > > > = > > > > > > Hello=C2=A0 > > > > > > = > > > > > > Are there any SPDK examples which use AIO?=C2=A0=C2=A0Perf.c ha= s very little documentation in the > > > > > > usage > > > > > > for AIO. > > > > > > = > > > > > > Regards, > > > > > > Varun Bhadauria > > > > > > = > > > > > > = > > > > > > _______________________________________________ > > > > > > SPDK mailing list > > > > > > SPDK(a)lists.01.org > > > > > > https://lists.01.org/mailman/listinfo/spdk > > > > > > _______________________________________________ > > > > > > SPDK mailing list > > > > > > SPDK(a)lists.01.org > > > > > > https://lists.01.org/mailman/listinfo/spdk > > > > > _______________________________________________ > > > > > SPDK mailing list > > > > > SPDK(a)lists.01.org > > > > > https://lists.01.org/mailman/listinfo/spdk > > > > _______________________________________________ > > > > SPDK mailing list > > > > SPDK(a)lists.01.org > > > > https://lists.01.org/mailman/listinfo/spdk > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============3186708450900795814==--