From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] SPDK aio examples
Date: Wed, 22 Jun 2016 17:35:49 +0000 [thread overview]
Message-ID: <1466616949.26925.170.camel@intel.com> (raw)
In-Reply-To: 3A3AEA95-16B6-44BA-B5A7-691CBED02A9B@playstation.sony.com
[-- Attachment #1: Type: text/plain, Size: 11631 bytes --]
On Wed, 2016-06-22 at 16:50 +0000, Bhadauria, Varun wrote:
> Hi Ben
>
> Thank you for the reply.
>
> For application I/Os from being getting submitted from different threads to same queue pair, one
> can allocate a queue pair per logical core (given the H/W supports creation of those many number
> of queue pairs). However getting the current cpu from application code involves a system call
> overhead (Some OS may not even support this).
>
> The other approach can be to have some worker threads each with is own queue pair that feed of
> the application maintained pending I/O queues. However this approach introduces various locking
> overheads (to establish this producer consumer model) which may introduce contentions and prevent
> getting the maximum performance.
>
> How do you think this problem can be avoided?
I think you are assuming that the layer doing the I/O submission is not designed with knowledge of
the application logic above it. That's true for something like the Linux kernel's block-mq layer -
it doesn't know what threading model the application(s) running on it use so it just allocates 1
queue pair per core (sharing if necessary) and then has to ask which core a thread is on to choose
the right queue pair. One of the major advantages of SPDK, however, is that the I/O submission layer
is part of the application and can therefore take advantage of additional knowledge. Most
applications using SPDK will be designed to have 1 thread per CPU core where the thread is running
in a tight event loop, polling a queue. I/O coming in off of the network will be immediately routed
to a particular CPU core and it will be processed there until I/O is submitted to the disk. In that
model, you never have to look up what core you are on - you just have to associate network
connections with particular threads one time when the connection is established. We provide a basic
framework for applications to use this model inside of SPDK (header is at include/spdk/event.h). The
framework isn't required to use our drivers, but all of our example applications and our NVMf target
use it.
>
> Also I don’t see any api for issuing a trim command. Is that being implemented as well?
Every specification uses a different word for trim for some reason. TRIM is the term used by the ATA
command set, SCSI calls it UNMAP, and NVMe calls it deallocate. See http://www.spdk.io/spdk/doc/nvme
_8h.html#ae275923b7e982b115483e425c2972ec5.
>
> Also
> Regards,
> Varun Bhadauria
>
>
>
>
>
>
>
> On 6/17/16, 2:57 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of
> benjamin.walker(a)intel.com> wrote:
>
> >
> > On Fri, 2016-06-17 at 20:52 +0000, Bhadauria, Varun wrote:
> > >
> > > Thanks Ben
> > >
> > > Can you also possibly shed some light on the expected behavior when more than one I/Os are
> > > erroneously submitted on the same qpair? Do the spdk_nvme_ns_cmd_read/write*() return a
> > > specific
> > > error value in this case?
> > >
> > You can submit many I/O per queue pair at the same time as long as you do it from a single
> > thread,
> > and you can submit I/O to different queue pairs on different threads simultaneously with no
> > locks.
> > Are you asking what happens when I/O is submitted simultaneously from different threads to the
> > same
> > queue pair? In that case, you run the risk of corrupting the memory state of the queue. The
> > queue is
> > implemented as an array in memory with a head and a tail pointer. Submitting an I/O to the queue
> > places a command into the next slot, increments the head pointer, and rings a doorbell register
> > to
> > tell the device new commands are present. If you do this from two threads simultaneously, they'd
> > both be copying into the same spot and ringing the doorbell, meaning the device may receive part
> > of
> > one command and part of another. The code is in lib/nvme/nvme_qpair.c:nvme_qpair_submit_tracker
> > if
> > you want to look.
> >
> > There is no expected error value for this case - the behavior is simply undefined. In order to
> > catch
> > a user doing this, we'd have to look at some shared state (which means a lock) and the whole
> > purpose
> > of queue pairs is to avoid locking.
> >
> > >
> > > Also doesn the spdk_nvme_qpair_process_completions() for a qpair needs to be invoked from the
> > > same
> > > thread that is responsible for issuing i/o on the qpair?
> > Yes - you need to call that function from the same thread that you submitted the I/O on. It's
> > fairly
> > obvious that you can only call spdk_nvme_qpair_process_completions on a particular queue pair
> > from 1
> > thread at a time, but it isn't as obvious why you can't reap your completions on a different
> > thread
> > than your submissions, so let me try and explain that.
> >
> > We define two objects, a request and a tracker, that are placed on lists. A request represents a
> > single user call to submit an I/O. A tracker is an entry on the hardware queue. We allow more
> > requests outstanding than available trackers. Submissions and completions manipulate the lists
> > of
> > free requests and trackers using a simple linked list, which is not thread safe. Further, each
> > time
> > a completion happens and frees up a tracker, we check if there are any pending requests and
> > submit
> > them. If we find any on the completion side but we're on a different thread and the submission
> > path,
> > this would be equivalent to doing submissions from two threads simultaneously.
> >
> > I'm not sure this technical challenge couldn't be overcome, but I am fairly confident that you
> > don't
> > actually want to do this in your software anyway. Not only is it more complicated, but you end
> > up
> > thrashing your CPU cache. The request objects are sitting nicely in your L1 or L2 CPU cache from
> > submission, so when you complete on the same core it is ideal.
> >
> > >
> > >
> > > When any outstanding completions that are processed as a result of calling
> > > spdk_nvme_qpair_process_completions(), does a request’s call back called on the same core ?
> > Yes - whatever thread you call spdk_nvme_qpair_process_completions on, for each completion it
> > finds
> > it will call that callback immediately inside of the current thread. So all of the callbacks for
> > completions found will have been called by the time spdk_nvme_qpair_process_completions returns.
> > The
> > code is in lib/nvme/nvme_qpair.c:spdk_nvme_qpair_process_completions() - you can see it just
> > loop
> > over the completion entries and call nvme_qpair_complete_tracker for each one. Inside of
> > nvme_qpair_complete_tracker, it calls the callback function.
> >
> > >
> > >
> > > Is it always necessary to call spdk_nvme_qpair_process_completions() to process completions?
> > Yes - there are no interrupts or backgrounds threads so the driver will only execute in response
> > to
> > calls from the user.
> >
> > >
> > >
> > > Regards,
> > > Varun Bhadauria
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 6/17/16, 10:24 AM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on
> > > behalf of
> > > benjamin.walker(a)intel.com> wrote:
> > >
> > > >
> > > >
> > > > On Wed, 2016-06-15 at 23:56 +0000, Bhadauria, Varun wrote:
> > > > >
> > > > >
> > > > > Hello Ben
> > > > >
> > > > > Thank you for the clarification. I was under the false impression that Linux AIO can be
> > > > > made
> > > > > to
> > > > > use SPDK under the hood which is clearly not the case since they will have to go through
> > > > > the
> > > > > filesystem.
> > > > I'm sure someone could wrap the AIO interface around the SPDK driver for the specific case
> > > > where
> > > > the
> > > > user is opening a block device directly with O_DIRECT. It's nearly a 1:1 translation for
> > > > that
> > > > case.
> > > > Unfortunately, most people use Linux AIO on files instead of block devices.
> > > >
> > > > >
> > > > >
> > > > > BTW are there any known early filesystem implementation besides ceph’s rocksdb based
> > > > > bluestore
> > > > > FS
> > > > > which use SPDK.
> > > > The only publicly announced one that I'm aware of is Bluestore inside of Ceph. As long as
> > > > SPDK
> > > > continues to be valuable, I fully expect many filesystems with different designs to appear
> > > > over
> > > > time. If you have a particular use case where you'd like some sort of filesystem-like layer
> > > > on
> > > > top
> > > > of SPDK, I'd love to hear about it. At a minimum, it's useful to collect requirements from a
> > > > number
> > > > of sources.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > > Varun Bhadauria
> > > > >
> > > > >
> > > > > On 6/15/16, 4:37 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on
> > > > > behalf
> > > > > of
> > > > > benjamin.walker(a)intel.com> wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Can you explain a bit more about why you want to use AIO? Are you referring to Linux AIO
> > > > > > or
> > > > > > POSIX AIO? If you want to do a performance comparison of Linux AIO and the SPDK NVMe
> > > > > > driver
> > > > > > then
> > > > > > the perf tool is your best bet.
> > > > > >
> > > > > > You can run the perf tool against a block device using Linux AIO by binding your NVMe
> > > > > > device
> > > > > > to
> > > > > > the kernel ("./scripts/setup.sh reset" will hand them all back to the kernel) and then
> > > > > > doing
> > > > > > something like:
> > > > > >
> > > > > > ./perf -q 1 -s 4096 -w read -t 10 /dev/nvme0n1 /dev/nvme1n1
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Bhadauria, Varun
> > > > > > Sent: Wednesday, June 15, 2016 4:30 PM
> > > > > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > > > > Subject: [SPDK] SPDK air examples
> > > > > >
> > > > > > Hello
> > > > > >
> > > > > > Are there any SPDK examples which use AIO? Perf.c has very little documentation in the
> > > > > > usage
> > > > > > for AIO.
> > > > > >
> > > > > > Regards,
> > > > > > Varun Bhadauria
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > SPDK mailing list
> > > > > > SPDK(a)lists.01.org
> > > > > > https://lists.01.org/mailman/listinfo/spdk
> > > > > > _______________________________________________
> > > > > > SPDK mailing list
> > > > > > SPDK(a)lists.01.org
> > > > > > https://lists.01.org/mailman/listinfo/spdk
> > > > > _______________________________________________
> > > > > SPDK mailing list
> > > > > SPDK(a)lists.01.org
> > > > > https://lists.01.org/mailman/listinfo/spdk
> > > > _______________________________________________
> > > > SPDK mailing list
> > > > SPDK(a)lists.01.org
> > > > https://lists.01.org/mailman/listinfo/spdk
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
next reply other threads:[~2016-06-22 17:35 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-06-22 17:35 Walker, Benjamin [this message]
-- strict thread matches above, loose matches on Subject: below --
2016-06-22 16:50 [SPDK] SPDK aio examples Bhadauria, Varun
2016-06-17 21:57 Walker, Benjamin
2016-06-17 20:52 Bhadauria, Varun
2016-06-17 17:24 Walker, Benjamin
2016-06-15 23:56 Bhadauria, Varun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1466616949.26925.170.camel@intel.com \
--to=spdk@lists.01.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.