* Re: [SPDK] SPDK aio examples
@ 2016-06-17 17:24 Walker, Benjamin
0 siblings, 0 replies; 6+ messages in thread
From: Walker, Benjamin @ 2016-06-17 17:24 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 2737 bytes --]
On Wed, 2016-06-15 at 23:56 +0000, Bhadauria, Varun wrote:
> Hello Ben
>
> Thank you for the clarification. I was under the false impression that Linux AIO can be made to
> use SPDK under the hood which is clearly not the case since they will have to go through the
> filesystem.
I'm sure someone could wrap the AIO interface around the SPDK driver for the specific case where the
user is opening a block device directly with O_DIRECT. It's nearly a 1:1 translation for that case.
Unfortunately, most people use Linux AIO on files instead of block devices.
> BTW are there any known early filesystem implementation besides ceph’s rocksdb based bluestore FS
> which use SPDK.
The only publicly announced one that I'm aware of is Bluestore inside of Ceph. As long as SPDK
continues to be valuable, I fully expect many filesystems with different designs to appear over
time. If you have a particular use case where you'd like some sort of filesystem-like layer on top
of SPDK, I'd love to hear about it. At a minimum, it's useful to collect requirements from a number
of sources.
>
> Regards,
> Varun Bhadauria
>
>
> On 6/15/16, 4:37 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of
> benjamin.walker(a)intel.com> wrote:
>
> >
> > Can you explain a bit more about why you want to use AIO? Are you referring to Linux AIO or
> > POSIX AIO? If you want to do a performance comparison of Linux AIO and the SPDK NVMe driver then
> > the perf tool is your best bet.
> >
> > You can run the perf tool against a block device using Linux AIO by binding your NVMe device to
> > the kernel ("./scripts/setup.sh reset" will hand them all back to the kernel) and then doing
> > something like:
> >
> > ./perf -q 1 -s 4096 -w read -t 10 /dev/nvme0n1 /dev/nvme1n1
> >
> > -----Original Message-----
> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Bhadauria, Varun
> > Sent: Wednesday, June 15, 2016 4:30 PM
> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > Subject: [SPDK] SPDK air examples
> >
> > Hello
> >
> > Are there any SPDK examples which use AIO? Perf.c has very little documentation in the usage
> > for AIO.
> >
> > Regards,
> > Varun Bhadauria
> >
> >
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [SPDK] SPDK aio examples
@ 2016-06-22 17:35 Walker, Benjamin
0 siblings, 0 replies; 6+ messages in thread
From: Walker, Benjamin @ 2016-06-22 17:35 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 11631 bytes --]
On Wed, 2016-06-22 at 16:50 +0000, Bhadauria, Varun wrote:
> Hi Ben
>
> Thank you for the reply.
>
> For application I/Os from being getting submitted from different threads to same queue pair, one
> can allocate a queue pair per logical core (given the H/W supports creation of those many number
> of queue pairs). However getting the current cpu from application code involves a system call
> overhead (Some OS may not even support this).
>
> The other approach can be to have some worker threads each with is own queue pair that feed of
> the application maintained pending I/O queues. However this approach introduces various locking
> overheads (to establish this producer consumer model) which may introduce contentions and prevent
> getting the maximum performance.
>
> How do you think this problem can be avoided?
I think you are assuming that the layer doing the I/O submission is not designed with knowledge of
the application logic above it. That's true for something like the Linux kernel's block-mq layer -
it doesn't know what threading model the application(s) running on it use so it just allocates 1
queue pair per core (sharing if necessary) and then has to ask which core a thread is on to choose
the right queue pair. One of the major advantages of SPDK, however, is that the I/O submission layer
is part of the application and can therefore take advantage of additional knowledge. Most
applications using SPDK will be designed to have 1 thread per CPU core where the thread is running
in a tight event loop, polling a queue. I/O coming in off of the network will be immediately routed
to a particular CPU core and it will be processed there until I/O is submitted to the disk. In that
model, you never have to look up what core you are on - you just have to associate network
connections with particular threads one time when the connection is established. We provide a basic
framework for applications to use this model inside of SPDK (header is at include/spdk/event.h). The
framework isn't required to use our drivers, but all of our example applications and our NVMf target
use it.
>
> Also I don’t see any api for issuing a trim command. Is that being implemented as well?
Every specification uses a different word for trim for some reason. TRIM is the term used by the ATA
command set, SCSI calls it UNMAP, and NVMe calls it deallocate. See http://www.spdk.io/spdk/doc/nvme
_8h.html#ae275923b7e982b115483e425c2972ec5.
>
> Also
> Regards,
> Varun Bhadauria
>
>
>
>
>
>
>
> On 6/17/16, 2:57 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of
> benjamin.walker(a)intel.com> wrote:
>
> >
> > On Fri, 2016-06-17 at 20:52 +0000, Bhadauria, Varun wrote:
> > >
> > > Thanks Ben
> > >
> > > Can you also possibly shed some light on the expected behavior when more than one I/Os are
> > > erroneously submitted on the same qpair? Do the spdk_nvme_ns_cmd_read/write*() return a
> > > specific
> > > error value in this case?
> > >
> > You can submit many I/O per queue pair at the same time as long as you do it from a single
> > thread,
> > and you can submit I/O to different queue pairs on different threads simultaneously with no
> > locks.
> > Are you asking what happens when I/O is submitted simultaneously from different threads to the
> > same
> > queue pair? In that case, you run the risk of corrupting the memory state of the queue. The
> > queue is
> > implemented as an array in memory with a head and a tail pointer. Submitting an I/O to the queue
> > places a command into the next slot, increments the head pointer, and rings a doorbell register
> > to
> > tell the device new commands are present. If you do this from two threads simultaneously, they'd
> > both be copying into the same spot and ringing the doorbell, meaning the device may receive part
> > of
> > one command and part of another. The code is in lib/nvme/nvme_qpair.c:nvme_qpair_submit_tracker
> > if
> > you want to look.
> >
> > There is no expected error value for this case - the behavior is simply undefined. In order to
> > catch
> > a user doing this, we'd have to look at some shared state (which means a lock) and the whole
> > purpose
> > of queue pairs is to avoid locking.
> >
> > >
> > > Also doesn the spdk_nvme_qpair_process_completions() for a qpair needs to be invoked from the
> > > same
> > > thread that is responsible for issuing i/o on the qpair?
> > Yes - you need to call that function from the same thread that you submitted the I/O on. It's
> > fairly
> > obvious that you can only call spdk_nvme_qpair_process_completions on a particular queue pair
> > from 1
> > thread at a time, but it isn't as obvious why you can't reap your completions on a different
> > thread
> > than your submissions, so let me try and explain that.
> >
> > We define two objects, a request and a tracker, that are placed on lists. A request represents a
> > single user call to submit an I/O. A tracker is an entry on the hardware queue. We allow more
> > requests outstanding than available trackers. Submissions and completions manipulate the lists
> > of
> > free requests and trackers using a simple linked list, which is not thread safe. Further, each
> > time
> > a completion happens and frees up a tracker, we check if there are any pending requests and
> > submit
> > them. If we find any on the completion side but we're on a different thread and the submission
> > path,
> > this would be equivalent to doing submissions from two threads simultaneously.
> >
> > I'm not sure this technical challenge couldn't be overcome, but I am fairly confident that you
> > don't
> > actually want to do this in your software anyway. Not only is it more complicated, but you end
> > up
> > thrashing your CPU cache. The request objects are sitting nicely in your L1 or L2 CPU cache from
> > submission, so when you complete on the same core it is ideal.
> >
> > >
> > >
> > > When any outstanding completions that are processed as a result of calling
> > > spdk_nvme_qpair_process_completions(), does a request’s call back called on the same core ?
> > Yes - whatever thread you call spdk_nvme_qpair_process_completions on, for each completion it
> > finds
> > it will call that callback immediately inside of the current thread. So all of the callbacks for
> > completions found will have been called by the time spdk_nvme_qpair_process_completions returns.
> > The
> > code is in lib/nvme/nvme_qpair.c:spdk_nvme_qpair_process_completions() - you can see it just
> > loop
> > over the completion entries and call nvme_qpair_complete_tracker for each one. Inside of
> > nvme_qpair_complete_tracker, it calls the callback function.
> >
> > >
> > >
> > > Is it always necessary to call spdk_nvme_qpair_process_completions() to process completions?
> > Yes - there are no interrupts or backgrounds threads so the driver will only execute in response
> > to
> > calls from the user.
> >
> > >
> > >
> > > Regards,
> > > Varun Bhadauria
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > >
> > > On 6/17/16, 10:24 AM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on
> > > behalf of
> > > benjamin.walker(a)intel.com> wrote:
> > >
> > > >
> > > >
> > > > On Wed, 2016-06-15 at 23:56 +0000, Bhadauria, Varun wrote:
> > > > >
> > > > >
> > > > > Hello Ben
> > > > >
> > > > > Thank you for the clarification. I was under the false impression that Linux AIO can be
> > > > > made
> > > > > to
> > > > > use SPDK under the hood which is clearly not the case since they will have to go through
> > > > > the
> > > > > filesystem.
> > > > I'm sure someone could wrap the AIO interface around the SPDK driver for the specific case
> > > > where
> > > > the
> > > > user is opening a block device directly with O_DIRECT. It's nearly a 1:1 translation for
> > > > that
> > > > case.
> > > > Unfortunately, most people use Linux AIO on files instead of block devices.
> > > >
> > > > >
> > > > >
> > > > > BTW are there any known early filesystem implementation besides ceph’s rocksdb based
> > > > > bluestore
> > > > > FS
> > > > > which use SPDK.
> > > > The only publicly announced one that I'm aware of is Bluestore inside of Ceph. As long as
> > > > SPDK
> > > > continues to be valuable, I fully expect many filesystems with different designs to appear
> > > > over
> > > > time. If you have a particular use case where you'd like some sort of filesystem-like layer
> > > > on
> > > > top
> > > > of SPDK, I'd love to hear about it. At a minimum, it's useful to collect requirements from a
> > > > number
> > > > of sources.
> > > >
> > > > >
> > > > >
> > > > >
> > > > > Regards,
> > > > > Varun Bhadauria
> > > > >
> > > > >
> > > > > On 6/15/16, 4:37 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on
> > > > > behalf
> > > > > of
> > > > > benjamin.walker(a)intel.com> wrote:
> > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > Can you explain a bit more about why you want to use AIO? Are you referring to Linux AIO
> > > > > > or
> > > > > > POSIX AIO? If you want to do a performance comparison of Linux AIO and the SPDK NVMe
> > > > > > driver
> > > > > > then
> > > > > > the perf tool is your best bet.
> > > > > >
> > > > > > You can run the perf tool against a block device using Linux AIO by binding your NVMe
> > > > > > device
> > > > > > to
> > > > > > the kernel ("./scripts/setup.sh reset" will hand them all back to the kernel) and then
> > > > > > doing
> > > > > > something like:
> > > > > >
> > > > > > ./perf -q 1 -s 4096 -w read -t 10 /dev/nvme0n1 /dev/nvme1n1
> > > > > >
> > > > > > -----Original Message-----
> > > > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Bhadauria, Varun
> > > > > > Sent: Wednesday, June 15, 2016 4:30 PM
> > > > > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > > > > Subject: [SPDK] SPDK air examples
> > > > > >
> > > > > > Hello
> > > > > >
> > > > > > Are there any SPDK examples which use AIO? Perf.c has very little documentation in the
> > > > > > usage
> > > > > > for AIO.
> > > > > >
> > > > > > Regards,
> > > > > > Varun Bhadauria
> > > > > >
> > > > > >
> > > > > > _______________________________________________
> > > > > > SPDK mailing list
> > > > > > SPDK(a)lists.01.org
> > > > > > https://lists.01.org/mailman/listinfo/spdk
> > > > > > _______________________________________________
> > > > > > SPDK mailing list
> > > > > > SPDK(a)lists.01.org
> > > > > > https://lists.01.org/mailman/listinfo/spdk
> > > > > _______________________________________________
> > > > > SPDK mailing list
> > > > > SPDK(a)lists.01.org
> > > > > https://lists.01.org/mailman/listinfo/spdk
> > > > _______________________________________________
> > > > SPDK mailing list
> > > > SPDK(a)lists.01.org
> > > > https://lists.01.org/mailman/listinfo/spdk
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [SPDK] SPDK aio examples
@ 2016-06-22 16:50 Bhadauria, Varun
0 siblings, 0 replies; 6+ messages in thread
From: Bhadauria, Varun @ 2016-06-22 16:50 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 8932 bytes --]
Hi Ben
Thank you for the reply.
For application I/Os from being getting submitted from different threads to same queue pair, one can allocate a queue pair per logical core (given the H/W supports creation of those many number of queue pairs). However getting the current cpu from application code involves a system call overhead (Some OS may not even support this).
The other approach can be to have some worker threads each with is own queue pair that feed of the application maintained pending I/O queues. However this approach introduces various locking overheads (to establish this producer consumer model) which may introduce contentions and prevent getting the maximum performance.
How do you think this problem can be avoided?
Also I don’t see any api for issuing a trim command. Is that being implemented as well?
Also
Regards,
Varun Bhadauria
On 6/17/16, 2:57 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:
>On Fri, 2016-06-17 at 20:52 +0000, Bhadauria, Varun wrote:
>> Thanks Ben
>>
>> Can you also possibly shed some light on the expected behavior when more than one I/Os are
>> erroneously submitted on the same qpair? Do the spdk_nvme_ns_cmd_read/write*() return a specific
>> error value in this case?
>>
>You can submit many I/O per queue pair at the same time as long as you do it from a single thread,
>and you can submit I/O to different queue pairs on different threads simultaneously with no locks.
>Are you asking what happens when I/O is submitted simultaneously from different threads to the same
>queue pair? In that case, you run the risk of corrupting the memory state of the queue. The queue is
>implemented as an array in memory with a head and a tail pointer. Submitting an I/O to the queue
>places a command into the next slot, increments the head pointer, and rings a doorbell register to
>tell the device new commands are present. If you do this from two threads simultaneously, they'd
>both be copying into the same spot and ringing the doorbell, meaning the device may receive part of
>one command and part of another. The code is in lib/nvme/nvme_qpair.c:nvme_qpair_submit_tracker if
>you want to look.
>
>There is no expected error value for this case - the behavior is simply undefined. In order to catch
>a user doing this, we'd have to look at some shared state (which means a lock) and the whole purpose
>of queue pairs is to avoid locking.
>
>> Also doesn the spdk_nvme_qpair_process_completions() for a qpair needs to be invoked from the same
>> thread that is responsible for issuing i/o on the qpair?
>
>Yes - you need to call that function from the same thread that you submitted the I/O on. It's fairly
>obvious that you can only call spdk_nvme_qpair_process_completions on a particular queue pair from 1
>thread at a time, but it isn't as obvious why you can't reap your completions on a different thread
>than your submissions, so let me try and explain that.
>
>We define two objects, a request and a tracker, that are placed on lists. A request represents a
>single user call to submit an I/O. A tracker is an entry on the hardware queue. We allow more
>requests outstanding than available trackers. Submissions and completions manipulate the lists of
>free requests and trackers using a simple linked list, which is not thread safe. Further, each time
>a completion happens and frees up a tracker, we check if there are any pending requests and submit
>them. If we find any on the completion side but we're on a different thread and the submission path,
>this would be equivalent to doing submissions from two threads simultaneously.
>
>I'm not sure this technical challenge couldn't be overcome, but I am fairly confident that you don't
>actually want to do this in your software anyway. Not only is it more complicated, but you end up
>thrashing your CPU cache. The request objects are sitting nicely in your L1 or L2 CPU cache from
>submission, so when you complete on the same core it is ideal.
>
>>
>> When any outstanding completions that are processed as a result of calling
>> spdk_nvme_qpair_process_completions(), does a request’s call back called on the same core ?
>
>Yes - whatever thread you call spdk_nvme_qpair_process_completions on, for each completion it finds
>it will call that callback immediately inside of the current thread. So all of the callbacks for
>completions found will have been called by the time spdk_nvme_qpair_process_completions returns. The
>code is in lib/nvme/nvme_qpair.c:spdk_nvme_qpair_process_completions() - you can see it just loop
>over the completion entries and call nvme_qpair_complete_tracker for each one. Inside of
>nvme_qpair_complete_tracker, it calls the callback function.
>
>>
>> Is it always necessary to call spdk_nvme_qpair_process_completions() to process completions?
>
>Yes - there are no interrupts or backgrounds threads so the driver will only execute in response to
>calls from the user.
>
>>
>> Regards,
>> Varun Bhadauria
>>
>>
>>
>>
>>
>>
>>
>>
>> On 6/17/16, 10:24 AM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of
>> benjamin.walker(a)intel.com> wrote:
>>
>> >
>> > On Wed, 2016-06-15 at 23:56 +0000, Bhadauria, Varun wrote:
>> > >
>> > > Hello Ben
>> > >
>> > > Thank you for the clarification. I was under the false impression that Linux AIO can be made
>> > > to
>> > > use SPDK under the hood which is clearly not the case since they will have to go through the
>> > > filesystem.
>> > I'm sure someone could wrap the AIO interface around the SPDK driver for the specific case where
>> > the
>> > user is opening a block device directly with O_DIRECT. It's nearly a 1:1 translation for that
>> > case.
>> > Unfortunately, most people use Linux AIO on files instead of block devices.
>> >
>> > >
>> > > BTW are there any known early filesystem implementation besides ceph’s rocksdb based bluestore
>> > > FS
>> > > which use SPDK.
>> > The only publicly announced one that I'm aware of is Bluestore inside of Ceph. As long as SPDK
>> > continues to be valuable, I fully expect many filesystems with different designs to appear over
>> > time. If you have a particular use case where you'd like some sort of filesystem-like layer on
>> > top
>> > of SPDK, I'd love to hear about it. At a minimum, it's useful to collect requirements from a
>> > number
>> > of sources.
>> >
>> > >
>> > >
>> > > Regards,
>> > > Varun Bhadauria
>> > >
>> > >
>> > > On 6/15/16, 4:37 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf
>> > > of
>> > > benjamin.walker(a)intel.com> wrote:
>> > >
>> > > >
>> > > >
>> > > > Can you explain a bit more about why you want to use AIO? Are you referring to Linux AIO or
>> > > > POSIX AIO? If you want to do a performance comparison of Linux AIO and the SPDK NVMe driver
>> > > > then
>> > > > the perf tool is your best bet.
>> > > >
>> > > > You can run the perf tool against a block device using Linux AIO by binding your NVMe device
>> > > > to
>> > > > the kernel ("./scripts/setup.sh reset" will hand them all back to the kernel) and then doing
>> > > > something like:
>> > > >
>> > > > ./perf -q 1 -s 4096 -w read -t 10 /dev/nvme0n1 /dev/nvme1n1
>> > > >
>> > > > -----Original Message-----
>> > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Bhadauria, Varun
>> > > > Sent: Wednesday, June 15, 2016 4:30 PM
>> > > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
>> > > > Subject: [SPDK] SPDK air examples
>> > > >
>> > > > Hello
>> > > >
>> > > > Are there any SPDK examples which use AIO? Perf.c has very little documentation in the
>> > > > usage
>> > > > for AIO.
>> > > >
>> > > > Regards,
>> > > > Varun Bhadauria
>> > > >
>> > > >
>> > > > _______________________________________________
>> > > > SPDK mailing list
>> > > > SPDK(a)lists.01.org
>> > > > https://lists.01.org/mailman/listinfo/spdk
>> > > > _______________________________________________
>> > > > SPDK mailing list
>> > > > SPDK(a)lists.01.org
>> > > > https://lists.01.org/mailman/listinfo/spdk
>> > > _______________________________________________
>> > > SPDK mailing list
>> > > SPDK(a)lists.01.org
>> > > https://lists.01.org/mailman/listinfo/spdk
>> > _______________________________________________
>> > SPDK mailing list
>> > SPDK(a)lists.01.org
>> > https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>_______________________________________________
>SPDK mailing list
>SPDK(a)lists.01.org
>https://lists.01.org/mailman/listinfo/spdk
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [SPDK] SPDK aio examples
@ 2016-06-17 21:57 Walker, Benjamin
0 siblings, 0 replies; 6+ messages in thread
From: Walker, Benjamin @ 2016-06-17 21:57 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 7607 bytes --]
On Fri, 2016-06-17 at 20:52 +0000, Bhadauria, Varun wrote:
> Thanks Ben
>
> Can you also possibly shed some light on the expected behavior when more than one I/Os are
> erroneously submitted on the same qpair? Do the spdk_nvme_ns_cmd_read/write*() return a specific
> error value in this case?
>
You can submit many I/O per queue pair at the same time as long as you do it from a single thread,
and you can submit I/O to different queue pairs on different threads simultaneously with no locks.
Are you asking what happens when I/O is submitted simultaneously from different threads to the same
queue pair? In that case, you run the risk of corrupting the memory state of the queue. The queue is
implemented as an array in memory with a head and a tail pointer. Submitting an I/O to the queue
places a command into the next slot, increments the head pointer, and rings a doorbell register to
tell the device new commands are present. If you do this from two threads simultaneously, they'd
both be copying into the same spot and ringing the doorbell, meaning the device may receive part of
one command and part of another. The code is in lib/nvme/nvme_qpair.c:nvme_qpair_submit_tracker if
you want to look.
There is no expected error value for this case - the behavior is simply undefined. In order to catch
a user doing this, we'd have to look at some shared state (which means a lock) and the whole purpose
of queue pairs is to avoid locking.
> Also doesn the spdk_nvme_qpair_process_completions() for a qpair needs to be invoked from the same
> thread that is responsible for issuing i/o on the qpair?
Yes - you need to call that function from the same thread that you submitted the I/O on. It's fairly
obvious that you can only call spdk_nvme_qpair_process_completions on a particular queue pair from 1
thread at a time, but it isn't as obvious why you can't reap your completions on a different thread
than your submissions, so let me try and explain that.
We define two objects, a request and a tracker, that are placed on lists. A request represents a
single user call to submit an I/O. A tracker is an entry on the hardware queue. We allow more
requests outstanding than available trackers. Submissions and completions manipulate the lists of
free requests and trackers using a simple linked list, which is not thread safe. Further, each time
a completion happens and frees up a tracker, we check if there are any pending requests and submit
them. If we find any on the completion side but we're on a different thread and the submission path,
this would be equivalent to doing submissions from two threads simultaneously.
I'm not sure this technical challenge couldn't be overcome, but I am fairly confident that you don't
actually want to do this in your software anyway. Not only is it more complicated, but you end up
thrashing your CPU cache. The request objects are sitting nicely in your L1 or L2 CPU cache from
submission, so when you complete on the same core it is ideal.
>
> When any outstanding completions that are processed as a result of calling
> spdk_nvme_qpair_process_completions(), does a request’s call back called on the same core ?
Yes - whatever thread you call spdk_nvme_qpair_process_completions on, for each completion it finds
it will call that callback immediately inside of the current thread. So all of the callbacks for
completions found will have been called by the time spdk_nvme_qpair_process_completions returns. The
code is in lib/nvme/nvme_qpair.c:spdk_nvme_qpair_process_completions() - you can see it just loop
over the completion entries and call nvme_qpair_complete_tracker for each one. Inside of
nvme_qpair_complete_tracker, it calls the callback function.
>
> Is it always necessary to call spdk_nvme_qpair_process_completions() to process completions?
Yes - there are no interrupts or backgrounds threads so the driver will only execute in response to
calls from the user.
>
> Regards,
> Varun Bhadauria
>
>
>
>
>
>
>
>
> On 6/17/16, 10:24 AM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of
> benjamin.walker(a)intel.com> wrote:
>
> >
> > On Wed, 2016-06-15 at 23:56 +0000, Bhadauria, Varun wrote:
> > >
> > > Hello Ben
> > >
> > > Thank you for the clarification. I was under the false impression that Linux AIO can be made
> > > to
> > > use SPDK under the hood which is clearly not the case since they will have to go through the
> > > filesystem.
> > I'm sure someone could wrap the AIO interface around the SPDK driver for the specific case where
> > the
> > user is opening a block device directly with O_DIRECT. It's nearly a 1:1 translation for that
> > case.
> > Unfortunately, most people use Linux AIO on files instead of block devices.
> >
> > >
> > > BTW are there any known early filesystem implementation besides ceph’s rocksdb based bluestore
> > > FS
> > > which use SPDK.
> > The only publicly announced one that I'm aware of is Bluestore inside of Ceph. As long as SPDK
> > continues to be valuable, I fully expect many filesystems with different designs to appear over
> > time. If you have a particular use case where you'd like some sort of filesystem-like layer on
> > top
> > of SPDK, I'd love to hear about it. At a minimum, it's useful to collect requirements from a
> > number
> > of sources.
> >
> > >
> > >
> > > Regards,
> > > Varun Bhadauria
> > >
> > >
> > > On 6/15/16, 4:37 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf
> > > of
> > > benjamin.walker(a)intel.com> wrote:
> > >
> > > >
> > > >
> > > > Can you explain a bit more about why you want to use AIO? Are you referring to Linux AIO or
> > > > POSIX AIO? If you want to do a performance comparison of Linux AIO and the SPDK NVMe driver
> > > > then
> > > > the perf tool is your best bet.
> > > >
> > > > You can run the perf tool against a block device using Linux AIO by binding your NVMe device
> > > > to
> > > > the kernel ("./scripts/setup.sh reset" will hand them all back to the kernel) and then doing
> > > > something like:
> > > >
> > > > ./perf -q 1 -s 4096 -w read -t 10 /dev/nvme0n1 /dev/nvme1n1
> > > >
> > > > -----Original Message-----
> > > > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Bhadauria, Varun
> > > > Sent: Wednesday, June 15, 2016 4:30 PM
> > > > To: Storage Performance Development Kit <spdk(a)lists.01.org>
> > > > Subject: [SPDK] SPDK air examples
> > > >
> > > > Hello
> > > >
> > > > Are there any SPDK examples which use AIO? Perf.c has very little documentation in the
> > > > usage
> > > > for AIO.
> > > >
> > > > Regards,
> > > > Varun Bhadauria
> > > >
> > > >
> > > > _______________________________________________
> > > > SPDK mailing list
> > > > SPDK(a)lists.01.org
> > > > https://lists.01.org/mailman/listinfo/spdk
> > > > _______________________________________________
> > > > SPDK mailing list
> > > > SPDK(a)lists.01.org
> > > > https://lists.01.org/mailman/listinfo/spdk
> > > _______________________________________________
> > > SPDK mailing list
> > > SPDK(a)lists.01.org
> > > https://lists.01.org/mailman/listinfo/spdk
> > _______________________________________________
> > SPDK mailing list
> > SPDK(a)lists.01.org
> > https://lists.01.org/mailman/listinfo/spdk
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [SPDK] SPDK aio examples
@ 2016-06-17 20:52 Bhadauria, Varun
0 siblings, 0 replies; 6+ messages in thread
From: Bhadauria, Varun @ 2016-06-17 20:52 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 3779 bytes --]
Thanks Ben
Can you also possibly shed some light on the expected behavior when more than one I/Os are erroneously submitted on the same qpair? Do the spdk_nvme_ns_cmd_read/write*() return a specific error value in this case?
Also doesn the spdk_nvme_qpair_process_completions() for a qpair needs to be invoked from the same thread that is responsible for issuing i/o on the qpair?
When any outstanding completions that are processed as a result of calling spdk_nvme_qpair_process_completions(), does a request’s call back called on the same core ?
Is it always necessary to call spdk_nvme_qpair_process_completions() to process completions?
Regards,
Varun Bhadauria
On 6/17/16, 10:24 AM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:
>On Wed, 2016-06-15 at 23:56 +0000, Bhadauria, Varun wrote:
>> Hello Ben
>>
>> Thank you for the clarification. I was under the false impression that Linux AIO can be made to
>> use SPDK under the hood which is clearly not the case since they will have to go through the
>> filesystem.
>
>I'm sure someone could wrap the AIO interface around the SPDK driver for the specific case where the
>user is opening a block device directly with O_DIRECT. It's nearly a 1:1 translation for that case.
>Unfortunately, most people use Linux AIO on files instead of block devices.
>
>> BTW are there any known early filesystem implementation besides ceph’s rocksdb based bluestore FS
>> which use SPDK.
>
>The only publicly announced one that I'm aware of is Bluestore inside of Ceph. As long as SPDK
>continues to be valuable, I fully expect many filesystems with different designs to appear over
>time. If you have a particular use case where you'd like some sort of filesystem-like layer on top
>of SPDK, I'd love to hear about it. At a minimum, it's useful to collect requirements from a number
>of sources.
>
>>
>> Regards,
>> Varun Bhadauria
>>
>>
>> On 6/15/16, 4:37 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of
>> benjamin.walker(a)intel.com> wrote:
>>
>> >
>> > Can you explain a bit more about why you want to use AIO? Are you referring to Linux AIO or
>> > POSIX AIO? If you want to do a performance comparison of Linux AIO and the SPDK NVMe driver then
>> > the perf tool is your best bet.
>> >
>> > You can run the perf tool against a block device using Linux AIO by binding your NVMe device to
>> > the kernel ("./scripts/setup.sh reset" will hand them all back to the kernel) and then doing
>> > something like:
>> >
>> > ./perf -q 1 -s 4096 -w read -t 10 /dev/nvme0n1 /dev/nvme1n1
>> >
>> > -----Original Message-----
>> > From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Bhadauria, Varun
>> > Sent: Wednesday, June 15, 2016 4:30 PM
>> > To: Storage Performance Development Kit <spdk(a)lists.01.org>
>> > Subject: [SPDK] SPDK air examples
>> >
>> > Hello
>> >
>> > Are there any SPDK examples which use AIO? Perf.c has very little documentation in the usage
>> > for AIO.
>> >
>> > Regards,
>> > Varun Bhadauria
>> >
>> >
>> > _______________________________________________
>> > SPDK mailing list
>> > SPDK(a)lists.01.org
>> > https://lists.01.org/mailman/listinfo/spdk
>> > _______________________________________________
>> > SPDK mailing list
>> > SPDK(a)lists.01.org
>> > https://lists.01.org/mailman/listinfo/spdk
>> _______________________________________________
>> SPDK mailing list
>> SPDK(a)lists.01.org
>> https://lists.01.org/mailman/listinfo/spdk
>_______________________________________________
>SPDK mailing list
>SPDK(a)lists.01.org
>https://lists.01.org/mailman/listinfo/spdk
^ permalink raw reply [flat|nested] 6+ messages in thread* Re: [SPDK] SPDK aio examples
@ 2016-06-15 23:56 Bhadauria, Varun
0 siblings, 0 replies; 6+ messages in thread
From: Bhadauria, Varun @ 2016-06-15 23:56 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]
Hello Ben
Thank you for the clarification. I was under the false impression that Linux AIO can be made to use SPDK under the hood which is clearly not the case since they will have to go through the filesystem. BTW are there any known early filesystem implementation besides ceph’s rocksdb based bluestore FS which use SPDK.
Regards,
Varun Bhadauria
On 6/15/16, 4:37 PM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:
>Can you explain a bit more about why you want to use AIO? Are you referring to Linux AIO or POSIX AIO? If you want to do a performance comparison of Linux AIO and the SPDK NVMe driver then the perf tool is your best bet.
>
>You can run the perf tool against a block device using Linux AIO by binding your NVMe device to the kernel ("./scripts/setup.sh reset" will hand them all back to the kernel) and then doing something like:
>
>./perf -q 1 -s 4096 -w read -t 10 /dev/nvme0n1 /dev/nvme1n1
>
>-----Original Message-----
>From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Bhadauria, Varun
>Sent: Wednesday, June 15, 2016 4:30 PM
>To: Storage Performance Development Kit <spdk(a)lists.01.org>
>Subject: [SPDK] SPDK air examples
>
>Hello
>
>Are there any SPDK examples which use AIO? Perf.c has very little documentation in the usage for AIO.
>
>Regards,
>Varun Bhadauria
>
>
>_______________________________________________
>SPDK mailing list
>SPDK(a)lists.01.org
>https://lists.01.org/mailman/listinfo/spdk
>_______________________________________________
>SPDK mailing list
>SPDK(a)lists.01.org
>https://lists.01.org/mailman/listinfo/spdk
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2016-06-22 17:35 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-06-17 17:24 [SPDK] SPDK aio examples Walker, Benjamin
-- strict thread matches above, loose matches on Subject: below --
2016-06-22 17:35 Walker, Benjamin
2016-06-22 16:50 Bhadauria, Varun
2016-06-17 21:57 Walker, Benjamin
2016-06-17 20:52 Bhadauria, Varun
2016-06-15 23:56 Bhadauria, Varun
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.