* [SPDK] SCST Usermode iSCSI Storage Server now handles Intel SPDK backing storage
@ 2017-09-05 7:25 David Butterfield
0 siblings, 0 replies; 4+ messages in thread
From: David Butterfield @ 2017-09-05 7:25 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 2493 bytes --]
The SCST Usermode iSCSI Storage Server can now utilize backing storage through
the Intel Storage Performance Development Kit (SPDK) API.
The SCST Usermode Server is a port of about 80 KLOC of the SCST Linux kernel
software to run entirely in usermode on an unmodified kernel, with virtually
no change to the existing SCST source code.
The diagram on the left side of this PDF page compares the usual kernel-based
SCST configuration [blue box] with the configuration adapted for usermode
[purple box]
https://github.com/DavidButterfield/SCST-Usermode-Adaptation/blob/usermode/usermode/scstu_tcmur.pdf
The diagram on the right side of that page illustrates the datapath from
Initiator to backing storage API -- showing paths through LIO (in-kernel), and
through Usermode SCST [purple box]. The Usermode SCST server can access
backing storage through any of these interfaces: preadv(2) and pwritev(2),
aio(7), or the tcmu-runner backstorage API [red arrow].
The tcmu-runner backstorage API is a usermode interface point between the
kernel-based LIO facility and usermode backstore-specific handlers. The
tcmu-runner project implements backstore handlers for Ceph/rbd, Gluster/glfs,
and QEMU/qcow [green box]. I have re-used that same API for Usermode SCST so
that it can make use of those same backstore handlers [red arrow].
I have also implemented two additional backstore handlers: a "ram" driver that
uses mmap(2) either anonymously or with a persistent backing file; and most
recently, an interface module to the Intel Storage Performance Development Kit
(SPDK) [red circle -- note that the new SPDK module is a prototype, presently
functional with Usermode SCST, but not yet through the LIO datapath].
Project is at https://github.com/DavidButterfield/SCST-Usermode-Adaptation --
the README there has a few diagrams and a link to a technical paper. The new
SPDK backstore handler is in usermode/spdk.c
The paper starts by describing the port of SCST from the Linux kernel to
usermode, including diagrams showing how this was done without changing the
SCST source code. Next I specify the configuration used for performance
measurements, followed by plots and analysis interpreting the results. I
introduce an experimental "Adaptive Nagle" algorithm to improve performance of
small Read operations. An appendix develops a performance model that attempts
to maintain some intuition in a fairly complicated analysis.
David Butterfield
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [SPDK] SCST Usermode iSCSI Storage Server now handles Intel SPDK backing storage
@ 2017-09-06 3:25 Vladislav Bolkhovitin
0 siblings, 0 replies; 4+ messages in thread
From: Vladislav Bolkhovitin @ 2017-09-06 3:25 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 2872 bytes --]
Wonderful!
The only note would be that, as I have already mentioned before, tcmu does data copy
between user mode module and kernel, so usage with SCST zero-copy scst_user instead
would be more performance efficient.
Thanks,
Vlad
David Butterfield wrote on 09/05/2017 12:25 AM:
> The SCST Usermode iSCSI Storage Server can now utilize backing storage through
> the Intel Storage Performance Development Kit (SPDK) API.
>
> The SCST Usermode Server is a port of about 80 KLOC of the SCST Linux kernel
> software to run entirely in usermode on an unmodified kernel, with virtually
> no change to the existing SCST source code.
>
> The diagram on the left side of this PDF page compares the usual kernel-based
> SCST configuration [blue box] with the configuration adapted for usermode
> [purple box]
> https://github.com/DavidButterfield/SCST-Usermode-Adaptation/blob/usermode/usermode/scstu_tcmur.pdf
>
> The diagram on the right side of that page illustrates the datapath from
> Initiator to backing storage API -- showing paths through LIO (in-kernel), and
> through Usermode SCST [purple box]. The Usermode SCST server can access
> backing storage through any of these interfaces: preadv(2) and pwritev(2),
> aio(7), or the tcmu-runner backstorage API [red arrow].
>
> The tcmu-runner backstorage API is a usermode interface point between the
> kernel-based LIO facility and usermode backstore-specific handlers. The
> tcmu-runner project implements backstore handlers for Ceph/rbd, Gluster/glfs,
> and QEMU/qcow [green box]. I have re-used that same API for Usermode SCST so
> that it can make use of those same backstore handlers [red arrow].
>
> I have also implemented two additional backstore handlers: a "ram" driver that
> uses mmap(2) either anonymously or with a persistent backing file; and most
> recently, an interface module to the Intel Storage Performance Development Kit
> (SPDK) [red circle -- note that the new SPDK module is a prototype, presently
> functional with Usermode SCST, but not yet through the LIO datapath].
>
> Project is at https://github.com/DavidButterfield/SCST-Usermode-Adaptation --
> the README there has a few diagrams and a link to a technical paper. The new
> SPDK backstore handler is in usermode/spdk.c
>
> The paper starts by describing the port of SCST from the Linux kernel to
> usermode, including diagrams showing how this was done without changing the
> SCST source code. Next I specify the configuration used for performance
> measurements, followed by plots and analysis interpreting the results. I
> introduce an experimental "Adaptive Nagle" algorithm to improve performance of
> small Read operations. An appendix develops a performance model that attempts
> to maintain some intuition in a fairly complicated analysis.
>
> David Butterfield
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [SPDK] SCST Usermode iSCSI Storage Server now handles Intel SPDK backing storage
@ 2017-09-06 22:48 David Butterfield
0 siblings, 0 replies; 4+ messages in thread
From: David Butterfield @ 2017-09-06 22:48 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 3451 bytes --]
On Tue, Sep 5, 2017 at 9:25 PM, Vladislav Bolkhovitin <vst(a)vlnb.net> wrote:
> The only note would be that, as I have already mentioned before, tcmu does data copy
> between user mode module and kernel, so usage with SCST zero-copy scst_user instead
> would be more performance efficient.
Yes, it's better not to have to copy the data; but I'm not sure that's
the limiting factor for TCMU performance.
A ring buffer mediates communication in the TCMU datapath between
tcm_user (in the kernel) and libtcmu (in usermode). One fairly
fundamental characteristic of the TCMU model is that the granularity
of transaction through the ring buffer is the CDB. There is overhead
cost to access and maintain the ring four times per SCSI command
(Request+Response) * (Sender+Receiver).
Concerning me more than that is the problem of timely scheduling of
the threads on each side of the ring. One might expect at least one
wakeup per SCSI command, because whichever side of the ring is faster
to process a command must inevitably sleep waiting for the slower
side.
In practice it averages fewer than one wakeup per command (with
sufficient queue-depth) because multiple commands can accumulate in
the ring during the scheduling delay for the first command, and the
entire backlog can be processed in one wakeup. But you only get such
batching in return for enduring thread scheduling latency on the
datapath (with its own issues).
It is too complicated to determine from analysis alone how all the
factors combine into overall performance behavior under various
loading conditions -- the only way to really know is to observe and
measure it. How many IOPS can get through that ring, and what happens
if the load is not quite 100%, or the load is light at queue-depth 2
or even 1? Or when the required protocol work is heavier on the
kernel side versus heavier on the usermode side?
TCMU has had some time to gain usermode clients. Finding even *one*
such client -- that has been well-measured under a variety of
conditions and demonstrated to work reliably with high performance --
would prove that it is possible to do through the TCMU API,
substantially reducing the concern. There may be an example out
there, but I looked around a couple of months ago and did not find
anything except "we haven't done performance tuning yet". But the
concern is toward factors that are inherent in the TCMU model, not
amenable to simple "performance tuning at the end". Given
CDB-granularity, I expect the TCMU IOPS bottleneck is going to be
around that ring.
In contrast to the CDB-ring model, Usermode SCST uses socket(2) and
related system calls for communication with the iSCSI initiator --
these socket calls are where the datapath crosses between the kernel
and usermode. Here the granularity of transaction between the two can
theoretically be as large as the socket buffer size -- much larger
than one SCSI command.
Especially when using SPDK for backing storage, another step is to
re-implement the network I/O using DPDK calls, eliminating the socket
I/O calls altogether (I expect that to be straightforward in
iscsi-scst/kernel/nthread.c). Then the entire datapath would be in
usermode (down to the I/O instructions, I think).
(Caveat: this analysis is based only on considering the TCMU model,
not any actual performance experimentation with TCMU)
Regards
David Butterfield
^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: [SPDK] SCST Usermode iSCSI Storage Server now handles Intel SPDK backing storage
@ 2017-09-06 23:32 Vladislav Bolkhovitin
0 siblings, 0 replies; 4+ messages in thread
From: Vladislav Bolkhovitin @ 2017-09-06 23:32 UTC (permalink / raw)
To: spdk
[-- Attachment #1: Type: text/plain, Size: 4196 bytes --]
David Butterfield wrote on 09/06/2017 03:48 PM:
> On Tue, Sep 5, 2017 at 9:25 PM, Vladislav Bolkhovitin <vst(a)vlnb.net> wrote:
>> The only note would be that, as I have already mentioned before, tcmu does data copy
>> between user mode module and kernel, so usage with SCST zero-copy scst_user instead
>> would be more performance efficient.
>
> Yes, it's better not to have to copy the data; but I'm not sure that's
> the limiting factor for TCMU performance.
>
> A ring buffer mediates communication in the TCMU datapath between
> tcm_user (in the kernel) and libtcmu (in usermode). One fairly
> fundamental characteristic of the TCMU model is that the granularity
> of transaction through the ring buffer is the CDB. There is overhead
> cost to access and maintain the ring four times per SCSI command
> (Request+Response) * (Sender+Receiver).
>
> Concerning me more than that is the problem of timely scheduling of
> the threads on each side of the ring. One might expect at least one
> wakeup per SCSI command, because whichever side of the ring is faster
> to process a command must inevitably sleep waiting for the slower
> side.
>
> In practice it averages fewer than one wakeup per command (with
> sufficient queue-depth) because multiple commands can accumulate in
> the ring during the scheduling delay for the first command, and the
> entire backlog can be processed in one wakeup. But you only get such
> batching in return for enduring thread scheduling latency on the
> datapath (with its own issues).
>
> It is too complicated to determine from analysis alone how all the
> factors combine into overall performance behavior under various
> loading conditions -- the only way to really know is to observe and
> measure it. How many IOPS can get through that ring, and what happens
> if the load is not quite 100%, or the load is light at queue-depth 2
> or even 1? Or when the required protocol work is heavier on the
> kernel side versus heavier on the usermode side?
>
> TCMU has had some time to gain usermode clients. Finding even *one*
> such client -- that has been well-measured under a variety of
> conditions and demonstrated to work reliably with high performance --
> would prove that it is possible to do through the TCMU API,
> substantially reducing the concern. There may be an example out
> there, but I looked around a couple of months ago and did not find
> anything except "we haven't done performance tuning yet". But the
> concern is toward factors that are inherent in the TCMU model, not
> amenable to simple "performance tuning at the end". Given
> CDB-granularity, I expect the TCMU IOPS bottleneck is going to be
> around that ring.
>
> In contrast to the CDB-ring model, Usermode SCST uses socket(2) and
> related system calls for communication with the iSCSI initiator --
> these socket calls are where the datapath crosses between the kernel
> and usermode. Here the granularity of transaction between the two can
> theoretically be as large as the socket buffer size -- much larger
> than one SCSI command.
>
> Especially when using SPDK for backing storage, another step is to
> re-implement the network I/O using DPDK calls, eliminating the socket
> I/O calls altogether (I expect that to be straightforward in
> iscsi-scst/kernel/nthread.c). Then the entire datapath would be in
> usermode (down to the I/O instructions, I think).
>
> (Caveat: this analysis is based only on considering the TCMU model,
> not any actual performance experimentation with TCMU)
I see, interesting analyze. Just one correction, netlink sockets are used for
kernel-user mode communication in iSCSI-SCST, and used only to establish connection,
then everything is done entirely inside the kernel (in user space in your port).
Scst_user uses IOCTL-based interface, with 2 calls per CDB that could be batched too.
Everything inside single thread context, no extra inter-threads switches. In your user
space port it could be translated to just a regular function call leading to very
interesting marriage between SPDK frontend and existing user mode SCST backends :)
Vlad
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2017-09-06 23:32 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-09-06 23:32 [SPDK] SCST Usermode iSCSI Storage Server now handles Intel SPDK backing storage Vladislav Bolkhovitin
-- strict thread matches above, loose matches on Subject: below --
2017-09-06 22:48 David Butterfield
2017-09-06 3:25 Vladislav Bolkhovitin
2017-09-05 7:25 David Butterfield
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.