Re: [SPDK] SPDK Dynamic Threading Model

* Re: [SPDK] SPDK Dynamic Threading Model
@ 2018-05-25 19:03 Walker, Benjamin
  0 siblings, 0 replies; 10+ messages in thread
From: Walker, Benjamin @ 2018-05-25 19:03 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 10055 bytes --]

I've been doing my best to think this through over the last few days, as have a
number of other community members, and some things are beginning to look a bit
clearer now.

SPDK was always intended to be a composable set of libraries as opposed to a
framework. By that, I mean that SPDK is intended to be integrated into other
applications as opposed to existing code being integrated into SPDK. The
community has done a lot of work to attempt to make that happen, with varying
degrees of success. The challenges are primarily centered on two things. First,
SPDK requires special memory management operations to allocate DMA-safe memory.
This stems from the strict requirement to avoid data copies. The problem would
essentially go away if SPDK instead internally allocated DMA-safe memory and
copied user data into those buffers, but the performance would take a big hit.
Second, SPDK avoids locks by instead passing messages between threads. That
means that many components (although not all) within SPDK imply that the
application is using a certain threading model. Specifically, the threading
model needs to look like cooperative multi-tasking, or futures and promises, or
event loops, etc. So far the consensus seems to be that it is acceptable to
assume there is some threading model that is conducive to message passing, but
we don't want to specifically pick a single model or framework.

The problem that John, Madhu, and the others at NetApp have identified is that
SPDK currently makes entirely too many assumptions about and places too many
strict requirements on the mechanics of the threading model in an application. I
think there is a strong consensus that fixing this is important and should be
high priority. The fix, ultimately, will be better abstractions around the
underlying application's threading model. I hope we can design something that
will enable people to plug SPDK into all sorts of frameworks - green threading
frameworks, DPDK lthreads, Seastar, coroutine frameworks, etc. The more people
we can get participating in this work, the better the abstractions will be, so
please everyone chime in with requirements and ideas.

The current set of patches break the 1:1 mapping between reactors and cores.
Instead, reactors are stored on a global list. Each core iterates on this global
list and pulls the next reactor and processes any waiting events and executes
pollers, then places the reactor back on the list. I'm concerned about three
things with this design:

* Since the reactors now potentially execute on a different core each time
through their loop, the CPU cache is going to be badly thrashed. I suspect the
performance hit here is very large and continues to grow as additional threads
are added. SPDK is designed to scale linearly with the addition of CPU cores as
much as possible, and I think it would be a mistake to move away from that. 
* All NUMA-awareness has been lost. Placing the processing of I/O on the same
NUMA node as the NIC or SSD is critical to achieving high performance, so the
code needs to remain NUMA-aware.
* All threads are polling a single queue of reactors, so the atomic variables
controlling the head and tail of that queue are going to be highly contended and
become more contended as the number of threads increases.

I hope this is just the beginning of a larger discussion. I'll let the patch
review settle into next week and see if solutions begin to emerge.

Thanks,
Ben

On Thu, 2018-05-24 at 02:24 +0000, Meneghini, John wrote:
> Hi Frank.
>  
> Thanks for your suggestion.
>  
> In our implementation/application, we don’t use DPDK.  This is why the first
> set of changes we proposed last year were to abstract out the dependencies on
> DPK. I think I still have copy of the old pull request around for reference.
>  
> https://github.com/spdk/spdk/pull/152
>  
> We are actually running SPDK in a completely different execution environment,
> and we need a “native” SPDK dynamic threading model that can be supported on
> any platform, without DPDK.
>  
> An second RFC patch has been pushed up to GerritHub for review.  Please see
> the commit message of these two patches for a complete description of the
> proposed change.
>  
> https://review.gerrithub.io/#/c/spdk/spdk/+/412277/
>  
> https://review.gerrithub.io/#/c/spdk/spdk/+/412093/
>  
> /John
>  
> 40.5. The L-thread subsystem
> The L-thread subsystem resides in the examples/performance-thread/common
> directory and is built and linked automatically when building the l3fwd-
> thread example.
> 
> The subsystem provides a simple cooperative scheduler to enable arbitrary
> functions to run as cooperative threads within a single EAL thread. The
> subsystem provides a pthread like API that is intended to assist in reuse of
> legacy code written for POSIX pthreads.
> 
> The following sections provide some detail on the features, constraints,
> performance and porting considerations when using L-threads.
> 
>  
>  
> From: SPDK <spdk-bounces(a)lists.01.org> on behalf of Huang Frank <kinzent(a)hotma
> il.com>
> Reply-To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Date: Wednesday, May 23, 2018 at 9:46 PM
> To: Storage Performance Development Kit <spdk(a)lists.01.org>
> Subject: [SPDK] 答复: SPDK Dynamic Threading Model
>  
> Hi,
>  
> Why not consider to use lpthread provided by DPDK?
> http://dpdk.org/doc/guides-16.04/sample_app_ug/performance_thread.html#lthread
> -subsystem 
>  
>  
> 
> Frank Huang
> 
>  
> 发件人: SPDK <spdk-bounces(a)lists.01.org> 代表 Meneghini, John <John.Meneghini(a)netap
> p.com>
> 发送时间: 2018年5月23日 4:12
> 收件人: Storage Performance Development Kit
> 主题: [SPDK] RFC: SPDK Dynamic Threading Model
>  
> As discussed during the Summit last week, we believe SPDK needs support for a
> dynamic threading model.  An RFC patch has been pushed upstream for review.
>  
> https://review.gerrithub.io/#/c/spdk/spdk/+/412093/
>  
> This patch is a beginning point for our proposed changes. Improvements will be
> made with subsequent patches.
>  
> The description below is taken from https://github.com/spdk/spdk/issues/308
> SPDK needs to support a dynamic threading model where reactors are NOT bound
> to lcores.
> Many applications need SPDK to support a threading model that:
> Does not assume a static number of threads
> Does not bind threads to cores (this burns up cores)
> Does not assume all treads use the same polling model
> Removing these assumptions from the SPDK libraries will allow:
> Different applications to share the SPDK libraries on the same platform
> E.g. FC-NVMe, RDMA-NVMe, and NVMe
> Different platforms to support the same applications with the same libraries
> E.g. a 4 core platform and a 128 core plaform, a PowerPC and NFS traffic
> Different workloads at different scales
> E.g. 1 NVMF Host with 1 Subsystem and 1 Namespace, or 16 NVMF Hosts with 100
> Subsystems and 1,000 namespaces.
> In particular, in SPDK, NVMF threads need to come and go depending upon the
> “NVMF load”.
> More Dynamic Use Cases Coming
> With the advent of FC-NVMe (which uses NPIV to visualize FC ports) NVMF
> Subsystem Ports and Host Ports are not static. Different Hosts and Subsystems
> can have a different number of Ports, and Ports can be dynamically added and
> removed from the configuration. This means:
> The same platform may end up having different number of Subsystem ports at
> various points in its lifecycle
> The SPDK FC-NVMe application does NOT know up front how many ports it will
> have.
> Expected Behavior
> SPDK libraries should not assume a static number of threads
> SPDK libraries should bind threads to cores only optionally - supporting both
> static and dynamic threading models
> SPDK libraries should support a Hybrid polling model (modified run to
> completion)
> Current Behavior
> SPDK libraries assume a static number of threads
> SPDK libraries bind threads to cores
> SPDK libraries assume all treads use the same polling model
> Possible Solution
> Proposal to solve above Use Cases:
> Use the spdk_nvmf_poll_group (PG) as the unit of threading abstraction
> Use PG as the fundamental unit on which a thread operates
> The spdk_thread will be a “virtual” thread that gets tied into a PG (1-1
> relationship)
> Create PGs as and when hardware ports (and associated queue-pairs) come to
> life.
> No dependency between a PG and a “real” thread.
> A PG can be picked up by any “real” thread and worked upon. The PG contains
> everything needed for IO handling.
> PG continues to contain spdk_thread. spdk_thread continues same mechanisms for
> IO channels to different NS etc. etc.
> PG contains vendor data. Eg. A “ring” for depositing asynchronous callback
> events from the backend OR management events that come from external modules.
> spdk_thread contains thread_context that points to a PG instead of a reactor.
> So messages from the library get routed to the PG “ring” instead of a
> thread/reactor event ring.spdk_bdev_get_io
> Understanding the intent of the event library, it is believed this is the
> place for customization. However, the current event library assumes a
> threading model that's a part of the util library. Moreover, many of the other
> SPDK core libraries assume the same threading model as the util library. If
> the SPDK util library can be modified to support these use dynamic threading
> use cases, all applications would be able to use the SPDK framework more
> effectively.
> Steps to Reproduce
> This is an enhancement. There is no bug.
> Context (Environment including OS version, SPDK version, etc.)
> Would like to provide these enhancements in V18.07.
>  
>  
>  
>  
> _______________________________________________
> SPDK mailing list
> SPDK(a)lists.01.org
> https://lists.01.org/mailman/listinfo/spdk

^ permalink raw reply	[flat|nested] 10+ messages in thread