From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============2700031065150415651==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] SPDK Dynamic Threading Model Date: Fri, 25 May 2018 19:03:04 +0000 Message-ID: <1527274981.55770.61.camel@intel.com> In-Reply-To: C3FC2C3C-BADC-461F-BD88-5946B0E0E0B7@netapp.com List-ID: To: spdk@lists.01.org --===============2700031065150415651== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable I've been doing my best to think this through over the last few days, as ha= ve a number of other community members, and some things are beginning to look a = bit clearer now. SPDK was always intended to be a composable set of libraries as opposed to a framework. By that, I mean that SPDK is intended to be integrated into other applications as opposed to existing code being integrated into SPDK. The community has done a lot of work to attempt to make that happen, with varyi= ng degrees of success. The challenges are primarily centered on two things. Fi= rst, SPDK requires special memory management operations to allocate DMA-safe mem= ory. This stems from the strict requirement to avoid data copies. The problem wo= uld essentially go away if SPDK instead internally allocated DMA-safe memory and copied user data into those buffers, but the performance would take a big h= it. Second, SPDK avoids locks by instead passing messages between threads. That means that many components (although not all) within SPDK imply that the application is using a certain threading model. Specifically, the threading model needs to look like cooperative multi-tasking, or futures and promises= , or event loops, etc. So far the consensus seems to be that it is acceptable to assume there is some threading model that is conducive to message passing, = but we don't want to specifically pick a single model or framework. The problem that John, Madhu, and the others at NetApp have identified is t= hat SPDK currently makes entirely too many assumptions about and places too many strict requirements on the mechanics of the threading model in an applicati= on. I think there is a strong consensus that fixing this is important and should = be high priority. The fix, ultimately, will be better abstractions around the underlying application's threading model. I hope we can design something th= at will enable people to plug SPDK into all sorts of frameworks - green thread= ing frameworks, DPDK lthreads, Seastar, coroutine frameworks, etc. The more peo= ple we can get participating in this work, the better the abstractions will be,= so please everyone chime in with requirements and ideas. The current set of patches break the 1:1 mapping between reactors and cores. Instead, reactors are stored on a global list. Each core iterates on this g= lobal list and pulls the next reactor and processes any waiting events and execut= es pollers, then places the reactor back on the list. I'm concerned about three things with this design: * Since the reactors now potentially execute on a different core each time through their loop, the CPU cache is going to be badly thrashed. I suspect = the performance hit here is very large and continues to grow as additional thre= ads are added. SPDK is designed to scale linearly with the addition of CPU core= s as much as possible, and I think it would be a mistake to move away from that. = * All NUMA-awareness has been lost. Placing the processing of I/O on the sa= me NUMA node as the NIC or SSD is critical to achieving high performance, so t= he code needs to remain NUMA-aware. * All threads are polling a single queue of reactors, so the atomic variabl= es controlling the head and tail of that queue are going to be highly contende= d and become more contended as the number of threads increases. I hope this is just the beginning of a larger discussion. I'll let the patch review settle into next week and see if solutions begin to emerge. Thanks, Ben On Thu, 2018-05-24 at 02:24 +0000, Meneghini, John wrote: > Hi Frank. > = > Thanks for your suggestion. > = > In our implementation/application, we don=E2=80=99t use DPDK. This is wh= y the first > set of changes we proposed last year were to abstract out the dependencie= s on > DPK. I think I still have copy of the old pull request around for referen= ce. > = > https://github.com/spdk/spdk/pull/152 > = > We are actually running SPDK in a completely different execution environm= ent, > and we need a =E2=80=9Cnative=E2=80=9D SPDK dynamic threading model that = can be supported on > any platform, without DPDK. > = > An second RFC patch has been pushed up to GerritHub for review. Please s= ee > the commit message of these two patches for a complete description of the > proposed change. > = > https://review.gerrithub.io/#/c/spdk/spdk/+/412277/ > = > https://review.gerrithub.io/#/c/spdk/spdk/+/412093/ > = > /John > = > 40.5. The L-thread subsystem > The L-thread subsystem resides in the examples/performance-thread/common > directory and is built and linked automatically when building the l3fwd- > thread example. > = > The subsystem provides a simple cooperative scheduler to enable arbitrary > functions to run as cooperative threads within a single EAL thread. The > subsystem provides a pthread like API that is intended to assist in reuse= of > legacy code written for POSIX pthreads. > = > The following sections provide some detail on the features, constraints, > performance and porting considerations when using L-threads. > = > = > = > From: SPDK on behalf of Huang Frank il.com> > Reply-To: Storage Performance Development Kit > Date: Wednesday, May 23, 2018 at 9:46 PM > To: Storage Performance Development Kit > Subject: [SPDK] =E7=AD=94=E5=A4=8D: SPDK Dynamic Threading Model > = > Hi, > = > Why not consider to use lpthread provided by DPDK? > http://dpdk.org/doc/guides-16.04/sample_app_ug/performance_thread.html#lt= hread > -subsystem = > = > = > = > Frank Huang > = > = > =E5=8F=91=E4=BB=B6=E4=BA=BA: SPDK =E4=BB=A3= =E8=A1=A8 Meneghini, John p.com> > =E5=8F=91=E9=80=81=E6=97=B6=E9=97=B4: 2018=E5=B9=B45=E6=9C=8823=E6=97=A5 = 4:12 > =E6=94=B6=E4=BB=B6=E4=BA=BA: Storage Performance Development Kit > =E4=B8=BB=E9=A2=98: [SPDK] RFC: SPDK Dynamic Threading Model > = > As discussed during the Summit last week, we believe SPDK needs support f= or a > dynamic threading model. An RFC patch has been pushed upstream for revie= w. > = > https://review.gerrithub.io/#/c/spdk/spdk/+/412093/ > = > This patch is a beginning point for our proposed changes. Improvements wi= ll be > made with subsequent patches. > = > The description below is taken from https://github.com/spdk/spdk/issues/3= 08 > SPDK needs to support a dynamic threading model where reactors are NOT bo= und > to lcores. > Many applications need SPDK to support a threading model that: > Does not assume a static number of threads > Does not bind threads to cores (this burns up cores) > Does not assume all treads use the same polling model > Removing these assumptions from the SPDK libraries will allow: > Different applications to share the SPDK libraries on the same platform > E.g. FC-NVMe, RDMA-NVMe, and NVMe > Different platforms to support the same applications with the same librar= ies > E.g. a 4 core platform and a 128 core plaform, a PowerPC and NFS traffic > Different workloads at different scales > E.g. 1 NVMF Host with 1 Subsystem and 1 Namespace, or 16 NVMF Hosts with = 100 > Subsystems and 1,000 namespaces. > In particular, in SPDK, NVMF threads need to come and go depending upon t= he > =E2=80=9CNVMF load=E2=80=9D. > More Dynamic Use Cases Coming > With the advent of FC-NVMe (which uses NPIV to visualize FC ports) NVMF > Subsystem Ports and Host Ports are not static. Different Hosts and Subsys= tems > can have a different number of Ports, and Ports can be dynamically added = and > removed from the configuration. This means: > The same platform may end up having different number of Subsystem ports at > various points in its lifecycle > The SPDK FC-NVMe application does NOT know up front how many ports it will > have. > Expected Behavior > SPDK libraries should not assume a static number of threads > SPDK libraries should bind threads to cores only optionally - supporting = both > static and dynamic threading models > SPDK libraries should support a Hybrid polling model (modified run to > completion) > Current Behavior > SPDK libraries assume a static number of threads > SPDK libraries bind threads to cores > SPDK libraries assume all treads use the same polling model > Possible Solution > Proposal to solve above Use Cases: > Use the spdk_nvmf_poll_group (PG) as the unit of threading abstraction > Use PG as the fundamental unit on which a thread operates > The spdk_thread will be a =E2=80=9Cvirtual=E2=80=9D thread that gets tied= into a PG (1-1 > relationship) > Create PGs as and when hardware ports (and associated queue-pairs) come to > life. > No dependency between a PG and a =E2=80=9Creal=E2=80=9D thread. > A PG can be picked up by any =E2=80=9Creal=E2=80=9D thread and worked upo= n. The PG contains > everything needed for IO handling. > PG continues to contain spdk_thread. spdk_thread continues same mechanism= s for > IO channels to different NS etc. etc. > PG contains vendor data. Eg. A =E2=80=9Cring=E2=80=9D for depositing asyn= chronous callback > events from the backend OR management events that come from external modu= les. > spdk_thread contains thread_context that points to a PG instead of a reac= tor. > So messages from the library get routed to the PG =E2=80=9Cring=E2=80=9D = instead of a > thread/reactor event ring.spdk_bdev_get_io > Understanding the intent of the event library, it is believed this is the > place for customization. However, the current event library assumes a > threading model that's a part of the util library. Moreover, many of the = other > SPDK core libraries assume the same threading model as the util library. = If > the SPDK util library can be modified to support these use dynamic thread= ing > use cases, all applications would be able to use the SPDK framework more > effectively. > Steps to Reproduce > This is an enhancement. There is no bug. > Context (Environment including OS version, SPDK version, etc.) > Would like to provide these enhancements in V18.07. > = > = > = > = > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============2700031065150415651==--