From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============7150228660987838627==" MIME-Version: 1.0 From: Walker, Benjamin Subject: [SPDK] Re: SPDK socket abstraction layer Date: Wed, 30 Oct 2019 23:20:56 +0000 Message-ID: In-Reply-To: d63f4372-3309-45c3-3e41-222cefca68aa@dev.mellanox.co.il List-ID: To: spdk@lists.01.org --===============7150228660987838627== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, 2019-10-30 at 23:47 +0200, Sasha Kotchubievsky wrote: > We started from following use-case: Initiator running on ARM above = > user-space stack (VMA) + TSO optimization . TSO -TCP segmentation = > offload. This optimization should improve sending large packets (storage = > case). In this case, we see benefit after applying zero-copy in any = > block size bigger than 512B. > = > Obviously, you test x86 and target side. We didn't tested that yet. So, = > I can't say what is a bottleneck in target (x86) case. I'd suggest to = > reduce overhead for sending big buffer in network card by configuring = > TSO, or jubmo-frames. After that retest zero-copy solution. Maybe = > without TSO, memcopy is not a real bottleneck. > = > memcpy in target side (on ARM) after applying TSO takes about 18% (4K = > IO). So, I believe, zero-copy should improve performance. The copy is taking ~25% of CPU in our traces normally. I agree that it shou= ld be a huge improvement. > = > We can test your configuration too. It's very interesting understanding = > what's real bottleneck. > = > What configuration do you use? > - What's network card? > - What's OS and kernel? > - NULL devices, or real NVME disks? > - queue depth and number of cores? It's a Mellanox CX-5 100GbE card with Ubuntu 18.10 and kernel 5.4.0-rc4 tha= t we compiled ourselves. TSO and jumbo frames are enabled. The I/O is going to r= eal NVMe devices on the backend (there are 12 P4500 Intel NVMe SSDs attached). = We're running 8 cores on the initiator side, each sending queue depth 64 worth of= 4k I/O. The target is running just one core. However, I just figured out what's wrong, so maybe you can help me fix it. = The zero copy is all working mechanically, except when I get the zero copy completion notification, ee_code is set to SO_EE_CODE_ZEROCOPY_COPIED. So something is not configured correctly in my networking stack and the kernel= is doing deferred copies instead. Any ideas what I'd need to do in order to en= able this? I'm fairly certain I have it working at my desk on Fedora 30 in loopb= ack, based on the CPU traces I'm seeing, so maybe it's just a matter of installi= ng Fedora 30 on the benchmark system instead. > = > Is it enough to apply those two patches for zero-copy in target ? > = > Asynchronous writev https://review.gerrithub.io/c/spdk/spdk/+/470523 > = > MSG_ZEROCOPY use in the posix implementation = > https://review.gerrithub.io/c/spdk/spdk/+/471752 Let me get the patches sorted out in a nice series and then you can grab th= e top of the series for testing. > = > = > BR, > = > Sasha > = > On 30-Oct-19 10:28 PM, Walker, Benjamin wrote: > > On Wed, 2019-10-30 at 21:55 +0200, Sasha Kotchubievsky wrote: > > > Hi Ben, > > > = > > > Great list of patches, > > > This work will , definitely, take NVME-OF TCP at the next level. > > > = > > > We, also, work on zero-copy (TX) in initiator side based on MSG_ZEROC= OPY. > > > Preliminary results are great. In the target side, we still investiga= te > > > the > > > solution. > > I'm very interested to hear about your work here. I've been doing it > > primarily > > on the target side and the preliminary results we're seeing are that it= 's > > slower > > for 4K I/O. That's not really what I expected at all, and I feel like I= must > > be > > missing something with getting the page pinning to hit the fast path > > consistently. It's early days with all of these things, so it's a safe > > assumption that I either coded something incorrectly or the system isn't > > configured right. > > = > > > We run a lot of tests and see great potential for zero-copy. It look = like > > > a > > > real bottleneck. On the send side, it can be removed with POSIX inter= face > > > (MSG_ZEROCOPY), in receive side, it needs deep integration between TCP > > > stack > > > and SPDK. Next week we will have internal brain-storming, and, I hope= , on > > > dev > > > meetup, I'll be ready for discussions. > > > = > > > We will invest in both directions: user-space path and in Linux Kernel > > > path. But, in user-space area, at this stage, VPP is out of our inte= rest. > > > = > > > Best regards > > > Sasha > > > = > > > -----Original Message----- > > > From: Walker, Benjamin > > > Sent: Wednesday, October 30, 2019 8:47 PM > > > To: spdk(a)lists.01.org > > > Subject: [SPDK] Re: SPDK socket abstraction layer > > > = > > > On Wed, 2019-10-30 at 17:54 +0000, Harris, James R wrote: > > > > Hi Sasha, > > > > = > > > > Tomek is only talking about the VPP implementation. There are no > > > > plans to remove the socket abstraction layer. If anything, the > > > > project needs to look at extending it in ways as you suggested. > > > To expand on this, there's a lot of activity right now in the SPDK so= ck > > > abstraction layer to begin to implement asynchronous operations, zero= copy > > > operations, etc. For example, see: > > > = > > > Asynchronous writev > > > https://review.gerrithub.io/c/spdk/spdk/+/470523 > > > = > > > MSG_ZEROCOPY use in the posix implementation > > > https://review.gerrithub.io/c/spdk/spdk/+/471752 > > > = > > > A new sock implementation based on io_uring/libaio: > > > https://review.gerrithub.io/c/spdk/spdk/+/471314 > > > = > > > And a new sock implementation based on Seastar: > > > https://review.gerrithub.io/c/spdk/spdk/+/466629 > > > = > > > So not only is the sock abstraction layer sticking around, but it's > > > getting a > > > lot of focus going forward. There is a lot of innovation happening in= the > > > Linux kernel around networking at all layers that we need to keep up = with. > > > = > > > One thing I would like community feedback on is what to do about the > > > current > > > VPP implementation. As we make improvements and additions to the sock > > > abstraction, it will necessarily require updates to the VPP > > > implementation. We > > > can of course continue to make those, but does the community see valu= e in > > > maintaining support here? I'd really love to see someone take up the > > > mantle on > > > VPP if they believe there is value that we just haven't been able to > > > unlock > > > yet, but absent that it's just a maintenance burden. > > > = > > > Personally speaking, it would be easier for me, as someone trying to > > > evolve > > > the sock abstraction layer, to drop VPP. That's one less implementati= on > > > that I > > > then have to go update and test each time. But I'm very open to opini= ons > > > and > > > feedback here if anyone has something to say. SPDK obviously can't ju= st > > > drop > > > support without strong consensus and a considerable amount of forewar= ning. > > > = > > > Thanks, > > > Ben > > > = > > > > -Jim > > > > = > > > > = > > > > =EF=BB=BFOn 10/30/19, 10:50 AM, "Sasha Kotchubievsky" > > > > > > > > wrote: > > > > = > > > > Hi Tomek, > > > > = > > > > Are you looking for community feedback regarding VPP implement= ation > > > > of > > > > TCP > > > > stack, or about having socket abstraction layer in SPDK? > > > > I think, socket abstraction layer is critical for future > > > > integration between > > > > SPDK and user-space stacks. In Mellanox, we're evaluating > > > > integration > > > > between VMA (https://github.com/Mellanox/libvma) and SPDK. > > > > Although, VMA can > > > > be used as replacement for Kernel implementation of Posix sock= et > > > > interface, > > > > we see great potential in "deep" integration, which definitely > > > > needs > > > > keep > > > > existing abstraction layer. For example, one of potential > > > > improvements > > > > can > > > > be zero-copy in RX (receive) flow. I don't see how that can be > > > > implemented > > > > on top of Linux Kernel stack. > > > > = > > > > Best regards > > > > Sasha > > > > = > > > > -----Original Message----- > > > > From: Zawadzki, Tomasz > > > > Sent: Monday, October 21, 2019 3:01 PM > > > > To: Storage Performance Development Kit > > > > Subject: [SPDK] SPDK socket abstraction layer > > > > = > > > > Hello everyone, > > > > = > > > > Summary: > > > > = > > > > With this message I wanted to update SPDK community on state o= f VPP > > > > socket > > > > abstraction as of SPDK 19.07 release. > > > > At this time there does not seem to be a clear efficiency > > > > improvements with > > > > VPP. There is no further work planned on SPDK and VPP integrat= ion. > > > > = > > > > Details: > > > > = > > > > As some of you may remember, SPDK 18.04 release introduced sup= port > > > > for > > > > alternative socket types. Along with that release, Vector Pack= et > > > > Processing > > > > (VPP) 18.01 was integrated with S= PDK, > > > > by > > > > expanding socket abstraction to use VPP Communications Library > > > > (VCL). > > > > TCP/IP > > > > stack in VPP was in ear= ly > > > > stages back > > > > then and has seen improvements throughout the last year. > > > > = > > > > To better use VPP capabilities, following fruitful collaborati= on > > > > with > > > > VPP > > > > team, in SPDK 19.07, this implementation was changed from VCL = to > > > > VPP Session > > > > API from VPP 19.04.2. > > > > = > > > > VPP socket abstraction has met some challenges due to inherent > > > > design of > > > > both projects, in particular related to running separate proce= sses > > > > and > > > > memory copies. > > > > Seeing improvements from original implementation was encouragi= ng, > > > > yet > > > > measuring against posix socket abstraction (taking into > > > > consideration entire > > > > system, i.e. both processes), results are comparable. In other > > > > words, at > > > > this time there does not seem to be a clear benefit of either > > > > socket > > > > abstraction from standpoint of CPU efficiency or IOPS. > > > > = > > > > With this message I just wanted to update SPDK community on st= ate > > > > of socket > > > > abstraction layers as of SPDK 19.07 release. Each SPDK release > > > > always brings > > > > improvements to the abstraction and its implementations, with > > > > exciting work > > > > on more efficient use of kernel TCP stack - changes in SPDK 19= .10 > > > > and > > > > SPDK > > > > 20.01. > > > > = > > > > However there is no active involvement at this point around VPP > > > > implementation of socket abstraction in SPDK. Contributions in > > > > this area are > > > > always welcome. In case you're interested in implementing furt= her > > > > enhancements of VPP and SPDK integration feel free to reply, o= r to > > > > use > > > > one > > > > of the many SPDK community communications > > > > channels;;;. > > > > = > > > > Thanks, > > > > Tomek > > > > = > > > > _______________________________________________ > > > > SPDK mailing list -- spdk(a)lists.01.org > > > > To unsubscribe send an email to spdk-leave(a)lists.01.org > > > > _______________________________________________ > > > > SPDK mailing list -- spdk(a)lists.01.org > > > > To unsubscribe send an email to spdk-leave(a)lists.01.org > > > > = > > > > = > > > > _______________________________________________ > > > > SPDK mailing list -- spdk(a)lists.01.org To unsubscribe send an ema= il to > > > > spdk-leave(a)lists.01.org > > > _______________________________________________ > > > SPDK mailing list -- spdk(a)lists.01.org > > > To unsubscribe send an email to spdk-leave(a)lists.01.org > > > _______________________________________________ > > > SPDK mailing list -- spdk(a)lists.01.org > > > To unsubscribe send an email to spdk-leave(a)lists.01.org > > _______________________________________________ > > SPDK mailing list -- spdk(a)lists.01.org > > To unsubscribe send an email to spdk-leave(a)lists.01.org > _______________________________________________ > SPDK mailing list -- spdk(a)lists.01.org > To unsubscribe send an email to spdk-leave(a)lists.01.org --===============7150228660987838627==--