From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============7001616377348266328==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] Problem with Blobstore when write 65MB continously Date: Wed, 10 Jan 2018 20:53:39 +0000 Message-ID: <1515617617.6063.79.camel@intel.com> In-Reply-To: CANvN+ek=pdYkE0JSf5eycgyeFzpPy=sOEjT=1xD0+1yTkTbBqg@mail.gmail.com List-ID: To: spdk@lists.01.org --===============7001616377348266328== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, 2018-01-10 at 19:28 +0000, Andrey Kuzmin wrote: > On Wed, Jan 10, 2018, 20:17 Walker, Benjamin > wrote: > > On Wed, 2018-01-10 at 17:00 +0000, Andrey Kuzmin wrote: > > > It appears quite logical to start submission with a check for pending > > > completions, doesn't it? Or check for completions if downstream bdev > > returns > > > busy status. That would definitely meet app expectations whatever the > > request > > > pool size is. > > = > > We've considered checking for completions inside the submission path if= we > > would > > otherwise return ENOMEM. So far, we've decided not to go that direction= for > > two > > reasons. > > = > > 1) Even if we do this, there are still cases where we'll return ENOMEM.= For > > instance, if there are no completions to reap yet. > = > While theoretically possible, such a case is problematic to imagine in > practice. The user has 512 queue depth available and is submitting I/O in a tight loo= p. The submission path through the blobstore and into the NVMe driver probably takes on the order of 500ns to run. That means you can submit your full que= ue depth worth in 256us. On many NAND SSDs that's well within P99 latency expectations for 4KiB I/O, and it gets increasingly likely with larger I/O = to the point where it is almost guaranteed to happen with 128KiB requests. The= user is free to reduce the available queue depth to save memory as well. > > 2) This would result in completion callbacks in response to a submit ca= ll. > > Today, the expectations are set that completions are called in response= to a > > poll call only. > = > Feel free to correct me if I'm wrong, but my recollection is that complet= ion > callback may be called on submission path in case of error. I just checked and for the nvme and bdev libraries an error code will be gi= ven to the user as the return code for the function. The callback will not be c= alled because the failure is known immediately. For the blobstore library it work= s the opposite way - the functions have no return code and instead always call the user callback. I think this is probably a design mistake on my part. For th= ese ENOMEM cases, we need to return that to the user as a return code. That mak= es it much easier to handle the situation and makes it consistent with the other libraries. > The case in question is, apparently, a corner one as application must che= ck > for completions if bdev returns busy status. One cannot run an unlimited = rate > client atop a rate-limited server w/o a poll enforced at some point. > = > It might also be helpful to add a parameter to the poll call specifying t= he > minimum number of completions to reap before returning control to the app= , to > deal with deadlocks like this one. There already is a parameter that limits the number of completions reaped i= n a single poll call. Even if you don't specify a limit, the drivers enforce sensible limits by default. > = > Regards, > Andrey > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > = > -- = > Regards, > Andrey > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============7001616377348266328==--