From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============5381323358316578172==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes Date: Wed, 06 Jul 2016 21:40:03 +0000 Message-ID: <1467841201.5999.222.camel@intel.com> In-Reply-To: 9F03029A-8630-4F27-BD92-1D7D02743FDB@xeograph.com List-ID: To: spdk@lists.01.org --===============5381323358316578172== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote: > Ben, > = > Thanks, you explained that very well.=C2=A0=C2=A0I=E2=80=99m working with= a random 4k read only workload of queue > depth 256.=C2=A0 Can you try capping your queue depth at 128? That's the maximum I/O we allo= w outstanding at the hardware. The NVMe driver should be doing software queueing beyond that aut= omatically, but this data point will help narrow down the problem. > =C2=A0I=E2=80=99m using 4 drives and one thread per drive.=C2=A0=C2=A0If = it=E2=80=99s true that there are just no more > completions to handle, then I will recheck the code I wrote to keep track= of the number of > outstanding read requests. > = > Here is the qpair: > (gdb) p *qpair > $1 =3D {sq_tdbl =3D 0x7ffff7ff2008, cq_hdbl =3D 0x7ffff7ff200c, cmd =3D 0= x7fffdaae0000, cpl =3D > 0x7fffdaade000, free_tr =3D {lh_first =3D 0x0}, outstanding_tr =3D {lh_fi= rst =3D 0x7fffdaad8000}, tr =3D > 0x7fffdaa5d000, > =C2=A0 queued_req =3D {stqh_first =3D 0x7fffea9a1780, stqh_last =3D 0x7ff= fea9cdf40}, id =3D 1, num_entries =3D > 256, sq_tail =3D 249, cq_head =3D 121, phase =3D 0 '\000', is_enabled =3D= true, sq_in_cmb =3D false, Note how sq_tail - cq_head is 128, meaning the driver believes there to be = 128 commands outstanding. The driver's view of the world (commands outstanding) doesn't line up with = us not getting any NVMe completions - there is definitely a problem here. > =C2=A0 qprio =3D 0 '\000', ctrlr =3D 0x7fffddca6740, tailq =3D {tqe_next = =3D 0x0, tqe_prev =3D 0x7fffddca7b70}, > cmd_bus_addr =3D 35430203392, cpl_bus_addr =3D 35430195200} > (gdb) p qpair->phase > $2 =3D 0 '\000' > (gdb) p qpair->cpl[qpair->cq_head] > $3 =3D {cdw0 =3D 0, rsvd1 =3D 0, sqhd =3D 132, sqid =3D 1, cid =3D 112, s= tatus =3D {p =3D 1, sc =3D 0, sct =3D 0, > rsvd2 =3D 0, m =3D 0, dnr =3D 0}} Can you print out the following 3 things: - qpair->cpl[qpair->cq_head - 1] - qpair->sq_tdbl - qpair->cq_hdbl > = > -Will > = > On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" enjamin.walker(a)intel.com> wrote: > = > On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote: > > Andrey, > > =C2=A0 > > I was able to step into the spdk_nvme_qpair_process_completions() funct= ion with gdb and found > > the > > reason is isn=E2=80=99t returning any completions is because this check= is failing at line 469: if (cpl- > > > status.p !=3D qpair->phase) > > =C2=A0 > > Relevant gdb info here: > > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_comple= tions > > (qpair=3D0x7ffff0a5baa0, max_completions=3D0) at nvme_qpair.c:463 > > 463=C2=A0=C2=A0=C2=A0in nvme_qpair.c > > (gdb) p qpair->cpl[qpair->cq_head] > > $11 =3D {cdw0 =3D 0, rsvd1 =3D 0, sqhd =3D 51, sqid =3D 1, cid =3D 27, = status =3D {p =3D 0, sc =3D 0, sct =3D 0, > > rsvd2 > > =3D 0, m =3D 0, dnr =3D 0}} > > (gdb) p qpair->phase > > $12 =3D 1 '\001' > > =C2=A0 > > What does this mean?=C2=A0=C2=A0Does this information help at all? > = > The NVMe hardware queue pairs consist of two arrays - one of commands and= the other of responses - > and a set of head and tail indices. The arrays are circular, so you can l= oop back around to the > beginning. Each entry in the array contains a phase bit, which is either = 1 or 0. On the first pass > through the array, new entries in the queue are marked by setting their p= hase to 1. On the next > pass > through the array, new elements are marked by setting their phase bit to = 0, etc. The current > iteration's expected phase value is stored in qpair->phase. So the code y= ou are looking at on > lines > 467-470 is basically saying: > = > 1) Grab the completion entry at the head of the queue > 2) Check its phase bit. If it hasn't toggled, there is no new completion,= so exit > = > All that means is that there are no completions outstanding for any comma= nds according to the SSD, > which doesn't narrow much down. At the point where you are broken in, can= you just dump out the > whole qpair structure? Something like "p *qpair" should do it. That way I= can see if there are any > commands actually pending at the device and what state the device is in. = My expectation is that > there aren't any commands outstanding. > = > Can you also provide some background as to what kind of I/O you're submit= ting (read or write, > size, > queue depth, etc.) when this occurs? > = > > -Will > > =C2=A0 > > From: SPDK on behalf of Andrey Kuzmin > > Reply-To: Storage Performance Development Kit > > Date: Wednesday, July 6, 2016 at 2:35 PM > > To: Storage Performance Development Kit > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being = called sometimes > > = > > =C2=A0 > > = > > =C2=A0 > > = > > On Wed, Jul 6, 2016, 20:56 Will Del Genio wr= ote: > > = > > Andrey, > > That sounds exactly like what we are experiencing, however we=E2=80=99r= e working off the spdk codebase > > that was current as of last week and are still experiencing the issue.= =C2=A0=C2=A0Do you know what the > > resource allocation fault was and how we might be able to determine if = that is still occurring? > > = > > = > > = > > I'll take a look at commit log, both SPDK and mine, and will get back t= o you. > > = > > =C2=A0 > > = > > Regards, > > = > > Andrey > > = > > Ben, > > We=E2=80=99re ASSERTing that the result of spdk_nvme_ns_cmd_read() =3D= =3D 0.=C2=A0=C2=A0If I set our queue depth high > > enough it will fail that assertion, as would be expected.=C2=A0=C2=A0Wh= atever other failure we=E2=80=99re > > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to ret= urn an error code. > > =C2=A0 > > Also I performed some tests with the spdk perf tool and was not able to= replicate our > > problem.=C2=A0=C2=A0It > > ran fine at various queue depths and core masks.=C2=A0=C2=A0When the qd= was set too high, it failed > > gracefully with an error message.=C2=A0=C2=A0This is all as expected. > > =C2=A0 > > I=E2=80=99d like to continue down the path of investigating if some res= ource allocation or something > > else > > is failing silently for us.=C2=A0=C2=A0Any specific ideas? > > =C2=A0 > > Thanks! > > Will > > =C2=A0 > > From: SPDK on behalf of Andrey Kuzmin > > Reply-To: Storage Performance Development Kit > > Date: Wednesday, July 6, 2016 at 12:01 PM > > To: Storage Performance Development Kit > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being = called sometimes > > = > > = > > = > > =C2=A0 > > = > > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin wrote: > > = > > Hi Will, > > = > > =C2=A0 > > = > > Since I can't see the code for your application I'd like to try and rep= roduce the problem with > > code that I have some visibility into. Are you able to reproduce the pr= oblem using our perf tool > > (examples/nvme/perf)? If you aren't, this is likely a problem with your= test application and not > > SPDK.=C2=A0 > > = > > = > > =C2=A0 > > = > > I had been witnessing a similar issue with an earlier SPDK release, bac= k around Feb, where the > > submit call was failing due to the resource allocation fault and neithe= r returning an error nor > > invoking the callback, but my issue has been fixed in the recent releas= e (I can't recall the > > actual commit, but there definitely was one dealing exactly with the ca= use). > > =C2=A0 > > = > > =C2=A0 > > = > > Based on the symptoms, my best guess is that your memory pool ran out o= f request objects. The > > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fai= ls, it won't call the > > callback. You can check for failure by looking at the return value - se= e the documentation here. > > Your application allocates this memory pool up front - all of our examp= les allocate 8k requests > > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a lar= ge enough pool to handle > > the maximum number of outstanding requests you plan to have. We recentl= y added a "hello_world" > > style example for the NVMe driver at https://github.com/spdk/spdk/tree/= master/examples/nvme/hell > > o_ > > world with tons of comments. One of the comments explains this memory p= ool in detail. > > = > > =C2=A0 > > = > > That memory pool allocation is a bit of a wart on our otherwise clean A= PI. We're looking at > > different strategies to clean that up. Let me know what the result of t= he debugging is and I'll > > shoot you some more ideas to try if necessary. > > = > > = > > =C2=A0 > > = > > Are there any plans regarding the global request pool rework? > > = > > =C2=A0 > > = > > Regards, > > = > > Andrey > > = > > =C2=A0 > > = > > =C2=A0 > > = > > Thanks, > > = > > Ben > > = > > =C2=A0 > > = > > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote: > > = > > Hello, > > We have written a test application that is utilizing the spdk library t= o benchmark a set of 3 > > Intel P3700 drives and a single 750 drive (concurrently).=C2=A0=C2=A0We= =E2=80=99ve done some testing using fio and > > the kernel nvme drivers and have had no problem achieving the claimed I= OPs (4k random read) of > > all > > drives on our system. > > =C2=A0 > > What we have found during our testing is that spdk will sometimes start= to silently fail to call > > the callback passed to spdk_nvme_ns_cmd_read in the following situation= s: > > 1.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0Testing a single drive and = passing in 0 for max_completions to > > spdk_nvme_qpair_process_completions().=C2=A0=C2=A0We haven=E2=80=99t se= en any issues with single drive testing > > when > > max_completions was > 0. > > 2.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0Testing all four drives at = once will result in one drive failing to receive callbacks, > > seemingly regardless of what number we pass for max_completions (1 thro= ugh 128). > > =C2=A0 > > Here are other observations we=E2=80=99ve made > > -When the callbacks fail to be called for a drive, they fail to be call= ed for the remaining > > duration of the test. > > -The drive that =E2=80=98fails=E2=80=99 when testing 4 drives concurren= tly varies from test to test. > > -=E2=80=98failure=E2=80=99 of a drive seems to be correlated with the n= umber of outstanding read operations, > > though it is not a strict correlation. > > =C2=A0 > > Our system is a dual socket=C2=A0=C2=A0E5-2630 v3.=C2=A0=C2=A0One drive= is on a PCI slot for CPU 0 and the other 3 > > are > > on PCI slots on CPU 1.=C2=A0=C2=A0The master/slave threads are on the t= he same cpu socket as the nvme > > device > > they are talking to. > > =C2=A0 > > We=E2=80=99d like to know what is causing this issue and what we can do= to help investigate the > > problem.=C2=A0 > > What other information can we provide?=C2=A0=C2=A0Is there some part of= the spdk code that we can look at > > to > > help determine the cause? > > =C2=A0 > > Thanks, > > Will > > =C2=A0 > > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > = > > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > = > > = > > =C2=A0 > > = > > = > > = > > = > > = > > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > = > > -- > > = > > Regards, > > Andrey > > = > > = > > = > > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk > = > = > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============5381323358316578172==--