From mboxrd@z Thu Jan 1 00:00:00 1970 Content-Type: multipart/mixed; boundary="===============3654086528248758686==" MIME-Version: 1.0 From: Walker, Benjamin Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes Date: Wed, 06 Jul 2016 23:23:25 +0000 Message-ID: <1467847403.5999.227.camel@intel.com> In-Reply-To: D9560002-21D1-4417-84FD-0753568C8564@xeograph.com List-ID: To: spdk@lists.01.org --===============3654086528248758686== Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable On Wed, 2016-07-06 at 22:56 +0000, Will Del Genio wrote: > Ben, > = > I tried capping my queue depth to 128 and it significantly improved the p= roblem (I=E2=80=99m not entirely > sure if it was completely eliminated). Can you try reverting commit=C2=A0ff7e2122c74b09e5961cbcb2622fda9c0087f48f = =C2=A0and see if that fixes the problem? I believe we see the problem and it will only occur with 256 or gr= eater queue depth. If reverting that commit solves the problem, we'll submit a patch to fix it th= e right way and I'll explain what happened. > = > To get the vars I had to put the qd back to 256: > = > (gdb) p *qpair > $1 =3D {sq_tdbl =3D 0x7ffff7ff2008, cq_hdbl =3D 0x7ffff7ff200c, cmd =3D 0= x7fffdaae0000, cpl =3D > 0x7fffdaade000, free_tr =3D {lh_first =3D 0x0}, outstanding_tr =3D {lh_fi= rst =3D 0x7fffdaabf000}, tr =3D > 0x7fffdaa5d000, > =C2=A0 queued_req =3D {stqh_first =3D 0x7fffea91a500, stqh_last =3D 0x7ff= fea8efec0}, id =3D 1, num_entries =3D > 256, sq_tail =3D 121, cq_head =3D 249, phase =3D 0 '\000', is_enabled =3D= true, sq_in_cmb =3D false, > =C2=A0 qprio =3D 0 '\000', ctrlr =3D 0x7fffddca6740, tailq =3D {tqe_next = =3D 0x0, tqe_prev =3D 0x7fffddca7b70}, > cmd_bus_addr =3D 35430203392, cpl_bus_addr =3D 35430195200} > (gdb) p qpair->cpl[qpair->cq_head -1] > $2 =3D {cdw0 =3D 0, rsvd1 =3D 0, sqhd =3D 250, sqid =3D 1, cid =3D 97, st= atus =3D {p =3D 0, sc =3D 0, sct =3D 0, rsvd2 > =3D 0, m =3D 0, dnr =3D 0}} > =C2=A0(gdb) p *(qpair->sq_tdbl) > $4 =3D 121 > =C2=A0(gdb) p *(qpair->cq_hdbl) > $5 =3D 249 > (gdb) > = > On 7/6/16, 4:40 PM, "SPDK on behalf of Walker, Benjamin" enjamin.walker(a)intel.com> wrote: > = > On Wed, 2016-07-06 at 21:00 +0000, Will Del Genio wrote: > > Ben, > > = > > Thanks, you explained that very well.=C2=A0=C2=A0I=E2=80=99m working wi= th a random 4k read only workload of queue > > depth 256.=C2=A0 > = > Can you try capping your queue depth at 128? That's the maximum I/O we al= low outstanding at the > hardware. The NVMe driver should be doing software queueing beyond that a= utomatically, but this > data > point will help narrow down the problem. > = > > =C2=A0I=E2=80=99m using 4 drives and one thread per drive.=C2=A0=C2=A0I= f it=E2=80=99s true that there are just no more > > completions to handle, then I will recheck the code I wrote to keep tra= ck of the number of > > outstanding read requests. > > = > > Here is the qpair: > > (gdb) p *qpair > > $1 =3D {sq_tdbl =3D 0x7ffff7ff2008, cq_hdbl =3D 0x7ffff7ff200c, cmd =3D= 0x7fffdaae0000, cpl =3D > > 0x7fffdaade000, free_tr =3D {lh_first =3D 0x0}, outstanding_tr =3D {lh_= first =3D 0x7fffdaad8000}, tr =3D > > 0x7fffdaa5d000, > > =C2=A0 queued_req =3D {stqh_first =3D 0x7fffea9a1780, stqh_last =3D 0x7= fffea9cdf40}, id =3D 1, num_entries =3D > > 256, sq_tail =3D 249, cq_head =3D 121, phase =3D 0 '\000', is_enabled = =3D true, sq_in_cmb =3D false, > = > Note how sq_tail - cq_head is 128, meaning the driver believes there to b= e 128 commands > outstanding. > The driver's view of the world (commands outstanding) doesn't line up wit= h us not getting any NVMe > completions - there is definitely a problem here. > = > > =C2=A0 qprio =3D 0 '\000', ctrlr =3D 0x7fffddca6740, tailq =3D {tqe_nex= t =3D 0x0, tqe_prev =3D 0x7fffddca7b70}, > > cmd_bus_addr =3D 35430203392, cpl_bus_addr =3D 35430195200} > > (gdb) p qpair->phase > > $2 =3D 0 '\000' > > (gdb) p qpair->cpl[qpair->cq_head] > > $3 =3D {cdw0 =3D 0, rsvd1 =3D 0, sqhd =3D 132, sqid =3D 1, cid =3D 112,= status =3D {p =3D 1, sc =3D 0, sct =3D 0, > > rsvd2 =3D 0, m =3D 0, dnr =3D 0}} > = > Can you print out the following 3 things: > - qpair->cpl[qpair->cq_head - 1] > - qpair->sq_tdbl > - qpair->cq_hdbl > = > > = > > -Will > > = > > On 7/6/16, 3:50 PM, "SPDK on behalf of Walker, Benjamin" > b > > enjamin.walker(a)intel.com> wrote: > > = > > On Wed, 2016-07-06 at 20:33 +0000, Will Del Genio wrote: > > > Andrey, > > > =C2=A0 > > > I was able to step into the spdk_nvme_qpair_process_completions() fun= ction with gdb and found > > > the > > > reason is isn=E2=80=99t returning any completions is because this che= ck is failing at line 469: if > > > (cpl- > > > > status.p !=3D qpair->phase) > > > =C2=A0 > > > Relevant gdb info here: > > > Thread 4 "xg:nvmeIo:9" hit Breakpoint 5, spdk_nvme_qpair_process_comp= letions > > > (qpair=3D0x7ffff0a5baa0, max_completions=3D0) at nvme_qpair.c:463 > > > 463=C2=A0=C2=A0=C2=A0in nvme_qpair.c > > > (gdb) p qpair->cpl[qpair->cq_head] > > > $11 =3D {cdw0 =3D 0, rsvd1 =3D 0, sqhd =3D 51, sqid =3D 1, cid =3D 27= , status =3D {p =3D 0, sc =3D 0, sct =3D 0, > > > rsvd2 > > > =3D 0, m =3D 0, dnr =3D 0}} > > > (gdb) p qpair->phase > > > $12 =3D 1 '\001' > > > =C2=A0 > > > What does this mean?=C2=A0=C2=A0Does this information help at all? > > = > > The NVMe hardware queue pairs consist of two arrays - one of commands a= nd the other of responses > > - > > and a set of head and tail indices. The arrays are circular, so you can= loop back around to the > > beginning. Each entry in the array contains a phase bit, which is eithe= r 1 or 0. On the first > > pass > > through the array, new entries in the queue are marked by setting their= phase to 1. On the next > > pass > > through the array, new elements are marked by setting their phase bit t= o 0, etc. The current > > iteration's expected phase value is stored in qpair->phase. So the code= you are looking at on > > lines > > 467-470 is basically saying: > > = > > 1) Grab the completion entry at the head of the queue > > 2) Check its phase bit. If it hasn't toggled, there is no new completio= n, so exit > > = > > All that means is that there are no completions outstanding for any com= mands according to the > > SSD, > > which doesn't narrow much down. At the point where you are broken in, c= an you just dump out the > > whole qpair structure? Something like "p *qpair" should do it. That way= I can see if there are > > any > > commands actually pending at the device and what state the device is in= . My expectation is that > > there aren't any commands outstanding. > > = > > Can you also provide some background as to what kind of I/O you're subm= itting (read or write, > > size, > > queue depth, etc.) when this occurs? > > = > > > -Will > > > =C2=A0 > > > From: SPDK on behalf of Andrey Kuzmin <= andrey.v.kuzmin(a)gmail.com> > > > Reply-To: Storage Performance Development Kit > > > Date: Wednesday, July 6, 2016 at 2:35 PM > > > To: Storage Performance Development Kit > > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not bein= g called sometimes > > > = > > > =C2=A0 > > > = > > > =C2=A0 > > > = > > > On Wed, Jul 6, 2016, 20:56 Will Del Genio = wrote: > > > = > > > Andrey, > > > That sounds exactly like what we are experiencing, however we=E2=80= =99re working off the spdk codebase > > > that was current as of last week and are still experiencing the issue= .=C2=A0=C2=A0Do you know what the > > > resource allocation fault was and how we might be able to determine i= f that is still > > > occurring? > > > = > > > = > > > = > > > I'll take a look at commit log, both SPDK and mine, and will get back= to you. > > > = > > > =C2=A0 > > > = > > > Regards, > > > = > > > Andrey > > > = > > > Ben, > > > We=E2=80=99re ASSERTing that the result of spdk_nvme_ns_cmd_read() = =3D=3D 0.=C2=A0=C2=A0If I set our queue depth > > > high > > > enough it will fail that assertion, as would be expected.=C2=A0=C2=A0= Whatever other failure we=E2=80=99re > > > experiencing does not seem to be causing spdk_nvme_ns_cmd_read() to r= eturn an error code. > > > =C2=A0 > > > Also I performed some tests with the spdk perf tool and was not able = to replicate our > > > problem.=C2=A0=C2=A0It > > > ran fine at various queue depths and core masks.=C2=A0=C2=A0When the = qd was set too high, it failed > > > gracefully with an error message.=C2=A0=C2=A0This is all as expected. > > > =C2=A0 > > > I=E2=80=99d like to continue down the path of investigating if some r= esource allocation or something > > > else > > > is failing silently for us.=C2=A0=C2=A0Any specific ideas? > > > =C2=A0 > > > Thanks! > > > Will > > > =C2=A0 > > > From: SPDK on behalf of Andrey Kuzmin <= andrey.v.kuzmin(a)gmail.com> > > > Reply-To: Storage Performance Development Kit > > > Date: Wednesday, July 6, 2016 at 12:01 PM > > > To: Storage Performance Development Kit > > > Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not bein= g called sometimes > > > = > > > = > > > = > > > =C2=A0 > > > = > > > On Wed, Jul 6, 2016 at 6:35 PM, Walker, Benjamin wrote: > > > = > > > Hi Will, > > > = > > > =C2=A0 > > > = > > > Since I can't see the code for your application I'd like to try and r= eproduce the problem with > > > code that I have some visibility into. Are you able to reproduce the = problem using our perf > > > tool > > > (examples/nvme/perf)? If you aren't, this is likely a problem with yo= ur test application and > > > not > > > SPDK.=C2=A0 > > > = > > > = > > > =C2=A0 > > > = > > > I had been witnessing a similar issue with an earlier SPDK release, b= ack around Feb, where the > > > submit call was failing due to the resource allocation fault and neit= her returning an error > > > nor > > > invoking the callback, but my issue has been fixed in the recent rele= ase (I can't recall the > > > actual commit, but there definitely was one dealing exactly with the = cause). > > > =C2=A0 > > > = > > > =C2=A0 > > > = > > > Based on the symptoms, my best guess is that your memory pool ran out= of request objects. The > > > first thing to check is whether spdk_nvme_ns_cmd_read failed. If it f= ails, it won't call the > > > callback. You can check for failure by looking at the return value - = see the documentation > > > here. > > > Your application allocates this memory pool up front - all of our exa= mples allocate 8k > > > requests > > > (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a l= arge enough pool to > > > handle > > > the maximum number of outstanding requests you plan to have. We recen= tly added a "hello_world" > > > style example for the NVMe driver at https://github.com/spdk/spdk/tre= e/master/examples/nvme/he > > > ll > > > o_ > > > world with tons of comments. One of the comments explains this memory= pool in detail. > > > = > > > =C2=A0 > > > = > > > That memory pool allocation is a bit of a wart on our otherwise clean= API. We're looking at > > > different strategies to clean that up. Let me know what the result of= the debugging is and > > > I'll > > > shoot you some more ideas to try if necessary. > > > = > > > = > > > =C2=A0 > > > = > > > Are there any plans regarding the global request pool rework? > > > = > > > =C2=A0 > > > = > > > Regards, > > > = > > > Andrey > > > = > > > =C2=A0 > > > = > > > =C2=A0 > > > = > > > Thanks, > > > = > > > Ben > > > = > > > =C2=A0 > > > = > > > On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote: > > > = > > > Hello, > > > We have written a test application that is utilizing the spdk library= to benchmark a set of 3 > > > Intel P3700 drives and a single 750 drive (concurrently).=C2=A0=C2=A0= We=E2=80=99ve done some testing using fio > > > and > > > the kernel nvme drivers and have had no problem achieving the claimed= IOPs (4k random read) of > > > all > > > drives on our system. > > > =C2=A0 > > > What we have found during our testing is that spdk will sometimes sta= rt to silently fail to > > > call > > > the callback passed to spdk_nvme_ns_cmd_read in the following situati= ons: > > > 1.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0Testing a single drive an= d passing in 0 for max_completions to > > > spdk_nvme_qpair_process_completions().=C2=A0=C2=A0We haven=E2=80=99t = seen any issues with single drive testing > > > when > > > max_completions was > 0. > > > 2.=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0=C2=A0Testing all four drives a= t once will result in one drive failing to receive > > > callbacks, > > > seemingly regardless of what number we pass for max_completions (1 th= rough 128). > > > =C2=A0 > > > Here are other observations we=E2=80=99ve made > > > -When the callbacks fail to be called for a drive, they fail to be ca= lled for the remaining > > > duration of the test. > > > -The drive that =E2=80=98fails=E2=80=99 when testing 4 drives concurr= ently varies from test to test. > > > -=E2=80=98failure=E2=80=99 of a drive seems to be correlated with the= number of outstanding read operations, > > > though it is not a strict correlation. > > > =C2=A0 > > > Our system is a dual socket=C2=A0=C2=A0E5-2630 v3.=C2=A0=C2=A0One dri= ve is on a PCI slot for CPU 0 and the other 3 > > > are > > > on PCI slots on CPU 1.=C2=A0=C2=A0The master/slave threads are on the= the same cpu socket as the nvme > > > device > > > they are talking to. > > > =C2=A0 > > > We=E2=80=99d like to know what is causing this issue and what we can = do to help investigate the > > > problem.=C2=A0 > > > What other information can we provide?=C2=A0=C2=A0Is there some part = of the spdk code that we can look > > > at > > > to > > > help determine the cause? > > > =C2=A0 > > > Thanks, > > > Will > > > =C2=A0 > > > = > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > > = > > > = > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > > = > > > = > > > =C2=A0 > > > = > > > = > > > = > > > = > > > = > > > = > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > > = > > > -- > > > = > > > Regards, > > > Andrey > > > = > > > = > > > = > > > = > > > _______________________________________________ > > > SPDK mailing list > > > SPDK(a)lists.01.org > > > https://lists.01.org/mailman/listinfo/spdk > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > > = > > = > > _______________________________________________ > > SPDK mailing list > > SPDK(a)lists.01.org > > https://lists.01.org/mailman/listinfo/spdk > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk > = > = > _______________________________________________ > SPDK mailing list > SPDK(a)lists.01.org > https://lists.01.org/mailman/listinfo/spdk --===============3654086528248758686==--