All of lore.kernel.org
 help / color / mirror / Atom feed
From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes
Date: Wed, 06 Jul 2016 15:35:09 +0000	[thread overview]
Message-ID: <1467819307.5999.181.camel@intel.com> (raw)
In-Reply-To: 370981E9-A83E-44B9-B3B7-E9AAEE3E2184@xeograph.com

[-- Attachment #1: Type: text/plain, Size: 3349 bytes --]

Hi Will,

Since I can't see the code for your application I'd like to try and reproduce the problem with code that I have some visibility into. Are you able to reproduce the problem using our perf tool (examples/nvme/perf)? If you aren't, this is likely a problem with your test application and not SPDK.

Based on the symptoms, my best guess is that your memory pool ran out of request objects. The first thing to check is whether spdk_nvme_ns_cmd_read failed. If it fails, it won't call the callback. You can check for failure by looking at the return value - see the documentation here<http://www.spdk.io/spdk/doc/nvme_8h.html#a084c6ecb53bd810fbb5051100b79bec5>. Your application allocates this memory pool up front - all of our examples allocate 8k requests (see line 1097 in examples/nvme/perf/perf.c) You need to allocate a large enough pool to handle the maximum number of outstanding requests you plan to have. We recently added a "hello_world" style example for the NVMe driver at https://github.com/spdk/spdk/tree/master/examples/nvme/hello_world with tons of comments. One of the comments explains this memory pool in detail.

That memory pool allocation is a bit of a wart on our otherwise clean API. We're looking at different strategies to clean that up. Let me know what the result of the debugging is and I'll shoot you some more ideas to try if necessary.

Thanks,
Ben

On Tue, 2016-07-05 at 21:03 +0000, Will Del Genio wrote:
Hello,
We have written a test application that is utilizing the spdk library to benchmark a set of 3 Intel P3700 drives and a single 750 drive (concurrently).  We’ve done some testing using fio and the kernel nvme drivers and have had no problem achieving the claimed IOPs (4k random read) of all drives on our system.

What we have found during our testing is that spdk will sometimes start to silently fail to call the callback passed to spdk_nvme_ns_cmd_read in the following situations:
1.       Testing a single drive and passing in 0 for max_completions to spdk_nvme_qpair_process_completions().  We haven’t seen any issues with single drive testing when max_completions was > 0.
2.       Testing all four drives at once will result in one drive failing to receive callbacks, seemingly regardless of what number we pass for max_completions (1 through 128).

Here are other observations we’ve made
-When the callbacks fail to be called for a drive, they fail to be called for the remaining duration of the test.
-The drive that ‘fails’ when testing 4 drives concurrently varies from test to test.
-‘failure’ of a drive seems to be correlated with the number of outstanding read operations, though it is not a strict correlation.

Our system is a dual socket  E5-2630 v3.  One drive is on a PCI slot for CPU 0 and the other 3 are on PCI slots on CPU 1.  The master/slave threads are on the the same cpu socket as the nvme device they are talking to.

We’d like to know what is causing this issue and what we can do to help investigate the problem.  What other information can we provide?  Is there some part of the spdk code that we can look at to help determine the cause?

Thanks,
Will


_______________________________________________
SPDK mailing list
SPDK(a)lists.01.org<mailto:SPDK(a)lists.01.org>
https://lists.01.org/mailman/listinfo/spdk


[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 9524 bytes --]

             reply	other threads:[~2016-07-06 15:35 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-07-06 15:35 Walker, Benjamin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2016-07-12 14:38 [SPDK] Callback passed to spdk_nvme_ns_cmd_read not being called sometimes Will Del Genio
2016-07-07 21:03 Walker, Benjamin
2016-07-07 18:45 Will Del Genio
2016-07-07 15:36 Will Del Genio
2016-07-07 15:09 Will Del Genio
2016-07-06 23:23 Walker, Benjamin
2016-07-06 22:56 Will Del Genio
2016-07-06 21:47 Andrey Kuzmin
2016-07-06 21:40 Walker, Benjamin
2016-07-06 21:21 Will Del Genio
2016-07-06 21:00 Will Del Genio
2016-07-06 20:50 Walker, Benjamin
2016-07-06 20:33 Will Del Genio
2016-07-06 19:35 Andrey Kuzmin
2016-07-06 17:56 Will Del Genio
2016-07-06 17:01 Andrey Kuzmin
2016-07-05 21:03 Will Del Genio

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1467819307.5999.181.camel@intel.com \
    --to=spdk@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.