All of lore.kernel.org
 help / color / mirror / Atom feed
From: Walker, Benjamin <benjamin.walker at intel.com>
To: spdk@lists.01.org
Subject: Re: [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment
Date: Tue, 10 Apr 2018 17:50:12 +0000	[thread overview]
Message-ID: <1523382610.2684.51.camel@intel.com> (raw)
In-Reply-To: F009CE4E1CB4E047B169243B6A3189273155FA32@SHSMSX101.ccr.corp.intel.com

[-- Attachment #1: Type: text/plain, Size: 3828 bytes --]

On Tue, 2018-04-10 at 13:43 +0000, Cao, Gang wrote:
> Hi all,
>  
> This topic is regarding the usage of SPDK NVMe driver in the multi-process
> mode. There are some cases that any process (primary or secondary) could exit
> in the unexpectedly way to leave the allocated memory not released, the held
> lock also not released or even some severe problem like in the middle of the
> memory allocation. Some of these issues may have a way to solve and others may
> be difficult to solve.

> Would like to initiate some discussion on this topic to get more input,
> suggestions and comments.

I'm very interested to hear from SPDK users who plan to deploy SPDK using its
multi-process capabilities. Specifically, I want to know what their strategy is
for handling a process crash. As far as I'm aware, DPDK (which provides the low
level multi-process handling), recommends that all processes be restarted fresh
after any process crashes.

To recap, SPDK's NVMe driver allows for multiple separate processes to start up,
map some shared memory regions, and then each process can allocate an NVMe I/O
queue pair of its own. They can submit commands to the NVMe device without any
further coordination between the two processes at that point. See
http://www.spdk.io/doc/nvme.html#nvme_multi_process for the full docs.

If one of these processes crash, there are several potential issues that must be
dealt with:

1) The DPDK-allocated memory assigned to the crashed process will never be
released. This is more than just memory allocated by spdk_dma_malloc and such -
it's also memory that needs to be put back into memory pools.
2) The process may have been holding a cross-process lock and/or modifying some
of the data structures in shared memory at the time of the crash. None of the
code in SPDK or DPDK, in the critical areas, is designed to guarantee that the
in-memory data structures are always in a valid or consistent state such that a
process could crash in the middle of a modification and another process could
continue using them. Moving to atomic data structures everywhere, assuming it is
even possible, would be both a huge amount of effort and probably a huge
performance hit.

So the above are the facts. Now my opinion, which can easily be swayed based on
the feedback we receive here:

It's not clear that it is even possible to rewrite all of the critical DPDK and
SPDK data structures to be atomic, the effort would be enormous, and the end
result would probably have significantly degraded performance. So the only
reasonable way to guarantee correct recovery from a process crash is to restart
all processes involved. That, in my opinion, makes the use of NVMe multi-process 
features for more than simple management tools with short lifetimes a brittle
architectural choice.

In my opinion, a more robust design would be to use something like SPDK's vhost
target as a dispatcher, where one process owns the storage devices and exposes
shared memory queues to the other processes on which they can submit requests.
The vhost model does use process-shared memory, but in a limited way that is
protected against both issues I outlined above. To be fair, there are a few
drawbacks to using vhost as a dispatcher. First, there are currently
restrictions on the memory layout that can be described to vhost for the shared
memory regions that need to be lifted (some work in flight). Second, the vhost
target is polling the incoming queues and the storage devices, so it may consume
additional CPU compared to the multi-process model. There are probably ways that
we can work to mitigate this over time, and even right now the cost can be
amortized across a large number of client processes due to the vhost target
being so efficient.

Thanks,
Ben

             reply	other threads:[~2018-04-10 17:50 UTC|newest]

Thread overview: 5+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-10 17:50 Walker, Benjamin [this message]
  -- strict thread matches above, loose matches on Subject: below --
2018-04-11  1:34 [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment Cao, Gang
2018-04-10 18:31 Harris, James R
2018-04-10 15:03 Luse, Paul E
2018-04-10 13:43 Cao, Gang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1523382610.2684.51.camel@intel.com \
    --to=spdk@lists.01.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.