All of lore.kernel.org
 help / color / mirror / Atom feed
* [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment
@ 2018-04-10 13:43 Cao, Gang
  0 siblings, 0 replies; 5+ messages in thread
From: Cao, Gang @ 2018-04-10 13:43 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 846 bytes --]

Hi all,

This topic is regarding the usage of SPDK NVMe driver in the multi-process mode. There are some cases that any process (primary or secondary) could exit in the unexpectedly way to leave the allocated memory not released, the held lock also not released or even some severe problem like in the middle of the memory allocation. Some of these issues may have a way to solve and others may be difficult to solve.

Would like to initiate some discussion on this topic to get more input, suggestions and comments. For example, we may need some mechanism to track the per process memory allocation through those memory related APIs (called DPDK's) so that these memory has the hint to release once its associated process exits. Some co-work will be covered by SPDK and DPDK memory management.

Any feedback is welcomed.

Thanks,
Gang

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 3302 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment
@ 2018-04-10 15:03 Luse, Paul E
  0 siblings, 0 replies; 5+ messages in thread
From: Luse, Paul E @ 2018-04-10 15:03 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 1358 bytes --]

Hi Gang,

Can you summarize any discussion that's happened already w/the DPDK community? For example, have you brought it up there, are we aware of anyone else bringing it up there, etc., and if so what's their position?

Thx
Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Cao, Gang
Sent: Tuesday, April 10, 2018 6:43 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment

Hi all,

This topic is regarding the usage of SPDK NVMe driver in the multi-process mode. There are some cases that any process (primary or secondary) could exit in the unexpectedly way to leave the allocated memory not released, the held lock also not released or even some severe problem like in the middle of the memory allocation. Some of these issues may have a way to solve and others may be difficult to solve.

Would like to initiate some discussion on this topic to get more input, suggestions and comments. For example, we may need some mechanism to track the per process memory allocation through those memory related APIs (called DPDK's) so that these memory has the hint to release once its associated process exits. Some co-work will be covered by SPDK and DPDK memory management.

Any feedback is welcomed.

Thanks,
Gang

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 5003 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment
@ 2018-04-10 17:50 Walker, Benjamin
  0 siblings, 0 replies; 5+ messages in thread
From: Walker, Benjamin @ 2018-04-10 17:50 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 3828 bytes --]

On Tue, 2018-04-10 at 13:43 +0000, Cao, Gang wrote:
> Hi all,
>  
> This topic is regarding the usage of SPDK NVMe driver in the multi-process
> mode. There are some cases that any process (primary or secondary) could exit
> in the unexpectedly way to leave the allocated memory not released, the held
> lock also not released or even some severe problem like in the middle of the
> memory allocation. Some of these issues may have a way to solve and others may
> be difficult to solve.

> Would like to initiate some discussion on this topic to get more input,
> suggestions and comments.

I'm very interested to hear from SPDK users who plan to deploy SPDK using its
multi-process capabilities. Specifically, I want to know what their strategy is
for handling a process crash. As far as I'm aware, DPDK (which provides the low
level multi-process handling), recommends that all processes be restarted fresh
after any process crashes.

To recap, SPDK's NVMe driver allows for multiple separate processes to start up,
map some shared memory regions, and then each process can allocate an NVMe I/O
queue pair of its own. They can submit commands to the NVMe device without any
further coordination between the two processes at that point. See
http://www.spdk.io/doc/nvme.html#nvme_multi_process for the full docs.

If one of these processes crash, there are several potential issues that must be
dealt with:

1) The DPDK-allocated memory assigned to the crashed process will never be
released. This is more than just memory allocated by spdk_dma_malloc and such -
it's also memory that needs to be put back into memory pools.
2) The process may have been holding a cross-process lock and/or modifying some
of the data structures in shared memory at the time of the crash. None of the
code in SPDK or DPDK, in the critical areas, is designed to guarantee that the
in-memory data structures are always in a valid or consistent state such that a
process could crash in the middle of a modification and another process could
continue using them. Moving to atomic data structures everywhere, assuming it is
even possible, would be both a huge amount of effort and probably a huge
performance hit.

So the above are the facts. Now my opinion, which can easily be swayed based on
the feedback we receive here:

It's not clear that it is even possible to rewrite all of the critical DPDK and
SPDK data structures to be atomic, the effort would be enormous, and the end
result would probably have significantly degraded performance. So the only
reasonable way to guarantee correct recovery from a process crash is to restart
all processes involved. That, in my opinion, makes the use of NVMe multi-process 
features for more than simple management tools with short lifetimes a brittle
architectural choice.

In my opinion, a more robust design would be to use something like SPDK's vhost
target as a dispatcher, where one process owns the storage devices and exposes
shared memory queues to the other processes on which they can submit requests.
The vhost model does use process-shared memory, but in a limited way that is
protected against both issues I outlined above. To be fair, there are a few
drawbacks to using vhost as a dispatcher. First, there are currently
restrictions on the memory layout that can be described to vhost for the shared
memory regions that need to be lifted (some work in flight). Second, the vhost
target is polling the incoming queues and the storage devices, so it may consume
additional CPU compared to the multi-process model. There are probably ways that
we can work to mitigate this over time, and even right now the cost can be
amortized across a large number of client processes due to the vhost target
being so efficient.

Thanks,
Ben

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment
@ 2018-04-10 18:31 Harris, James R
  0 siblings, 0 replies; 5+ messages in thread
From: Harris, James R @ 2018-04-10 18:31 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 4654 bytes --]



On 4/10/18, 10:50 AM, "SPDK on behalf of Walker, Benjamin" <spdk-bounces(a)lists.01.org on behalf of benjamin.walker(a)intel.com> wrote:

    On Tue, 2018-04-10 at 13:43 +0000, Cao, Gang wrote:

<snip>

    If one of these processes crash, there are several potential issues that must be
    dealt with:
    
    1) The DPDK-allocated memory assigned to the crashed process will never be
    released. This is more than just memory allocated by spdk_dma_malloc and such -
    it's also memory that needs to be put back into memory pools.
    2) The process may have been holding a cross-process lock and/or modifying some
    of the data structures in shared memory at the time of the crash. None of the
    code in SPDK or DPDK, in the critical areas, is designed to guarantee that the
    in-memory data structures are always in a valid or consistent state such that a
    process could crash in the middle of a modification and another process could
    continue using them. Moving to atomic data structures everywhere, assuming it is
    even possible, would be both a huge amount of effort and probably a huge
  performance hit.

[Jim] Buffers (allocated via malloc or mempools) can also be passed between processes.  For example, in the SPDK NVMe driver, process A may allocate an admin request, then process B may poll the admin queue and get the completion for process A’s request.  We put this completion into a software queue so that the completion routine can be called later when process A polls the admin queue.  In this case the buffer cannot be reclaimed automatically and there’s a lot of code complexity required to make this work.  I suspect that DPDK has many more cases where buffers are passed between processes and would each require special handling for recovery.

[Jim] There’s also a more insidious problem – if one process crashed unexpectedly, it may have corrupted shared memory in some way which silently corrupted one of the other processes.

    So the above are the facts. Now my opinion, which can easily be swayed based on
    the feedback we receive here:
    
    It's not clear that it is even possible to rewrite all of the critical DPDK and
    SPDK data structures to be atomic, the effort would be enormous, and the end
    result would probably have significantly degraded performance. So the only
    reasonable way to guarantee correct recovery from a process crash is to restart
    all processes involved. That, in my opinion, makes the use of NVMe multi-process 
    features for more than simple management tools with short lifetimes a brittle
  architectural choice.

[Jim] I wouldn’t necessarily say it’s a brittle architectural choice by itself.  I would say it’s brittle if NVMe multi-process is used with the assumption that the set of shared processes can remain if one process dies unexpectedly.

    In my opinion, a more robust design would be to use something like SPDK's vhost
    target as a dispatcher, where one process owns the storage devices and exposes
    shared memory queues to the other processes on which they can submit requests.
    The vhost model does use process-shared memory, but in a limited way that is
    protected against both issues I outlined above. To be fair, there are a few
    drawbacks to using vhost as a dispatcher. First, there are currently
    restrictions on the memory layout that can be described to vhost for the shared
    memory regions that need to be lifted (some work in flight). Second, the vhost
    target is polling the incoming queues and the storage devices, so it may consume
    additional CPU compared to the multi-process model. There are probably ways that
    we can work to mitigate this over time, and even right now the cost can be
    amortized across a large number of client processes due to the vhost target
  being so efficient.

[Jim] I agree – vhost as a dispatcher is the better architectural choice if isolation is desired between different SPDK processes so that one crashed process doesn’t affect other processes.  A lot of the changes that Darek has been submitting (especially around mapping hugepage memory as a single segment) make this much easier to implement.  Previously this was somewhat limited by the small (8) number of shared file descriptors allowed by the vhost protocol currently and that DPDK mapped each huge page as a separate file descriptor.
    
    Thanks,
    Ben
    _______________________________________________
    SPDK mailing list
    SPDK(a)lists.01.org
    https://lists.01.org/mailman/listinfo/spdk
    


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment
@ 2018-04-11  1:34 Cao, Gang
  0 siblings, 0 replies; 5+ messages in thread
From: Cao, Gang @ 2018-04-11  1:34 UTC (permalink / raw)
  To: spdk

[-- Attachment #1: Type: text/plain, Size: 2231 bytes --]

Hi Paul,

Previously there were some discussions on this similar topic in the DPDK community I think. With SPDK, we have seen this same process model developed and deployed. One not that complicated usage is that, some out of band tools like nvme_cli will need to access the controllers for the management purpose where the same controllers are used by other applications. Also there are some other usage cases to work on the SPDK multi process.

We will need to bring it up again in the DPDK community to get their feedbacks and comments.

Thanks,
Gang

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Luse, Paul E
Sent: Tuesday, April 10, 2018 11:03 PM
To: Storage Performance Development Kit <spdk(a)lists.01.org>
Subject: Re: [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment

Hi Gang,

Can you summarize any discussion that's happened already w/the DPDK community? For example, have you brought it up there, are we aware of anyone else bringing it up there, etc., and if so what's their position?

Thx
Paul

From: SPDK [mailto:spdk-bounces(a)lists.01.org] On Behalf Of Cao, Gang
Sent: Tuesday, April 10, 2018 6:43 AM
To: Storage Performance Development Kit <spdk(a)lists.01.org<mailto:spdk(a)lists.01.org>>
Subject: [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment

Hi all,

This topic is regarding the usage of SPDK NVMe driver in the multi-process mode. There are some cases that any process (primary or secondary) could exit in the unexpectedly way to leave the allocated memory not released, the held lock also not released or even some severe problem like in the middle of the memory allocation. Some of these issues may have a way to solve and others may be difficult to solve.

Would like to initiate some discussion on this topic to get more input, suggestions and comments. For example, we may need some mechanism to track the per process memory allocation through those memory related APIs (called DPDK's) so that these memory has the hint to release once its associated process exits. Some co-work will be covered by SPDK and DPDK memory management.

Any feedback is welcomed.

Thanks,
Gang

[-- Attachment #2: attachment.html --]
[-- Type: text/html, Size: 7319 bytes --]

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-04-11  1:34 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-10 17:50 [SPDK] Trying to recover from one SPDK process crashing in a multi-process environment Walker, Benjamin
  -- strict thread matches above, loose matches on Subject: below --
2018-04-11  1:34 Cao, Gang
2018-04-10 18:31 Harris, James R
2018-04-10 15:03 Luse, Paul E
2018-04-10 13:43 Cao, Gang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.