Re: [Virtio-fs] [PATCH] vhost-user-fs: add capability to allow migration

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Stefan Hajnoczi <stefanha@redhat.com>
To: Anton Kuchin <antonkuchin@yandex-team.ru>
Cc: Stefan Hajnoczi <stefanha@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, virtio-fs@redhat.com,
	Markus Armbruster <armbru@redhat.com>,
	Eric Blake <eblake@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	yc-core@yandex-team.ru
Subject: Re: [Virtio-fs] [PATCH] vhost-user-fs: add capability to allow migration
Date: Mon, 23 Jan 2023 14:49:39 -0500	[thread overview]
Message-ID: <Y87k0wBnHuf5Oktp@fedora> (raw)
In-Reply-To: <21b87a0d-99b1-2755-00de-d1201d85a63e@yandex-team.ru>

[-- Attachment #1: Type: text/plain, Size: 5770 bytes --]

On Mon, Jan 23, 2023 at 05:52:17PM +0200, Anton Kuchin wrote:
> 
> On 23/01/2023 16:09, Stefan Hajnoczi wrote:
> > On Sun, 22 Jan 2023 at 11:18, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > > > > This flag should be set when qemu don't need to worry about any
> > > > > > > > external state stored in vhost-user daemons during migration:
> > > > > > > > don't fail migration, just pack generic virtio device states to
> > > > > > > > migration stream and orchestrator guarantees that the rest of the
> > > > > > > > state will be present at the destination to restore full context and
> > > > > > > > continue running.
> > > > > > > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > > > > > > vhost-user-fs is not created by default that we'd then need opt-in to
> > > > > > > the special "migrateable" case.
> > > > > > > That's why I said it might make some sense as a device property as qemu
> > > > > > > tracks whether device is unplugged for us.
> > > > > > > 
> > > > > > > But as written, if you are going to teach the orchestrator about
> > > > > > > vhost-user-fs and its special needs, just teach it when to migrate and
> > > > > > > where not to migrate.
> > > > > > > 
> > > > > > > Either we describe the special situation to qemu and let qemu
> > > > > > > make an intelligent decision whether to allow migration,
> > > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > > > > > > already says orchestrator decided to migrate.
> > > > > > The problem I'm trying to solve is that most of vhost-user devices
> > > > > > now block migration in qemu. And this makes sense since qemu can't
> > > > > > extract and transfer backend daemon state. But this prevents us from
> > > > > > updating qemu executable via local migration. So this flag is
> > > > > > intended more as a safety check that says "I know what I'm doing".
> > > > > > 
> > > > > > I agree that it is not really necessary if we trust the orchestrator
> > > > > > to request migration only when the migration can be performed in a
> > > > > > safe way. But changing the current behavior of vhost-user-fs from
> > > > > > "always blocks migration" to "migrates partial state whenever
> > > > > > orchestrator requests it" seems a little  dangerous and can be
> > > > > > misinterpreted as full support for migration in all cases.
> > > > > It's not really different from block is it? orchestrator has to arrange
> > > > > for backend migration. I think we just assumed there's no use-case where
> > > > > this is practical for vhost-user-fs so we blocked it.
> > > > > But in any case it's orchestrator's responsibility.
> > > > Yes, you are right. So do you think we should just drop the blocker
> > > > without adding a new flag?
> > > I'd be inclined to. I am curious what do dgilbert and stefanha think though.
> > If the migration blocker is removed, what happens when a user attempts
> > to migrate with a management tool and/or vhost-user-fs server
> > implementation that don't support migration?
> 
> There will be no matching fuse-session in destination endpoint so all
> requests to this fs will fail until it is remounted from guest to
> send new FUSE_INIT message that does session setup.

The point of the migration blocker is to prevent breaking running
guests. Situations where a migration completes but results in a broken
guest are problematic for users (especially when they are not logged in
to guests and able to fix them interactively).

If a command-line option is set to override the blocker, that's fine.
But there needs to be a blocker by default if external knowledge is
required to decide whether or not it's safe to migrate.

> > 
> > Anton: Can you explain how stateless migration will work on the
> > vhost-user-fs back-end side? Is it reusing vhost-user reconnect
> > functionality or introducing a new mode for stateless migration? I
> > guess the vhost-user-fs back-end implementation is required to
> > implement VHOST_F_LOG_ALL so dirty memory can be tracked and drain all
> > in-flight requests when vrings are stopped?
> 
> It reuses existing vhost-user reconnect code to resubmit inflight
> requests.
> Sure, backend needs to support this feature - presence of required
> features is checked by generic vhost and vhost-user code during init
> and if something is missing migration blocker is assigned to the
> device (not a static one in vmstate that I remove in this patch, but
> other per-device kind of blocker).

This is not enough detail. Please post the QEMU patches before we commit
to a user-visible vhost-user-fs command-line parameter.

I think what you're trying is a new approach that can be made to work.
However, both vhost-user and migration are fragile and you have not
explained how it will work. I don't have confidence in merging this
incrementally because I'm afraid of committing to user-visible or
vhost-user protocol behavior that turns out to be broken just a little
while later.

The kind of detail I was hoping to hear was, for example, how
vhost_user_blk_device_realize() blocks and tries to reconnect 3 times.
Does this approach work for stateless migration? The destination QEMU is
launched before the source QEMU disconnects from the vhost-user UNIX
domain socket, so I guess the destination QEMU cannot connect in the
current version of vhost-user reconnect as implemented by QEMU's
vhost-user-blk device. Have you come up with a new handover protocol?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

WARNING: multiple messages have this Message-ID (diff)

From: Stefan Hajnoczi <stefanha@redhat.com>
To: Anton Kuchin <antonkuchin@yandex-team.ru>
Cc: Stefan Hajnoczi <stefanha@gmail.com>,
	"Michael S. Tsirkin" <mst@redhat.com>,
	qemu-devel@nongnu.org, virtio-fs@redhat.com,
	Markus Armbruster <armbru@redhat.com>,
	Eric Blake <eblake@redhat.com>,
	"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
	Juan Quintela <quintela@redhat.com>,
	yc-core@yandex-team.ru
Subject: Re: [PATCH] vhost-user-fs: add capability to allow migration
Date: Mon, 23 Jan 2023 14:49:39 -0500	[thread overview]
Message-ID: <Y87k0wBnHuf5Oktp@fedora> (raw)
In-Reply-To: <21b87a0d-99b1-2755-00de-d1201d85a63e@yandex-team.ru>

[-- Attachment #1: Type: text/plain, Size: 5770 bytes --]

On Mon, Jan 23, 2023 at 05:52:17PM +0200, Anton Kuchin wrote:
> 
> On 23/01/2023 16:09, Stefan Hajnoczi wrote:
> > On Sun, 22 Jan 2023 at 11:18, Michael S. Tsirkin <mst@redhat.com> wrote:
> > > On Sun, Jan 22, 2023 at 06:09:40PM +0200, Anton Kuchin wrote:
> > > > On 22/01/2023 16:46, Michael S. Tsirkin wrote:
> > > > > On Sun, Jan 22, 2023 at 02:36:04PM +0200, Anton Kuchin wrote:
> > > > > > > > This flag should be set when qemu don't need to worry about any
> > > > > > > > external state stored in vhost-user daemons during migration:
> > > > > > > > don't fail migration, just pack generic virtio device states to
> > > > > > > > migration stream and orchestrator guarantees that the rest of the
> > > > > > > > state will be present at the destination to restore full context and
> > > > > > > > continue running.
> > > > > > > Sorry  I still do not get it.  So fundamentally, why do we need this property?
> > > > > > > vhost-user-fs is not created by default that we'd then need opt-in to
> > > > > > > the special "migrateable" case.
> > > > > > > That's why I said it might make some sense as a device property as qemu
> > > > > > > tracks whether device is unplugged for us.
> > > > > > > 
> > > > > > > But as written, if you are going to teach the orchestrator about
> > > > > > > vhost-user-fs and its special needs, just teach it when to migrate and
> > > > > > > where not to migrate.
> > > > > > > 
> > > > > > > Either we describe the special situation to qemu and let qemu
> > > > > > > make an intelligent decision whether to allow migration,
> > > > > > > or we trust the orchestrator. And if it's the latter, then 'migrate'
> > > > > > > already says orchestrator decided to migrate.
> > > > > > The problem I'm trying to solve is that most of vhost-user devices
> > > > > > now block migration in qemu. And this makes sense since qemu can't
> > > > > > extract and transfer backend daemon state. But this prevents us from
> > > > > > updating qemu executable via local migration. So this flag is
> > > > > > intended more as a safety check that says "I know what I'm doing".
> > > > > > 
> > > > > > I agree that it is not really necessary if we trust the orchestrator
> > > > > > to request migration only when the migration can be performed in a
> > > > > > safe way. But changing the current behavior of vhost-user-fs from
> > > > > > "always blocks migration" to "migrates partial state whenever
> > > > > > orchestrator requests it" seems a little  dangerous and can be
> > > > > > misinterpreted as full support for migration in all cases.
> > > > > It's not really different from block is it? orchestrator has to arrange
> > > > > for backend migration. I think we just assumed there's no use-case where
> > > > > this is practical for vhost-user-fs so we blocked it.
> > > > > But in any case it's orchestrator's responsibility.
> > > > Yes, you are right. So do you think we should just drop the blocker
> > > > without adding a new flag?
> > > I'd be inclined to. I am curious what do dgilbert and stefanha think though.
> > If the migration blocker is removed, what happens when a user attempts
> > to migrate with a management tool and/or vhost-user-fs server
> > implementation that don't support migration?
> 
> There will be no matching fuse-session in destination endpoint so all
> requests to this fs will fail until it is remounted from guest to
> send new FUSE_INIT message that does session setup.

The point of the migration blocker is to prevent breaking running
guests. Situations where a migration completes but results in a broken
guest are problematic for users (especially when they are not logged in
to guests and able to fix them interactively).

If a command-line option is set to override the blocker, that's fine.
But there needs to be a blocker by default if external knowledge is
required to decide whether or not it's safe to migrate.

> > 
> > Anton: Can you explain how stateless migration will work on the
> > vhost-user-fs back-end side? Is it reusing vhost-user reconnect
> > functionality or introducing a new mode for stateless migration? I
> > guess the vhost-user-fs back-end implementation is required to
> > implement VHOST_F_LOG_ALL so dirty memory can be tracked and drain all
> > in-flight requests when vrings are stopped?
> 
> It reuses existing vhost-user reconnect code to resubmit inflight
> requests.
> Sure, backend needs to support this feature - presence of required
> features is checked by generic vhost and vhost-user code during init
> and if something is missing migration blocker is assigned to the
> device (not a static one in vmstate that I remove in this patch, but
> other per-device kind of blocker).

This is not enough detail. Please post the QEMU patches before we commit
to a user-visible vhost-user-fs command-line parameter.

I think what you're trying is a new approach that can be made to work.
However, both vhost-user and migration are fragile and you have not
explained how it will work. I don't have confidence in merging this
incrementally because I'm afraid of committing to user-visible or
vhost-user protocol behavior that turns out to be broken just a little
while later.

The kind of detail I was hoping to hear was, for example, how
vhost_user_blk_device_realize() blocks and tries to reconnect 3 times.
Does this approach work for stateless migration? The destination QEMU is
launched before the source QEMU disconnects from the vhost-user UNIX
domain socket, so I guess the destination QEMU cannot connect in the
current version of vhost-user reconnect as implemented by QEMU's
vhost-user-blk device. Have you come up with a new handover protocol?

Stefan

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

next prev parent reply	other threads:[~2023-01-23 19:49 UTC|newest]

Thread overview: 80+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-15 17:09 [Virtio-fs] [PATCH] vhost-user-fs: add capability to allow migration Anton Kuchin
2023-01-15 17:09 ` Anton Kuchin
2023-01-18 15:52 ` [Virtio-fs] " Stefan Hajnoczi
2023-01-18 15:52   ` Stefan Hajnoczi
2023-01-19 12:43   ` [Virtio-fs] " Anton Kuchin
2023-01-19 12:43     ` Anton Kuchin
2023-01-19 14:30     ` [Virtio-fs] " Stefan Hajnoczi
2023-01-19 14:30       ` Stefan Hajnoczi
2023-01-19 15:29       ` [Virtio-fs] " Anton Kuchin
2023-01-19 15:29         ` Anton Kuchin
2023-01-19 16:02         ` [Virtio-fs] " Stefan Hajnoczi
2023-01-19 16:02           ` Stefan Hajnoczi
2023-01-19 16:58           ` [Virtio-fs] " Anton Kuchin
2023-01-19 16:58             ` Anton Kuchin
2023-01-19 20:40             ` [Virtio-fs] " Stefan Hajnoczi
2023-01-19 20:40               ` Stefan Hajnoczi
2023-02-01 14:26             ` [Virtio-fs] " Juan Quintela
2023-02-01 14:26               ` Juan Quintela
2023-02-02  0:54               ` [Virtio-fs] " Anton Kuchin
2023-02-02  0:54                 ` Anton Kuchin
2023-02-02  9:59                 ` [Virtio-fs] " Juan Quintela
2023-02-02  9:59                   ` Juan Quintela
2023-02-10 14:09                   ` [Virtio-fs] " Anton Kuchin
2023-02-10 14:09                     ` Anton Kuchin
2023-02-10 16:08                     ` [Virtio-fs] " Juan Quintela
2023-02-10 16:08                       ` Juan Quintela
2023-02-16 21:00                       ` [Virtio-fs] " Stefan Hajnoczi
2023-02-16 21:00                         ` Stefan Hajnoczi
2023-01-19 12:51 ` [Virtio-fs] " Michael S. Tsirkin
2023-01-19 12:51   ` Michael S. Tsirkin
2023-01-19 13:45   ` [Virtio-fs] " Anton Kuchin
2023-01-19 13:45     ` Anton Kuchin
2023-01-19 19:00     ` [Virtio-fs] " Dr. David Alan Gilbert
2023-01-19 19:00       ` Dr. David Alan Gilbert
2023-01-19 20:47       ` [Virtio-fs] " Anton Kuchin
2023-01-19 20:47         ` Anton Kuchin
2023-01-20 13:58     ` [Virtio-fs] " Michael S. Tsirkin
2023-01-20 13:58       ` Michael S. Tsirkin
2023-01-20 17:37       ` [Virtio-fs] " Anton Kuchin
2023-01-20 17:37         ` Anton Kuchin
2023-01-22  8:16         ` [Virtio-fs] " Michael S. Tsirkin
2023-01-22  8:16           ` Michael S. Tsirkin
2023-01-22 12:36           ` [Virtio-fs] " Anton Kuchin
2023-01-22 12:36             ` Anton Kuchin
2023-01-22 14:46             ` [Virtio-fs] " Michael S. Tsirkin
2023-01-22 14:46               ` Michael S. Tsirkin
2023-01-22 16:09               ` [Virtio-fs] " Anton Kuchin
2023-01-22 16:09                 ` Anton Kuchin
2023-01-22 16:17                 ` [Virtio-fs] " Michael S. Tsirkin
2023-01-22 16:17                   ` Michael S. Tsirkin
2023-01-23 14:09                   ` [Virtio-fs] " Stefan Hajnoczi
2023-01-23 14:09                     ` Stefan Hajnoczi
2023-01-23 15:52                     ` [Virtio-fs] " Anton Kuchin
2023-01-23 15:52                       ` Anton Kuchin
2023-01-23 19:49                       ` Stefan Hajnoczi [this message]
2023-01-23 19:49                         ` Stefan Hajnoczi
2023-01-23 21:00                         ` [Virtio-fs] " Michael S. Tsirkin
2023-01-23 21:00                           ` Michael S. Tsirkin
2023-01-23 21:56                           ` [Virtio-fs] " Stefan Hajnoczi
2023-01-23 21:56                             ` Stefan Hajnoczi
2023-01-23 18:27                   ` [Virtio-fs] " Dr. David Alan Gilbert
2023-01-23 18:27                     ` Dr. David Alan Gilbert
2023-01-23 19:53                     ` [Virtio-fs] " Stefan Hajnoczi
2023-01-23 19:53                       ` Stefan Hajnoczi
2023-01-24  1:46                       ` [Virtio-fs] " Stefan Hajnoczi
2023-01-24  1:46                         ` Stefan Hajnoczi
2023-01-24  9:50                         ` [Virtio-fs] " Dr. David Alan Gilbert
2023-01-24  9:50                           ` Dr. David Alan Gilbert
2023-01-24 12:48                           ` [Virtio-fs] " Stefan Hajnoczi
2023-01-24 12:48                             ` Stefan Hajnoczi
2023-02-01 14:37                             ` [Virtio-fs] " Juan Quintela
2023-02-01 14:37                               ` Juan Quintela
2023-01-25 19:46 ` [Virtio-fs] " Stefan Hajnoczi
2023-01-25 19:46   ` Stefan Hajnoczi
2023-01-26 14:20   ` [Virtio-fs] " Anton Kuchin
2023-01-26 14:20     ` Anton Kuchin
2023-01-26 15:13     ` [Virtio-fs] " Stefan Hajnoczi
2023-01-26 15:13       ` Stefan Hajnoczi
2023-01-26 15:21       ` [Virtio-fs] " Anton Kuchin
2023-01-26 15:21         ` Anton Kuchin

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y87k0wBnHuf5Oktp@fedora \
    --to=stefanha@redhat.com \
    --cc=antonkuchin@yandex-team.ru \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eblake@redhat.com \
    --cc=mst@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@gmail.com \
    --cc=virtio-fs@redhat.com \
    --cc=yc-core@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.