From: Juan Quintela <quintela@redhat.com>
To: Joao Martins <joao.m.martins@oracle.com>
Cc: Alex Williamson <alex.williamson@redhat.com>,
Eric Blake <eblake@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Fam Zheng <fam@euphon.net>,
qemu-s390x@nongnu.org, Cornelia Huck <cohuck@redhat.com>,
Thomas Huth <thuth@redhat.com>,
Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>,
Laurent Vivier <lvivier@redhat.com>,
John Snow <jsnow@redhat.com>,
"Dr. David Alan Gilbert" <dgilbert@redhat.com>,
Christian Borntraeger <borntraeger@linux.ibm.com>,
Halil Pasic <pasic@linux.ibm.com>,
Paolo Bonzini <pbonzini@redhat.com>,
qemu-block@nongnu.org, Eric Farman <farman@linux.ibm.com>,
Richard Henderson <richard.henderson@linaro.org>,
David Hildenbrand <david@redhat.com>,
Avihai Horon <avihaih@nvidia.com>,
Jason Gunthorpe <jgg@nvidia.com>,
qemu-devel@nongnu.org
Subject: Re: [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped
Date: Thu, 13 Oct 2022 18:08:56 +0200 [thread overview]
Message-ID: <87ilknemef.fsf@secure.mitica> (raw)
In-Reply-To: <bf945182-5c73-b1cc-13b2-5594bc21823f@oracle.com> (Joao Martins's message of "Thu, 13 Oct 2022 13:25:10 +0100")
Joao Martins <joao.m.martins@oracle.com> wrote:
> +Avihai, +Jason
>
> On 03/10/2022 04:16, Juan Quintela wrote:
>> HACK ahead.
>>
>> There are devices that require the guest to be stopped to tell us what
>> is the size of its state.
>
> "... and have transferred said device state" if we are talking current vfio.
Yeap.
> We can't query size of the data_fd right now
Oops. My understanding was that once the guest is stopped you can say
how big is it. That is a deal breaker. We can't continue if we don't
know the size. Copying the whole state to know the size looks too much.
> It would need a @data_size in addition to @data_fd in
> vfio_device_feature_mig_state, or getting fseek supported over the fd
Ok, we can decided how to do it, but I think that putting fseek into it
will only make things more complicated.
>> So we need to stop the vm "before" we
>> cal the functions.
>>
>> It is a hack because:
>> - we are "starting" the guest again to stop it in migration_complete()
>> I know, I know, but it is not trivial to get all the information
>> easily to migration_complete(), so this hack.
>>
>
> Could you expand on that? Naively, I was assuming that by 'all information' the
> locally stored counters in migration_iteration_run() that aren't present in
> MigrateState?
This is not related to vfio at all.
The problem is that current code assumes that the guest is still
running, and then do qemu_mutex_lock_iothread() and unlock() inside the
pending handlers. To stop the guest, we need to lock the iothread
first. So this was going to hang. I fixed it doing the lock/unlock
twice. That is the hack. I will change the callers once that we decide
what is the right path ahead. This is not a problem related to vfio. it
is a problem related about how we can stop/resume guests programatically
in the middle of qemu code.
>> - auto_converge test fails with this hack. I think that it is related
>> to previous problem. We start the guest when it is supposed to be
>> stopped for convergence reasons.
>>
>> - All experiments that I did to do the proper thing failed with having
>> the iothread_locked() or try to unlock() it when not locked.
>>
>> - several of the pending functions are using the iothread lock
>> themselves, so I need to split it to have two versions (one for the
>> _estimate() case with the iothread lock), and another for the
>> _exact() case without the iothread_lock(). I want comments about
>> this approach before I try to continue on this direction.
>>
>> Signed-off-by: Juan Quintela <quintela@redhat.com>
>> ---
>> migration/migration.c | 13 +++++++++++++
>> tests/qtest/migration-test.c | 3 ++-
>> 2 files changed, 15 insertions(+), 1 deletion(-)
>>
>> diff --git a/migration/migration.c b/migration/migration.c
>> index 35e512887a..7374884818 100644
>> --- a/migration/migration.c
>> +++ b/migration/migration.c
>> @@ -3742,7 +3742,20 @@ static MigIterateState migration_iteration_run(MigrationState *s)
>> trace_migrate_pending_estimate(pending_size, s->threshold_size, pend_pre, pend_post);
>>
>> if (pend_pre <= s->threshold_size) {
>> + int old_state = s->state;
>> + qemu_mutex_lock_iothread();
>> + // is this really necessary? it works for me both ways.
>> + qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
>> + s->vm_was_running = runstate_is_running();
>> + vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
>> + qemu_mutex_unlock_iothread();
>> qemu_savevm_state_pending_exact(&pend_pre, &pend_post);
>> + qemu_mutex_lock_iothread();
>> + runstate_set(old_state);
>> + if (s->vm_was_running) {
>> + vm_start();
>> + }
>> + qemu_mutex_unlock_iothread();
>
> Couldn't we just have an extra patch that just stores pend_pre and pending_size
> in MigrateState, which would allow all this check to be moved into
> migration_completion(). Or maybe that wasn't an option for some other reason?
This is not an option, because we don't have a way to get back from
migration_completion() to migrate_iteration_run() if there is not enough
space to send all the state.
> Additionally what about having a migration helper function that
> vfio_save_complete_precopy() callback needs to use into to check if the
> expected-device state size meets the threshold/downtime as it is saving the
> device state and otherwise fail earlier accordingly when saving beyond the
> threshold?
That is what I tried to do with the code.
See patch 4. ram.c
How I have two functions:
- ram_state_pending_estimate()
- ram_state_pending_exact()
1st should work without any locking, i.e. just best estimation without
too much work, second should give the real value.
What we had discussed before for vfio is that the device will "cache"
the value returned from previous ram_state_pending_exact().
> It would allow supporting both the (current UAPI) case where you need to
> transfer the state to get device state size (so checking against threshold_size
> pending_pre constantly would allow to not violate the SLA) as well as any other
> UAPI improvement to fseek()/data_size that lets you fail even earlier.
>
> Seems like at least it keeps some of the rules (both under iothread lock) as
> this patch
Coming to worse thing, you need to read the state into a local buffer
and then calculate the size in exact? Looks overkill, but in your case,
I can't see other way to do it.
My understanding was that for new hardware you have an easy way to
calculate this value "if the guest was stopped".
My next series are a way to do in parallel the read/send of the state
with respect to the migration_thread(), but that is a different
problem. I need a way to calculate sizes to start with, otherwise I
have no way to decide if I can enter the completion stage (or for old
devices one can lie and assume than size is zero, but then you are going
to have bigger downtimes).
Later, Juan.
next prev parent reply other threads:[~2022-10-13 16:18 UTC|newest]
Thread overview: 23+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-10-03 3:15 [RFC 0/7] migration patches for VFIO Juan Quintela
2022-10-03 3:15 ` [RFC 1/7] migration: Remove res_compatible parameter Juan Quintela
2022-11-22 13:54 ` Dr. David Alan Gilbert
2022-10-03 3:15 ` [RFC 2/7] migration: No save_live_pending() method uses the QEMUFile parameter Juan Quintela
2022-11-22 17:48 ` Dr. David Alan Gilbert
2022-10-03 3:15 ` [RFC 3/7] migration: Block migration comment or code is wrong Juan Quintela
2022-10-03 19:12 ` Stefan Hajnoczi
2022-10-03 3:15 ` [RFC 4/7] migration: Split save_live_pending() into state_pending_* Juan Quintela
2022-11-22 20:08 ` Dr. David Alan Gilbert
2022-10-03 3:15 ` [RFC 5/7] migration: Remove unused threshold_size parameter Juan Quintela
2022-11-23 16:38 ` Dr. David Alan Gilbert
2022-10-03 3:15 ` [RFC 6/7] migration: simplify migration_iteration_run() Juan Quintela
2022-11-23 17:39 ` Dr. David Alan Gilbert
2022-10-03 3:16 ` [RFC 7/7] migration: call qemu_savevm_state_pending_exact() with the guest stopped Juan Quintela
2022-10-13 12:25 ` Joao Martins
2022-10-13 16:08 ` Juan Quintela [this message]
2022-10-14 10:36 ` Joao Martins
2022-10-14 11:28 ` Juan Quintela
2022-10-14 12:29 ` Joao Martins
2022-10-18 12:22 ` Jason Gunthorpe via
2022-10-19 15:51 ` Yishai Hadas
2022-10-14 19:49 ` Jason Gunthorpe
2022-10-20 14:56 ` [RFC 0/7] migration patches for VFIO Yishai Hadas
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87ilknemef.fsf@secure.mitica \
--to=quintela@redhat.com \
--cc=alex.williamson@redhat.com \
--cc=avihaih@nvidia.com \
--cc=borntraeger@linux.ibm.com \
--cc=cohuck@redhat.com \
--cc=david@redhat.com \
--cc=dgilbert@redhat.com \
--cc=eblake@redhat.com \
--cc=fam@euphon.net \
--cc=farman@linux.ibm.com \
--cc=jgg@nvidia.com \
--cc=joao.m.martins@oracle.com \
--cc=jsnow@redhat.com \
--cc=lvivier@redhat.com \
--cc=pasic@linux.ibm.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=qemu-s390x@nongnu.org \
--cc=richard.henderson@linaro.org \
--cc=stefanha@redhat.com \
--cc=thuth@redhat.com \
--cc=vsementsov@yandex-team.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.