qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Steven Sistare <steven.sistare@oracle.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org, "Juan Quintela" <quintela@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Daniel P. Berrangé" <berrange@redhat.com>
Subject: Re: [PATCH V1 2/3] migration: fix suspended runstate
Date: Thu, 24 Aug 2023 16:53:13 -0400	[thread overview]
Message-ID: <b64bae85-1665-2ea5-3690-12cd5e67620b@oracle.com> (raw)
In-Reply-To: <ZN5kmZdvc1Q1446r@x1n>

On 8/17/2023 2:19 PM, Peter Xu wrote:
> On Wed, Aug 16, 2023 at 01:48:13PM -0400, Steven Sistare wrote:
>> On 8/14/2023 3:37 PM, Peter Xu wrote:
>>> On Mon, Aug 14, 2023 at 02:53:56PM -0400, Steven Sistare wrote:
>>>>> Can we just call vm_state_notify() earlier?
>>>>
>>>> We cannot.  The guest is not running yet, and will not be until later.
>>>> We cannot call notifiers that perform actions that complete, or react to, 
>>>> the guest entering a running state.
>>>
>>> I tried to look at a few examples of the notifees and most of them I read
>>> do not react to "vcpu running" but "vm running" (in which case I think
>>> "suspended" mode falls into "vm running" case); most of them won't care on
>>> the RunState parameter passed in, but only the bool "running".
>>>
>>> In reality, when running=true, it must be RUNNING so far.
>>>
>>> In that case does it mean we should notify right after the switchover,
>>> since after migration the vm is indeed running only if the vcpus are not
>>> during suspend?
>>
>> I cannot parse your question, but maybe this answers it.
>> If the outgoing VM is running and not suspended, then the incoming side
>> tests for autostart==true and calls vm_start, which calls the notifiers,
>> right after the switchover.
> 
> I meant IMHO SUSPENDED should be seen as "vm running" case to me, just like
> RUNNING.  Then, we should invoke vm_prepare_start(), just need some touch
> ups.
> 
>>
>>> One example (of possible issue) is vfio_vmstate_change(), where iiuc if we
>>> try to suspend a VM it should keep to be VFIO_DEVICE_STATE_RUNNING for that
>>> device; this kind of prove to me that SUSPEND is actually one of
>>> running=true states.
>>>
>>> If we postpone all notifiers here even after we switched over to dest qemu
>>> to the next upcoming suspend wakeup, I think it means these devices will
>>> not be in VFIO_DEVICE_STATE_RUNNING after switchover but perhaps
>>> VFIO_DEVICE_STATE_STOP.
>>
>> or VFIO_DEVICE_STATE_RESUMING, which is set in vfio_load_setup.
>> AFAIK it is OK to remain in that state until wakeup is called later.
> 
> So let me provide another clue of why I think we should call
> vm_prepare_start()..
> 
> Firstly, I think RESUME event should always be there right after we
> switched over, no matter suspeneded or not.  I just noticed that your test
> case would work because you put "wakeup" to be before RESUME.  I'm not sure
> whether that's correct.  I'd bet people could rely on that RESUME to
> identify the switchover.

Yes, possibly.

> More importantly, I'm wondering whether RTC should still be running during
> the suspended mode?  Sorry again if my knowledge over there is just
> limited, so correct me otherwise - but my understanding is during suspend
> mode (s1 or s3, frankly I can't tell which one this belongs..), rtc should
> still be running along with the system clock.  It means we _should_ at
> least call cpu_enable_ticks() to enable rtc:

Agreed.  The comment above cpu_get_ticks says:
  * return the time elapsed in VM between vm_start and vm_stop
Suspending a guest does not call vm_stop, so ticks keeps running.
I also verified that experimentally.

> /*
>  * enable cpu_get_ticks()
>  * Caller must hold BQL which serves as mutex for vm_clock_seqlock.
>  */
> void cpu_enable_ticks(void)
> 
> I think that'll enable cpu_get_tsc() and make it start to work right.
> 
>>
>>> Ideally I think we should here call vm_state_notify() with running=true and
>>> state=SUSPEND, but since I do see some hooks are not well prepared for
>>> SUSPEND over running=true, I'd think we should on the safe side call
>>> vm_state_notify(running=true, state=RUNNING) even for SUSPEND at switch
>>> over phase.  With that IIUC it'll naturally work (e.g. when wakeup again
>>> later we just need to call no notifiers).
>>
>> Notifiers are just one piece, all the code in vm_prepare_start must be called.
>> Is it correct to call all of that long before we actually resume the CPUs in
>> wakeup?  I don't know, but what is the point?
> 
> The point is not only for cleaness (again, I really, really don't like that
> new global.. sorry), but also now I think we should make the vm running.

I do believe it is safe to call vm_prepare_start immediately, since the vcpus
are never running when it is called.

>> The wakeup code still needs
>> modification to conditionally resume the vcpus.  The scheme would be roughly:
>>
>>     loadvm_postcopy_handle_run_bh()
>>         runstat = global_state_get_runstate();
>>         if (runstate == RUN_STATE_RUNNING) {
>>             vm_start()
>>         } else if (runstate == RUN_STATE_SUSPENDED)
>>             vm_prepare_start();   // the start of vm_start()
>>         }
>>
>>     qemu_system_wakeup_request()
>>         if (some condition)
>>             resume_all_vcpus();   // the remainder of vm_start()
>>         else
>>             runstate_set(RUN_STATE_RUNNING)
> 
> No it doesn't.  wakeup_reason is set there, main loop does the resuming.
> See:
> 
>     if (qemu_wakeup_requested()) {
>         pause_all_vcpus();
>         qemu_system_wakeup();
>         notifier_list_notify(&wakeup_notifiers, &wakeup_reason);
>         wakeup_reason = QEMU_WAKEUP_REASON_NONE;
>         resume_all_vcpus();
>         qapi_event_send_wakeup();
>     }

Agreed, we can rely on that main loop code to call resume_all_vcpus, and not
modify qemu_system_wakeup_request.  That is cleaner. 

>> How is that better than my patches
>>     [PATCH V3 01/10] vl: start on wakeup request
>>     [PATCH V3 02/10] migration: preserve suspended runstate
>>
>>     loadvm_postcopy_handle_run_bh()
>>         runstate = global_state_get_runstate();
>>         if (runstate == RUN_STATE_RUNNING)
>>             vm_start()
>>         else
>>             runstate_set(runstate);    // eg RUN_STATE_SUSPENDED
>>
>>     qemu_system_wakeup_request()
>>         if (!vm_started)
>>             vm_start();
>>         else
>>             runstate_set(RUN_STATE_RUNNING);
>>
>> Recall this thread started with your comment "It then can avoid touching the 
>> system wakeup code which seems cleaner".  We still need to touch the wakeup
>> code.
> 
> Let me provide some code and reply to your new patchset inlines.

Thank you, I have more comments there.

- Steve



  reply	other threads:[~2023-08-24 20:53 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-15 20:26 [PATCH V1 0/3] fix migration of suspended runstate Steve Sistare
2023-06-15 20:26 ` [PATCH V1 1/3] vl: start on wakeup request Steve Sistare
2023-06-15 20:26 ` [PATCH V1 2/3] migration: fix suspended runstate Steve Sistare
2023-06-20 21:46   ` Peter Xu
2023-06-21 19:15     ` Steven Sistare
2023-06-21 20:28       ` Peter Xu
2023-06-23 18:25         ` Steven Sistare
2023-06-23 19:56           ` Steven Sistare
2023-06-26 18:27           ` Peter Xu
2023-06-26 19:11             ` Peter Xu
2023-06-30 13:50             ` Steven Sistare
2023-07-26 20:18               ` Peter Xu
2023-08-14 18:53                 ` Steven Sistare
2023-08-14 19:37                   ` Peter Xu
2023-08-16 17:48                     ` Steven Sistare
2023-08-17 18:19                       ` Peter Xu
2023-08-24 20:53                         ` Steven Sistare [this message]
2023-06-15 20:26 ` [PATCH V1 3/3] tests/qtest: live migration suspended state Steve Sistare
2023-06-21 16:45   ` Peter Xu
2023-06-21 19:39     ` Steven Sistare
2023-06-21 20:00       ` Peter Xu
2023-06-22 21:28         ` Steven Sistare

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b64bae85-1665-2ea5-3690-12cd5e67620b@oracle.com \
    --to=steven.sistare@oracle.com \
    --cc=berrange@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).