From: "manish.mishra" <manish.mishra@nutanix.com>
To: Peter Xu <peterx@redhat.com>
Cc: qemu-devel@nongnu.org,
Leonardo Bras Soares Passos <lsoaresp@redhat.com>,
"Daniel P . Berrange" <berrange@redhat.com>,
"Dr . David Alan Gilbert" <dgilbert@redhat.com>,
Juan Quintela <quintela@redhat.com>
Subject: Re: [PATCH v5 13/21] migration: Postcopy recover with preempt enabled
Date: Mon, 16 May 2022 21:02:16 +0530 [thread overview]
Message-ID: <a2853510-bf69-a415-31f5-fd54c68fbf57@nutanix.com> (raw)
In-Reply-To: <85b85303-e4e8-77be-c65d-76018ac7704c@nutanix.com>
[-- Attachment #1: Type: text/plain, Size: 4993 bytes --]
On 16/05/22 8:21 pm, manish.mishra wrote:
>
>
> On 16/05/22 7:41 pm, Peter Xu wrote:
>> Hi, Manish,
>>
>> On Mon, May 16, 2022 at 07:01:35PM +0530, manish.mishra wrote:
>>> On 26/04/22 5:08 am, Peter Xu wrote:
>>> LGTM,
>>> Peter, I wanted to give review-tag for this and ealier patch too. I am new
>>> to qemu
>>> review process so not sure how give review-tag, did not find any reference
>>> on
>>> google too. So if you please let me know how to do it.
>> It's here:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__git.qemu.org_-3Fp-3Dqemu.git-3Ba-3Dblob-3Bf-3Ddocs_devel_submitting-2Da-2Dpatch.rst-3Bhb-3DHEAD-23l492&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=8LU6rphEJ5GMSXEpSxe8JZ_hpn6TQDUXfjWM6Vt7DdShxnU3X5zYXbAMBLPYchdK&s=TUNUCtdl7LWhrdlfnIx1F08kC0d9IMvArl6cNMpfXkc&e=
>>
>> Since afaict QEMU is mostly following what Linux does, you can also
>> reference to this one with more context:
>>
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.kernel.org_doc_html_v4.17_process_submitting-2Dpatches.html-23using-2Dreported-2Dby-2Dtested-2Dby-2Dreviewed-2Dby-2Dsuggested-2Dby-2Dand-2Dfixes&d=DwIBaQ&c=s883GpUCOChKOHiocYtGcg&r=c4KON2DiMd-szjwjggQcuUvTsPWblztAL0gVzaHnNmc&m=8LU6rphEJ5GMSXEpSxe8JZ_hpn6TQDUXfjWM6Vt7DdShxnU3X5zYXbAMBLPYchdK&s=TJmr_eC4LAccVY1EqgkLleXfJhUgtIjTJmLc3cedYr0&e=
>>
>> But since you're still having question regarding this patch, no rush on
>> providing your R-bs; let's finish the discussion first.
>>
>> [...]
>>
>>>> +static void postcopy_pause_ram_fast_load(MigrationIncomingState *mis)
>>>> +{
>>>> + trace_postcopy_pause_fast_load();
>>>> + qemu_mutex_unlock(&mis->postcopy_prio_thread_mutex);
>>> I may have misunderstood synchronisation here but very very rare chance,
>>>
>>> as both threads are working independently is it possible qemu_sem_post on
>>>
>>> postcopy_pause_sem_fast_load is called by main thread even before we go
>>>
>>> to qemu_sem_wait in next line, causing some kind of deadlock. That's should
>>>
>>> not be possible as i understand it requires manually calling
>>> qmp_migration_recover
>>>
>>> so chances are almost impossible. Just wanted to confirm it.
>> Sorry I don't quite get the question, could you elaborate? E.g., when the
>> described deadlock happened, what is both main thread and preempt load
>> thread doing? What are they waiting at?
>>
>> Note: we have already released mutex before waiting on sem.
>
> What i meant here is deadlock could be due the reason that we infinately wait
>
> on qemu_sem_wait(&mis->postcopy_pause_sem_fast_load), because main
>
> thread already called post on postcopy_pause_sem_fast_load after recovery
>
> even before we moved to qemu_sem_wait(&mis->postcopy_pause_sem_fast_load)
>
> in next line. Basically if we miss a post on postcopy_pause_sem_fast_load.
>
> This is nearly impossibily case becuase it requires full recovery path to be completed
>
> before this thread executes just next line. Also as recovery needs to be called manually,
>
> So please ignore this.
>
> Basically i wanted to check if we should use something like
>
> int pthread_cond_wait(pthread_cond_t *restrict cond,
> pthread_mutex_t *restrict mutex);
>
> so that there is no race between releasing mutex and calling wait.
>
Really sorry, please ignore this it is sem_post and sem_wait so that is not possible.
>
>>>> + qemu_sem_wait(&mis->postcopy_pause_sem_fast_load);
>>> Just wanted to confirm why postcopy_pause_incoming is not called here
>>> itself.
>> postcopy_pause_incoming() is only used in the main ram load thread, while
>> this function (postcopy_pause_ram_fast_load) is only called by the preempt
>> load thread.
>>
> ok got it, thanks Peter, i meant if we should close both the channels as soon
>
> as we relise there is some failure instead of main thread waiting for error event
>
> and then closing and pausing post-copy. But agree current approach is good.
>
>>> Is it based on assumption that if there is error in any of the channel it
>>> will
>>>
>>> eventually be paused on source side, closing both channels, resulting
>>>
>>> postcopy_pause_incoming will be called from main thread on destination?
>> Yes.
>>
>>> Usually it should be good to call as early as possible. It is left to main
>>>
>>> thread default path so that we do not have any synchronisation overhead?
>> What's the sync overhead you mentioned? What we want to do here is simply
>> to put all the dest QEMU migration threads into a halted state rather than
>> quitting, so that they can be continued when necessary.
>>
>>> Also Peter, i was trying to understand postcopy recovery model so is use
>>> case
>>>
>>> of qmp_migrate_pause just for debugging purpose?
>> Yes. It's also a way to cleanly stop using the network (comparing to force
>> unplug the nic ports?) for whatever reason with a shutdown() syscall upon
>> the socket. I just don't know whether there's any real use case of that in
>> reality.
>>
>> Thanks,
>>
[-- Attachment #2: Type: text/html, Size: 8611 bytes --]
next prev parent reply other threads:[~2022-05-16 16:40 UTC|newest]
Thread overview: 40+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-25 23:38 [PATCH v5 00/21] migration: Postcopy Preemption Peter Xu
2022-04-25 23:38 ` [PATCH v5 01/21] tests: fix encoding of IP addresses in x509 certs Peter Xu
2022-04-25 23:38 ` [PATCH v5 02/21] tests: add more helper macros for creating TLS " Peter Xu
2022-04-25 23:38 ` [PATCH v5 03/21] tests: add migration tests of TLS with PSK credentials Peter Xu
2022-04-25 23:38 ` [PATCH v5 04/21] tests: add migration tests of TLS with x509 credentials Peter Xu
2022-04-25 23:38 ` [PATCH v5 05/21] tests: convert XBZRLE migration test to use common helper Peter Xu
2022-04-25 23:38 ` [PATCH v5 06/21] tests: convert multifd migration tests " Peter Xu
2022-04-25 23:38 ` [PATCH v5 07/21] tests: add multifd migration tests of TLS with PSK credentials Peter Xu
2022-04-25 23:38 ` [PATCH v5 08/21] tests: add multifd migration tests of TLS with x509 credentials Peter Xu
2022-04-25 23:38 ` [PATCH v5 09/21] tests: ensure migration status isn't reported as failed Peter Xu
2022-04-25 23:38 ` [PATCH v5 10/21] migration: Add postcopy-preempt capability Peter Xu
2022-04-25 23:38 ` [PATCH v5 11/21] migration: Postcopy preemption preparation on channel creation Peter Xu
2022-05-11 17:08 ` manish.mishra
2022-05-12 16:36 ` Peter Xu
2022-04-25 23:38 ` [PATCH v5 12/21] migration: Postcopy preemption enablement Peter Xu
2022-04-25 23:38 ` [PATCH v5 13/21] migration: Postcopy recover with preempt enabled Peter Xu
2022-05-16 13:31 ` manish.mishra
2022-05-16 14:11 ` Peter Xu
2022-05-16 14:51 ` manish.mishra
2022-05-16 15:32 ` manish.mishra [this message]
2022-05-16 15:58 ` Peter Xu
2022-04-25 23:38 ` [PATCH v5 14/21] migration: Create the postcopy preempt channel asynchronously Peter Xu
2022-05-12 17:35 ` Dr. David Alan Gilbert
2022-05-16 15:02 ` manish.mishra
2022-05-16 16:21 ` Peter Xu
2022-04-25 23:38 ` [PATCH v5 15/21] migration: Parameter x-postcopy-preempt-break-huge Peter Xu
2022-05-12 17:42 ` Dr. David Alan Gilbert
2022-05-16 15:20 ` manish.mishra
2022-04-25 23:38 ` [PATCH v5 16/21] migration: Add helpers to detect TLS capability Peter Xu
2022-05-12 17:46 ` Dr. David Alan Gilbert
2022-04-25 23:38 ` [PATCH v5 17/21] migration: Export tls-[creds|hostname|authz] params to cmdline too Peter Xu
2022-05-12 18:03 ` Dr. David Alan Gilbert
2022-05-12 19:05 ` Daniel P. Berrangé
2022-05-12 20:07 ` Peter Xu
2022-04-25 23:38 ` [PATCH v5 18/21] migration: Enable TLS for preempt channel Peter Xu
2022-04-25 23:38 ` [PATCH v5 19/21] tests: Add postcopy tls migration test Peter Xu
2022-04-25 23:38 ` [PATCH v5 20/21] tests: Add postcopy tls recovery " Peter Xu
2022-04-25 23:38 ` [PATCH v5 21/21] tests: Add postcopy preempt tests Peter Xu
2022-05-16 15:25 ` [PATCH v5 00/21] migration: Postcopy Preemption manish.mishra
2022-05-16 16:06 ` Peter Xu
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a2853510-bf69-a415-31f5-fd54c68fbf57@nutanix.com \
--to=manish.mishra@nutanix.com \
--cc=berrange@redhat.com \
--cc=dgilbert@redhat.com \
--cc=lsoaresp@redhat.com \
--cc=peterx@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=quintela@redhat.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).