qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Joao Martins <joao.m.martins@oracle.com>
To: Peter Xu <peterx@redhat.com>,
	Elena Ufimtseva <elena.ufimtseva@oracle.com>
Cc: qemu-devel@nongnu.org, eduardo@habkost.net,
	marcel.apfelbaum@gmail.com, philmd@linaro.org,
	wangyanan55@huawei.com, farosas@suse.de, eblake@redhat.com,
	armbru@redhat.com
Subject: Re: [PATCH RFC 2/2] migration: abort on destination if switchover limit exceeded
Date: Tue, 25 Jun 2024 12:38:50 +0100	[thread overview]
Message-ID: <9eeea2a9-b3ef-4791-94de-fb06ad2bd9b4@oracle.com> (raw)
In-Reply-To: <ZnnL42_iDip3hfUh@x1n>

On 24/06/2024 20:41, Peter Xu wrote:
> On Fri, Jun 21, 2024 at 07:32:21AM -0700, Elena Ufimtseva wrote:
>> @@ -2659,6 +2698,18 @@ qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis,
 >>      if (!check_section_footer(f, se)) {
>>          return -EINVAL;
>> @@ -2714,6 +2765,19 @@ qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis,
>>                                      se->instance_id, end_ts - start_ts);
>>      }
>>  
>> +    if (migrate_switchover_abort() && type == QEMU_VM_SECTION_END &&
>> +        mis->downtime_start) {
>> +        mis->downtime_now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>> +        uint64_t dst_downtime = mis->downtime_now - mis->downtime_start;
>> +        if (mis->src_downtime + dst_downtime >= mis->abort_limit) {
>> +            error_report("Shutdown destination migration, migration abort_limit (%lu ms)"
>> +                          "was reached.", mis->abort_limit);
>> +            trace_qemu_loadvm_downtime_abort(mis->abort_limit, dst_downtime,
>> +                                             mis->src_downtime);
>> +            return -EINVAL;
>> +        }
>> +    }
> 
> So this traps both iteration and non-iteration phase.  What if the downtime
> was caused by after these, or unluckily at the last non-iterator device
> state?
> 
> After trying to think about it, I figured this is not easy to do right.
> Also, I start to doubt whether it's definitely a good idea on having this
> in the first place.
> 
> Firstly, I'm wondering how we should treat this new feature
> v.s. downtime_limit.  It's about how the user should set both.
> 
> If this is about "cancel migration if downtime more than xxx",
> then.. AFAICT that's exactly what downtime_limit was "designed" to be..
> It's just that downtime_limit says the other way round, as: "don't
> switchover if the downtime will be more than xxx".
> 
> Then, if the user has option to set both these parameters, what would be
> the right thing to do?  Shouldn't they simply always set both parameters to
> be the same value already?  But if so, what's the point on having two?
> 
> This caused me to think whether the 2nd parameter is needed at all, instead
> whether we should simply make downtime_limit more accurate, so that it will
> start to account more things than before.  It won't be accurate, but the
> same case here: making this new feature "accurate" can also be very hard.
> 

The way I think about it is that current downtime-limit captures nicely the data
part as the only calculations it cares about is how much outstanding data it
sends to the destination (be it VF device state or RAM). This second parameter
captures what is *not* related to data, i.e. costs of hypervisor quiescing the
VM or added latencies in hypervisor *in addition* to sending outstanding data to
destination.

If we were to merge this all into a single parameter (aka downtime-limit) we are
possibility artificially increasing the downtime thanks to relaxing the
oustanding data part, as opposed to trying to capture the switchover cost (hence
the name the parameter) that can't be reflected in the former equation.

So I feel like this needs two parameters whereby this second new parameter
covers 'switchover cost' (hence the name) with current migration algorithm.

	Joao


  reply	other threads:[~2024-06-25 11:39 UTC|newest]

Thread overview: 20+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-21 14:32 [PATCH RFC 0/2] migration: introduce strict SLA Elena Ufimtseva
2024-06-21 14:32 ` [PATCH RFC 1/2] migration: abort when switchover limit exceeded Elena Ufimtseva
2024-06-24 19:19   ` Peter Xu
2024-06-25 19:05   ` Daniel P. Berrangé
2024-06-21 14:32 ` [PATCH RFC 2/2] migration: abort on destination if " Elena Ufimtseva
2024-06-24 19:41   ` Peter Xu
2024-06-25 11:38     ` Joao Martins [this message]
2024-06-25 14:53       ` Peter Xu
2024-06-25 16:31         ` Joao Martins
2024-06-25 19:03           ` Peter Xu
2024-06-26 11:04             ` Joao Martins
2024-06-26 18:41               ` Peter Xu
2024-07-26  7:41                 ` Elena Ufimtseva
2024-08-21 19:45                   ` Peter Xu
2024-06-25 18:37         ` Daniel P. Berrangé
2024-06-26 11:29           ` Joao Martins
2024-06-26 11:34             ` Daniel P. Berrangé
2024-06-26 12:12               ` Joao Martins
2024-06-26 12:20                 ` Daniel P. Berrangé
2024-06-25 19:12   ` Daniel P. Berrangé

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9eeea2a9-b3ef-4791-94de-fb06ad2bd9b4@oracle.com \
    --to=joao.m.martins@oracle.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=eduardo@habkost.net \
    --cc=elena.ufimtseva@oracle.com \
    --cc=farosas@suse.de \
    --cc=marcel.apfelbaum@gmail.com \
    --cc=peterx@redhat.com \
    --cc=philmd@linaro.org \
    --cc=qemu-devel@nongnu.org \
    --cc=wangyanan55@huawei.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).