From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.lore.kernel.org (Postfix) with ESMTPS id 14460EB64D9 for ; Thu, 6 Jul 2023 19:24:51 +0000 (UTC) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qHUa1-0004EF-Rs; Thu, 06 Jul 2023 15:23:53 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qHUa0-0004Dr-1g for qemu-devel@nongnu.org; Thu, 06 Jul 2023 15:23:52 -0400 Received: from us-smtp-delivery-124.mimecast.com ([170.10.129.124]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qHUZx-0003xX-Qr for qemu-devel@nongnu.org; Thu, 06 Jul 2023 15:23:51 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1688671428; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/TA+3DlLBLcqxH+3cewi4g8RZAGknK9aI8er5j9Qf8Y=; b=HExfAVTAA4dPZsE3jnLjd64FJjtoLTf0XABEUnU8BB6e3/Lci+mdqIvXHi4+h8FTcT4n9G XaTGJ6AaNQTa7BhGhqSjf7z3HerrPvq7BP62d1+gmXrP6dL/Ky5IWHSfT9yh/uFW3sKG2r UgdBvpQfltfFN7snlUIgT9cg20DnpbM= Received: from mail-qv1-f71.google.com (mail-qv1-f71.google.com [209.85.219.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-473-r6uT2m7KOWyZkMMTy1ES0g-1; Thu, 06 Jul 2023 15:23:47 -0400 X-MC-Unique: r6uT2m7KOWyZkMMTy1ES0g-1 Received: by mail-qv1-f71.google.com with SMTP id 6a1803df08f44-635eb5b04e1so2625866d6.1 for ; Thu, 06 Jul 2023 12:23:47 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1688671426; x=1689276226; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=/TA+3DlLBLcqxH+3cewi4g8RZAGknK9aI8er5j9Qf8Y=; b=ZBZBC6OxO3pKRv8v8mpsW+vGvrS8+YjWPkSSh0nN6H2Wgng8unfFnUGKf5TOI0SoKq B8tJRUUtwIXPyxiVQHyMo7DlKhuJKGGRYDdo0xgAWX0hkCoyGo5DJ7UwyalZW5qQUJRP DUb0AcAVAkudRQozRPAuKGwiY30ml8kqJ24FY2M5iVoZD4c/wM7CmPwv7vTNyt0xJXUd H63XvsigYi0SIGmjmlagvHMXlfwtI/wluqH2vexfOw/m7YuMPlf9yP2xJsGbyE77rIP7 C1rVq+Td/fYWiqoLf4xxXx+Ad40vIKkqvkDRykCv8X2tI69AP4d/XVxSfxXTmY8uvI2k lfEw== X-Gm-Message-State: ABy/qLZ62ulSK7CZTuZUDwvwbkK1cvXp2P7aMAdhYOpwdrP070PlQ2jI 9mjfJbJezAyyi3kw8URfMSUyEzeBdUSwrbKfroauIz6TToYp0C6hvSC3TrjDxM1S90kcfsoqOdF vuW69aRbiy1fiUO0= X-Received: by 2002:a05:6214:411c:b0:62b:5410:322d with SMTP id kc28-20020a056214411c00b0062b5410322dmr3064061qvb.6.1688671426560; Thu, 06 Jul 2023 12:23:46 -0700 (PDT) X-Google-Smtp-Source: APBJJlFc+yS97tP/JxdEiVLXrxcxWU75dKjqzMtBHB4cMVC+CzfYfAcBQWbVCP7TqsHnBKz4DbbKtA== X-Received: by 2002:a05:6214:411c:b0:62b:5410:322d with SMTP id kc28-20020a056214411c00b0062b5410322dmr3064037qvb.6.1688671426027; Thu, 06 Jul 2023 12:23:46 -0700 (PDT) Received: from x1n (cpe5c7695f3aee0-cm5c7695f3aede.cpe.net.cable.rogers.com. [99.254.144.39]) by smtp.gmail.com with ESMTPSA id i10-20020a0cab4a000000b0062119a7a7a3sm1226359qvb.4.2023.07.06.12.23.45 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 06 Jul 2023 12:23:45 -0700 (PDT) Date: Thu, 6 Jul 2023 15:23:43 -0400 From: Peter Xu To: Andrei Gudkov Cc: qemu-devel@nongnu.org, quintela@redhat.com, leobras@redhat.com, eblake@redhat.com, armbru@redhat.com Subject: Re: [PATCH] migration/calc-dirty-rate: millisecond precision period Message-ID: References: <8571da37847f9bb39b84e62ef4998e68ef3c10d1.1688028297.git.gudkov.andrei@huawei.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <8571da37847f9bb39b84e62ef4998e68ef3c10d1.1688028297.git.gudkov.andrei@huawei.com> Received-SPF: pass client-ip=170.10.129.124; envelope-from=peterx@redhat.com; helo=us-smtp-delivery-124.mimecast.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIMWL_WL_HIGH=-0.001, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H4=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: qemu-devel@nongnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org Sender: qemu-devel-bounces+qemu-devel=archiver.kernel.org@nongnu.org On Thu, Jun 29, 2023 at 11:59:03AM +0300, Andrei Gudkov wrote: > Introduces alternative argument calc-time-ms, which is the > the same as calc-time but accepts millisecond value. > Millisecond precision allows to make predictions whether > migration will succeed or not. To do this, calculate dirty > rate with calc-time-ms set to max allowed downtime, convert > measured rate into volume of dirtied memory, and divide by > network throughput. If the value is lower than max allowed > downtime, then migration will converge. > > Measurement results for single thread randomly writing to > a 24GiB region: > +--------------+--------------------+ > | calc-time-ms | dirty-rate (MiB/s) | > +--------------+--------------------+ > | 100 | 1880 | > | 200 | 1340 | > | 300 | 1120 | > | 400 | 1030 | > | 500 | 868 | > | 750 | 720 | > | 1000 | 636 | > | 1500 | 498 | > | 2000 | 423 | > +--------------+--------------------+ Do you mean the dirty workload is constant? Why it differs so much with different calc-time-ms? I also doubt usefulness on huge vms the ms accuracy won't matter much because enable/disable dirty logging overhead can already be ms level for those. > > Signed-off-by: Andrei Gudkov > --- > qapi/migration.json | 15 ++++++-- > migration/dirtyrate.h | 12 ++++--- > migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------ > 3 files changed, 68 insertions(+), 40 deletions(-) > > diff --git a/qapi/migration.json b/qapi/migration.json > index 5bb5ab82a0..dd1afe1982 100644 > --- a/qapi/migration.json > +++ b/qapi/migration.json > @@ -1778,7 +1778,12 @@ > # > # @start-time: start time in units of second for calculation > # > -# @calc-time: time in units of second for sample dirty pages > +# @calc-time: time period for which dirty page rate was measured > +# (rounded down to seconds). > +# > +# @calc-time-ms: actual time period for which dirty page rate was > +# measured (in milliseconds). Value may be larger than requested > +# time period due to measurement overhead. > # > # @sample-pages: page count per GB for sample dirty pages the default > # value is 512 (since 6.1) > @@ -1796,6 +1801,7 @@ > 'status': 'DirtyRateStatus', > 'start-time': 'int64', > 'calc-time': 'int64', > + 'calc-time-ms': 'int64', > 'sample-pages': 'uint64', > 'mode': 'DirtyRateMeasureMode', > '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } } > @@ -1807,6 +1813,10 @@ > # > # @calc-time: time in units of second for sample dirty pages > # > +# @calc-time-ms: the same as @calc-time but in milliseconds. These > +# two arguments are mutually exclusive. Exactly one of them must > +# be specified. (Since 8.1) > +# > # @sample-pages: page count per GB for sample dirty pages the default > # value is 512 (since 6.1) > # > @@ -1821,7 +1831,8 @@ > # 'sample-pages': 512} } > # <- { "return": {} } > ## > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64', > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64', > + '*calc-time-ms': 'int64', > '*sample-pages': 'int', > '*mode': 'DirtyRateMeasureMode'} } > > diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h > index 594a5c0bb6..869c060941 100644 > --- a/migration/dirtyrate.h > +++ b/migration/dirtyrate.h > @@ -31,10 +31,12 @@ > #define MIN_RAMBLOCK_SIZE 128 > > /* > - * Take 1s as minimum time for calculation duration > + * Allowed range for dirty page rate calculation (in milliseconds). > + * Lower limit relates to the smallest realistic downtime it > + * makes sense to impose on migration. > */ > -#define MIN_FETCH_DIRTYRATE_TIME_SEC 1 > -#define MAX_FETCH_DIRTYRATE_TIME_SEC 60 > +#define MIN_CALC_TIME_MS 50 > +#define MAX_CALC_TIME_MS 60000 > > /* > * Take 1/16 pages in 1G as the maxmum sample page count > @@ -44,7 +46,7 @@ > > struct DirtyRateConfig { > uint64_t sample_pages_per_gigabytes; /* sample pages per GB */ > - int64_t sample_period_seconds; /* time duration between two sampling */ > + int64_t calc_time_ms; /* desired calculation time (in milliseconds) */ > DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */ > }; > > @@ -73,7 +75,7 @@ typedef struct SampleVMStat { > struct DirtyRateStat { > int64_t dirty_rate; /* dirty rate in MB/s */ > int64_t start_time; /* calculation start time in units of second */ > - int64_t calc_time; /* time duration of two sampling in units of second */ > + int64_t calc_time_ms; /* actual calculation time (in milliseconds) */ > uint64_t sample_pages; /* sample pages per GB */ > union { > SampleVMStat page_sampling; > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c > index 84f1b0fb20..90fb336329 100644 > --- a/migration/dirtyrate.c > +++ b/migration/dirtyrate.c > @@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t initial_time) > msec = current_time - initial_time; > } else { > g_usleep((msec + initial_time - current_time) * 1000); > + /* g_usleep may overshoot */ > + msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time; This removes the g_usleep(), is it intended? I thought it was the code to do the delay. What is the solution to g_usleep() then? > } > > return msec; > @@ -77,9 +79,12 @@ static int64_t do_calculate_dirtyrate(DirtyPageRecord dirty_pages, > { > uint64_t increased_dirty_pages = > dirty_pages.end_pages - dirty_pages.start_pages; > - uint64_t memory_size_MiB = qemu_target_pages_to_MiB(increased_dirty_pages); > - > - return memory_size_MiB * 1000 / calc_time_ms; > + /* > + * multiply by 1000ms/s _before_ converting down to megabytes > + * to avoid losing precision > + */ > + return qemu_target_pages_to_MiB(increased_dirty_pages * 1000) / > + calc_time_ms; > } > > void global_dirty_log_change(unsigned int flag, bool start) > @@ -183,10 +188,9 @@ retry: > return duration; > } > > -static bool is_sample_period_valid(int64_t sec) > +static bool is_calc_time_valid(int64_t msec) > { > - if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC || > - sec > MAX_FETCH_DIRTYRATE_TIME_SEC) { > + if ((msec < MIN_CALC_TIME_MS) || (msec > MAX_CALC_TIME_MS)) { > return false; > } > > @@ -219,7 +223,8 @@ static struct DirtyRateInfo *query_dirty_rate_info(void) > > info->status = CalculatingState; > info->start_time = DirtyStat.start_time; > - info->calc_time = DirtyStat.calc_time; > + info->calc_time_ms = DirtyStat.calc_time_ms; > + info->calc_time = DirtyStat.calc_time_ms / 1000; > info->sample_pages = DirtyStat.sample_pages; > info->mode = dirtyrate_mode; > > @@ -258,7 +263,7 @@ static void init_dirtyrate_stat(int64_t start_time, > { > DirtyStat.dirty_rate = -1; > DirtyStat.start_time = start_time; > - DirtyStat.calc_time = config.sample_period_seconds; > + DirtyStat.calc_time_ms = config.calc_time_ms; > DirtyStat.sample_pages = config.sample_pages_per_gigabytes; > > switch (config.mode) { > @@ -568,7 +573,6 @@ static inline void dirtyrate_manual_reset_protect(void) > > static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config) > { > - int64_t msec = 0; > int64_t start_time; > DirtyPageRecord dirty_pages; > > @@ -596,9 +600,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config) > start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME); > DirtyStat.start_time = start_time / 1000; > > - msec = config.sample_period_seconds * 1000; > - msec = dirty_stat_wait(msec, start_time); > - DirtyStat.calc_time = msec / 1000; > + DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms, start_time); > > /* > * do two things. > @@ -609,12 +611,12 @@ static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig config) > > record_dirtypages_bitmap(&dirty_pages, false); > > - DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec); > + DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, > + DirtyStat.calc_time_ms); > } > > static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config) > { > - int64_t duration; > uint64_t dirtyrate = 0; > uint64_t dirtyrate_sum = 0; > int i = 0; > @@ -625,12 +627,10 @@ static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config) > DirtyStat.start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) / 1000; > > /* calculate vcpu dirtyrate */ > - duration = vcpu_calculate_dirtyrate(config.sample_period_seconds * 1000, > - &DirtyStat.dirty_ring, > - GLOBAL_DIRTY_DIRTY_RATE, > - true); > - > - DirtyStat.calc_time = duration / 1000; > + DirtyStat.calc_time_ms = vcpu_calculate_dirtyrate(config.calc_time_ms, > + &DirtyStat.dirty_ring, > + GLOBAL_DIRTY_DIRTY_RATE, > + true); > > /* calculate vm dirtyrate */ > for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) { > @@ -646,7 +646,6 @@ static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config) > { > struct RamblockDirtyInfo *block_dinfo = NULL; > int block_count = 0; > - int64_t msec = 0; > int64_t initial_time; > > rcu_read_lock(); > @@ -656,17 +655,16 @@ static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config) > } > rcu_read_unlock(); > > - msec = config.sample_period_seconds * 1000; > - msec = dirty_stat_wait(msec, initial_time); > + DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms, > + initial_time); > DirtyStat.start_time = initial_time / 1000; > - DirtyStat.calc_time = msec / 1000; > > rcu_read_lock(); > if (!compare_page_hash_info(block_dinfo, block_count)) { > goto out; > } > > - update_dirtyrate(msec); > + update_dirtyrate(DirtyStat.calc_time_ms); > > out: > rcu_read_unlock(); > @@ -711,7 +709,10 @@ void *get_dirtyrate_thread(void *arg) > return NULL; > } > > -void qmp_calc_dirty_rate(int64_t calc_time, > +void qmp_calc_dirty_rate(bool has_calc_time, > + int64_t calc_time, > + bool has_calc_time_ms, > + int64_t calc_time_ms, > bool has_sample_pages, > int64_t sample_pages, > bool has_mode, > @@ -731,10 +732,21 @@ void qmp_calc_dirty_rate(int64_t calc_time, > return; > } > > - if (!is_sample_period_valid(calc_time)) { > - error_setg(errp, "calc-time is out of range[%d, %d].", > - MIN_FETCH_DIRTYRATE_TIME_SEC, > - MAX_FETCH_DIRTYRATE_TIME_SEC); > + if ((int)has_calc_time + (int)has_calc_time_ms != 1) { > + error_setg(errp, "Exactly one of calc-time and calc-time-ms must" > + " be specified"); > + return; > + } > + if (has_calc_time) { > + /* > + * The worst thing that can happen due to overflow is that > + * invalid value will become valid. > + */ > + calc_time_ms = calc_time * 1000; > + } > + if (!is_calc_time_valid(calc_time_ms)) { > + error_setg(errp, "Calculation time is out of range[%dms, %dms].", > + MIN_CALC_TIME_MS, MAX_CALC_TIME_MS); > return; > } > > @@ -781,7 +793,7 @@ void qmp_calc_dirty_rate(int64_t calc_time, > return; > } > > - config.sample_period_seconds = calc_time; > + config.calc_time_ms = calc_time_ms; > config.sample_pages_per_gigabytes = sample_pages; > config.mode = mode; > > @@ -867,8 +879,11 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict *qdict) > mode = DIRTY_RATE_MEASURE_MODE_DIRTY_RING; > } > > - qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, true, > - mode, &err); > + qmp_calc_dirty_rate(true, sec, /* calc_time */ > + false, 0, /* calc_time_ms */ > + has_sample_pages, sample_pages, > + true, mode, > + &err); > if (err) { > hmp_handle_error(mon, err); > return; > -- > 2.30.2 > -- Peter Xu