Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Yong Huang <yong.huang@smartx.com>
To: gudkov.andrei@huawei.com
Cc: qemu-devel@nongnu.org, quintela@redhat.com, peterx@redhat.com,
	 leobras@redhat.com, eblake@redhat.com, armbru@redhat.com
Subject: Re: [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period
Date: Thu, 10 Aug 2023 14:39:37 +0800	[thread overview]
Message-ID: <CAK9dgmY6WKr67UM1gNaMKey+cMC-pG=PTafBerDuoMqWqbyBkA@mail.gmail.com> (raw)
In-Reply-To: <ZNOqnLFaD/At4poY@DESKTOP-0LHM7NF.china.huawei.com>

[-- Attachment #1: Type: text/plain, Size: 18848 bytes --]

On Wed, Aug 9, 2023 at 11:03 PM <gudkov.andrei@huawei.com> wrote:

> On Sun, Aug 06, 2023 at 02:16:34PM +0800, Yong Huang wrote:
> > On Fri, Aug 4, 2023 at 11:03 PM Andrei Gudkov <gudkov.andrei@huawei.com>
> > wrote:
> >
> > > Introduces alternative argument calc-time-ms, which is the
> > > the same as calc-time but accepts millisecond value.
> > > Millisecond granularity allows to make predictions whether
> > > migration will succeed or not. To do this, calculate dirty
> > > rate with calc-time-ms set to max allowed downtime, convert
> > > measured rate into volume of dirtied memory, and divide by
> > > network throughput. If the value is lower than max allowed
> > >
> > Not for the patch, I'm just curious about how the predication
> > decides the network throughput, I mean QEMU predicts
> > if migration will converge based on how fast it sends the data,
> > not the actual bandwidth of the interface, which one the
> > prediction uses?
> >
> Currently I use network nominal bandwidth, e.g. 1gbps. It would
> be nice, of course, to use measured throughput since it can take
> into account network headers overhead (as Wang Lei previously
> mentioned), compression, etc., but it looks too complicated to
> implement outside of migration process.
>
> > > downtime, then migration will converge.
> > >
> > > Measurement results for single thread randomly writing to
> > > a 1/4/24GiB memory region:
> > >
> > > +--------------+-----------------------------------------------+
> > > | calc-time-ms |                dirty rate MiB/s               |
> > > |              +----------------+---------------+--------------+
> > > |              | theoretical    | page-sampling | dirty-bitmap |
> > > |              | (at 3M wr/sec) |               |              |
> > > +--------------+----------------+---------------+--------------+
> > > |                             1GiB                             |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |           6996 |          7100 |         3192 |
> > > |          200 |           4606 |          4660 |         2655 |
> > > |          300 |           3305 |          3280 |         2371 |
> > > |          400 |           2534 |          2525 |         2154 |
> > > |          500 |           2041 |          2044 |         1871 |
> > > |          750 |           1365 |          1341 |         1358 |
> > > |         1000 |           1024 |          1052 |         1025 |
> > > |         1500 |            683 |           678 |          684 |
> > > |         2000 |            512 |           507 |          513 |
> > > +--------------+----------------+---------------+--------------+
> > > |                             4GiB                             |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |          10232 |          8880 |         4070 |
> > > |          200 |           8954 |          8049 |         3195 |
> > > |          300 |           7889 |          7193 |         2881 |
> > > |          400 |           6996 |          6530 |         2700 |
> > > |          500 |           6245 |          5772 |         2312 |
> > > |          750 |           4829 |          4586 |         2465 |
> > > |         1000 |           3865 |          3780 |         2178 |
> > > |         1500 |           2694 |          2633 |         2004 |
> > > |         2000 |           2041 |          2031 |         1789 |
> > > +--------------+----------------+---------------+--------------+
> > > |                             24GiB                            |
> > > +--------------+----------------+---------------+--------------+
> > > |          100 |          11495 |          8640 |         5597 |
> > > |          200 |          11226 |          8616 |         3527 |
> > > |          300 |          10965 |          8386 |         2355 |
> > > |          400 |          10713 |          8370 |         2179 |
> > > |          500 |          10469 |          8196 |         2098 |
> > > |          750 |           9890 |          7885 |         2556 |
> > > |         1000 |           9354 |          7506 |         2084 |
> > > |         1500 |           8397 |          6944 |         2075 |
> > > |         2000 |           7574 |          6402 |         2062 |
> > > +--------------+----------------+---------------+--------------+
> > >
> > > Theoretical values are computed according to the following formula:
> > > size * (1 - (1-(4096/size))^(time*wps)) / (time * 2^20),
> > > where size is in bytes, time is in seconds, and wps is number of
> > > writes per second.
> > >
> > > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> > > ---
> > >  qapi/migration.json   | 14 ++++++--
> > >  migration/dirtyrate.h | 12 ++++---
> > >  migration/dirtyrate.c | 81 +++++++++++++++++++++++++------------------
> > >  3 files changed, 67 insertions(+), 40 deletions(-)
> > >
> > > [...]
> >
> > > diff --git a/qapi/migration.json b/qapi/migration.json
> > > index 8843e74b59..82493d6a57 100644
> > > --- a/qapi/migration.json
> > > +++ b/qapi/migration.json
> > > @@ -1849,7 +1849,11 @@
> > >  # @start-time: start time in units of second for calculation
> > >  #
> > >  # @calc-time: time period for which dirty page rate was measured
> > > -#     (in seconds)
> > > +#     (rounded down to seconds).
> > >
> > Does there need an extra comment to emphasize that calc-time shows
> > zero if the calc-time-ms is lower than 1000?
> >
> > > +#
> > > +# @calc-time-ms: actual time period for which dirty page rate was
> > > +#     measured (in milliseconds).  Value may be larger than requested
> > > +#     time period due to measurement overhead.
> > >  #
> > >  # @sample-pages: number of sampled pages per GiB of guest memory.
> > >  #     Valid only in page-sampling mode (Since 6.1)
> > > @@ -1866,6 +1870,7 @@
> > >             'status': 'DirtyRateStatus',
> > >             'start-time': 'int64',
> > >             'calc-time': 'int64',
> > > +           'calc-time-ms': 'int64',
> > >             'sample-pages': 'uint64',
> > >             'mode': 'DirtyRateMeasureMode',
> > >             '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > > @@ -1908,6 +1913,10 @@
> > >  #     dirty during @calc-time period, further writes to this page will
> > >  #     not increase dirty page rate anymore.
> > >  #
> > > +# @calc-time-ms: the same as @calc-time but in milliseconds.  These
> > > +#    two arguments are mutually exclusive.  Exactly one of them must
> > > +#    be specified. (Since 8.1)
> > > +#
> > >  # @sample-pages: number of sampled pages per each GiB of guest memory.
> > >  #     Default value is 512.  For 4KiB guest pages this corresponds to
> > >  #     sampling ratio of 0.2%.  This argument is used only in page
> > > @@ -1925,7 +1934,8 @@
> > >  #                                                 'sample-pages':
> 512} }
> > >  # <- { "return": {} }
> > >  ##
> > > -{ 'command': 'calc-dirty-rate', 'data': {'calc-time': 'int64',
> > > +{ 'command': 'calc-dirty-rate', 'data': {'*calc-time': 'int64',
> > > +                                         '*calc-time-ms': 'int64',
> > >                                           '*sample-pages': 'int',
> > >                                           '*mode':
> 'DirtyRateMeasureMode'}
> > > }
> > >
> > > diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> > > index 594a5c0bb6..869c060941 100644
> > > --- a/migration/dirtyrate.h
> > > +++ b/migration/dirtyrate.h
> > > @@ -31,10 +31,12 @@
> > >  #define MIN_RAMBLOCK_SIZE                         128
> > >
> > >  /*
> > > - * Take 1s as minimum time for calculation duration
> > > + * Allowed range for dirty page rate calculation (in milliseconds).
> > > + * Lower limit relates to the smallest realistic downtime it
> > > + * makes sense to impose on migration.
> > >   */
> > > -#define MIN_FETCH_DIRTYRATE_TIME_SEC              1
> > > -#define MAX_FETCH_DIRTYRATE_TIME_SEC              60
> > > +#define MIN_CALC_TIME_MS                          50
> > > +#define MAX_CALC_TIME_MS                       60000
> > >
> > >  /*
> > >   * Take 1/16 pages in 1G as the maxmum sample page count
> > > @@ -44,7 +46,7 @@
> > >
> > >  struct DirtyRateConfig {
> > >      uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
> > > -    int64_t sample_period_seconds; /* time duration between two
> sampling
> > > */
> > > +    int64_t calc_time_ms; /* desired calculation time (in
> milliseconds) */
> > >      DirtyRateMeasureMode mode; /* mode of dirtyrate measurement */
> > >  };
> > >
> > > @@ -73,7 +75,7 @@ typedef struct SampleVMStat {
> > >  struct DirtyRateStat {
> > >      int64_t dirty_rate; /* dirty rate in MB/s */
> > >      int64_t start_time; /* calculation start time in units of second
> */
> > > -    int64_t calc_time; /* time duration of two sampling in units of
> > > second */
> > > +    int64_t calc_time_ms; /* actual calculation time (in
> milliseconds) */
> > >      uint64_t sample_pages; /* sample pages per GB */
> > >      union {
> > >          SampleVMStat page_sampling;
> >
> > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> > > index 84f1b0fb20..90fb336329 100644
> > > --- a/migration/dirtyrate.c
> > > +++ b/migration/dirtyrate.c
> > > @@ -57,6 +57,8 @@ static int64_t dirty_stat_wait(int64_t msec, int64_t
> > > initial_time)
> > >          msec = current_time - initial_time;
> > >      } else {
> > >          g_usleep((msec + initial_time - current_time) * 1000);
> > > +        /* g_usleep may overshoot */
> > > +        msec = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - initial_time;
> > >
> > The optimization could be a standalone commit along with the content
> > below(see the following comment)?
> >
> OK, I will move it into separate commit.
>
> > >      }
> > >
> > >      return msec;
> > > @@ -77,9 +79,12 @@ static int64_t
> do_calculate_dirtyrate(DirtyPageRecord
> > > dirty_pages,
> > >  {
> > >      uint64_t increased_dirty_pages =
> > >          dirty_pages.end_pages - dirty_pages.start_pages;
> > > -    uint64_t memory_size_MiB =
> > > qemu_target_pages_to_MiB(increased_dirty_pages);
> > > -
> > > -    return memory_size_MiB * 1000 / calc_time_ms;
> > > +    /*
> > > +     * multiply by 1000ms/s _before_ converting down to megabytes
> > > +     * to avoid losing precision
> > > +     */
> > > +    return qemu_target_pages_to_MiB(increased_dirty_pages * 1000) /
> > > +        calc_time_ms;
> > >
> > Code optimization, could be in a standalone commit.
> >
> OK, but note that it is an important optimization. Imagine that
> calc_time_ms=100 and increased_dirty_pages=1000. If we compute without
> this optimization, we will get only 1000/(2^8)*1000/100=30. With
> optmization: 1000*1000/(2^8)/100=39.
>
> Good work, and thanks for this patch. :)


> > >  }
> > >
> > >  void global_dirty_log_change(unsigned int flag, bool start)
> > > @@ -183,10 +188,9 @@ retry:
> > >      return duration;
> > >  }
> > >
> > > -static bool is_sample_period_valid(int64_t sec)
> > > +static bool is_calc_time_valid(int64_t msec)
> > >  {
> > > -    if (sec < MIN_FETCH_DIRTYRATE_TIME_SEC ||
> > > -        sec > MAX_FETCH_DIRTYRATE_TIME_SEC) {
> > > +    if ((msec < MIN_CALC_TIME_MS) || (msec > MAX_CALC_TIME_MS)) {
> > >          return false;
> > >      }
> > >
> > > @@ -219,7 +223,8 @@ static struct DirtyRateInfo
> > > *query_dirty_rate_info(void)
> > >
> > >      info->status = CalculatingState;
> > >      info->start_time = DirtyStat.start_time;
> > > -    info->calc_time = DirtyStat.calc_time;
> > > +    info->calc_time_ms = DirtyStat.calc_time_ms;
> > > +    info->calc_time = DirtyStat.calc_time_ms / 1000;
> > >      info->sample_pages = DirtyStat.sample_pages;
> > >      info->mode = dirtyrate_mode;
> > >
> > > @@ -258,7 +263,7 @@ static void init_dirtyrate_stat(int64_t start_time,
> > >  {
> > >      DirtyStat.dirty_rate = -1;
> > >      DirtyStat.start_time = start_time;
> > > -    DirtyStat.calc_time = config.sample_period_seconds;
> > > +    DirtyStat.calc_time_ms = config.calc_time_ms;
> > >      DirtyStat.sample_pages = config.sample_pages_per_gigabytes;
> > >
> > >      switch (config.mode) {
> > > @@ -568,7 +573,6 @@ static inline void
> dirtyrate_manual_reset_protect(void)
> > >
> > >  static void calculate_dirtyrate_dirty_bitmap(struct DirtyRateConfig
> > > config)
> > >  {
> > > -    int64_t msec = 0;
> > >      int64_t start_time;
> > >      DirtyPageRecord dirty_pages;
> > >
> > > @@ -596,9 +600,7 @@ static void calculate_dirtyrate_dirty_bitmap(struct
> > > DirtyRateConfig config)
> > >      start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> > >      DirtyStat.start_time = start_time / 1000;
> > >
> > > -    msec = config.sample_period_seconds * 1000;
> > > -    msec = dirty_stat_wait(msec, start_time);
> > > -    DirtyStat.calc_time = msec / 1000;
> > > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > > start_time);
> > >
> > >      /*
> > >       * do two things.
> > > @@ -609,12 +611,12 @@ static void
> calculate_dirtyrate_dirty_bitmap(struct
> > > DirtyRateConfig config)
> > >
> > >      record_dirtypages_bitmap(&dirty_pages, false);
> > >
> > > -    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages, msec);
> > > +    DirtyStat.dirty_rate = do_calculate_dirtyrate(dirty_pages,
> > > +
> DirtyStat.calc_time_ms);
> > >  }
> > >
> > >  static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig
> config)
> > >  {
> > > -    int64_t duration;
> > >      uint64_t dirtyrate = 0;
> > >      uint64_t dirtyrate_sum = 0;
> > >      int i = 0;
> > > @@ -625,12 +627,10 @@ static void calculate_dirtyrate_dirty_ring(struct
> > > DirtyRateConfig config)
> > >      DirtyStat.start_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) /
> 1000;
> > >
> > >      /* calculate vcpu dirtyrate */
> > > -    duration = vcpu_calculate_dirtyrate(config.sample_period_seconds *
> > > 1000,
> > > -                                        &DirtyStat.dirty_ring,
> > > -                                        GLOBAL_DIRTY_DIRTY_RATE,
> > > -                                        true);
> > > -
> > > -    DirtyStat.calc_time = duration / 1000;
> > > +    DirtyStat.calc_time_ms =
> vcpu_calculate_dirtyrate(config.calc_time_ms,
> > > +
> > > &DirtyStat.dirty_ring,
> > > +
> > > GLOBAL_DIRTY_DIRTY_RATE,
> > > +                                                      true);
> > >
> > >      /* calculate vm dirtyrate */
> > >      for (i = 0; i < DirtyStat.dirty_ring.nvcpu; i++) {
> > > @@ -646,7 +646,6 @@ static void calculate_dirtyrate_sample_vm(struct
> > > DirtyRateConfig config)
> > >  {
> > >      struct RamblockDirtyInfo *block_dinfo = NULL;
> > >      int block_count = 0;
> > > -    int64_t msec = 0;
> > >      int64_t initial_time;
> > >
> > >      rcu_read_lock();
> > > @@ -656,17 +655,16 @@ static void calculate_dirtyrate_sample_vm(struct
> > > DirtyRateConfig config)
> > >      }
> > >      rcu_read_unlock();
> > >
> > > -    msec = config.sample_period_seconds * 1000;
> > > -    msec = dirty_stat_wait(msec, initial_time);
> > > +    DirtyStat.calc_time_ms = dirty_stat_wait(config.calc_time_ms,
> > > +                                             initial_time);
> > >      DirtyStat.start_time = initial_time / 1000;
> > > -    DirtyStat.calc_time = msec / 1000;
> > >
> > >      rcu_read_lock();
> > >      if (!compare_page_hash_info(block_dinfo, block_count)) {
> > >          goto out;
> > >      }
> > >
> > > -    update_dirtyrate(msec);
> > > +    update_dirtyrate(DirtyStat.calc_time_ms);
> > >
> > >  out:
> > >      rcu_read_unlock();
> > > @@ -711,7 +709,10 @@ void *get_dirtyrate_thread(void *arg)
> > >      return NULL;
> > >  }
> > >
> > > -void qmp_calc_dirty_rate(int64_t calc_time,
> > > +void qmp_calc_dirty_rate(bool has_calc_time,
> > > +                         int64_t calc_time,
> > > +                         bool has_calc_time_ms,
> > > +                         int64_t calc_time_ms,
> > >                           bool has_sample_pages,
> > >                           int64_t sample_pages,
> > >                           bool has_mode,
> > > @@ -731,10 +732,21 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> > >          return;
> > >      }
> > >
> > > -    if (!is_sample_period_valid(calc_time)) {
> > > -        error_setg(errp, "calc-time is out of range[%d, %d].",
> > > -                         MIN_FETCH_DIRTYRATE_TIME_SEC,
> > > -                         MAX_FETCH_DIRTYRATE_TIME_SEC);
> > > +    if ((int)has_calc_time + (int)has_calc_time_ms != 1) {
> > > +        error_setg(errp, "Exactly one of calc-time and calc-time-ms
> must"
> > > +                         " be specified");
> > > +        return;
> > > +    }
> > > +    if (has_calc_time) {
> > > +        /*
> > > +         * The worst thing that can happen due to overflow is that
> > > +         * invalid value will become valid.
> > > +         */
> > > +        calc_time_ms = calc_time * 1000;
> > > +    }
> > > +    if (!is_calc_time_valid(calc_time_ms)) {
> > > +        error_setg(errp, "Calculation time is out of range[%dms,
> %dms].",
> > > +                         MIN_CALC_TIME_MS, MAX_CALC_TIME_MS);
> > >          return;
> > >      }
> > >
> > > @@ -781,7 +793,7 @@ void qmp_calc_dirty_rate(int64_t calc_time,
> > >          return;
> > >      }
> > >
> > > -    config.sample_period_seconds = calc_time;
> > > +    config.calc_time_ms = calc_time_ms;
> > >      config.sample_pages_per_gigabytes = sample_pages;
> > >      config.mode = mode;
> > >
> > > @@ -867,8 +879,11 @@ void hmp_calc_dirty_rate(Monitor *mon, const QDict
> > > *qdict)
> > >          mode = DIRTY_RATE_MEASURE_MODE_DIRTY_RING;
> > >      }
> > >
> > > -    qmp_calc_dirty_rate(sec, has_sample_pages, sample_pages, true,
> > > -                        mode, &err);
> > > +    qmp_calc_dirty_rate(true, sec, /* calc_time */
> > > +                        false, 0, /* calc_time_ms */
> > > +                        has_sample_pages, sample_pages,
> > > +                        true, mode,
> > > +                        &err);
> > >      if (err) {
> > >          hmp_handle_error(mon, err);
> > >          return;
> > > --
> > > 2.30.2
> > >
> > > The patch set works for me, and I'm inclined to split it into two
> commits
> > as I point out above.
> >
> > Thanks
> >
> > Yong
> >
> > --
> > Best regards
>


-- 
Best regards

[-- Attachment #2: Type: text/html, Size: 25142 bytes --]

next prev parent reply	other threads:[~2023-08-10  6:42 UTC|newest]

Thread overview: 12+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-04 15:03 [PATCH v2] migration/calc-dirty-rate: millisecond-granularity period Andrei Gudkov via
2023-08-04 16:14 ` Peter Xu
2023-08-04 18:04 ` Markus Armbruster
2023-08-06  6:31   ` Yong Huang
2023-08-09 13:59     ` gudkov.andrei--- via
2023-08-10  6:37       ` Yong Huang
2023-08-06  6:16 ` Yong Huang
2023-08-09 15:02   ` gudkov.andrei--- via
2023-08-09 15:26     ` Peter Xu
2023-08-10  6:39     ` Yong Huang [this message]
2023-08-26  8:32     ` Yong Huang
2023-08-28  6:23       ` gudkov.andrei--- via

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAK9dgmY6WKr67UM1gNaMKey+cMC-pG=PTafBerDuoMqWqbyBkA@mail.gmail.com' \
    --to=yong.huang@smartx.com \
    --cc=armbru@redhat.com \
    --cc=eblake@redhat.com \
    --cc=gudkov.andrei@huawei.com \
    --cc=leobras@redhat.com \
    --cc=peterx@redhat.com \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).