* [PATCH v2 0/4] Migration time prediction using calc-dirty-rate
@ 2023-04-27 12:42 Andrei Gudkov via
2023-04-27 12:42 ` [PATCH v2 1/4] migration/calc-dirty-rate: replaced CRC32 with xxHash Andrei Gudkov via
` (4 more replies)
0 siblings, 5 replies; 22+ messages in thread
From: Andrei Gudkov via @ 2023-04-27 12:42 UTC (permalink / raw)
To: qemu-devel; +Cc: quintela, eblake, armbru, berrange, zhengchuan, Andrei Gudkov
V1 -> V2:
- Extracted CRC32->xxHash into separate commit
- Extacted @n-zero-samples metric into separate commit
- Added description to qapi about connection between
@n-dirty-samples and @periods arrays
- Added (Since ...) tag to new metrics
---
The overall goal of this patch is to be able to predict time it would
take to migrate VM in precopy mode based on max allowed downtime,
network bandwidth, and metrics collected with "calc-dirty-rate".
Predictor itself is a simple python script that closely follows iterations
of the migration algorithm: compute how long it would take to copy
dirty pages, estimate number of pages dirtied by VM from the beginning
of the last iteration; repeat all over again until estimated iteration time
fits max allowed downtime. However, to get reasonable accuracy, predictor
requires more metrics, which have been implemented into "calc-dirty-rate".
Summary of calc-dirty-rate changes:
1. The most important change is that now calc-dirty-rate produces
a *vector* of dirty page measurements for progressively increasing time
periods: 125ms, 250, 500, 750, 1000, 1500, .., up to specified calc-time.
The motivation behind such change is that number of dirtied pages as
a function of time starting from "clean state" (new migration iteration)
is far from linear. Shape of this function depends on the workload type
and intensity. Measuring number of dirty pages at progressively
increasing periods allows to reconstruct this function using piece-wise
interpolation.
2. New metric added -- number of all-zero pages.
Predictor needs to distinguish between number of zero and non-zero pages
because during migration only 8 byte header is placed on the wire for
all-zero page.
3. Hashing function was changed from CRC32 to xxHash.
This reduces overhead of sampling by ~10 times, which is important since
now some of the measurement periods are sub-second.
4. Other trivial metrics were added for convenience: total number
of VM pages, number of sampled pages, page size.
After these changes output from calc-dirty-rate looks like this:
{
"page-size": 4096,
"periods": [125, 250, 375, 500, 750, 1000, 1500,
2000, 3000, 4001, 6000, 8000, 10000,
15000, 20000, 25000, 30000, 35000,
40000, 45000, 50000, 60000],
"status": "measured",
"sample-pages": 512,
"dirty-rate": 98,
"mode": "page-sampling",
"n-dirty-pages": [33, 78, 119, 151, 217, 236, 293, 336,
425, 505, 620, 756, 898, 1204, 1457,
1723, 1934, 2141, 2328, 2522, 2675, 2958],
"n-sampled-pages": 16392,
"n-zero-pages": 10060,
"n-total-pages": 8392704,
"start-time": 2916750,
"calc-time": 60
}
Passing this data into prediction script, we get the following estimations:
Downtime> | 125ms | 250ms | 500ms | 1000ms | 5000ms | unlim
---------------------------------------------------------------------------
100 Mbps | - | - | - | - | - | 16m59s
1 Gbps | - | - | - | - | - | 1m40s
2 Gbps | - | - | - | - | 1m41s | 50s
2.5 Gbps | - | - | - | - | 1m07s | 40s
5 Gbps | 48s | 46s | 31s | 28s | 25s | 20s
10 Gbps | 13s | 12s | 12s | 12s | 12s | 10s
25 Gbps | 5s | 5s | 5s | 5s | 4s | 4s
40 Gbps | 3s | 3s | 3s | 3s | 3s | 3s
Quality of prediction was tested with YCSB benchmark. Memcached instance
was installed into 32GiB VM, and a client generated a stream of requests.
Between experiments we varied request size distribution, number of threads,
and location of the client (inside or outside the VM).
After short preheat phase, we measured calc-dirty-rate:
1. {"execute": "calc-dirty-rate", "arguments":{"calc-time":60}}
2. Wait 60 seconds
3. Collect results with {"execute": "query-dirty-rate"}
Afterwards we tried to migrate VM after randomly selecting max downtime
and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
of 5779 experiments failing badly: prediction error >=25% or incorrectly
predicting migration success when in fact it didn't converge.
Andrei Gudkov (4):
migration/calc-dirty-rate: replaced CRC32 with xxHash
migration/calc-dirty-rate: detailed stats in sampling mode
migration/calc-dirty-rate: added n-zero-pages metric
migration/calc-dirty-rate: tool to predict migration time
MAINTAINERS | 1 +
migration/dirtyrate.c | 228 +++++++++++++++++++++-------
migration/dirtyrate.h | 26 +++-
migration/trace-events | 4 +-
qapi/migration.json | 28 +++-
scripts/predict_migration.py | 283 +++++++++++++++++++++++++++++++++++
6 files changed, 511 insertions(+), 59 deletions(-)
create mode 100644 scripts/predict_migration.py
--
2.30.2
^ permalink raw reply [flat|nested] 22+ messages in thread
* [PATCH v2 1/4] migration/calc-dirty-rate: replaced CRC32 with xxHash
2023-04-27 12:42 [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Andrei Gudkov via
@ 2023-04-27 12:42 ` Andrei Gudkov via
2023-05-10 16:54 ` Juan Quintela
2023-04-27 12:42 ` [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode Andrei Gudkov via
` (3 subsequent siblings)
4 siblings, 1 reply; 22+ messages in thread
From: Andrei Gudkov via @ 2023-04-27 12:42 UTC (permalink / raw)
To: qemu-devel; +Cc: quintela, eblake, armbru, berrange, zhengchuan, Andrei Gudkov
This significantly reduces overhead of dirty page
rate calculation in sampling mode.
Tested using 32GiB VM on E5-2690 CPU.
With CRC32:
total_pages=8388608 sampled_pages=16384 millis=71
With xxHash:
total_pages=8388608 sampled_pages=16384 millis=14
Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
---
migration/dirtyrate.c | 45 +++++++++++++++++++++++++++++++++---------
migration/trace-events | 4 ++--
2 files changed, 38 insertions(+), 11 deletions(-)
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 180ba38c7a..acba3213a3 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -29,6 +29,7 @@
#include "sysemu/kvm.h"
#include "sysemu/runstate.h"
#include "exec/memory.h"
+#include "qemu/xxhash.h"
/*
* total_dirty_pages is procted by BQL and is used
@@ -308,6 +309,33 @@ static void update_dirtyrate(uint64_t msec)
DirtyStat.dirty_rate = dirtyrate;
}
+/*
+ * Compute hash of a single page of size TARGET_PAGE_SIZE.
+ */
+static uint32_t compute_page_hash(void *ptr)
+{
+ uint32_t i;
+ uint64_t v1, v2, v3, v4;
+ uint64_t res;
+ const uint64_t *p = ptr;
+
+ v1 = QEMU_XXHASH_SEED + XXH_PRIME64_1 + XXH_PRIME64_2;
+ v2 = QEMU_XXHASH_SEED + XXH_PRIME64_2;
+ v3 = QEMU_XXHASH_SEED + 0;
+ v4 = QEMU_XXHASH_SEED - XXH_PRIME64_1;
+ for (i = 0; i < TARGET_PAGE_SIZE / 8; i += 4) {
+ v1 = XXH64_round(v1, p[i + 0]);
+ v2 = XXH64_round(v2, p[i + 1]);
+ v3 = XXH64_round(v3, p[i + 2]);
+ v4 = XXH64_round(v4, p[i + 3]);
+ }
+ res = XXH64_mergerounds(v1, v2, v3, v4);
+ res += TARGET_PAGE_SIZE;
+ res = XXH64_avalanche(res);
+ return (uint32_t)(res & UINT32_MAX);
+}
+
+
/*
* get hash result for the sampled memory with length of TARGET_PAGE_SIZE
* in ramblock, which starts from ramblock base address.
@@ -315,13 +343,12 @@ static void update_dirtyrate(uint64_t msec)
static uint32_t get_ramblock_vfn_hash(struct RamblockDirtyInfo *info,
uint64_t vfn)
{
- uint32_t crc;
+ uint32_t hash;
- crc = crc32(0, (info->ramblock_addr +
- vfn * TARGET_PAGE_SIZE), TARGET_PAGE_SIZE);
+ hash = compute_page_hash(info->ramblock_addr + vfn * TARGET_PAGE_SIZE);
- trace_get_ramblock_vfn_hash(info->idstr, vfn, crc);
- return crc;
+ trace_get_ramblock_vfn_hash(info->idstr, vfn, hash);
+ return hash;
}
static bool save_ramblock_hash(struct RamblockDirtyInfo *info)
@@ -454,13 +481,13 @@ out:
static void calc_page_dirty_rate(struct RamblockDirtyInfo *info)
{
- uint32_t crc;
+ uint32_t hash;
int i;
for (i = 0; i < info->sample_pages_count; i++) {
- crc = get_ramblock_vfn_hash(info, info->sample_page_vfn[i]);
- if (crc != info->hash_result[i]) {
- trace_calc_page_dirty_rate(info->idstr, crc, info->hash_result[i]);
+ hash = get_ramblock_vfn_hash(info, info->sample_page_vfn[i]);
+ if (hash != info->hash_result[i]) {
+ trace_calc_page_dirty_rate(info->idstr, hash, info->hash_result[i]);
info->sample_dirty_count++;
}
}
diff --git a/migration/trace-events b/migration/trace-events
index 92161eeac5..f39818c329 100644
--- a/migration/trace-events
+++ b/migration/trace-events
@@ -342,8 +342,8 @@ dirty_bitmap_load_success(void) ""
# dirtyrate.c
dirtyrate_set_state(const char *new_state) "new state %s"
query_dirty_rate_info(const char *new_state) "current state %s"
-get_ramblock_vfn_hash(const char *idstr, uint64_t vfn, uint32_t crc) "ramblock name: %s, vfn: %"PRIu64 ", crc: %" PRIu32
-calc_page_dirty_rate(const char *idstr, uint32_t new_crc, uint32_t old_crc) "ramblock name: %s, new crc: %" PRIu32 ", old crc: %" PRIu32
+get_ramblock_vfn_hash(const char *idstr, uint64_t vfn, uint32_t hash) "ramblock name: %s, vfn: %"PRIu64 ", hash: %" PRIu32
+calc_page_dirty_rate(const char *idstr, uint32_t new_hash, uint32_t old_hash) "ramblock name: %s, new hash: %" PRIu32 ", old hash: %" PRIu32
skip_sample_ramblock(const char *idstr, uint64_t ramblock_size) "ramblock name: %s, ramblock size: %" PRIu64
find_page_matched(const char *idstr) "ramblock %s addr or size changed"
dirtyrate_calculate(int64_t dirtyrate) "dirty rate: %" PRIi64 " MB/s"
--
2.30.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode
2023-04-27 12:42 [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Andrei Gudkov via
2023-04-27 12:42 ` [PATCH v2 1/4] migration/calc-dirty-rate: replaced CRC32 with xxHash Andrei Gudkov via
@ 2023-04-27 12:42 ` Andrei Gudkov via
2023-05-10 17:36 ` Juan Quintela
` (2 more replies)
2023-04-27 12:42 ` [PATCH v2 3/4] migration/calc-dirty-rate: added n-zero-pages metric Andrei Gudkov via
` (2 subsequent siblings)
4 siblings, 3 replies; 22+ messages in thread
From: Andrei Gudkov via @ 2023-04-27 12:42 UTC (permalink / raw)
To: qemu-devel; +Cc: quintela, eblake, armbru, berrange, zhengchuan, Andrei Gudkov
Collect number of dirty pages for progresseively increasing time
periods starting with 125ms up to number of seconds specified with
calc-dirty-rate. Report through qmp and hmp: 1) vector of dirty page
measurements, 2) page size, 3) total number of VM pages, 4) number
of sampled pages.
Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
---
migration/dirtyrate.c | 165 +++++++++++++++++++++++++++++-------------
migration/dirtyrate.h | 25 ++++++-
qapi/migration.json | 24 +++++-
3 files changed, 160 insertions(+), 54 deletions(-)
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index acba3213a3..4491bbe91a 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -224,6 +224,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
info->calc_time = DirtyStat.calc_time;
info->sample_pages = DirtyStat.sample_pages;
info->mode = dirtyrate_mode;
+ info->page_size = TARGET_PAGE_SIZE;
if (qatomic_read(&CalculatingState) == DIRTY_RATE_STATUS_MEASURED) {
info->has_dirty_rate = true;
@@ -245,6 +246,29 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
info->vcpu_dirty_rate = head;
}
+ if (dirtyrate_mode == DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING) {
+ int64List *periods_head = NULL;
+ int64List **periods_tail = &periods_head;
+ int64List *n_dirty_pages_head = NULL;
+ int64List **n_dirty_pages_tail = &n_dirty_pages_head;
+
+ info->n_total_pages = DirtyStat.page_sampling.n_total_pages;
+ info->has_n_total_pages = true;
+
+ info->n_sampled_pages = DirtyStat.page_sampling.n_sampled_pages;
+ info->has_n_sampled_pages = true;
+
+ for (i = 0; i < DirtyStat.page_sampling.n_readings; i++) {
+ DirtyReading *dr = &DirtyStat.page_sampling.readings[i];
+ QAPI_LIST_APPEND(periods_tail, dr->period);
+ QAPI_LIST_APPEND(n_dirty_pages_tail, dr->n_dirty_pages);
+ }
+ info->n_dirty_pages = n_dirty_pages_head;
+ info->periods = periods_head;
+ info->has_n_dirty_pages = true;
+ info->has_periods = true;
+ }
+
if (dirtyrate_mode == DIRTY_RATE_MEASURE_MODE_DIRTY_BITMAP) {
info->sample_pages = 0;
}
@@ -265,9 +289,11 @@ static void init_dirtyrate_stat(int64_t start_time,
switch (config.mode) {
case DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING:
- DirtyStat.page_sampling.total_dirty_samples = 0;
- DirtyStat.page_sampling.total_sample_count = 0;
- DirtyStat.page_sampling.total_block_mem_MB = 0;
+ DirtyStat.page_sampling.n_total_pages = 0;
+ DirtyStat.page_sampling.n_sampled_pages = 0;
+ DirtyStat.page_sampling.n_readings = 0;
+ DirtyStat.page_sampling.readings = g_try_malloc0_n(MAX_DIRTY_READINGS,
+ sizeof(DirtyReading));
break;
case DIRTY_RATE_MEASURE_MODE_DIRTY_RING:
DirtyStat.dirty_ring.nvcpu = -1;
@@ -285,28 +311,10 @@ static void cleanup_dirtyrate_stat(struct DirtyRateConfig config)
free(DirtyStat.dirty_ring.rates);
DirtyStat.dirty_ring.rates = NULL;
}
-}
-
-static void update_dirtyrate_stat(struct RamblockDirtyInfo *info)
-{
- DirtyStat.page_sampling.total_dirty_samples += info->sample_dirty_count;
- DirtyStat.page_sampling.total_sample_count += info->sample_pages_count;
- /* size of total pages in MB */
- DirtyStat.page_sampling.total_block_mem_MB += (info->ramblock_pages *
- TARGET_PAGE_SIZE) >> 20;
-}
-
-static void update_dirtyrate(uint64_t msec)
-{
- uint64_t dirtyrate;
- uint64_t total_dirty_samples = DirtyStat.page_sampling.total_dirty_samples;
- uint64_t total_sample_count = DirtyStat.page_sampling.total_sample_count;
- uint64_t total_block_mem_MB = DirtyStat.page_sampling.total_block_mem_MB;
-
- dirtyrate = total_dirty_samples * total_block_mem_MB *
- 1000 / (total_sample_count * msec);
-
- DirtyStat.dirty_rate = dirtyrate;
+ if (DirtyStat.page_sampling.readings) {
+ free(DirtyStat.page_sampling.readings);
+ DirtyStat.page_sampling.readings = NULL;
+ }
}
/*
@@ -377,12 +385,14 @@ static bool save_ramblock_hash(struct RamblockDirtyInfo *info)
return false;
}
- rand = g_rand_new();
+ rand = g_rand_new();
+ DirtyStat.page_sampling.n_total_pages += info->ramblock_pages;
for (i = 0; i < sample_pages_count; i++) {
info->sample_page_vfn[i] = g_rand_int_range(rand, 0,
info->ramblock_pages - 1);
info->hash_result[i] = get_ramblock_vfn_hash(info,
info->sample_page_vfn[i]);
+ DirtyStat.page_sampling.n_sampled_pages++;
}
g_rand_free(rand);
@@ -479,18 +489,20 @@ out:
return ret;
}
-static void calc_page_dirty_rate(struct RamblockDirtyInfo *info)
+static int64_t calc_page_dirty_rate(struct RamblockDirtyInfo *info)
{
uint32_t hash;
int i;
+ int64_t n_dirty = 0;
for (i = 0; i < info->sample_pages_count; i++) {
hash = get_ramblock_vfn_hash(info, info->sample_page_vfn[i]);
if (hash != info->hash_result[i]) {
+ n_dirty++;
trace_calc_page_dirty_rate(info->idstr, hash, info->hash_result[i]);
- info->sample_dirty_count++;
}
}
+ return n_dirty;
}
static struct RamblockDirtyInfo *
@@ -519,11 +531,12 @@ find_block_matched(RAMBlock *block, int count,
return &infos[i];
}
-static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
+static int64_t compare_page_hash_info(struct RamblockDirtyInfo *info,
int block_count)
{
struct RamblockDirtyInfo *block_dinfo = NULL;
RAMBlock *block = NULL;
+ int64_t n_dirty = 0;
RAMBLOCK_FOREACH_MIGRATABLE(block) {
if (skip_sample_ramblock(block)) {
@@ -533,15 +546,10 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
if (block_dinfo == NULL) {
continue;
}
- calc_page_dirty_rate(block_dinfo);
- update_dirtyrate_stat(block_dinfo);
- }
-
- if (DirtyStat.page_sampling.total_sample_count == 0) {
- return false;
+ n_dirty += calc_page_dirty_rate(block_dinfo);
}
- return true;
+ return n_dirty;
}
static inline void record_dirtypages_bitmap(DirtyPageRecord *dirty_pages,
@@ -642,34 +650,77 @@ static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
DirtyStat.dirty_rate = dirtyrate_sum;
}
+static int64_t increase_period(int64_t prev_period, int64_t max_period)
+{
+ int64_t delta;
+ int64_t next_period;
+
+ if (prev_period < 500) {
+ delta = 125;
+ } else if (prev_period < 1000) {
+ delta = 250;
+ } else if (prev_period < 2000) {
+ delta = 500;
+ } else if (prev_period < 4000) {
+ delta = 1000;
+ } else if (prev_period < 10000) {
+ delta = 2000;
+ } else {
+ delta = 5000;
+ }
+
+ next_period = prev_period + delta;
+ if (next_period + delta >= max_period) {
+ next_period = max_period;
+ }
+ return next_period;
+}
+
+
static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config)
{
struct RamblockDirtyInfo *block_dinfo = NULL;
int block_count = 0;
- int64_t msec = 0;
int64_t initial_time;
+ int64_t current_time;
+ /* initial pass */
rcu_read_lock();
initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
- if (!record_ramblock_hash_info(&block_dinfo, config, &block_count)) {
+ bool ok = record_ramblock_hash_info(&block_dinfo, config, &block_count);
+ rcu_read_unlock();
+ if ((!ok) || (DirtyStat.page_sampling.n_sampled_pages == 0)) {
goto out;
}
- rcu_read_unlock();
- msec = config.sample_period_seconds * 1000;
- msec = dirty_stat_wait(msec, initial_time);
- DirtyStat.start_time = initial_time / 1000;
- DirtyStat.calc_time = msec / 1000;
+ int64_t period = INITIAL_PERIOD_MS;
+ while (true) {
+ current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+ int64_t delta = initial_time + period - current_time;
+ if (delta > 0) {
+ g_usleep(delta * 1000);
+ }
- rcu_read_lock();
- if (!compare_page_hash_info(block_dinfo, block_count)) {
- goto out;
- }
+ rcu_read_lock();
+ current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+ int64_t n_dirty = compare_page_hash_info(block_dinfo, block_count);
+ rcu_read_unlock();
- update_dirtyrate(msec);
+ SampleVMStat *ps = &DirtyStat.page_sampling;
+ ps->readings[ps->n_readings].period = current_time - initial_time;
+ ps->readings[ps->n_readings].n_dirty_pages = n_dirty;
+ ps->n_readings++;
+
+ if (period >= DirtyStat.calc_time * 1000) {
+ int64_t mb_total = (ps->n_total_pages * TARGET_PAGE_SIZE) >> 20;
+ int64_t mb_dirty = n_dirty * mb_total / ps->n_sampled_pages;
+ DirtyStat.dirty_rate = mb_dirty * 1000 / period;
+ break;
+ }
+ period = increase_period(period, DirtyStat.calc_time * 1000);
+ }
out:
- rcu_read_unlock();
free_ramblock_dirty_info(block_dinfo, block_count);
}
@@ -836,7 +887,23 @@ void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict)
monitor_printf(mon, "(not ready)\n");
}
+ if (info->has_n_total_pages) {
+ monitor_printf(mon, "Page count (page size %d):\n", TARGET_PAGE_SIZE);
+ monitor_printf(mon, " Total: %"PRIi64"\n", info->n_total_pages);
+ monitor_printf(mon, " Sampled: %"PRIi64"\n", info->n_sampled_pages);
+ int64List *periods = info->periods;
+ int64List *n_dirty_pages = info->n_dirty_pages;
+ while (periods) {
+ monitor_printf(mon, " Dirty(%"PRIi64"ms): %"PRIi64"\n",
+ periods->value, n_dirty_pages->value);
+ periods = periods->next;
+ n_dirty_pages = n_dirty_pages->next;
+ }
+ }
+
qapi_free_DirtyRateVcpuList(info->vcpu_dirty_rate);
+ qapi_free_int64List(info->periods);
+ qapi_free_int64List(info->n_dirty_pages);
g_free(info);
}
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index 594a5c0bb6..7a97e2b076 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -42,6 +42,18 @@
#define MIN_SAMPLE_PAGE_COUNT 128
#define MAX_SAMPLE_PAGE_COUNT 16384
+/*
+ * Initial sampling period expressed in milliseconds
+ */
+#define INITIAL_PERIOD_MS 125
+
+/*
+ * Upper bound on the number of DirtyReadings calculcated based on
+ * INITIAL_PERIOD_MS, MAX_FETCH_DIRTYRATE_TIME_SEC and increase_period()
+ */
+#define MAX_DIRTY_READINGS 32
+
+
struct DirtyRateConfig {
uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
int64_t sample_period_seconds; /* time duration between two sampling */
@@ -57,14 +69,19 @@ struct RamblockDirtyInfo {
uint64_t ramblock_pages; /* ramblock size in TARGET_PAGE_SIZE */
uint64_t *sample_page_vfn; /* relative offset address for sampled page */
uint64_t sample_pages_count; /* count of sampled pages */
- uint64_t sample_dirty_count; /* count of dirty pages we measure */
uint32_t *hash_result; /* array of hash result for sampled pages */
};
+typedef struct DirtyReading {
+ int64_t period; /* time period in milliseconds */
+ int64_t n_dirty_pages; /* number of observed dirty pages */
+} DirtyReading;
+
typedef struct SampleVMStat {
- uint64_t total_dirty_samples; /* total dirty sampled page */
- uint64_t total_sample_count; /* total sampled pages */
- uint64_t total_block_mem_MB; /* size of total sampled pages in MB */
+ int64_t n_total_pages; /* total number of pages */
+ int64_t n_sampled_pages; /* number of sampled pages */
+ int64_t n_readings;
+ DirtyReading *readings;
} SampleVMStat;
/*
diff --git a/qapi/migration.json b/qapi/migration.json
index 2c35b7b9cf..f818f51e0e 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1805,6 +1805,22 @@
# @vcpu-dirty-rate: dirtyrate for each vcpu if dirty-ring
# mode specified (Since 6.2)
#
+# @page-size: page size in bytes (since 8.1)
+#
+# @n-total-pages: [page-sampling] total number of VM pages (since 8.1)
+#
+# @n-sampled-pages: [page-sampling] number of sampled VM pages (since 8.1)
+#
+# @periods: [page-sampling] array of time periods expressed in milliseconds
+# for which dirty-sample measurements were collected (since 8.1)
+#
+# @n-dirty-pages: [page-sampling] number of pages among all sampled pages
+# that were observed as changed during respective time period.
+# i-th element of this array corresponds to the i-th element
+# of the @periods array, i.e. @n-dirty-pages[i] is the number
+# of dirtied pages during period of @periods[i] milliseconds
+# after the initiation of calc-dirty-rate (since 8.1)
+#
# Since: 5.2
##
{ 'struct': 'DirtyRateInfo',
@@ -1814,7 +1830,13 @@
'calc-time': 'int64',
'sample-pages': 'uint64',
'mode': 'DirtyRateMeasureMode',
- '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
+ '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ],
+ 'page-size': 'int64',
+ '*n-total-pages': 'int64',
+ '*n-sampled-pages': 'int64',
+ '*periods': ['int64'],
+ '*n-dirty-pages': ['int64'] } }
+
##
# @calc-dirty-rate:
--
2.30.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 3/4] migration/calc-dirty-rate: added n-zero-pages metric
2023-04-27 12:42 [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Andrei Gudkov via
2023-04-27 12:42 ` [PATCH v2 1/4] migration/calc-dirty-rate: replaced CRC32 with xxHash Andrei Gudkov via
2023-04-27 12:42 ` [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode Andrei Gudkov via
@ 2023-04-27 12:42 ` Andrei Gudkov via
2023-05-10 17:57 ` Juan Quintela
2023-04-27 12:43 ` [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time Andrei Gudkov via
2023-05-30 15:46 ` [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Peter Xu
4 siblings, 1 reply; 22+ messages in thread
From: Andrei Gudkov via @ 2023-04-27 12:42 UTC (permalink / raw)
To: qemu-devel; +Cc: quintela, eblake, armbru, berrange, zhengchuan, Andrei Gudkov
In sampling mode, a new metric is collected and reported:
number of pages entirely filled with zeroes.
Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
---
migration/dirtyrate.c | 40 +++++++++++++++++++++++++++++++++++-----
migration/dirtyrate.h | 1 +
qapi/migration.json | 4 ++++
3 files changed, 40 insertions(+), 5 deletions(-)
diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
index 4491bbe91a..55ef69927e 100644
--- a/migration/dirtyrate.c
+++ b/migration/dirtyrate.c
@@ -258,6 +258,9 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
info->n_sampled_pages = DirtyStat.page_sampling.n_sampled_pages;
info->has_n_sampled_pages = true;
+ info->n_zero_pages = DirtyStat.page_sampling.n_zero_pages;
+ info->has_n_zero_pages = true;
+
for (i = 0; i < DirtyStat.page_sampling.n_readings; i++) {
DirtyReading *dr = &DirtyStat.page_sampling.readings[i];
QAPI_LIST_APPEND(periods_tail, dr->period);
@@ -291,6 +294,7 @@ static void init_dirtyrate_stat(int64_t start_time,
case DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING:
DirtyStat.page_sampling.n_total_pages = 0;
DirtyStat.page_sampling.n_sampled_pages = 0;
+ DirtyStat.page_sampling.n_zero_pages = 0;
DirtyStat.page_sampling.n_readings = 0;
DirtyStat.page_sampling.readings = g_try_malloc0_n(MAX_DIRTY_READINGS,
sizeof(DirtyReading));
@@ -319,6 +323,7 @@ static void cleanup_dirtyrate_stat(struct DirtyRateConfig config)
/*
* Compute hash of a single page of size TARGET_PAGE_SIZE.
+ * If ptr is NULL, then compute hash of a page entirely filled with zeros.
*/
static uint32_t compute_page_hash(void *ptr)
{
@@ -331,11 +336,20 @@ static uint32_t compute_page_hash(void *ptr)
v2 = QEMU_XXHASH_SEED + XXH_PRIME64_2;
v3 = QEMU_XXHASH_SEED + 0;
v4 = QEMU_XXHASH_SEED - XXH_PRIME64_1;
- for (i = 0; i < TARGET_PAGE_SIZE / 8; i += 4) {
- v1 = XXH64_round(v1, p[i + 0]);
- v2 = XXH64_round(v2, p[i + 1]);
- v3 = XXH64_round(v3, p[i + 2]);
- v4 = XXH64_round(v4, p[i + 3]);
+ if (ptr) {
+ for (i = 0; i < TARGET_PAGE_SIZE / 8; i += 4) {
+ v1 = XXH64_round(v1, p[i + 0]);
+ v2 = XXH64_round(v2, p[i + 1]);
+ v3 = XXH64_round(v3, p[i + 2]);
+ v4 = XXH64_round(v4, p[i + 3]);
+ }
+ } else {
+ for (i = 0; i < TARGET_PAGE_SIZE / 8; i += 4) {
+ v1 = XXH64_round(v1, 0);
+ v2 = XXH64_round(v2, 0);
+ v3 = XXH64_round(v3, 0);
+ v4 = XXH64_round(v4, 0);
+ }
}
res = XXH64_mergerounds(v1, v2, v3, v4);
res += TARGET_PAGE_SIZE;
@@ -343,6 +357,17 @@ static uint32_t compute_page_hash(void *ptr)
return (uint32_t)(res & UINT32_MAX);
}
+static uint32_t get_zero_page_hash(void)
+{
+ static uint32_t hash;
+ static int is_computed;
+
+ if (!is_computed) {
+ hash = compute_page_hash(NULL);
+ is_computed = 1;
+ }
+ return hash;
+}
/*
* get hash result for the sampled memory with length of TARGET_PAGE_SIZE
@@ -364,6 +389,7 @@ static bool save_ramblock_hash(struct RamblockDirtyInfo *info)
unsigned int sample_pages_count;
int i;
GRand *rand;
+ uint32_t zero_page_hash = get_zero_page_hash();
sample_pages_count = info->sample_pages_count;
@@ -393,6 +419,9 @@ static bool save_ramblock_hash(struct RamblockDirtyInfo *info)
info->hash_result[i] = get_ramblock_vfn_hash(info,
info->sample_page_vfn[i]);
DirtyStat.page_sampling.n_sampled_pages++;
+ if (info->hash_result[i] == zero_page_hash) {
+ DirtyStat.page_sampling.n_zero_pages++;
+ }
}
g_rand_free(rand);
@@ -891,6 +920,7 @@ void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict)
monitor_printf(mon, "Page count (page size %d):\n", TARGET_PAGE_SIZE);
monitor_printf(mon, " Total: %"PRIi64"\n", info->n_total_pages);
monitor_printf(mon, " Sampled: %"PRIi64"\n", info->n_sampled_pages);
+ monitor_printf(mon, " Zero: %"PRIi64"\n", info->n_zero_pages);
int64List *periods = info->periods;
int64List *n_dirty_pages = info->n_dirty_pages;
while (periods) {
diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
index 7a97e2b076..e2af72fb8c 100644
--- a/migration/dirtyrate.h
+++ b/migration/dirtyrate.h
@@ -80,6 +80,7 @@ typedef struct DirtyReading {
typedef struct SampleVMStat {
int64_t n_total_pages; /* total number of pages */
int64_t n_sampled_pages; /* number of sampled pages */
+ int64_t n_zero_pages; /* number of observed zero pages */
int64_t n_readings;
DirtyReading *readings;
} SampleVMStat;
diff --git a/qapi/migration.json b/qapi/migration.json
index f818f51e0e..2c48a9ef80 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -1811,6 +1811,9 @@
#
# @n-sampled-pages: [page-sampling] number of sampled VM pages (since 8.1)
#
+# @n-zero-pages: [page-sampling] number of observed all-zero pages among all
+# sampled pages (since 8.1)
+#
# @periods: [page-sampling] array of time periods expressed in milliseconds
# for which dirty-sample measurements were collected (since 8.1)
#
@@ -1834,6 +1837,7 @@
'page-size': 'int64',
'*n-total-pages': 'int64',
'*n-sampled-pages': 'int64',
+ '*n-zero-pages': 'int64',
'*periods': ['int64'],
'*n-dirty-pages': ['int64'] } }
--
2.30.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time
2023-04-27 12:42 [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Andrei Gudkov via
` (2 preceding siblings ...)
2023-04-27 12:42 ` [PATCH v2 3/4] migration/calc-dirty-rate: added n-zero-pages metric Andrei Gudkov via
@ 2023-04-27 12:43 ` Andrei Gudkov via
2023-05-10 18:01 ` Juan Quintela
2023-05-30 3:21 ` Wang, Lei
2023-05-30 15:46 ` [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Peter Xu
4 siblings, 2 replies; 22+ messages in thread
From: Andrei Gudkov via @ 2023-04-27 12:43 UTC (permalink / raw)
To: qemu-devel; +Cc: quintela, eblake, armbru, berrange, zhengchuan, Andrei Gudkov
Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
---
MAINTAINERS | 1 +
scripts/predict_migration.py | 283 +++++++++++++++++++++++++++++++++++
2 files changed, 284 insertions(+)
create mode 100644 scripts/predict_migration.py
diff --git a/MAINTAINERS b/MAINTAINERS
index fc225e66df..0c578446cf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3167,6 +3167,7 @@ F: docs/devel/migration.rst
F: qapi/migration.json
F: tests/migration/
F: util/userfaultfd.c
+F: scripts/predict_migration.py
D-Bus
M: Marc-André Lureau <marcandre.lureau@redhat.com>
diff --git a/scripts/predict_migration.py b/scripts/predict_migration.py
new file mode 100644
index 0000000000..c92a97585f
--- /dev/null
+++ b/scripts/predict_migration.py
@@ -0,0 +1,283 @@
+#!/usr/bin/env python3
+#
+# Predicts time required to migrate VM under given max downtime constraint.
+#
+# Copyright (c) 2023 HUAWEI TECHNOLOGIES CO.,LTD.
+#
+# Authors:
+# Andrei Gudkov <gudkov.andrei@huawei.com>
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later. See the COPYING file in the top-level directory.
+
+
+# Usage:
+#
+# Step 1. Collect dirty page statistics from live VM:
+# $ scripts/predict_migration.py calc-dirty-rate <qmphost> <qmpport> >dirty.json
+# <...takes 1 minute by default...>
+#
+# Step 2. Run predictor against collected data:
+# $ scripts/predict_migration.py predict < dirty.json
+# Downtime> | 125ms | 250ms | 500ms | 1000ms | 5000ms | unlim |
+# -----------------------------------------------------------------------------
+# 100 Mbps | - | - | - | - | - | 16m45s |
+# 1 Gbps | - | - | - | - | - | 1m39s |
+# 2 Gbps | - | - | - | - | 1m55s | 50s |
+# 2.5 Gbps | - | - | - | - | 1m12s | 40s |
+# 5 Gbps | - | - | - | 29s | 25s | 20s |
+# 10 Gbps | 13s | 13s | 12s | 12s | 12s | 10s |
+# 25 Gbps | 5s | 5s | 5s | 5s | 4s | 4s |
+# 40 Gbps | 3s | 3s | 3s | 3s | 3s | 3s |
+#
+# The latter prints table that lists estimated time it will take to migrate VM.
+# This time depends on the network bandwidth and max allowed downtime.
+# Dash indicates that migration does not converge.
+# Prediction takes care only about migrating RAM and only in pre-copy mode.
+# Other features, such as compression or local disk migration, are not supported
+
+
+import sys
+import os
+import math
+import json
+from dataclasses import dataclass
+import asyncio
+import argparse
+
+sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'python'))
+from qemu.qmp import QMPClient
+
+async def calc_dirty_rate(host, port, calc_time, sample_pages):
+ client = QMPClient()
+ try:
+ await client.connect((host, port))
+ args = {
+ 'calc-time': calc_time,
+ 'sample-pages': sample_pages
+ }
+ await client.execute('calc-dirty-rate', args)
+ await asyncio.sleep(calc_time)
+ while True:
+ data = await client.execute('query-dirty-rate')
+ if data['status'] == 'measuring':
+ await asyncio.sleep(0.5)
+ elif data['status'] == 'measured':
+ return data
+ else:
+ raise ValueError(data['status'])
+ finally:
+ await client.disconnect()
+
+
+class MemoryModel:
+ """
+ Models RAM state during pre-copy migration using calc-dirty-rate results.
+ Its primary function is to estimate how many pages will be dirtied
+ after given time starting from "clean" state.
+ This function is non-linear and saturates at some point.
+ """
+
+ @dataclass
+ class Point:
+ period_millis:float
+ dirty_pages:float
+
+ def __init__(self, data):
+ """
+ :param data: dictionary returned by calc-dirty-rate
+ """
+ self.__points = self.__make_points(data)
+ self.__page_size = data['page-size']
+ self.__num_total_pages = data['n-total-pages']
+ self.__num_zero_pages = data['n-zero-pages'] / \
+ (data['n-sampled-pages'] / data['n-total-pages'])
+
+ def __make_points(self, data):
+ points = list()
+
+ # Add observed points
+ sample_ratio = data['n-sampled-pages'] / data['n-total-pages']
+ for millis,dirty_pages in zip(data['periods'], data['n-dirty-pages']):
+ millis = float(millis)
+ dirty_pages = dirty_pages / sample_ratio
+ points.append(MemoryModel.Point(millis, dirty_pages))
+
+ # Extrapolate function to the left.
+ # Assuming that the function is convex, the worst case is achieved
+ # when dirty page count immediately jumps to some value at zero time
+ # (infinite slope), and next keeps the same slope as in the region
+ # between the first two observed points: points[0]..points[1]
+ slope, offset = self.__fit_line(points[0], points[1])
+ points.insert(0, MemoryModel.Point(0.0, max(offset, 0.0)))
+
+ # Extrapolate function to the right.
+ # The worst case is achieved when the function has the same slope
+ # as in the last observed region.
+ slope, offset = self.__fit_line(points[-2], points[-1])
+ max_dirty_pages = \
+ data['n-total-pages'] - (data['n-zero-pages'] / sample_ratio)
+ if slope > 0.0:
+ saturation_millis = (max_dirty_pages - offset) / slope
+ points.append(MemoryModel.Point(saturation_millis, max_dirty_pages))
+ points.append(MemoryModel.Point(math.inf, max_dirty_pages))
+
+ return points
+
+ def __fit_line(self, lhs:Point, rhs:Point):
+ slope = (rhs.dirty_pages - lhs.dirty_pages) / \
+ (rhs.period_millis - lhs.period_millis)
+ offset = lhs.dirty_pages - slope * lhs.period_millis
+ return slope, offset
+
+ def page_size(self):
+ """
+ Return page size in bytes
+ """
+ return self.__page_size
+
+ def num_total_pages(self):
+ return self.__num_total_pages
+
+ def num_zero_pages(self):
+ """
+ Estimated total number of zero pages. Assumed to be constant.
+ """
+ return self.__num_zero_pages
+
+ def num_dirty_pages(self, millis):
+ """
+ Estimate number of dirty pages after given time starting from "clean"
+ state. The estimation is based on piece-wise linear interpolation.
+ """
+ for i in range(len(self.__points)):
+ if self.__points[i].period_millis == millis:
+ return self.__points[i].dirty_pages
+ elif self.__points[i].period_millis > millis:
+ slope, offset = self.__fit_line(self.__points[i-1],
+ self.__points[i])
+ return offset + slope * millis
+ raise RuntimeError("unreachable")
+
+
+def predict_migration_time(model, bandwidth, downtime, deadline=3600*1000):
+ """
+ Predict how much time it will take to migrate VM under under given
+ deadline constraint.
+
+ :param model: `MemoryModel` object for a given VM
+ :param bandwidth: Bandwidth available for migration [bytes/s]
+ :param downtime: Max allowed downtime [milliseconds]
+ :param deadline: Max total time to migrate VM before timeout [milliseconds]
+ :return: Predicted migration time [milliseconds] or `None`
+ if migration process doesn't converge before given deadline
+ """
+
+ left_zero_pages = model.num_zero_pages()
+ left_normal_pages = model.num_total_pages() - model.num_zero_pages()
+ header_size = 8
+
+ total_millis = 0.0
+ while True:
+ iter_bytes = 0.0
+ iter_bytes += left_normal_pages * (model.page_size() + header_size)
+ iter_bytes += left_zero_pages * header_size
+
+ iter_millis = iter_bytes * 1000.0 / bandwidth
+
+ total_millis += iter_millis
+
+ if iter_millis <= downtime:
+ return int(math.ceil(total_millis))
+ elif total_millis > deadline:
+ return None
+ else:
+ left_zero_pages = 0
+ left_normal_pages = model.num_dirty_pages(iter_millis)
+
+
+def run_predict_cmd(model):
+ @dataclass
+ class ValStr:
+ value:object
+ string:str
+
+ def gbps(value):
+ return ValStr(value*1024*1024*1024/8, f'{value} Gbps')
+
+ def mbps(value):
+ return ValStr(value*1024*1024/8, f'{value} Mbps')
+
+ def dt(millis):
+ if millis is not None:
+ return ValStr(millis, f'{millis}ms')
+ else:
+ return ValStr(math.inf, 'unlim')
+
+ def eta(millis):
+ if millis is not None:
+ seconds = int(math.ceil(millis/1000.0))
+ minutes, seconds = divmod(seconds, 60)
+ s = ''
+ if minutes > 0:
+ s += f'{minutes}m'
+ if len(s) > 0:
+ s += f'{seconds:02d}s'
+ else:
+ s += f'{seconds}s'
+ else:
+ s = '-'
+ return ValStr(millis, s)
+
+
+ bandwidths = [mbps(100), gbps(1), gbps(2), gbps(2.5), gbps(5), gbps(10),
+ gbps(25), gbps(40)]
+ downtimes = [dt(125), dt(250), dt(500), dt(1000), dt(5000), dt(None)]
+
+ out = ''
+ out += 'Downtime> |'
+ for downtime in downtimes:
+ out += f' {downtime.string:>7} |'
+ print(out)
+
+ print('-'*len(out))
+
+ for bandwidth in bandwidths:
+ print(f'{bandwidth.string:>9} | ', '', end='')
+ for downtime in downtimes:
+ millis = predict_migration_time(model,
+ bandwidth.value,
+ downtime.value)
+ print(f'{eta(millis).string:>7} | ', '', end='')
+ print()
+
+def main():
+ parser = argparse.ArgumentParser()
+ subparsers = parser.add_subparsers(dest='command', required=True)
+
+ parser_cdr = subparsers.add_parser('calc-dirty-rate',
+ help='Collect and print dirty page statistics from live VM')
+ parser_cdr.add_argument('--calc-time', type=int, default=60,
+ help='Calculation time in seconds')
+ parser_cdr.add_argument('--sample-pages', type=int, default=512,
+ help='Number of sampled pages per one gigabyte of RAM')
+ parser_cdr.add_argument('host', metavar='host', type=str, help='QMP host')
+ parser_cdr.add_argument('port', metavar='port', type=int, help='QMP port')
+
+ subparsers.add_parser('predict', help='Predict migration time')
+
+ args = parser.parse_args()
+
+ if args.command == 'calc-dirty-rate':
+ data = asyncio.run(calc_dirty_rate(host=args.host,
+ port=args.port,
+ calc_time=args.calc_time,
+ sample_pages=args.sample_pages))
+ print(json.dumps(data))
+ elif args.command == 'predict':
+ data = json.load(sys.stdin)
+ model = MemoryModel(data)
+ run_predict_cmd(model)
+
+if __name__ == '__main__':
+ main()
--
2.30.2
^ permalink raw reply related [flat|nested] 22+ messages in thread
* Re: [PATCH v2 1/4] migration/calc-dirty-rate: replaced CRC32 with xxHash
2023-04-27 12:42 ` [PATCH v2 1/4] migration/calc-dirty-rate: replaced CRC32 with xxHash Andrei Gudkov via
@ 2023-05-10 16:54 ` Juan Quintela
0 siblings, 0 replies; 22+ messages in thread
From: Juan Quintela @ 2023-05-10 16:54 UTC (permalink / raw)
To: Andrei Gudkov; +Cc: qemu-devel, eblake, armbru, berrange, zhengchuan
Andrei Gudkov <gudkov.andrei@huawei.com> wrote:
> This significantly reduces overhead of dirty page
> rate calculation in sampling mode.
> Tested using 32GiB VM on E5-2690 CPU.
>
> With CRC32:
> total_pages=8388608 sampled_pages=16384 millis=71
>
> With xxHash:
> total_pages=8388608 sampled_pages=16384 millis=14
>
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
Reviewed-by: Juan Quintela <quintela@redhat.com>
This changes nothing outside of dirtyrate, and if the hash function is
faster, we should be good.
queued.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode
2023-04-27 12:42 ` [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode Andrei Gudkov via
@ 2023-05-10 17:36 ` Juan Quintela
2023-05-12 13:18 ` gudkov.andrei--- via
2023-05-11 6:14 ` Markus Armbruster
2023-05-30 3:06 ` Wang, Lei
2 siblings, 1 reply; 22+ messages in thread
From: Juan Quintela @ 2023-05-10 17:36 UTC (permalink / raw)
To: Andrei Gudkov; +Cc: qemu-devel, eblake, armbru, berrange, zhengchuan
Andrei Gudkov <gudkov.andrei@huawei.com> wrote:
> Collect number of dirty pages for progresseively increasing time
> periods starting with 125ms up to number of seconds specified with
> calc-dirty-rate. Report through qmp and hmp: 1) vector of dirty page
> measurements, 2) page size, 3) total number of VM pages, 4) number
> of sampled pages.
>
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> ---
> migration/dirtyrate.c | 165 +++++++++++++++++++++++++++++-------------
> migration/dirtyrate.h | 25 ++++++-
> qapi/migration.json | 24 +++++-
You need the equivalent of this in your .git/config file.
[diff]
orderFile = scripts/git.orderfile
In particular:
*json files cames first
*.h second
*.c third
> 3 files changed, 160 insertions(+), 54 deletions(-)
>
> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> index acba3213a3..4491bbe91a 100644
> --- a/migration/dirtyrate.c
> +++ b/migration/dirtyrate.c
> @@ -224,6 +224,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
> info->calc_time = DirtyStat.calc_time;
> info->sample_pages = DirtyStat.sample_pages;
> info->mode = dirtyrate_mode;
> + info->page_size = TARGET_PAGE_SIZE;
I thought we exported this trough ""info migrate"
but we do it only if we are in the middle of a migration. Perhaps we
should print it always.
> if (qatomic_read(&CalculatingState) == DIRTY_RATE_STATUS_MEASURED) {
> info->has_dirty_rate = true;
> @@ -245,6 +246,29 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
> info->vcpu_dirty_rate = head;
> }
>
> + if (dirtyrate_mode == DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING)
> {
see my comment at the end about int64 vs uint64/size
> + DirtyStat.page_sampling.n_total_pages = 0;
> + DirtyStat.page_sampling.n_sampled_pages = 0;
> + DirtyStat.page_sampling.n_readings = 0;
> + DirtyStat.page_sampling.readings = g_try_malloc0_n(MAX_DIRTY_READINGS,
> + sizeof(DirtyReading));
> break;
You do g_try_malloc0()
or you check for NULL return.
Even better, I think you can use here:
foo = g_new0(DirtyReading, MAX_DIRTY_READINGS);
MAX_DIRTY_READINGS is small enough that you can assume that there is
enough free memory.
> - DirtyStat.dirty_rate = dirtyrate;
> + if (DirtyStat.page_sampling.readings) {
> + free(DirtyStat.page_sampling.readings);
> + DirtyStat.page_sampling.readings = NULL;
> + }
What glib gives, glib takes.
g_malloc() -> g_free()
g_free() knows how to handle the NULL case so:
g_free(DirtyStat.page_sampling.readings);
DirtyStat.page_sampling.readings = NULL;
Without if should be good enough.
> -static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
> +static int64_t compare_page_hash_info(struct RamblockDirtyInfo *info,
> int block_count)
bad indentantion.
> +static int64_t increase_period(int64_t prev_period, int64_t max_period)
> +{
> + int64_t delta;
> + int64_t next_period;
> +
> + if (prev_period < 500) {
> + delta = 125;
> + } else if (prev_period < 1000) {
> + delta = 250;
> + } else if (prev_period < 2000) {
> + delta = 500;
> + } else if (prev_period < 4000) {
> + delta = 1000;
> + } else if (prev_period < 10000) {
> + delta = 2000;
> + } else {
> + delta = 5000;
> + }
> +
> + next_period = prev_period + delta;
> + if (next_period + delta >= max_period) {
> + next_period = max_period;
> + }
> + return next_period;
> +}
Is there any reason for this to be so complicated?
int64_t periods[] = { 125, 250, 375, 500, 750, 1000, 1500, 2000, 3000, 4000, 6000, 8000, 10000,
15000, 20000, 25000, 30000, 35000, 40000, 45000 };
And access it in the loop? This way you get what the values are.
You can put a limit to config.sample_period_seconds, or you want it
unlimited?
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 2c35b7b9cf..f818f51e0e 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1805,6 +1805,22 @@
> # @vcpu-dirty-rate: dirtyrate for each vcpu if dirty-ring
> # mode specified (Since 6.2)
> #
> +# @page-size: page size in bytes (since 8.1)
> +#
> +# @n-total-pages: [page-sampling] total number of VM pages (since 8.1)
> +#
> +# @n-sampled-pages: [page-sampling] number of sampled VM pages (since 8.1)
> +#
> +# @periods: [page-sampling] array of time periods expressed in milliseconds
> +# for which dirty-sample measurements were collected (since 8.1)
> +#
> +# @n-dirty-pages: [page-sampling] number of pages among all sampled pages
> +# that were observed as changed during respective time period.
> +# i-th element of this array corresponds to the i-th element
> +# of the @periods array, i.e. @n-dirty-pages[i] is the number
> +# of dirtied pages during period of @periods[i] milliseconds
> +# after the initiation of calc-dirty-rate (since 8.1)
> +#
> # Since: 5.2
> ##
> { 'struct': 'DirtyRateInfo',
> @@ -1814,7 +1830,13 @@
> 'calc-time': 'int64',
> 'sample-pages': 'uint64',
> 'mode': 'DirtyRateMeasureMode',
> - '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> + '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ],
> + 'page-size': 'int64',
2 things:
a- this is exported in info migrate, so you can get it from there.
b- even if we export it here, it is as size or uint64, negative page
size make no sense (not that I am expecting to have page that don't
fit in a int O:-)
Same for the rest of the counters.
> + '*n-total-pages': 'int64',
> + '*n-sampled-pages': 'int64',
> + '*periods': ['int64'],
> + '*n-dirty-pages': ['int64'] } }
>
> ##
> # @calc-dirty-rate:
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 3/4] migration/calc-dirty-rate: added n-zero-pages metric
2023-04-27 12:42 ` [PATCH v2 3/4] migration/calc-dirty-rate: added n-zero-pages metric Andrei Gudkov via
@ 2023-05-10 17:57 ` Juan Quintela
0 siblings, 0 replies; 22+ messages in thread
From: Juan Quintela @ 2023-05-10 17:57 UTC (permalink / raw)
To: Andrei Gudkov; +Cc: qemu-devel, eblake, armbru, berrange, zhengchuan
Andrei Gudkov <gudkov.andrei@huawei.com> wrote:
> In sampling mode, a new metric is collected and reported:
> number of pages entirely filled with zeroes.
Good idea.
> @@ -331,11 +336,20 @@ static uint32_t compute_page_hash(void *ptr)
> v2 = QEMU_XXHASH_SEED + XXH_PRIME64_2;
> v3 = QEMU_XXHASH_SEED + 0;
> v4 = QEMU_XXHASH_SEED - XXH_PRIME64_1;
> - for (i = 0; i < TARGET_PAGE_SIZE / 8; i += 4) {
> - v1 = XXH64_round(v1, p[i + 0]);
> - v2 = XXH64_round(v2, p[i + 1]);
> - v3 = XXH64_round(v3, p[i + 2]);
> - v4 = XXH64_round(v4, p[i + 3]);
> + if (ptr) {
It smells like a hack, that is only going to be used once in the
program.
But the only other option that I can think is repeating the function for
the zero case.
No way to win here.
> + for (i = 0; i < TARGET_PAGE_SIZE / 8; i += 4) {
> + v1 = XXH64_round(v1, p[i + 0]);
> + v2 = XXH64_round(v2, p[i + 1]);
> + v3 = XXH64_round(v3, p[i + 2]);
> + v4 = XXH64_round(v4, p[i + 3]);
> + }
> + } else {
> + for (i = 0; i < TARGET_PAGE_SIZE / 8; i += 4) {
> + v1 = XXH64_round(v1, 0);
> + v2 = XXH64_round(v2, 0);
> + v3 = XXH64_round(v3, 0);
> + v4 = XXH64_round(v4, 0);
> + }
> }
> res = XXH64_mergerounds(v1, v2, v3, v4);
> res += TARGET_PAGE_SIZE;
> @@ -343,6 +357,17 @@ static uint32_t compute_page_hash(void *ptr)
> return (uint32_t)(res & UINT32_MAX);
> }
>
> +static uint32_t get_zero_page_hash(void)
> +{
> + static uint32_t hash;
> + static int is_computed;
bool?
Later, Juan.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time
2023-04-27 12:43 ` [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time Andrei Gudkov via
@ 2023-05-10 18:01 ` Juan Quintela
2023-05-30 3:21 ` Wang, Lei
1 sibling, 0 replies; 22+ messages in thread
From: Juan Quintela @ 2023-05-10 18:01 UTC (permalink / raw)
To: Andrei Gudkov; +Cc: qemu-devel, eblake, armbru, berrange, zhengchuan
Andrei Gudkov <gudkov.andrei@huawei.com> wrote:
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
my python is very rusty, so I will let someone else to comment here.
Later, Juan.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode
2023-04-27 12:42 ` [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode Andrei Gudkov via
2023-05-10 17:36 ` Juan Quintela
@ 2023-05-11 6:14 ` Markus Armbruster
2023-05-12 14:24 ` gudkov.andrei--- via
2023-05-30 3:06 ` Wang, Lei
2 siblings, 1 reply; 22+ messages in thread
From: Markus Armbruster @ 2023-05-11 6:14 UTC (permalink / raw)
To: Andrei Gudkov via; +Cc: Andrei Gudkov, quintela, eblake, berrange, zhengchuan
Andrei Gudkov via <qemu-devel@nongnu.org> writes:
> Collect number of dirty pages for progresseively increasing time
> periods starting with 125ms up to number of seconds specified with
> calc-dirty-rate. Report through qmp and hmp: 1) vector of dirty page
> measurements, 2) page size, 3) total number of VM pages, 4) number
> of sampled pages.
>
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
[...]
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 2c35b7b9cf..f818f51e0e 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1805,6 +1805,22 @@
##
# @DirtyRateInfo:
#
# Information about current dirty page rate of vm.
#
# @dirty-rate: an estimate of the dirty page rate of the VM in units
# of MB/s, present only when estimating the rate has completed.
#
# @status: status containing dirtyrate query status includes
# 'unstarted' or 'measuring' or 'measured'
Not this patch's fault, but here goes anyway:
0. "dirtyrate" isn't a word. Spell it "dirty rate". Many more
instances elsewhere.
1. "status containing status"... what has the poor English language done
to us that we torture it so?
2. "includes 'unstarted' or 'measuring' or 'measured' is confusing and
entirely redundant with the type. @status doesn't "include" these,
these are the possible values, and all of them.
Suggest:
@status: dirty rate measuring status
I do understand how difficult writing good English is for non-native
speakers. This is mainly a failure of patch review.
#
# @start-time: start time in units of second for calculation
#
# @calc-time: time in units of second for sample dirty pages
#
# @sample-pages: page count per GB for sample dirty pages the default
# value is 512 (since 6.1)
#
# @mode: mode containing method of calculate dirtyrate includes
# 'page-sampling' and 'dirty-ring' (Since 6.2)
Still not this patch's fault:
1. "mode containing method": more torture :)
2. "includes 'page-sampling' and 'dirty-ring'" is confusing.
When it was added in commit 0e21bf24608, it was confusing and
redundant like the text for @status above.
Then commit 826b8bc80cb added value 'dirty-bitmap' without updating
the member doc here.
Suggest:
@mode: dirty rate measuring mode
#
# @vcpu-dirty-rate: dirtyrate for each vcpu if dirty-ring mode
# specified (Since 6.2)
#
> +# @page-size: page size in bytes (since 8.1)
> +#
> +# @n-total-pages: [page-sampling] total number of VM pages (since 8.1)
> +#
> +# @n-sampled-pages: [page-sampling] number of sampled VM pages (since 8.1)
> +#
> +# @periods: [page-sampling] array of time periods expressed in milliseconds
> +# for which dirty-sample measurements were collected (since 8.1)
> +#
> +# @n-dirty-pages: [page-sampling] number of pages among all sampled pages
> +# that were observed as changed during respective time period.
> +# i-th element of this array corresponds to the i-th element
> +# of the @periods array, i.e. @n-dirty-pages[i] is the number
> +# of dirtied pages during period of @periods[i] milliseconds
> +# after the initiation of calc-dirty-rate (since 8.1)
> +#
Changed doc comment conventions landed yesterday (merge commit
568992e3440). Please format like this:
# @page-size: page size in bytes (since 8.1)
#
# @n-total-pages: [page-sampling] total number of VM pages (since 8.1)
#
# @n-sampled-pages: [page-sampling] number of sampled VM pages (since
# 8.1)
#
# @n-zero-pages: [page-sampling] number of observed all-zero pages
# among all sampled pages (since 8.1)
#
# @periods: [page-sampling] array of time periods expressed in
# milliseconds for which dirty-sample measurements were collected
# (since 8.1)
#
# @n-dirty-pages: [page-sampling] number of pages among all sampled
# pages that were observed as changed during respective time
# period. i-th element of this array corresponds to the i-th
# element of the @periods array, i.e. @n-dirty-pages[i] is the
# number of dirtied pages during period of @periods[i]
# milliseconds after the initiation of calc-dirty-rate (since 8.1)
The meaning of "[page-sampling]" is unclear. What are you trying to
express?
For better or worse, we try to avoid abbreviations in QMP. The "n-"
prefix is one. What does it stand for?
It's quite unclear how all these numbers relate to each other. What's
the difference between @n-sampled-pages and @sample-pages? I think
we're missing an overview of the dirty rate measurement feature.
> # Since: 5.2
> ##
> { 'struct': 'DirtyRateInfo',
> @@ -1814,7 +1830,13 @@
> 'calc-time': 'int64',
> 'sample-pages': 'uint64',
> 'mode': 'DirtyRateMeasureMode',
> - '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> + '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ],
> + 'page-size': 'int64',
Shouldn't this be of type 'size'?
> + '*n-total-pages': 'int64',
> + '*n-sampled-pages': 'int64',
> + '*periods': ['int64'],
> + '*n-dirty-pages': ['int64'] } }
'uint64', like @sample-pages?
> +
>
> ##
> # @calc-dirty-rate:
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode
2023-05-10 17:36 ` Juan Quintela
@ 2023-05-12 13:18 ` gudkov.andrei--- via
2023-05-15 8:22 ` Juan Quintela
2023-05-15 8:23 ` Juan Quintela
0 siblings, 2 replies; 22+ messages in thread
From: gudkov.andrei--- via @ 2023-05-12 13:18 UTC (permalink / raw)
To: Juan Quintela
Cc: Andrei Gudkov, qemu-devel, eblake, armbru, berrange, zhengchuan
On Wed, May 10, 2023 at 07:36:33PM +0200, Juan Quintela wrote:
> Andrei Gudkov <gudkov.andrei@huawei.com> wrote:
> > Collect number of dirty pages for progresseively increasing time
> > periods starting with 125ms up to number of seconds specified with
> > calc-dirty-rate. Report through qmp and hmp: 1) vector of dirty page
> > measurements, 2) page size, 3) total number of VM pages, 4) number
> > of sampled pages.
> >
> > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> > ---
> > migration/dirtyrate.c | 165 +++++++++++++++++++++++++++++-------------
> > migration/dirtyrate.h | 25 ++++++-
> > qapi/migration.json | 24 +++++-
>
> You need the equivalent of this in your .git/config file.
>
> [diff]
> orderFile = scripts/git.orderfile
>
> In particular:
> *json files cames first
> *.h second
> *.c third
>
> > 3 files changed, 160 insertions(+), 54 deletions(-)
> >
> > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> > index acba3213a3..4491bbe91a 100644
> > --- a/migration/dirtyrate.c
> > +++ b/migration/dirtyrate.c
> > @@ -224,6 +224,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
> > info->calc_time = DirtyStat.calc_time;
> > info->sample_pages = DirtyStat.sample_pages;
> > info->mode = dirtyrate_mode;
> > + info->page_size = TARGET_PAGE_SIZE;
>
> I thought we exported this trough ""info migrate"
> but we do it only if we are in the middle of a migration. Perhaps we
> should print it always.
So, which one do you prefer? To keep it here or to make "info migrate" print it always (or both)?
>
> > if (qatomic_read(&CalculatingState) == DIRTY_RATE_STATUS_MEASURED) {
> > info->has_dirty_rate = true;
> > @@ -245,6 +246,29 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
> > info->vcpu_dirty_rate = head;
> > }
> >
> > + if (dirtyrate_mode == DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING)
> > {
>
> see my comment at the end about int64 vs uint64/size
>
> > + DirtyStat.page_sampling.n_total_pages = 0;
> > + DirtyStat.page_sampling.n_sampled_pages = 0;
> > + DirtyStat.page_sampling.n_readings = 0;
> > + DirtyStat.page_sampling.readings = g_try_malloc0_n(MAX_DIRTY_READINGS,
> > + sizeof(DirtyReading));
> > break;
>
> You do g_try_malloc0()
>
> or you check for NULL return.
>
> Even better, I think you can use here:
>
> foo = g_new0(DirtyReading, MAX_DIRTY_READINGS);
>
> MAX_DIRTY_READINGS is small enough that you can assume that there is
> enough free memory.
>
> > - DirtyStat.dirty_rate = dirtyrate;
> > + if (DirtyStat.page_sampling.readings) {
> > + free(DirtyStat.page_sampling.readings);
> > + DirtyStat.page_sampling.readings = NULL;
> > + }
>
> What glib gives, glib takes.
>
> g_malloc() -> g_free()
>
> g_free() knows how to handle the NULL case so:
>
> g_free(DirtyStat.page_sampling.readings);
> DirtyStat.page_sampling.readings = NULL;
>
> Without if should be good enough.
>
> > -static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
> > +static int64_t compare_page_hash_info(struct RamblockDirtyInfo *info,
> > int block_count)
>
> bad indentantion.
>
> > +static int64_t increase_period(int64_t prev_period, int64_t max_period)
> > +{
> > + int64_t delta;
> > + int64_t next_period;
> > +
> > + if (prev_period < 500) {
> > + delta = 125;
> > + } else if (prev_period < 1000) {
> > + delta = 250;
> > + } else if (prev_period < 2000) {
> > + delta = 500;
> > + } else if (prev_period < 4000) {
> > + delta = 1000;
> > + } else if (prev_period < 10000) {
> > + delta = 2000;
> > + } else {
> > + delta = 5000;
> > + }
> > +
> > + next_period = prev_period + delta;
> > + if (next_period + delta >= max_period) {
> > + next_period = max_period;
> > + }
> > + return next_period;
> > +}
>
> Is there any reason for this to be so complicated?
I think it was because I initially computed prev_period using qemu_clock_get_ms().
But now I see that I got rid of this idea, and prev_period can be only
one of the predefined values... So, you are right. I will change it to an array.
>
>
> int64_t periods[] = { 125, 250, 375, 500, 750, 1000, 1500, 2000, 3000, 4000, 6000, 8000, 10000,
> 15000, 20000, 25000, 30000, 35000, 40000, 45000 };
>
> And access it in the loop? This way you get what the values are.
>
> You can put a limit to config.sample_period_seconds, or you want it
> unlimited?
It cannot be unbounded. Max period can be at most
MAX_FETCH_DIRTYRATE_TIME_SEC*1000=60000 ms. It is already checked inside
qmp_calc_dirty_rate().
>
>
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 2c35b7b9cf..f818f51e0e 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1805,6 +1805,22 @@
> > # @vcpu-dirty-rate: dirtyrate for each vcpu if dirty-ring
> > # mode specified (Since 6.2)
> > #
> > +# @page-size: page size in bytes (since 8.1)
> > +#
> > +# @n-total-pages: [page-sampling] total number of VM pages (since 8.1)
> > +#
> > +# @n-sampled-pages: [page-sampling] number of sampled VM pages (since 8.1)
> > +#
> > +# @periods: [page-sampling] array of time periods expressed in milliseconds
> > +# for which dirty-sample measurements were collected (since 8.1)
> > +#
> > +# @n-dirty-pages: [page-sampling] number of pages among all sampled pages
> > +# that were observed as changed during respective time period.
> > +# i-th element of this array corresponds to the i-th element
> > +# of the @periods array, i.e. @n-dirty-pages[i] is the number
> > +# of dirtied pages during period of @periods[i] milliseconds
> > +# after the initiation of calc-dirty-rate (since 8.1)
> > +#
> > # Since: 5.2
> > ##
> > { 'struct': 'DirtyRateInfo',
> > @@ -1814,7 +1830,13 @@
> > 'calc-time': 'int64',
> > 'sample-pages': 'uint64',
> > 'mode': 'DirtyRateMeasureMode',
> > - '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > + '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ],
> > + 'page-size': 'int64',
>
> 2 things:
> a- this is exported in info migrate, so you can get it from there.
> b- even if we export it here, it is as size or uint64, negative page
> size make no sense (not that I am expecting to have page that don't
> fit in a int O:-)
But can you be sure that in the future you are not going to return
sentinel value like "-1"? :)
>
> Same for the rest of the counters.
Ok, but I still insist on using 64 bit types for the page number counters.
It looks to me that 16TiB VM is a matter of near future.
>
> > + '*n-total-pages': 'int64',
> > + '*n-sampled-pages': 'int64',
> > + '*periods': ['int64'],
> > + '*n-dirty-pages': ['int64'] } }
> >
> > ##
> > # @calc-dirty-rate:
>
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode
2023-05-11 6:14 ` Markus Armbruster
@ 2023-05-12 14:24 ` gudkov.andrei--- via
0 siblings, 0 replies; 22+ messages in thread
From: gudkov.andrei--- via @ 2023-05-12 14:24 UTC (permalink / raw)
To: Markus Armbruster
Cc: Andrei Gudkov via, quintela, eblake, berrange, zhengchuan,
gudkov.andrei
On Thu, May 11, 2023 at 08:14:29AM +0200, Markus Armbruster wrote:
> Andrei Gudkov via <qemu-devel@nongnu.org> writes:
>
> > Collect number of dirty pages for progresseively increasing time
> > periods starting with 125ms up to number of seconds specified with
> > calc-dirty-rate. Report through qmp and hmp: 1) vector of dirty page
> > measurements, 2) page size, 3) total number of VM pages, 4) number
> > of sampled pages.
> >
> > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
>
> [...]
>
> > diff --git a/qapi/migration.json b/qapi/migration.json
> > index 2c35b7b9cf..f818f51e0e 100644
> > --- a/qapi/migration.json
> > +++ b/qapi/migration.json
> > @@ -1805,6 +1805,22 @@
> ##
> # @DirtyRateInfo:
> #
> # Information about current dirty page rate of vm.
> #
> # @dirty-rate: an estimate of the dirty page rate of the VM in units
> # of MB/s, present only when estimating the rate has completed.
> #
> # @status: status containing dirtyrate query status includes
> # 'unstarted' or 'measuring' or 'measured'
>
> Not this patch's fault, but here goes anyway:
>
> 0. "dirtyrate" isn't a word. Spell it "dirty rate". Many more
> instances elsewhere.
>
> 1. "status containing status"... what has the poor English language done
> to us that we torture it so?
>
> 2. "includes 'unstarted' or 'measuring' or 'measured' is confusing and
> entirely redundant with the type. @status doesn't "include" these,
> these are the possible values, and all of them.
>
> Suggest:
>
> @status: dirty rate measuring status
>
> I do understand how difficult writing good English is for non-native
> speakers. This is mainly a failure of patch review.
>
> #
> # @start-time: start time in units of second for calculation
> #
> # @calc-time: time in units of second for sample dirty pages
> #
> # @sample-pages: page count per GB for sample dirty pages the default
> # value is 512 (since 6.1)
> #
> # @mode: mode containing method of calculate dirtyrate includes
> # 'page-sampling' and 'dirty-ring' (Since 6.2)
>
> Still not this patch's fault:
>
> 1. "mode containing method": more torture :)
>
> 2. "includes 'page-sampling' and 'dirty-ring'" is confusing.
>
> When it was added in commit 0e21bf24608, it was confusing and
> redundant like the text for @status above.
>
> Then commit 826b8bc80cb added value 'dirty-bitmap' without updating
> the member doc here.
>
> Suggest:
>
> @mode: dirty rate measuring mode
>
> #
> # @vcpu-dirty-rate: dirtyrate for each vcpu if dirty-ring mode
> # specified (Since 6.2)
> #
> > +# @page-size: page size in bytes (since 8.1)
> > +#
> > +# @n-total-pages: [page-sampling] total number of VM pages (since 8.1)
> > +#
> > +# @n-sampled-pages: [page-sampling] number of sampled VM pages (since 8.1)
> > +#
> > +# @periods: [page-sampling] array of time periods expressed in milliseconds
> > +# for which dirty-sample measurements were collected (since 8.1)
> > +#
> > +# @n-dirty-pages: [page-sampling] number of pages among all sampled pages
> > +# that were observed as changed during respective time period.
> > +# i-th element of this array corresponds to the i-th element
> > +# of the @periods array, i.e. @n-dirty-pages[i] is the number
> > +# of dirtied pages during period of @periods[i] milliseconds
> > +# after the initiation of calc-dirty-rate (since 8.1)
> > +#
>
> Changed doc comment conventions landed yesterday (merge commit
> 568992e3440). Please format like this:
>
> # @page-size: page size in bytes (since 8.1)
> #
> # @n-total-pages: [page-sampling] total number of VM pages (since 8.1)
> #
> # @n-sampled-pages: [page-sampling] number of sampled VM pages (since
> # 8.1)
> #
> # @n-zero-pages: [page-sampling] number of observed all-zero pages
> # among all sampled pages (since 8.1)
> #
> # @periods: [page-sampling] array of time periods expressed in
> # milliseconds for which dirty-sample measurements were collected
> # (since 8.1)
> #
> # @n-dirty-pages: [page-sampling] number of pages among all sampled
> # pages that were observed as changed during respective time
> # period. i-th element of this array corresponds to the i-th
> # element of the @periods array, i.e. @n-dirty-pages[i] is the
> # number of dirtied pages during period of @periods[i]
> # milliseconds after the initiation of calc-dirty-rate (since 8.1)
>
> The meaning of "[page-sampling]" is unclear. What are you trying to
> express?
There are two different measurements modes: page-sampling and dirty-ring.
They gather different sets of metrics. All the metrics above are
available only in page-sampling mode.
>
> For better or worse, we try to avoid abbreviations in QMP. The "n-"
> prefix is one. What does it stand for?
This is the number of respective objects. I can replace it with something like
"sampled-pages-count", "zero-pages-count", etc at the cost of longer names.
>
> It's quite unclear how all these numbers relate to each other. What's
> the difference between @n-sampled-pages and @sample-pages? I think
> we're missing an overview of the dirty rate measurement feature.
This looks like an easy thing to do. I will add some docs to
@calc-dirty-rate related to page-sampling mode.
>
> > # Since: 5.2
> > ##
> > { 'struct': 'DirtyRateInfo',
> > @@ -1814,7 +1830,13 @@
> > 'calc-time': 'int64',
> > 'sample-pages': 'uint64',
> > 'mode': 'DirtyRateMeasureMode',
> > - '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> > + '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ],
> > + 'page-size': 'int64',
>
> Shouldn't this be of type 'size'?
>
> > + '*n-total-pages': 'int64',
> > + '*n-sampled-pages': 'int64',
> > + '*periods': ['int64'],
> > + '*n-dirty-pages': ['int64'] } }
>
> 'uint64', like @sample-pages?
>
> > +
> >
> > ##
> > # @calc-dirty-rate:
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode
2023-05-12 13:18 ` gudkov.andrei--- via
@ 2023-05-15 8:22 ` Juan Quintela
2023-05-18 14:45 ` gudkov.andrei--- via
2023-05-15 8:23 ` Juan Quintela
1 sibling, 1 reply; 22+ messages in thread
From: Juan Quintela @ 2023-05-15 8:22 UTC (permalink / raw)
To: gudkov.andrei; +Cc: qemu-devel, eblake, armbru, berrange, zhengchuan
<gudkov.andrei@huawei.com> wrote:
> On Wed, May 10, 2023 at 07:36:33PM +0200, Juan Quintela wrote:
>> Andrei Gudkov <gudkov.andrei@huawei.com> wrote:
>> > Collect number of dirty pages for progresseively increasing time
>> > periods starting with 125ms up to number of seconds specified with
>> > calc-dirty-rate. Report through qmp and hmp: 1) vector of dirty page
>> > measurements, 2) page size, 3) total number of VM pages, 4) number
>> > of sampled pages.
>> >
>> > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
>> > ---
>> > migration/dirtyrate.c | 165 +++++++++++++++++++++++++++++-------------
>> > migration/dirtyrate.h | 25 ++++++-
>> > qapi/migration.json | 24 +++++-
>>
>> You need the equivalent of this in your .git/config file.
>>
>> [diff]
>> orderFile = scripts/git.orderfile
>>
>> In particular:
>> *json files cames first
>> *.h second
>> *.c third
>>
>> > 3 files changed, 160 insertions(+), 54 deletions(-)
>> >
>> > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
>> > index acba3213a3..4491bbe91a 100644
>> > --- a/migration/dirtyrate.c
>> > +++ b/migration/dirtyrate.c
>> > @@ -224,6 +224,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
>> > info->calc_time = DirtyStat.calc_time;
>> > info->sample_pages = DirtyStat.sample_pages;
>> > info->mode = dirtyrate_mode;
>> > + info->page_size = TARGET_PAGE_SIZE;
>>
>> I thought we exported this trough ""info migrate"
>> but we do it only if we are in the middle of a migration. Perhaps we
>> should print it always.
>
> So, which one do you prefer? To keep it here or to make "info migrate" print it always (or both)?
info migrate to print it allways. Thanks.
>> > @@ -1814,7 +1830,13 @@
>> > 'calc-time': 'int64',
>> > 'sample-pages': 'uint64',
>> > 'mode': 'DirtyRateMeasureMode',
>> > - '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
>> > + '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ],
>> > + 'page-size': 'int64',
>>
>> 2 things:
>> a- this is exported in info migrate, so you can get it from there.
>> b- even if we export it here, it is as size or uint64, negative page
>> size make no sense (not that I am expecting to have page that don't
>> fit in a int O:-)
>
> But can you be sure that in the future you are not going to return
> sentinel value like "-1"? :)
For page-size? I don't expect it. And if I want a centinel values,
UINT64_MAX, UINT64_MAX-1 and friends looks good enough for me.
>> Same for the rest of the counters.
>
> Ok, but I still insist on using 64 bit types for the page number counters.
> It looks to me that 16TiB VM is a matter of near future.
size_t is 64 bits on any host that is able to handle 16TiB guests O:-)
But I am indiferent to uint64_t or size_t (I am only on 64 bit machines,
so it is the same for me). That would only help in 32bits hosts, but I
am pretty sure that nobody is using them seriously (for migration)
because all the 32bit bugs that I get on migration is from the build
bots, not real users.
Later, Juan.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode
2023-05-12 13:18 ` gudkov.andrei--- via
2023-05-15 8:22 ` Juan Quintela
@ 2023-05-15 8:23 ` Juan Quintela
1 sibling, 0 replies; 22+ messages in thread
From: Juan Quintela @ 2023-05-15 8:23 UTC (permalink / raw)
To: gudkov.andrei; +Cc: qemu-devel, eblake, armbru, berrange, zhengchuan
<gudkov.andrei@huawei.com> wrote:
> On Wed, May 10, 2023 at 07:36:33PM +0200, Juan Quintela wrote:
>> Andrei Gudkov <gudkov.andrei@huawei.com> wrote:
>> > Collect number of dirty pages for progresseively increasing time
>> > periods starting with 125ms up to number of seconds specified with
>> > calc-dirty-rate. Report through qmp and hmp: 1) vector of dirty page
>> > measurements, 2) page size, 3) total number of VM pages, 4) number
>> > of sampled pages.
>> >
>> > Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
>> > ---
>> > migration/dirtyrate.c | 165 +++++++++++++++++++++++++++++-------------
>> > migration/dirtyrate.h | 25 ++++++-
>> > qapi/migration.json | 24 +++++-
>>
>> You need the equivalent of this in your .git/config file.
>>
>> [diff]
>> orderFile = scripts/git.orderfile
>>
>> In particular:
>> *json files cames first
>> *.h second
>> *.c third
>>
>> > 3 files changed, 160 insertions(+), 54 deletions(-)
>> >
>> > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
>> > index acba3213a3..4491bbe91a 100644
>> > --- a/migration/dirtyrate.c
>> > +++ b/migration/dirtyrate.c
>> > @@ -224,6 +224,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
>> > info->calc_time = DirtyStat.calc_time;
>> > info->sample_pages = DirtyStat.sample_pages;
>> > info->mode = dirtyrate_mode;
>> > + info->page_size = TARGET_PAGE_SIZE;
>>
>> I thought we exported this trough ""info migrate"
>> but we do it only if we are in the middle of a migration. Perhaps we
>> should print it always.
>
> So, which one do you prefer? To keep it here or to make "info migrate" print it always (or both)?
info migrate to print it always. Thanks.
>> > @@ -1814,7 +1830,13 @@
>> > 'calc-time': 'int64',
>> > 'sample-pages': 'uint64',
>> > 'mode': 'DirtyRateMeasureMode',
>> > - '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
>> > + '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ],
>> > + 'page-size': 'int64',
>>
>> 2 things:
>> a- this is exported in info migrate, so you can get it from there.
>> b- even if we export it here, it is as size or uint64, negative page
>> size make no sense (not that I am expecting to have page that don't
>> fit in a int O:-)
>
> But can you be sure that in the future you are not going to return
> sentinel value like "-1"? :)
For page-size? I don't expect it. And if I want a sentinel values,
UINT64_MAX, UINT64_MAX-1 and friends looks good enough for me.
>> Same for the rest of the counters.
>
> Ok, but I still insist on using 64 bit types for the page number counters.
> It looks to me that 16TiB VM is a matter of near future.
size_t is 64 bits on any host that is able to handle 16TiB guests O:-)
But I am indifferent to uint64_t or size_t (I am only on 64 bit machines,
so it is the same for me). That would only help in 32bits hosts, but I
am pretty sure that nobody is using them seriously (for migration)
because all the 32bit bugs that I get on migration is from the build
bots, not real users.
Later, Juan.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode
2023-05-15 8:22 ` Juan Quintela
@ 2023-05-18 14:45 ` gudkov.andrei--- via
2023-05-18 15:13 ` Juan Quintela
0 siblings, 1 reply; 22+ messages in thread
From: gudkov.andrei--- via @ 2023-05-18 14:45 UTC (permalink / raw)
To: Juan Quintela
Cc: gudkov.andrei, qemu-devel, eblake, armbru, berrange, zhengchuan,
eduardo, marcel.apfelbaum
On Mon, May 15, 2023 at 10:22:43AM +0200, Juan Quintela wrote:
> <gudkov.andrei@huawei.com> wrote:
> > On Wed, May 10, 2023 at 07:36:33PM +0200, Juan Quintela wrote:
> >> Andrei Gudkov <gudkov.andrei@huawei.com> wrote:
> >> > Collect number of dirty pages for progresseively increasing time
> >> > periods starting with 125ms up to number of seconds specified with
> >> > calc-dirty-rate. Report through qmp and hmp: 1) vector of dirty page
> >> > measurements, 2) page size, 3) total number of VM pages, 4) number
> >> > of sampled pages.
> >> >
> >> > 3 files changed, 160 insertions(+), 54 deletions(-)
> >> >
> >> > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> >> > index acba3213a3..4491bbe91a 100644
> >> > --- a/migration/dirtyrate.c
> >> > +++ b/migration/dirtyrate.c
> >> > @@ -224,6 +224,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
> >> > info->calc_time = DirtyStat.calc_time;
> >> > info->sample_pages = DirtyStat.sample_pages;
> >> > info->mode = dirtyrate_mode;
> >> > + info->page_size = TARGET_PAGE_SIZE;
> >>
> >> I thought we exported this trough ""info migrate"
> >> but we do it only if we are in the middle of a migration. Perhaps we
> >> should print it always.
> >
> > So, which one do you prefer? To keep it here or to make "info migrate" print it always (or both)?
>
> info migrate to print it allways. Thanks.
I looked into "info migrate". To print page size irregarding migration status,
all other 17 fields of MigrationInfo.ram will have to be made optional.
Atop of that, it feels like that page size doesn't belong to "info migrate"
since it is the only one "static" value, while all others are "dynamic" counters.
I think I found a better place where page size can be reported --
query-memory-size-summary. After the change it would be as following:
{"execute": "query-memory-size-summary"}
{"return": {"page-size": 4096, "base-memory": 34359738368, "plugged-memory": 0}}
What do you think about it?
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode
2023-05-18 14:45 ` gudkov.andrei--- via
@ 2023-05-18 15:13 ` Juan Quintela
0 siblings, 0 replies; 22+ messages in thread
From: Juan Quintela @ 2023-05-18 15:13 UTC (permalink / raw)
To: gudkov.andrei
Cc: qemu-devel, eblake, armbru, berrange, zhengchuan, eduardo,
marcel.apfelbaum
<gudkov.andrei@huawei.com> wrote:
> On Mon, May 15, 2023 at 10:22:43AM +0200, Juan Quintela wrote:
>> <gudkov.andrei@huawei.com> wrote:
>> > On Wed, May 10, 2023 at 07:36:33PM +0200, Juan Quintela wrote:
>> >> Andrei Gudkov <gudkov.andrei@huawei.com> wrote:
>> >> > Collect number of dirty pages for progresseively increasing time
>> >> > periods starting with 125ms up to number of seconds specified with
>> >> > calc-dirty-rate. Report through qmp and hmp: 1) vector of dirty page
>> >> > measurements, 2) page size, 3) total number of VM pages, 4) number
>> >> > of sampled pages.
>> >> >
>> >> > 3 files changed, 160 insertions(+), 54 deletions(-)
>> >> >
>> >> > diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
>> >> > index acba3213a3..4491bbe91a 100644
>> >> > --- a/migration/dirtyrate.c
>> >> > +++ b/migration/dirtyrate.c
>> >> > @@ -224,6 +224,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
>> >> > info->calc_time = DirtyStat.calc_time;
>> >> > info->sample_pages = DirtyStat.sample_pages;
>> >> > info->mode = dirtyrate_mode;
>> >> > + info->page_size = TARGET_PAGE_SIZE;
>> >>
>> >> I thought we exported this trough ""info migrate"
>> >> but we do it only if we are in the middle of a migration. Perhaps we
>> >> should print it always.
>> >
>> > So, which one do you prefer? To keep it here or to make "info migrate" print it always (or both)?
>>
>> info migrate to print it allways. Thanks.
>
> I looked into "info migrate". To print page size irregarding migration status,
> all other 17 fields of MigrationInfo.ram will have to be made optional.
> Atop of that, it feels like that page size doesn't belong to "info migrate"
> since it is the only one "static" value, while all others are "dynamic" counters.
>
> I think I found a better place where page size can be reported --
> query-memory-size-summary. After the change it would be as following:
> {"execute": "query-memory-size-summary"}
> {"return": {"page-size": 4096, "base-memory": 34359738368, "plugged-memory": 0}}
>
> What do you think about it?
Perfect for me.
Thanks.
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode
2023-04-27 12:42 ` [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode Andrei Gudkov via
2023-05-10 17:36 ` Juan Quintela
2023-05-11 6:14 ` Markus Armbruster
@ 2023-05-30 3:06 ` Wang, Lei
2 siblings, 0 replies; 22+ messages in thread
From: Wang, Lei @ 2023-05-30 3:06 UTC (permalink / raw)
To: Andrei Gudkov, qemu-devel; +Cc: quintela, eblake, armbru, berrange, zhengchuan
On 4/27/2023 20:42, Andrei Gudkov via wrote:
> Collect number of dirty pages for progresseively increasing time
> periods starting with 125ms up to number of seconds specified with
> calc-dirty-rate. Report through qmp and hmp: 1) vector of dirty page
> measurements, 2) page size, 3) total number of VM pages, 4) number
> of sampled pages.
>
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> ---
> migration/dirtyrate.c | 165 +++++++++++++++++++++++++++++-------------
> migration/dirtyrate.h | 25 ++++++-
> qapi/migration.json | 24 +++++-
> 3 files changed, 160 insertions(+), 54 deletions(-)
>
> diff --git a/migration/dirtyrate.c b/migration/dirtyrate.c
> index acba3213a3..4491bbe91a 100644
> --- a/migration/dirtyrate.c
> +++ b/migration/dirtyrate.c
> @@ -224,6 +224,7 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
> info->calc_time = DirtyStat.calc_time;
> info->sample_pages = DirtyStat.sample_pages;
> info->mode = dirtyrate_mode;
> + info->page_size = TARGET_PAGE_SIZE;
>
> if (qatomic_read(&CalculatingState) == DIRTY_RATE_STATUS_MEASURED) {
> info->has_dirty_rate = true;
> @@ -245,6 +246,29 @@ static struct DirtyRateInfo *query_dirty_rate_info(void)
> info->vcpu_dirty_rate = head;
> }
>
> + if (dirtyrate_mode == DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING) {
> + int64List *periods_head = NULL;
> + int64List **periods_tail = &periods_head;
> + int64List *n_dirty_pages_head = NULL;
> + int64List **n_dirty_pages_tail = &n_dirty_pages_head;
> +
> + info->n_total_pages = DirtyStat.page_sampling.n_total_pages;
> + info->has_n_total_pages = true;
> +
> + info->n_sampled_pages = DirtyStat.page_sampling.n_sampled_pages;
> + info->has_n_sampled_pages = true;
> +
> + for (i = 0; i < DirtyStat.page_sampling.n_readings; i++) {
> + DirtyReading *dr = &DirtyStat.page_sampling.readings[i];
> + QAPI_LIST_APPEND(periods_tail, dr->period);
> + QAPI_LIST_APPEND(n_dirty_pages_tail, dr->n_dirty_pages);
> + }
> + info->n_dirty_pages = n_dirty_pages_head;
> + info->periods = periods_head;
> + info->has_n_dirty_pages = true;
> + info->has_periods = true;
> + }
> +
> if (dirtyrate_mode == DIRTY_RATE_MEASURE_MODE_DIRTY_BITMAP) {
> info->sample_pages = 0;
> }
> @@ -265,9 +289,11 @@ static void init_dirtyrate_stat(int64_t start_time,
>
> switch (config.mode) {
> case DIRTY_RATE_MEASURE_MODE_PAGE_SAMPLING:
> - DirtyStat.page_sampling.total_dirty_samples = 0;
> - DirtyStat.page_sampling.total_sample_count = 0;
> - DirtyStat.page_sampling.total_block_mem_MB = 0;
> + DirtyStat.page_sampling.n_total_pages = 0;
> + DirtyStat.page_sampling.n_sampled_pages = 0;
> + DirtyStat.page_sampling.n_readings = 0;
> + DirtyStat.page_sampling.readings = g_try_malloc0_n(MAX_DIRTY_READINGS,
> + sizeof(DirtyReading));
> break;
> case DIRTY_RATE_MEASURE_MODE_DIRTY_RING:
> DirtyStat.dirty_ring.nvcpu = -1;
> @@ -285,28 +311,10 @@ static void cleanup_dirtyrate_stat(struct DirtyRateConfig config)
> free(DirtyStat.dirty_ring.rates);
> DirtyStat.dirty_ring.rates = NULL;
> }
> -}
> -
> -static void update_dirtyrate_stat(struct RamblockDirtyInfo *info)
> -{
> - DirtyStat.page_sampling.total_dirty_samples += info->sample_dirty_count;
> - DirtyStat.page_sampling.total_sample_count += info->sample_pages_count;
> - /* size of total pages in MB */
> - DirtyStat.page_sampling.total_block_mem_MB += (info->ramblock_pages *
> - TARGET_PAGE_SIZE) >> 20;
> -}
> -
> -static void update_dirtyrate(uint64_t msec)
> -{
> - uint64_t dirtyrate;
> - uint64_t total_dirty_samples = DirtyStat.page_sampling.total_dirty_samples;
> - uint64_t total_sample_count = DirtyStat.page_sampling.total_sample_count;
> - uint64_t total_block_mem_MB = DirtyStat.page_sampling.total_block_mem_MB;
> -
> - dirtyrate = total_dirty_samples * total_block_mem_MB *
> - 1000 / (total_sample_count * msec);
> -
> - DirtyStat.dirty_rate = dirtyrate;
> + if (DirtyStat.page_sampling.readings) {
> + free(DirtyStat.page_sampling.readings);
> + DirtyStat.page_sampling.readings = NULL;
> + }
> }
>
> /*
> @@ -377,12 +385,14 @@ static bool save_ramblock_hash(struct RamblockDirtyInfo *info)
> return false;
> }
>
> - rand = g_rand_new();
> + rand = g_rand_new();
> + DirtyStat.page_sampling.n_total_pages += info->ramblock_pages;
> for (i = 0; i < sample_pages_count; i++) {
> info->sample_page_vfn[i] = g_rand_int_range(rand, 0,
> info->ramblock_pages - 1);
> info->hash_result[i] = get_ramblock_vfn_hash(info,
> info->sample_page_vfn[i]);
> + DirtyStat.page_sampling.n_sampled_pages++;
How about:
DirtyStat.page_sampling.n_sampled_pages += sample_pages_count;
out of the loop to avoid doing the calculation multiple times?
> }
> g_rand_free(rand);
>
> @@ -479,18 +489,20 @@ out:
> return ret;
> }
>
> -static void calc_page_dirty_rate(struct RamblockDirtyInfo *info)
> +static int64_t calc_page_dirty_rate(struct RamblockDirtyInfo *info)
> {
> uint32_t hash;
> int i;
>
> + int64_t n_dirty = 0;
> for (i = 0; i < info->sample_pages_count; i++) {
> hash = get_ramblock_vfn_hash(info, info->sample_page_vfn[i]);
> if (hash != info->hash_result[i]) {
> + n_dirty++;
> trace_calc_page_dirty_rate(info->idstr, hash, info->hash_result[i]);
> - info->sample_dirty_count++;
> }
> }
> + return n_dirty;
> }
>
> static struct RamblockDirtyInfo *
> @@ -519,11 +531,12 @@ find_block_matched(RAMBlock *block, int count,
> return &infos[i];
> }
>
> -static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
> +static int64_t compare_page_hash_info(struct RamblockDirtyInfo *info,
> int block_count)
> {
> struct RamblockDirtyInfo *block_dinfo = NULL;
> RAMBlock *block = NULL;
> + int64_t n_dirty = 0;
>
> RAMBLOCK_FOREACH_MIGRATABLE(block) {
> if (skip_sample_ramblock(block)) {
> @@ -533,15 +546,10 @@ static bool compare_page_hash_info(struct RamblockDirtyInfo *info,
> if (block_dinfo == NULL) {
> continue;
> }
> - calc_page_dirty_rate(block_dinfo);
> - update_dirtyrate_stat(block_dinfo);
> - }
> -
> - if (DirtyStat.page_sampling.total_sample_count == 0) {
> - return false;
> + n_dirty += calc_page_dirty_rate(block_dinfo);
> }
>
> - return true;
> + return n_dirty;
> }
>
> static inline void record_dirtypages_bitmap(DirtyPageRecord *dirty_pages,
> @@ -642,34 +650,77 @@ static void calculate_dirtyrate_dirty_ring(struct DirtyRateConfig config)
> DirtyStat.dirty_rate = dirtyrate_sum;
> }
>
> +static int64_t increase_period(int64_t prev_period, int64_t max_period)
> +{
> + int64_t delta;
> + int64_t next_period;
> +
> + if (prev_period < 500) {
> + delta = 125;
> + } else if (prev_period < 1000) {
> + delta = 250;
> + } else if (prev_period < 2000) {
> + delta = 500;
> + } else if (prev_period < 4000) {
> + delta = 1000;
> + } else if (prev_period < 10000) {
> + delta = 2000;
> + } else {
> + delta = 5000;
> + }
> +
> + next_period = prev_period + delta;
> + if (next_period + delta >= max_period) {
> + next_period = max_period;
> + }
> + return next_period;
> +}
> +
> +
> static void calculate_dirtyrate_sample_vm(struct DirtyRateConfig config)
> {
> struct RamblockDirtyInfo *block_dinfo = NULL;
> int block_count = 0;
> - int64_t msec = 0;
> int64_t initial_time;
> + int64_t current_time;
>
> + /* initial pass */
> rcu_read_lock();
> initial_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> - if (!record_ramblock_hash_info(&block_dinfo, config, &block_count)) {
> + bool ok = record_ramblock_hash_info(&block_dinfo, config, &block_count);
> + rcu_read_unlock();
> + if ((!ok) || (DirtyStat.page_sampling.n_sampled_pages == 0)) {
> goto out;
> }
> - rcu_read_unlock();
>
> - msec = config.sample_period_seconds * 1000;
> - msec = dirty_stat_wait(msec, initial_time);
> - DirtyStat.start_time = initial_time / 1000;
> - DirtyStat.calc_time = msec / 1000;
> + int64_t period = INITIAL_PERIOD_MS;
> + while (true) {
> + current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> + int64_t delta = initial_time + period - current_time;
> + if (delta > 0) {
> + g_usleep(delta * 1000);
> + }
>
> - rcu_read_lock();
> - if (!compare_page_hash_info(block_dinfo, block_count)) {
> - goto out;
> - }
> + rcu_read_lock();
> + current_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
> + int64_t n_dirty = compare_page_hash_info(block_dinfo, block_count);
> + rcu_read_unlock();
>
> - update_dirtyrate(msec);
> + SampleVMStat *ps = &DirtyStat.page_sampling;
> + ps->readings[ps->n_readings].period = current_time - initial_time;
> + ps->readings[ps->n_readings].n_dirty_pages = n_dirty;
> + ps->n_readings++;
> +
> + if (period >= DirtyStat.calc_time * 1000) {
> + int64_t mb_total = (ps->n_total_pages * TARGET_PAGE_SIZE) >> 20;
> + int64_t mb_dirty = n_dirty * mb_total / ps->n_sampled_pages;
> + DirtyStat.dirty_rate = mb_dirty * 1000 / period;
> + break;
> + }
> + period = increase_period(period, DirtyStat.calc_time * 1000);
> + }
>
> out:
> - rcu_read_unlock();
> free_ramblock_dirty_info(block_dinfo, block_count);
> }
>
> @@ -836,7 +887,23 @@ void hmp_info_dirty_rate(Monitor *mon, const QDict *qdict)
> monitor_printf(mon, "(not ready)\n");
> }
>
> + if (info->has_n_total_pages) {
> + monitor_printf(mon, "Page count (page size %d):\n", TARGET_PAGE_SIZE);
> + monitor_printf(mon, " Total: %"PRIi64"\n", info->n_total_pages);
> + monitor_printf(mon, " Sampled: %"PRIi64"\n", info->n_sampled_pages);
> + int64List *periods = info->periods;
> + int64List *n_dirty_pages = info->n_dirty_pages;
> + while (periods) {
> + monitor_printf(mon, " Dirty(%"PRIi64"ms): %"PRIi64"\n",
> + periods->value, n_dirty_pages->value);
> + periods = periods->next;
> + n_dirty_pages = n_dirty_pages->next;
> + }
> + }
> +
> qapi_free_DirtyRateVcpuList(info->vcpu_dirty_rate);
> + qapi_free_int64List(info->periods);
> + qapi_free_int64List(info->n_dirty_pages);
> g_free(info);
> }
>
> diff --git a/migration/dirtyrate.h b/migration/dirtyrate.h
> index 594a5c0bb6..7a97e2b076 100644
> --- a/migration/dirtyrate.h
> +++ b/migration/dirtyrate.h
> @@ -42,6 +42,18 @@
> #define MIN_SAMPLE_PAGE_COUNT 128
> #define MAX_SAMPLE_PAGE_COUNT 16384
>
> +/*
> + * Initial sampling period expressed in milliseconds
> + */
> +#define INITIAL_PERIOD_MS 125
> +
> +/*
> + * Upper bound on the number of DirtyReadings calculcated based on
Nit: calculcated -> calculated
> + * INITIAL_PERIOD_MS, MAX_FETCH_DIRTYRATE_TIME_SEC and increase_period()
> + */
> +#define MAX_DIRTY_READINGS 32
> +
> +
> struct DirtyRateConfig {
> uint64_t sample_pages_per_gigabytes; /* sample pages per GB */
> int64_t sample_period_seconds; /* time duration between two sampling */
> @@ -57,14 +69,19 @@ struct RamblockDirtyInfo {
> uint64_t ramblock_pages; /* ramblock size in TARGET_PAGE_SIZE */
> uint64_t *sample_page_vfn; /* relative offset address for sampled page */
> uint64_t sample_pages_count; /* count of sampled pages */
> - uint64_t sample_dirty_count; /* count of dirty pages we measure */
> uint32_t *hash_result; /* array of hash result for sampled pages */
> };
>
> +typedef struct DirtyReading {
> + int64_t period; /* time period in milliseconds */
> + int64_t n_dirty_pages; /* number of observed dirty pages */
> +} DirtyReading;
> +
> typedef struct SampleVMStat {
> - uint64_t total_dirty_samples; /* total dirty sampled page */
> - uint64_t total_sample_count; /* total sampled pages */
> - uint64_t total_block_mem_MB; /* size of total sampled pages in MB */
> + int64_t n_total_pages; /* total number of pages */
> + int64_t n_sampled_pages; /* number of sampled pages */
> + int64_t n_readings;
> + DirtyReading *readings;
> } SampleVMStat;
>
> /*
> diff --git a/qapi/migration.json b/qapi/migration.json
> index 2c35b7b9cf..f818f51e0e 100644
> --- a/qapi/migration.json
> +++ b/qapi/migration.json
> @@ -1805,6 +1805,22 @@
> # @vcpu-dirty-rate: dirtyrate for each vcpu if dirty-ring
> # mode specified (Since 6.2)
> #
> +# @page-size: page size in bytes (since 8.1)
> +#
> +# @n-total-pages: [page-sampling] total number of VM pages (since 8.1)
> +#
> +# @n-sampled-pages: [page-sampling] number of sampled VM pages (since 8.1)
> +#
> +# @periods: [page-sampling] array of time periods expressed in milliseconds
> +# for which dirty-sample measurements were collected (since 8.1)
> +#
> +# @n-dirty-pages: [page-sampling] number of pages among all sampled pages
> +# that were observed as changed during respective time period.
> +# i-th element of this array corresponds to the i-th element
> +# of the @periods array, i.e. @n-dirty-pages[i] is the number
> +# of dirtied pages during period of @periods[i] milliseconds
> +# after the initiation of calc-dirty-rate (since 8.1)
> +#
> # Since: 5.2
> ##
> { 'struct': 'DirtyRateInfo',
> @@ -1814,7 +1830,13 @@
> 'calc-time': 'int64',
> 'sample-pages': 'uint64',
> 'mode': 'DirtyRateMeasureMode',
> - '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ] } }
> + '*vcpu-dirty-rate': [ 'DirtyRateVcpu' ],
> + 'page-size': 'int64',
> + '*n-total-pages': 'int64',
> + '*n-sampled-pages': 'int64',
> + '*periods': ['int64'],
> + '*n-dirty-pages': ['int64'] } }
> +
>
> ##
> # @calc-dirty-rate:
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time
2023-04-27 12:43 ` [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time Andrei Gudkov via
2023-05-10 18:01 ` Juan Quintela
@ 2023-05-30 3:21 ` Wang, Lei
2023-06-02 13:06 ` gudkov.andrei--- via
1 sibling, 1 reply; 22+ messages in thread
From: Wang, Lei @ 2023-05-30 3:21 UTC (permalink / raw)
To: Andrei Gudkov, qemu-devel; +Cc: quintela, eblake, armbru, berrange, zhengchuan
On 4/27/2023 20:43, Andrei Gudkov via wrote:
> Signed-off-by: Andrei Gudkov <gudkov.andrei@huawei.com>
> ---
> MAINTAINERS | 1 +
> scripts/predict_migration.py | 283 +++++++++++++++++++++++++++++++++++
> 2 files changed, 284 insertions(+)
> create mode 100644 scripts/predict_migration.py
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index fc225e66df..0c578446cf 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -3167,6 +3167,7 @@ F: docs/devel/migration.rst
> F: qapi/migration.json
> F: tests/migration/
> F: util/userfaultfd.c
> +F: scripts/predict_migration.py
>
> D-Bus
> M: Marc-André Lureau <marcandre.lureau@redhat.com>
> diff --git a/scripts/predict_migration.py b/scripts/predict_migration.py
> new file mode 100644
> index 0000000000..c92a97585f
> --- /dev/null
> +++ b/scripts/predict_migration.py
> @@ -0,0 +1,283 @@
> +#!/usr/bin/env python3
> +#
> +# Predicts time required to migrate VM under given max downtime constraint.
> +#
> +# Copyright (c) 2023 HUAWEI TECHNOLOGIES CO.,LTD.
> +#
> +# Authors:
> +# Andrei Gudkov <gudkov.andrei@huawei.com>
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or
> +# later. See the COPYING file in the top-level directory.
> +
> +
> +# Usage:
> +#
> +# Step 1. Collect dirty page statistics from live VM:
> +# $ scripts/predict_migration.py calc-dirty-rate <qmphost> <qmpport> >dirty.json
> +# <...takes 1 minute by default...>
> +#
> +# Step 2. Run predictor against collected data:
> +# $ scripts/predict_migration.py predict < dirty.json
> +# Downtime> | 125ms | 250ms | 500ms | 1000ms | 5000ms | unlim |
> +# -----------------------------------------------------------------------------
> +# 100 Mbps | - | - | - | - | - | 16m45s |
> +# 1 Gbps | - | - | - | - | - | 1m39s |
> +# 2 Gbps | - | - | - | - | 1m55s | 50s |
> +# 2.5 Gbps | - | - | - | - | 1m12s | 40s |
> +# 5 Gbps | - | - | - | 29s | 25s | 20s |
> +# 10 Gbps | 13s | 13s | 12s | 12s | 12s | 10s |
> +# 25 Gbps | 5s | 5s | 5s | 5s | 4s | 4s |
> +# 40 Gbps | 3s | 3s | 3s | 3s | 3s | 3s |
> +#
> +# The latter prints table that lists estimated time it will take to migrate VM.
> +# This time depends on the network bandwidth and max allowed downtime.
> +# Dash indicates that migration does not converge.
> +# Prediction takes care only about migrating RAM and only in pre-copy mode.
> +# Other features, such as compression or local disk migration, are not supported
> +
> +
> +import sys
> +import os
> +import math
> +import json
> +from dataclasses import dataclass
> +import asyncio
> +import argparse
> +
> +sys.path.append(os.path.join(os.path.dirname(__file__), '..', 'python'))
> +from qemu.qmp import QMPClient
> +
> +async def calc_dirty_rate(host, port, calc_time, sample_pages):
> + client = QMPClient()
> + try:
> + await client.connect((host, port))
> + args = {
> + 'calc-time': calc_time,
> + 'sample-pages': sample_pages
> + }
> + await client.execute('calc-dirty-rate', args)
> + await asyncio.sleep(calc_time)
> + while True:
> + data = await client.execute('query-dirty-rate')
> + if data['status'] == 'measuring':
> + await asyncio.sleep(0.5)
> + elif data['status'] == 'measured':
> + return data
> + else:
> + raise ValueError(data['status'])
> + finally:
> + await client.disconnect()
> +
> +
> +class MemoryModel:
> + """
> + Models RAM state during pre-copy migration using calc-dirty-rate results.
> + Its primary function is to estimate how many pages will be dirtied
> + after given time starting from "clean" state.
> + This function is non-linear and saturates at some point.
> + """
> +
> + @dataclass
> + class Point:
> + period_millis:float
> + dirty_pages:float
> +
> + def __init__(self, data):
> + """
> + :param data: dictionary returned by calc-dirty-rate
> + """
> + self.__points = self.__make_points(data)
> + self.__page_size = data['page-size']
> + self.__num_total_pages = data['n-total-pages']
> + self.__num_zero_pages = data['n-zero-pages'] / \
> + (data['n-sampled-pages'] / data['n-total-pages'])
> +
> + def __make_points(self, data):
> + points = list()
> +
> + # Add observed points
> + sample_ratio = data['n-sampled-pages'] / data['n-total-pages']
> + for millis,dirty_pages in zip(data['periods'], data['n-dirty-pages']):
> + millis = float(millis)
> + dirty_pages = dirty_pages / sample_ratio
> + points.append(MemoryModel.Point(millis, dirty_pages))
> +
> + # Extrapolate function to the left.
> + # Assuming that the function is convex, the worst case is achieved
> + # when dirty page count immediately jumps to some value at zero time
> + # (infinite slope), and next keeps the same slope as in the region
> + # between the first two observed points: points[0]..points[1]
> + slope, offset = self.__fit_line(points[0], points[1])
> + points.insert(0, MemoryModel.Point(0.0, max(offset, 0.0)))
> +
> + # Extrapolate function to the right.
> + # The worst case is achieved when the function has the same slope
> + # as in the last observed region.
> + slope, offset = self.__fit_line(points[-2], points[-1])
> + max_dirty_pages = \
> + data['n-total-pages'] - (data['n-zero-pages'] / sample_ratio)
> + if slope > 0.0:
> + saturation_millis = (max_dirty_pages - offset) / slope
> + points.append(MemoryModel.Point(saturation_millis, max_dirty_pages))
> + points.append(MemoryModel.Point(math.inf, max_dirty_pages))
> +
> + return points
> +
> + def __fit_line(self, lhs:Point, rhs:Point):
> + slope = (rhs.dirty_pages - lhs.dirty_pages) / \
> + (rhs.period_millis - lhs.period_millis)
> + offset = lhs.dirty_pages - slope * lhs.period_millis
> + return slope, offset
> +
> + def page_size(self):
> + """
> + Return page size in bytes
> + """
> + return self.__page_size
> +
> + def num_total_pages(self):
> + return self.__num_total_pages
> +
> + def num_zero_pages(self):
> + """
> + Estimated total number of zero pages. Assumed to be constant.
> + """
> + return self.__num_zero_pages
> +
> + def num_dirty_pages(self, millis):
> + """
> + Estimate number of dirty pages after given time starting from "clean"
> + state. The estimation is based on piece-wise linear interpolation.
> + """
> + for i in range(len(self.__points)):
> + if self.__points[i].period_millis == millis:
> + return self.__points[i].dirty_pages
> + elif self.__points[i].period_millis > millis:
> + slope, offset = self.__fit_line(self.__points[i-1],
> + self.__points[i])
Seems the indentation is broken here.
> + return offset + slope * millis
> + raise RuntimeError("unreachable")
> +
> +
> +def predict_migration_time(model, bandwidth, downtime, deadline=3600*1000):
> + """
> + Predict how much time it will take to migrate VM under under given
Nit: Duplicated "under".
> + deadline constraint.
> +
> + :param model: `MemoryModel` object for a given VM
> + :param bandwidth: Bandwidth available for migration [bytes/s]
> + :param downtime: Max allowed downtime [milliseconds]
> + :param deadline: Max total time to migrate VM before timeout [milliseconds]
> + :return: Predicted migration time [milliseconds] or `None`
> + if migration process doesn't converge before given deadline
> + """
> +
> + left_zero_pages = model.num_zero_pages()
> + left_normal_pages = model.num_total_pages() - model.num_zero_pages()
> + header_size = 8
In the cover letter: "Typical prediction error is 6-7%". I'm wondering if the
6-7% is less or more than the real migration time. I think 2 potential factors
will lead to less estimation time:
1. Network protocol stack's headers are not counted in, e.g., TCP's header can
be 20 ~ 60 bytes.
2. The bandwidth may not be saturated all the time.
> +
> + total_millis = 0.0
> + while True:
> + iter_bytes = 0.0
> + iter_bytes += left_normal_pages * (model.page_size() + header_size)
> + iter_bytes += left_zero_pages * header_size
> +
> + iter_millis = iter_bytes * 1000.0 / bandwidth
> +
> + total_millis += iter_millis
> +
> + if iter_millis <= downtime:
> + return int(math.ceil(total_millis))
> + elif total_millis > deadline:
> + return None
> + else:
> + left_zero_pages = 0
> + left_normal_pages = model.num_dirty_pages(iter_millis)
> +
> +
> +def run_predict_cmd(model):
> + @dataclass
> + class ValStr:
> + value:object
> + string:str
> +
> + def gbps(value):
> + return ValStr(value*1024*1024*1024/8, f'{value} Gbps')
> +
> + def mbps(value):
> + return ValStr(value*1024*1024/8, f'{value} Mbps')
> +
> + def dt(millis):
> + if millis is not None:
> + return ValStr(millis, f'{millis}ms')
> + else:
> + return ValStr(math.inf, 'unlim')
> +
> + def eta(millis):
> + if millis is not None:
> + seconds = int(math.ceil(millis/1000.0))
> + minutes, seconds = divmod(seconds, 60)
> + s = ''
> + if minutes > 0:
> + s += f'{minutes}m'
> + if len(s) > 0:
> + s += f'{seconds:02d}s'
> + else:
> + s += f'{seconds}s'
> + else:
> + s = '-'
> + return ValStr(millis, s)
> +
> +
> + bandwidths = [mbps(100), gbps(1), gbps(2), gbps(2.5), gbps(5), gbps(10),
> + gbps(25), gbps(40)]
> + downtimes = [dt(125), dt(250), dt(500), dt(1000), dt(5000), dt(None)]
> +
> + out = ''
> + out += 'Downtime> |'
> + for downtime in downtimes:
> + out += f' {downtime.string:>7} |'
> + print(out)
> +
> + print('-'*len(out))
> +
> + for bandwidth in bandwidths:
> + print(f'{bandwidth.string:>9} | ', '', end='')
> + for downtime in downtimes:
> + millis = predict_migration_time(model,
> + bandwidth.value,
> + downtime.value)
> + print(f'{eta(millis).string:>7} | ', '', end='')
> + print()
> +
> +def main():
> + parser = argparse.ArgumentParser()
> + subparsers = parser.add_subparsers(dest='command', required=True)
> +
> + parser_cdr = subparsers.add_parser('calc-dirty-rate',
> + help='Collect and print dirty page statistics from live VM')
> + parser_cdr.add_argument('--calc-time', type=int, default=60,
> + help='Calculation time in seconds')
> + parser_cdr.add_argument('--sample-pages', type=int, default=512,
> + help='Number of sampled pages per one gigabyte of RAM')
> + parser_cdr.add_argument('host', metavar='host', type=str, help='QMP host')
> + parser_cdr.add_argument('port', metavar='port', type=int, help='QMP port')
> +
> + subparsers.add_parser('predict', help='Predict migration time')
> +
> + args = parser.parse_args()
> +
> + if args.command == 'calc-dirty-rate':
> + data = asyncio.run(calc_dirty_rate(host=args.host,
> + port=args.port,
> + calc_time=args.calc_time,
> + sample_pages=args.sample_pages))
> + print(json.dumps(data))
> + elif args.command == 'predict':
> + data = json.load(sys.stdin)
> + model = MemoryModel(data)
> + run_predict_cmd(model)
> +
> +if __name__ == '__main__':
> + main()
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 0/4] Migration time prediction using calc-dirty-rate
2023-04-27 12:42 [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Andrei Gudkov via
` (3 preceding siblings ...)
2023-04-27 12:43 ` [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time Andrei Gudkov via
@ 2023-05-30 15:46 ` Peter Xu
2023-05-31 14:46 ` gudkov.andrei--- via
4 siblings, 1 reply; 22+ messages in thread
From: Peter Xu @ 2023-05-30 15:46 UTC (permalink / raw)
To: Andrei Gudkov; +Cc: qemu-devel, quintela, eblake, armbru, berrange, zhengchuan
Hi, Andrei,
On Thu, Apr 27, 2023 at 03:42:56PM +0300, Andrei Gudkov via wrote:
> Afterwards we tried to migrate VM after randomly selecting max downtime
> and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
> of 5779 experiments failing badly: prediction error >=25% or incorrectly
> predicting migration success when in fact it didn't converge.
What's the normal size of the VMs when you did the measurements?
A major challenge of convergence issues come from huge VMs and I'm
wondering whether those are covered in the prediction verifications.
Thanks,
--
Peter Xu
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 0/4] Migration time prediction using calc-dirty-rate
2023-05-30 15:46 ` [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Peter Xu
@ 2023-05-31 14:46 ` gudkov.andrei--- via
2023-05-31 15:03 ` Peter Xu
0 siblings, 1 reply; 22+ messages in thread
From: gudkov.andrei--- via @ 2023-05-31 14:46 UTC (permalink / raw)
To: Peter Xu; +Cc: qemu-devel, quintela, eblake, armbru, berrange, zhengchuan
On Tue, May 30, 2023 at 11:46:50AM -0400, Peter Xu wrote:
> Hi, Andrei,
>
> On Thu, Apr 27, 2023 at 03:42:56PM +0300, Andrei Gudkov via wrote:
> > Afterwards we tried to migrate VM after randomly selecting max downtime
> > and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
> > of 5779 experiments failing badly: prediction error >=25% or incorrectly
> > predicting migration success when in fact it didn't converge.
>
> What's the normal size of the VMs when you did the measurements?
VM size in all experiments was 32GiB. However, since some of the pages
are zero, the effective VM size was smaller. I checked the value of
precopy-bytes counter after the first migration iteration. Median value
among all experiments is 24.3GiB.
>
> A major challenge of convergence issues come from huge VMs and I'm
> wondering whether those are covered in the prediction verifications.
Hmmm... My understanding is that convergence primarily depends on how
agressive VM dirties pages and not on VM size. Small VM with agressive
writes would be impossible to migrate without throttling. On the contrary,
migration of the huge dormant VM will converge in just single iteration
(although a long one). The only reason I can imagine why large VM size can
negatively affect convergence is due to the following reasoning: larger VM
size => bigger number of vCPUs => more memory writes per second.
Or do you probably mean that during each iteration we perform
KVM_CLEAR_DIRTY_LOG, which is (I suspect) linear in time and can become
bottleneck for large VMs? Anyway, I will conduct experiments with
large VMs.
I think that the easiest way to predict whether VM migration will converge
or not is the following. Run calc-dirty-rate with calc-time equal to
desired downtime. If it reports that the volume of dirtied memory over
calc-time period is larger than you can copy over network in the same time,
then you are out of luck. Alas, at the current moment calc-time accepts
values in units of seconds, while reasonable downtime lies in range 50-300ms.
I am preparing a separate patch that will allow to specify calc-time in
milliseconds. I hope that this approach will be cleaner than an array of
hardcoded values I introduced in my original patch.
>
> Thanks,
>
> --
> Peter Xu
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 0/4] Migration time prediction using calc-dirty-rate
2023-05-31 14:46 ` gudkov.andrei--- via
@ 2023-05-31 15:03 ` Peter Xu
0 siblings, 0 replies; 22+ messages in thread
From: Peter Xu @ 2023-05-31 15:03 UTC (permalink / raw)
To: gudkov.andrei; +Cc: qemu-devel, quintela, eblake, armbru, berrange, zhengchuan
On Wed, May 31, 2023 at 05:46:40PM +0300, gudkov.andrei@huawei.com wrote:
> On Tue, May 30, 2023 at 11:46:50AM -0400, Peter Xu wrote:
> > Hi, Andrei,
> >
> > On Thu, Apr 27, 2023 at 03:42:56PM +0300, Andrei Gudkov via wrote:
> > > Afterwards we tried to migrate VM after randomly selecting max downtime
> > > and bandwidth limit. Typical prediction error is 6-7%, with only 180 out
> > > of 5779 experiments failing badly: prediction error >=25% or incorrectly
> > > predicting migration success when in fact it didn't converge.
> >
> > What's the normal size of the VMs when you did the measurements?
>
> VM size in all experiments was 32GiB. However, since some of the pages
> are zero, the effective VM size was smaller. I checked the value of
> precopy-bytes counter after the first migration iteration. Median value
> among all experiments is 24.3GiB.
>
> >
> > A major challenge of convergence issues come from huge VMs and I'm
> > wondering whether those are covered in the prediction verifications.
>
> Hmmm... My understanding is that convergence primarily depends on how
> agressive VM dirties pages and not on VM size. Small VM with agressive
> writes would be impossible to migrate without throttling. On the contrary,
> migration of the huge dormant VM will converge in just single iteration
> (although a long one). The only reason I can imagine why large VM size can
> negatively affect convergence is due to the following reasoning: larger VM
> size => bigger number of vCPUs => more memory writes per second.
> Or do you probably mean that during each iteration we perform
> KVM_CLEAR_DIRTY_LOG, which is (I suspect) linear in time and can become
> bottleneck for large VMs?
Partly yes, but not explicitly to CLEAR_LOG, more to the whole process that
may still be relevant to size of guest memory, and I was curious whether it
can keep being accurate even if mem size grows.
I assume huge VM normally should have more cores too, and it's even less
likely to be idle if there's a real customer using it (rather than in labs,
if I'm a huge VM tenant I won't want to make it idle anytime). Then with
more cores there's definitely more chance of having higher dirty rates,
especially with the larger mem pool.
> Anyway, I will conduct experiments with large VMs.
Thanks.
>
> I think that the easiest way to predict whether VM migration will converge
> or not is the following. Run calc-dirty-rate with calc-time equal to
> desired downtime. If it reports that the volume of dirtied memory over
> calc-time period is larger than you can copy over network in the same time,
> then you are out of luck. Alas, at the current moment calc-time accepts
> values in units of seconds, while reasonable downtime lies in range 50-300ms.
> I am preparing a separate patch that will allow to specify calc-time in
> milliseconds. I hope that this approach will be cleaner than an array of
> hardcoded values I introduced in my original patch.
I actually haven't personally gone through the details of the new
interface, but what you said sounds reasonable, and happy to read the new
version.
--
Peter Xu
^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time
2023-05-30 3:21 ` Wang, Lei
@ 2023-06-02 13:06 ` gudkov.andrei--- via
0 siblings, 0 replies; 22+ messages in thread
From: gudkov.andrei--- via @ 2023-06-02 13:06 UTC (permalink / raw)
To: Wang, Lei; +Cc: qemu-devel, quintela, eblake, armbru, berrange, zhengchuan
On Tue, May 30, 2023 at 11:21:38AM +0800, Wang, Lei wrote:
> On 4/27/2023 20:43, Andrei Gudkov via wrote:
> > + deadline constraint.
> > +
> > + :param model: `MemoryModel` object for a given VM
> > + :param bandwidth: Bandwidth available for migration [bytes/s]
> > + :param downtime: Max allowed downtime [milliseconds]
> > + :param deadline: Max total time to migrate VM before timeout [milliseconds]
> > + :return: Predicted migration time [milliseconds] or `None`
> > + if migration process doesn't converge before given deadline
> > + """
> > +
> > + left_zero_pages = model.num_zero_pages()
> > + left_normal_pages = model.num_total_pages() - model.num_zero_pages()
> > + header_size = 8
>
> In the cover letter: "Typical prediction error is 6-7%". I'm wondering if the
> 6-7% is less or more than the real migration time. I think 2 potential factors
> will lead to less estimation time:
>
> 1. Network protocol stack's headers are not counted in, e.g., TCP's header can
> be 20 ~ 60 bytes.
>
> 2. The bandwidth may not be saturated all the time.
Yes, I totally missed network headers in my model!
I do not see any noticeable assymetry in prediction error. Sometimes
prediction overestimates actual migration time, sometimes -- underestimates.
I think that prediction error is caused mainly by VM workload not being
perfectly stable over time.
^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2023-06-02 13:07 UTC | newest]
Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-04-27 12:42 [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Andrei Gudkov via
2023-04-27 12:42 ` [PATCH v2 1/4] migration/calc-dirty-rate: replaced CRC32 with xxHash Andrei Gudkov via
2023-05-10 16:54 ` Juan Quintela
2023-04-27 12:42 ` [PATCH v2 2/4] migration/calc-dirty-rate: detailed stats in sampling mode Andrei Gudkov via
2023-05-10 17:36 ` Juan Quintela
2023-05-12 13:18 ` gudkov.andrei--- via
2023-05-15 8:22 ` Juan Quintela
2023-05-18 14:45 ` gudkov.andrei--- via
2023-05-18 15:13 ` Juan Quintela
2023-05-15 8:23 ` Juan Quintela
2023-05-11 6:14 ` Markus Armbruster
2023-05-12 14:24 ` gudkov.andrei--- via
2023-05-30 3:06 ` Wang, Lei
2023-04-27 12:42 ` [PATCH v2 3/4] migration/calc-dirty-rate: added n-zero-pages metric Andrei Gudkov via
2023-05-10 17:57 ` Juan Quintela
2023-04-27 12:43 ` [PATCH v2 4/4] migration/calc-dirty-rate: tool to predict migration time Andrei Gudkov via
2023-05-10 18:01 ` Juan Quintela
2023-05-30 3:21 ` Wang, Lei
2023-06-02 13:06 ` gudkov.andrei--- via
2023-05-30 15:46 ` [PATCH v2 0/4] Migration time prediction using calc-dirty-rate Peter Xu
2023-05-31 14:46 ` gudkov.andrei--- via
2023-05-31 15:03 ` Peter Xu
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).