* [PATCH V13 0/7] support dirty restraint on vCPU
@ 2022-02-10 16:06 huangy81
[not found] ` <cover.1644508591.git.huangy81@chinatelecom.cn>
2022-02-10 16:12 ` [PATCH V13 0/7] support dirty restraint on vCPU Hyman
0 siblings, 2 replies; 4+ messages in thread
From: huangy81 @ 2022-02-10 16:06 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
Richard Henderson, Markus ArmBruster, Peter Xu,
Dr. David Alan Gilbert, Paolo Bonzini,
Philippe Mathieu-Daudé
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
v13
- rebase on master
- passing NULL to kvm_dirty_ring_reap in commit
"refactor per-vcpu dirty ring reaping" to keep the logic unchanged.
In other word, we still try the best to reap as much PFNs as possible
if dirtylimit not in service.
- move the cpu list gen id changes into a separate patch.
- release the lock before sleep during dirty page rate calculation.
- move the dirty ring size fetch logic into a separate patch.
- drop the DIRTYLIMIT_LINEAR_ADJUSTMENT_WATERMARK MACRO .
- substitute bh with function pointer when implement dirtylimit.
- merge the dirtylimit_start/stop into dirtylimit_change.
- fix "cpu-index" parameter type with "int" to keep consistency.
- fix some syntax error in documents.
Please review. Thanks,
Yong
v12
- rebase on master
- add a new commmit to refactor per-vcpu dirty ring reaping, which can resolve
the "vcpu miss the chances to sleep" problem
- remove the dirtylimit_thread and implemtment throttle in bottom half instead.
- let the dirty ring reaper thread keep sleeping when dirtylimit is in service
- introduce cpu_list_generation_id to identify cpu_list changing.
- keep taking the cpu_list_lock during dirty_stat_wait to prevent vcpu plug/unplug
when calculating the dirty page rate
- move the dirtylimit global initializations out of dirtylimit_set_vcpu and do
some code clean
- add DIRTYLIMIT_LINEAR_ADJUSTMENT_WATERMARK in case of oscillation when throttling
- remove the unmatched count field in dirtylimit_state
- add stub to fix build on non-x86
- refactor the documents
Thanks Peter and Markus for reviewing the previous versions, please review.
Thanks,
Yong
v11
- rebase on master
- add a commit " refactor dirty page rate calculation" so that dirty page rate limit
can reuse the calculation logic.
- handle the cpu hotplug/unplug case in the dirty page rate calculation logic.
- modify the qmp commands according to Markus's advice.
- introduce a standalone file dirtylimit.c to implement dirty page rate limit
- check if dirty limit in service by dirtylimit_state pointer instead of global variable
- introduce dirtylimit_mutex to protect dirtylimit_state
- do some code clean and docs
See the commit for more detail, thanks Markus and Peter very mush for the code
review and give the experienced and insightful advices, most modifications are
based on these advices.
v10:
- rebase on master
- make the following modifications on patch [1/3]:
1. Make "dirtylimit-calc" thread joinable and join it after quitting.
2. Add finalize function to free dirtylimit_calc_state
3. Do some code clean work
- make the following modifications on patch [2/3]:
1. Remove the original implementation of throttle according to
Peter's advice.
2. Introduce a negative feedback system and implement the throttle
on all vcpu in one thread named "dirtylimit".
3. Simplify the algo when calculation the throttle_us_per_full:
increase/decrease linearly when there exists a wide difference
between quota and current dirty page rate, increase/decrease
a fixed time slice when the difference is narrow. This makes
throttle responds faster and reach the quota smoothly.
4. Introduce a unfit_cnt in algo to make sure throttle really
takes effect.
5. Set the max sleep time 99 times more than "ring_full_time_us".
6. Make "dirtylimit" thread joinable and join it after quitting.
- make the following modifications on patch [3/3]:
1. Remove the unplug cpu handling logic.
2. "query-vcpu-dirty-limit" only return dirtylimit information of
vcpus that enable dirtylimit
3. Remove the "dirtylimit_setup" function
4. Trigger the dirtylimit and initialize the global state only
when someone enable dirtylimit, and finalize it after the last
dirtylimit be canceled.
5. Redefine the qmp command vcpu-dirty-limit/query-vcpu-dirty-limit:
enable/disable dirtylimit use a single command "vcpu-dirty-limit",
to enable/disabled dirtylimit on specified vcpu only if "cpu-index"
is specified, otherwise, all vcpu will be affected.
6. Redefine the hmp command vcpu_dirty_limit/info vcpu_dirty_limit
- other points about the code review:
1. "merge the code of calculation dirty page rate"
I think maybe it's not suitable to touch the 'calc-dirty-rate',
because 'calc-dirty-rate' will stop sync log after calculating
the dirtyrate and the 'dirtylimit-cal' will not untill the last
dirtylimit be canceled, if we merge the GLOBAL_DIRTY_LIMIT into
GLOBAL_DIRTY_DIRTYRATE, the two are interacted with each other.
2. The new implementaion of throttle algo enlightened by Peter
responds faster and consume less cpu resource than the older,
we make a impressed progress.
And there is a viewpoint may be discussed, it is that the new
throttle logic is "passive", vcpu sleeps only after dirty ring,
is full, unlike the "auto-converge" which will kick vcpu instead
in a fixed slice time. If the vcpu is memory-write intensive
and the ring size is large, it will produce dirty memory during
the dirty ring full time and the throttle works not so good, it
means the throttle depends on the dirty ring size.
I actually tested the new algo in two case:
case 1: dirty-ring-size: 4096, dirtyrate: 1170MB/s
result: minimum quota dirtyrate is 25MB/s or even less
minimum vcpu util is 6%
case 2: dirty-ring-size: 65536, dirtyrate: 1170MB/s
result: minimum quota dirtyrate is 256MB/s
minimum vcpu util is 24%
I post this just for discussion, i think this is not a big deal
beacase if we set the dirty-ring-size to the maximum value(65536),
we assume the server's bandwidth is capable of handling it.
3. I hard-code the acceptable deviation value to 25MB/s, see the
macro DIRTYLIMIT_TOLERANCE_RANGE. I'm struggling to decide
whether to let it configurable
4. Another point is the unplug cpu handle, current algo affects the
unplugged vcpu, if we set dirty limit on it, we should fork 2
thread "dirtylimit" and "dirtylimit-calc" but do nothing, once the
vcpu is hot-plugged, dirty limit works, i think the logic is ok
but still there can be different advice.
- to let developers play with it easier, i post the hmp usage example:
(qemu) vcpu_dirty_limit -g on -1 500
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) info vcpu_dirty_limit
vcpu[0], limit rate 500 (MB/s), current rate 415 (MB/s)
vcpu[1], limit rate 500 (MB/s), current rate 496 (MB/s)
vcpu[2], limit rate 500 (MB/s), current rate 0 (MB/s)
vcpu[3], limit rate 500 (MB/s), current rate 0 (MB/s)
(qemu) vcpu_dirty_limit -g off
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) info vcpu_dirty_limit
Dirty page limit not enabled!
(qemu) vcpu_dirty_limit on 0 300
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) vcpu_dirty_limit on 1 500
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) info vcpu_dirty_limit
vcpu[0], limit rate 300 (MB/s), current rate 342 (MB/s)
vcpu[1], limit rate 500 (MB/s), current rate 485 (MB/s)
(qemu) vcpu_dirty_limit off 0
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) info vcpu_dirty_limit
vcpu[1], limit rate 500 (MB/s), current rate 528 (MB/s)
(qemu) vcpu_dirty_limit off 1
[Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
(qemu) info vcpu_dirty_limit
Dirty page limit not enabled!
Thanks very much for the instructive algo suggestion given by Peter,
the comment and other code reviews made by Markus.
Please review, thanks!
v9:
- rebase on master
- fix the meson directory change, keep it untouched.
v8:
- rebase on master
- polish the error message and remove the "unlikely" compilation syntax
according to the advice given by Markus.
- keep the dirty tracking enabled during "dirtylimit-calc" lifecycle
so that the overhead can be reduced according to the advice given by
Peter.
- merge the "set/cancel" qmp commands into one named "vcpu-dirty-limit"
and introduce qmp command "query-vcpu-dirty-limit" to query dirty
limit information about virtual CPU, according to the advice given by
Peter.
- check if vcpu index is valid and handle the unplug case before
enabling, disabling dirty limit for virtual CPU.
- introduce hmp commands so developers can play with them easier, use
"vcpu_dirty_limit" to enable dirty limit and "info vcpu_dirty_limit"
to query.
The patch [2/3] has not been touched so far. Any corrections and
suggetions are welcome.
Please review, thanks!
v7:
- rebase on master
- polish the comments and error message according to the
advices given by Markus
- introduce dirtylimit_enabled function to pre-check if dirty
page limit is enabled before canceling.
v6:
- rebase on master
- fix dirtylimit setup crash found by Markus
- polish the comments according to the advice given by Markus
- adjust the qemu qmp command tag to 7.0
v5:
- rebase on master
- adjust the throttle algorithm by removing the tuning in
RESTRAINT_RATIO case so that dirty page rate could reachs the quota
more quickly.
- fix percentage update in throttle iteration.
v4:
- rebase on master
- modify the following points according to the advice given by Markus
1. move the defination into migration.json
2. polish the comments of set-dirty-limit
3. do the syntax check and change dirty rate to dirty page rate
Thanks for the carefule reviews made by Markus.
Please review, thanks!
v3:
- rebase on master
- modify the following points according to the advice given by Markus
1. remove the DirtyRateQuotaVcpu and use its field as option directly
2. add comments to show details of what dirtylimit setup do
3. explain how to use dirtylimit in combination with existing qmp
commands "calc-dirty-rate" and "query-dirty-rate" in documentation.
Thanks for the carefule reviews made by Markus.
Please review, thanks!
Hyman
v2:
- rebase on master
- modify the following points according to the advices given by Juan
1. rename dirtyrestraint to dirtylimit
2. implement the full lifecyle function of dirtylimit_calc, include
dirtylimit_calc and dirtylimit_calc_quit
3. introduce 'quit' field in dirtylimit_calc_state to implement the
dirtylimit_calc_quit
4. remove the ready_cond and ready_mtx since it may not be suitable
5. put the 'record_dirtypage' function code at the beggining of the
file
6. remove the unnecesary return;
- other modifications has been made after code review
1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the
number of running thread forked by dirtylimit
2. stop the dirtyrate calculation thread if all the dirtylimit thread
are stopped
3. do some renaming works
dirtyrate calulation thread -> dirtylimit-calc
dirtylimit thread -> dirtylimit-{cpu_index}
function name do_dirtyrestraint -> dirtylimit_check
qmp command dirty-restraint -> set-drity-limit
qmp command dirty-restraint-cancel -> cancel-dirty-limit
header file dirtyrestraint.h -> dirtylimit.h
Please review, thanks !
thanks for the accurate and timely advices given by Juan. we really
appreciate it if corrections and suggetions about this patchset are
proposed.
Best Regards !
Hyman
v1:
this patchset introduce a mechanism to impose dirty restraint
on vCPU, aiming to keep the vCPU running in a certain dirtyrate
given by user. dirty restraint on vCPU maybe an alternative
method to implement convergence logic for live migration,
which could improve guest memory performance during migration
compared with traditional method in theory.
For the current live migration implementation, the convergence
logic throttles all vCPUs of the VM, which has some side effects.
-'read processes' on vCPU will be unnecessarily penalized
- throttle increase percentage step by step, which seems
struggling to find the optimal throttle percentage when
dirtyrate is high.
- hard to predict the remaining time of migration if the
throttling percentage reachs 99%
to a certain extent, the dirty restraint machnism can fix these
effects by throttling at vCPU granularity during migration.
the implementation is rather straightforward, we calculate
vCPU dirtyrate via the Dirty Ring mechanism periodically
as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
does, for vCPU that be specified to impose dirty restraint,
we throttle it periodically as the auto-converge does, once after
throttling, we compare the quota dirtyrate with current dirtyrate,
if current dirtyrate is not under the quota, increase the throttling
percentage until current dirtyrate is under the quota.
this patchset is the basis of implmenting a new auto-converge method
for live migration, we introduce two qmp commands for impose/cancel
the dirty restraint on specified vCPU, so it also can be an independent
api to supply the upper app such as libvirt, which can use it to
implement the convergence logic during live migration, supplemented
with the qmp 'calc-dirty-rate' command or whatever.
we post this patchset for RFC and any corrections and suggetions about
the implementation, api, throttleing algorithm or whatever are very
appreciated!
Please review, thanks !
Best Regards !
Hyman Huang (7):
accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping
cpus: Introduce cpu_list_generation_id
migration/dirtyrate: Refactor dirty page rate calculation
softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically
accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function
softmmu/dirtylimit: Implement virtual CPU throttle
softmmu/dirtylimit: Implement dirty page rate limit
accel/kvm/kvm-all.c | 40 ++-
accel/stubs/kvm-stub.c | 5 +
cpus-common.c | 8 +
hmp-commands-info.hx | 13 +
hmp-commands.hx | 32 +++
include/exec/cpu-common.h | 1 +
include/exec/memory.h | 5 +-
include/hw/core/cpu.h | 6 +
include/monitor/hmp.h | 3 +
include/sysemu/dirtylimit.h | 37 +++
include/sysemu/dirtyrate.h | 29 +++
include/sysemu/kvm.h | 2 +
migration/dirtyrate.c | 222 ++++++++++------
migration/dirtyrate.h | 7 +-
qapi/migration.json | 80 ++++++
softmmu/dirtylimit.c | 619 ++++++++++++++++++++++++++++++++++++++++++++
softmmu/meson.build | 1 +
softmmu/trace-events | 8 +
18 files changed, 1022 insertions(+), 96 deletions(-)
create mode 100644 include/sysemu/dirtylimit.h
create mode 100644 include/sysemu/dirtyrate.h
create mode 100644 softmmu/dirtylimit.c
--
1.8.3.1
^ permalink raw reply [flat|nested] 4+ messages in thread
* [PATCH V13 1/7] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping
[not found] ` <cover.1644508591.git.huangy81@chinatelecom.cn>
@ 2022-02-10 16:06 ` huangy81
2022-02-10 16:06 ` [PATCH V13 2/7] cpus: Introduce cpu_list_generation_id huangy81
1 sibling, 0 replies; 4+ messages in thread
From: huangy81 @ 2022-02-10 16:06 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
Richard Henderson, Markus ArmBruster, Peter Xu,
Dr. David Alan Gilbert, Paolo Bonzini,
Philippe Mathieu-Daudé
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Add a non-required argument 'CPUState' to kvm_dirty_ring_reap so
that it can cover single vcpu dirty-ring-reaping scenario.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
accel/kvm/kvm-all.c | 23 +++++++++++++----------
1 file changed, 13 insertions(+), 10 deletions(-)
diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 0e66ebb..7b06b8a 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -756,17 +756,20 @@ static uint32_t kvm_dirty_ring_reap_one(KVMState *s, CPUState *cpu)
}
/* Must be with slots_lock held */
-static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
+static uint64_t kvm_dirty_ring_reap_locked(KVMState *s, CPUState* cpu)
{
int ret;
- CPUState *cpu;
uint64_t total = 0;
int64_t stamp;
stamp = get_clock();
- CPU_FOREACH(cpu) {
- total += kvm_dirty_ring_reap_one(s, cpu);
+ if (cpu) {
+ total = kvm_dirty_ring_reap_one(s, cpu);
+ } else {
+ CPU_FOREACH(cpu) {
+ total += kvm_dirty_ring_reap_one(s, cpu);
+ }
}
if (total) {
@@ -787,7 +790,7 @@ static uint64_t kvm_dirty_ring_reap_locked(KVMState *s)
* Currently for simplicity, we must hold BQL before calling this. We can
* consider to drop the BQL if we're clear with all the race conditions.
*/
-static uint64_t kvm_dirty_ring_reap(KVMState *s)
+static uint64_t kvm_dirty_ring_reap(KVMState *s, CPUState *cpu)
{
uint64_t total;
@@ -807,7 +810,7 @@ static uint64_t kvm_dirty_ring_reap(KVMState *s)
* reset below.
*/
kvm_slots_lock();
- total = kvm_dirty_ring_reap_locked(s);
+ total = kvm_dirty_ring_reap_locked(s, cpu);
kvm_slots_unlock();
return total;
@@ -854,7 +857,7 @@ static void kvm_dirty_ring_flush(void)
* vcpus out in a synchronous way.
*/
kvm_cpu_synchronize_kick_all();
- kvm_dirty_ring_reap(kvm_state);
+ kvm_dirty_ring_reap(kvm_state, NULL);
trace_kvm_dirty_ring_flush(1);
}
@@ -1398,7 +1401,7 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
* Not easy. Let's cross the fingers until it's fixed.
*/
if (kvm_state->kvm_dirty_ring_size) {
- kvm_dirty_ring_reap_locked(kvm_state);
+ kvm_dirty_ring_reap_locked(kvm_state, NULL);
} else {
kvm_slot_get_dirty_log(kvm_state, mem);
}
@@ -1470,7 +1473,7 @@ static void *kvm_dirty_ring_reaper_thread(void *data)
r->reaper_state = KVM_DIRTY_RING_REAPER_REAPING;
qemu_mutex_lock_iothread();
- kvm_dirty_ring_reap(s);
+ kvm_dirty_ring_reap(s, NULL);
qemu_mutex_unlock_iothread();
r->reaper_iteration++;
@@ -2956,7 +2959,7 @@ int kvm_cpu_exec(CPUState *cpu)
*/
trace_kvm_dirty_ring_full(cpu->cpu_index);
qemu_mutex_lock_iothread();
- kvm_dirty_ring_reap(kvm_state);
+ kvm_dirty_ring_reap(kvm_state, NULL);
qemu_mutex_unlock_iothread();
ret = 0;
break;
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* [PATCH V13 2/7] cpus: Introduce cpu_list_generation_id
[not found] ` <cover.1644508591.git.huangy81@chinatelecom.cn>
2022-02-10 16:06 ` [PATCH V13 1/7] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping huangy81
@ 2022-02-10 16:06 ` huangy81
1 sibling, 0 replies; 4+ messages in thread
From: huangy81 @ 2022-02-10 16:06 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, David Hildenbrand, Hyman, Juan Quintela,
Richard Henderson, Markus ArmBruster, Peter Xu,
Dr. David Alan Gilbert, Paolo Bonzini,
Philippe Mathieu-Daudé
From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
Introduce cpu_list_generation_id to track cpu list generation so
that cpu hotplug/unplug can be detected during measurement of
dirty page rate.
Signed-off-by: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
---
cpus-common.c | 8 ++++++++
include/exec/cpu-common.h | 1 +
2 files changed, 9 insertions(+)
diff --git a/cpus-common.c b/cpus-common.c
index 6e73d3e..31c6415 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -73,6 +73,12 @@ static int cpu_get_free_index(void)
}
CPUTailQ cpus = QTAILQ_HEAD_INITIALIZER(cpus);
+static unsigned int cpu_list_generation_id;
+
+unsigned int cpu_list_generation_id_get(void)
+{
+ return cpu_list_generation_id;
+}
void cpu_list_add(CPUState *cpu)
{
@@ -84,6 +90,7 @@ void cpu_list_add(CPUState *cpu)
assert(!cpu_index_auto_assigned);
}
QTAILQ_INSERT_TAIL_RCU(&cpus, cpu, node);
+ cpu_list_generation_id++;
}
void cpu_list_remove(CPUState *cpu)
@@ -96,6 +103,7 @@ void cpu_list_remove(CPUState *cpu)
QTAILQ_REMOVE_RCU(&cpus, cpu, node);
cpu->cpu_index = UNASSIGNED_CPU_INDEX;
+ cpu_list_generation_id++;
}
CPUState *qemu_get_cpu(int index)
diff --git a/include/exec/cpu-common.h b/include/exec/cpu-common.h
index de5f444..eb33642 100644
--- a/include/exec/cpu-common.h
+++ b/include/exec/cpu-common.h
@@ -20,6 +20,7 @@ extern intptr_t qemu_host_page_mask;
void qemu_init_cpu_list(void);
void cpu_list_lock(void);
void cpu_list_unlock(void);
+unsigned int cpu_list_generation_id_get(void);
void tcg_flush_softmmu_tlb(CPUState *cs);
--
1.8.3.1
^ permalink raw reply related [flat|nested] 4+ messages in thread
* Re: [PATCH V13 0/7] support dirty restraint on vCPU
2022-02-10 16:06 [PATCH V13 0/7] support dirty restraint on vCPU huangy81
[not found] ` <cover.1644508591.git.huangy81@chinatelecom.cn>
@ 2022-02-10 16:12 ` Hyman
1 sibling, 0 replies; 4+ messages in thread
From: Hyman @ 2022-02-10 16:12 UTC (permalink / raw)
To: qemu-devel
Cc: Eduardo Habkost, David Hildenbrand, Juan Quintela,
Richard Henderson, Markus ArmBruster, Peter Xu,
Dr. David Alan Gilbert, Paolo Bonzini,
Philippe Mathieu-Daudé
"Sent by accident, please ignore, I'll send v14 when ready."
在 2022/2/11 0:06, huangy81@chinatelecom.cn 写道:
> From: Hyman Huang(黄勇) <huangy81@chinatelecom.cn>
>
> v13
> - rebase on master
> - passing NULL to kvm_dirty_ring_reap in commit
> "refactor per-vcpu dirty ring reaping" to keep the logic unchanged.
> In other word, we still try the best to reap as much PFNs as possible
> if dirtylimit not in service.
> - move the cpu list gen id changes into a separate patch.
> - release the lock before sleep during dirty page rate calculation.
> - move the dirty ring size fetch logic into a separate patch.
> - drop the DIRTYLIMIT_LINEAR_ADJUSTMENT_WATERMARK MACRO .
> - substitute bh with function pointer when implement dirtylimit.
> - merge the dirtylimit_start/stop into dirtylimit_change.
> - fix "cpu-index" parameter type with "int" to keep consistency.
> - fix some syntax error in documents.
>
> Please review. Thanks,
>
> Yong
>
> v12
> - rebase on master
> - add a new commmit to refactor per-vcpu dirty ring reaping, which can resolve
> the "vcpu miss the chances to sleep" problem
> - remove the dirtylimit_thread and implemtment throttle in bottom half instead.
> - let the dirty ring reaper thread keep sleeping when dirtylimit is in service
> - introduce cpu_list_generation_id to identify cpu_list changing.
> - keep taking the cpu_list_lock during dirty_stat_wait to prevent vcpu plug/unplug
> when calculating the dirty page rate
> - move the dirtylimit global initializations out of dirtylimit_set_vcpu and do
> some code clean
> - add DIRTYLIMIT_LINEAR_ADJUSTMENT_WATERMARK in case of oscillation when throttling
> - remove the unmatched count field in dirtylimit_state
> - add stub to fix build on non-x86
> - refactor the documents
>
> Thanks Peter and Markus for reviewing the previous versions, please review.
>
> Thanks,
> Yong
>
> v11
> - rebase on master
> - add a commit " refactor dirty page rate calculation" so that dirty page rate limit
> can reuse the calculation logic.
> - handle the cpu hotplug/unplug case in the dirty page rate calculation logic.
> - modify the qmp commands according to Markus's advice.
> - introduce a standalone file dirtylimit.c to implement dirty page rate limit
> - check if dirty limit in service by dirtylimit_state pointer instead of global variable
> - introduce dirtylimit_mutex to protect dirtylimit_state
> - do some code clean and docs
>
> See the commit for more detail, thanks Markus and Peter very mush for the code
> review and give the experienced and insightful advices, most modifications are
> based on these advices.
>
> v10:
> - rebase on master
> - make the following modifications on patch [1/3]:
> 1. Make "dirtylimit-calc" thread joinable and join it after quitting.
>
> 2. Add finalize function to free dirtylimit_calc_state
>
> 3. Do some code clean work
>
> - make the following modifications on patch [2/3]:
> 1. Remove the original implementation of throttle according to
> Peter's advice.
>
> 2. Introduce a negative feedback system and implement the throttle
> on all vcpu in one thread named "dirtylimit".
>
> 3. Simplify the algo when calculation the throttle_us_per_full:
> increase/decrease linearly when there exists a wide difference
> between quota and current dirty page rate, increase/decrease
> a fixed time slice when the difference is narrow. This makes
> throttle responds faster and reach the quota smoothly.
>
> 4. Introduce a unfit_cnt in algo to make sure throttle really
> takes effect.
>
> 5. Set the max sleep time 99 times more than "ring_full_time_us".
>
> 6. Make "dirtylimit" thread joinable and join it after quitting.
>
> - make the following modifications on patch [3/3]:
> 1. Remove the unplug cpu handling logic.
>
> 2. "query-vcpu-dirty-limit" only return dirtylimit information of
> vcpus that enable dirtylimit
>
> 3. Remove the "dirtylimit_setup" function
>
> 4. Trigger the dirtylimit and initialize the global state only
> when someone enable dirtylimit, and finalize it after the last
> dirtylimit be canceled.
>
> 5. Redefine the qmp command vcpu-dirty-limit/query-vcpu-dirty-limit:
> enable/disable dirtylimit use a single command "vcpu-dirty-limit",
> to enable/disabled dirtylimit on specified vcpu only if "cpu-index"
> is specified, otherwise, all vcpu will be affected.
>
> 6. Redefine the hmp command vcpu_dirty_limit/info vcpu_dirty_limit
>
> - other points about the code review:
> 1. "merge the code of calculation dirty page rate"
> I think maybe it's not suitable to touch the 'calc-dirty-rate',
> because 'calc-dirty-rate' will stop sync log after calculating
> the dirtyrate and the 'dirtylimit-cal' will not untill the last
> dirtylimit be canceled, if we merge the GLOBAL_DIRTY_LIMIT into
> GLOBAL_DIRTY_DIRTYRATE, the two are interacted with each other.
>
> 2. The new implementaion of throttle algo enlightened by Peter
> responds faster and consume less cpu resource than the older,
> we make a impressed progress.
>
> And there is a viewpoint may be discussed, it is that the new
> throttle logic is "passive", vcpu sleeps only after dirty ring,
> is full, unlike the "auto-converge" which will kick vcpu instead
> in a fixed slice time. If the vcpu is memory-write intensive
> and the ring size is large, it will produce dirty memory during
> the dirty ring full time and the throttle works not so good, it
> means the throttle depends on the dirty ring size.
>
> I actually tested the new algo in two case:
>
> case 1: dirty-ring-size: 4096, dirtyrate: 1170MB/s
> result: minimum quota dirtyrate is 25MB/s or even less
> minimum vcpu util is 6%
>
> case 2: dirty-ring-size: 65536, dirtyrate: 1170MB/s
> result: minimum quota dirtyrate is 256MB/s
> minimum vcpu util is 24%
>
> I post this just for discussion, i think this is not a big deal
> beacase if we set the dirty-ring-size to the maximum value(65536),
> we assume the server's bandwidth is capable of handling it.
>
> 3. I hard-code the acceptable deviation value to 25MB/s, see the
> macro DIRTYLIMIT_TOLERANCE_RANGE. I'm struggling to decide
> whether to let it configurable
>
> 4. Another point is the unplug cpu handle, current algo affects the
> unplugged vcpu, if we set dirty limit on it, we should fork 2
> thread "dirtylimit" and "dirtylimit-calc" but do nothing, once the
> vcpu is hot-plugged, dirty limit works, i think the logic is ok
> but still there can be different advice.
>
> - to let developers play with it easier, i post the hmp usage example:
> (qemu) vcpu_dirty_limit -g on -1 500
> [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
>
> (qemu) info vcpu_dirty_limit
> vcpu[0], limit rate 500 (MB/s), current rate 415 (MB/s)
> vcpu[1], limit rate 500 (MB/s), current rate 496 (MB/s)
> vcpu[2], limit rate 500 (MB/s), current rate 0 (MB/s)
> vcpu[3], limit rate 500 (MB/s), current rate 0 (MB/s)
> (qemu) vcpu_dirty_limit -g off
> [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
>
> (qemu) info vcpu_dirty_limit
> Dirty page limit not enabled!
>
> (qemu) vcpu_dirty_limit on 0 300
> [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
>
> (qemu) vcpu_dirty_limit on 1 500
> [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
>
> (qemu) info vcpu_dirty_limit
> vcpu[0], limit rate 300 (MB/s), current rate 342 (MB/s)
> vcpu[1], limit rate 500 (MB/s), current rate 485 (MB/s)
>
> (qemu) vcpu_dirty_limit off 0
> [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
>
> (qemu) info vcpu_dirty_limit
> vcpu[1], limit rate 500 (MB/s), current rate 528 (MB/s)
>
> (qemu) vcpu_dirty_limit off 1
> [Please use 'info vcpu_dirty_limit' to query dirty limit for virtual CPU]
>
> (qemu) info vcpu_dirty_limit
> Dirty page limit not enabled!
>
> Thanks very much for the instructive algo suggestion given by Peter,
> the comment and other code reviews made by Markus.
>
> Please review, thanks!
>
> v9:
> - rebase on master
> - fix the meson directory change, keep it untouched.
>
> v8:
> - rebase on master
> - polish the error message and remove the "unlikely" compilation syntax
> according to the advice given by Markus.
> - keep the dirty tracking enabled during "dirtylimit-calc" lifecycle
> so that the overhead can be reduced according to the advice given by
> Peter.
> - merge the "set/cancel" qmp commands into one named "vcpu-dirty-limit"
> and introduce qmp command "query-vcpu-dirty-limit" to query dirty
> limit information about virtual CPU, according to the advice given by
> Peter.
> - check if vcpu index is valid and handle the unplug case before
> enabling, disabling dirty limit for virtual CPU.
> - introduce hmp commands so developers can play with them easier, use
> "vcpu_dirty_limit" to enable dirty limit and "info vcpu_dirty_limit"
> to query.
>
> The patch [2/3] has not been touched so far. Any corrections and
> suggetions are welcome.
>
> Please review, thanks!
>
> v7:
> - rebase on master
> - polish the comments and error message according to the
> advices given by Markus
> - introduce dirtylimit_enabled function to pre-check if dirty
> page limit is enabled before canceling.
>
> v6:
> - rebase on master
> - fix dirtylimit setup crash found by Markus
> - polish the comments according to the advice given by Markus
> - adjust the qemu qmp command tag to 7.0
>
> v5:
> - rebase on master
> - adjust the throttle algorithm by removing the tuning in
> RESTRAINT_RATIO case so that dirty page rate could reachs the quota
> more quickly.
> - fix percentage update in throttle iteration.
>
> v4:
> - rebase on master
> - modify the following points according to the advice given by Markus
> 1. move the defination into migration.json
> 2. polish the comments of set-dirty-limit
> 3. do the syntax check and change dirty rate to dirty page rate
>
> Thanks for the carefule reviews made by Markus.
>
> Please review, thanks!
>
> v3:
> - rebase on master
> - modify the following points according to the advice given by Markus
> 1. remove the DirtyRateQuotaVcpu and use its field as option directly
> 2. add comments to show details of what dirtylimit setup do
> 3. explain how to use dirtylimit in combination with existing qmp
> commands "calc-dirty-rate" and "query-dirty-rate" in documentation.
>
> Thanks for the carefule reviews made by Markus.
>
> Please review, thanks!
>
> Hyman
>
> v2:
> - rebase on master
> - modify the following points according to the advices given by Juan
> 1. rename dirtyrestraint to dirtylimit
> 2. implement the full lifecyle function of dirtylimit_calc, include
> dirtylimit_calc and dirtylimit_calc_quit
> 3. introduce 'quit' field in dirtylimit_calc_state to implement the
> dirtylimit_calc_quit
> 4. remove the ready_cond and ready_mtx since it may not be suitable
> 5. put the 'record_dirtypage' function code at the beggining of the
> file
> 6. remove the unnecesary return;
> - other modifications has been made after code review
> 1. introduce 'bmap' and 'nr' field in dirtylimit_state to record the
> number of running thread forked by dirtylimit
> 2. stop the dirtyrate calculation thread if all the dirtylimit thread
> are stopped
> 3. do some renaming works
> dirtyrate calulation thread -> dirtylimit-calc
> dirtylimit thread -> dirtylimit-{cpu_index}
> function name do_dirtyrestraint -> dirtylimit_check
> qmp command dirty-restraint -> set-drity-limit
> qmp command dirty-restraint-cancel -> cancel-dirty-limit
> header file dirtyrestraint.h -> dirtylimit.h
>
> Please review, thanks !
>
> thanks for the accurate and timely advices given by Juan. we really
> appreciate it if corrections and suggetions about this patchset are
> proposed.
>
> Best Regards !
>
> Hyman
>
> v1:
> this patchset introduce a mechanism to impose dirty restraint
> on vCPU, aiming to keep the vCPU running in a certain dirtyrate
> given by user. dirty restraint on vCPU maybe an alternative
> method to implement convergence logic for live migration,
> which could improve guest memory performance during migration
> compared with traditional method in theory.
>
> For the current live migration implementation, the convergence
> logic throttles all vCPUs of the VM, which has some side effects.
> -'read processes' on vCPU will be unnecessarily penalized
> - throttle increase percentage step by step, which seems
> struggling to find the optimal throttle percentage when
> dirtyrate is high.
> - hard to predict the remaining time of migration if the
> throttling percentage reachs 99%
>
> to a certain extent, the dirty restraint machnism can fix these
> effects by throttling at vCPU granularity during migration.
>
> the implementation is rather straightforward, we calculate
> vCPU dirtyrate via the Dirty Ring mechanism periodically
> as the commit 0e21bf246 "implement dirty-ring dirtyrate calculation"
> does, for vCPU that be specified to impose dirty restraint,
> we throttle it periodically as the auto-converge does, once after
> throttling, we compare the quota dirtyrate with current dirtyrate,
> if current dirtyrate is not under the quota, increase the throttling
> percentage until current dirtyrate is under the quota.
>
> this patchset is the basis of implmenting a new auto-converge method
> for live migration, we introduce two qmp commands for impose/cancel
> the dirty restraint on specified vCPU, so it also can be an independent
> api to supply the upper app such as libvirt, which can use it to
> implement the convergence logic during live migration, supplemented
> with the qmp 'calc-dirty-rate' command or whatever.
>
> we post this patchset for RFC and any corrections and suggetions about
> the implementation, api, throttleing algorithm or whatever are very
> appreciated!
>
> Please review, thanks !
>
> Best Regards !
>
> Hyman Huang (7):
> accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping
> cpus: Introduce cpu_list_generation_id
> migration/dirtyrate: Refactor dirty page rate calculation
> softmmu/dirtylimit: Implement vCPU dirtyrate calculation periodically
> accel/kvm/kvm-all: Introduce kvm_dirty_ring_size function
> softmmu/dirtylimit: Implement virtual CPU throttle
> softmmu/dirtylimit: Implement dirty page rate limit
>
> accel/kvm/kvm-all.c | 40 ++-
> accel/stubs/kvm-stub.c | 5 +
> cpus-common.c | 8 +
> hmp-commands-info.hx | 13 +
> hmp-commands.hx | 32 +++
> include/exec/cpu-common.h | 1 +
> include/exec/memory.h | 5 +-
> include/hw/core/cpu.h | 6 +
> include/monitor/hmp.h | 3 +
> include/sysemu/dirtylimit.h | 37 +++
> include/sysemu/dirtyrate.h | 29 +++
> include/sysemu/kvm.h | 2 +
> migration/dirtyrate.c | 222 ++++++++++------
> migration/dirtyrate.h | 7 +-
> qapi/migration.json | 80 ++++++
> softmmu/dirtylimit.c | 619 ++++++++++++++++++++++++++++++++++++++++++++
> softmmu/meson.build | 1 +
> softmmu/trace-events | 8 +
> 18 files changed, 1022 insertions(+), 96 deletions(-)
> create mode 100644 include/sysemu/dirtylimit.h
> create mode 100644 include/sysemu/dirtyrate.h
> create mode 100644 softmmu/dirtylimit.c
>
^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2022-02-10 19:23 UTC | newest]
Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-10 16:06 [PATCH V13 0/7] support dirty restraint on vCPU huangy81
[not found] ` <cover.1644508591.git.huangy81@chinatelecom.cn>
2022-02-10 16:06 ` [PATCH V13 1/7] accel/kvm/kvm-all: Refactor per-vcpu dirty ring reaping huangy81
2022-02-10 16:06 ` [PATCH V13 2/7] cpus: Introduce cpu_list_generation_id huangy81
2022-02-10 16:12 ` [PATCH V13 0/7] support dirty restraint on vCPU Hyman
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).