qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
To: "Dr. David Alan Gilbert" <dave@treblig.org>
Cc: "Alex Bennée" <alex.bennee@linaro.org>,
	qemu-devel@nongnu.org, "David Hildenbrand" <david@redhat.com>,
	"Ilya Leoshkevich" <iii@linux.ibm.com>,
	"Daniel Henrique Barboza" <danielhb413@gmail.com>,
	"Marcelo Tosatti" <mtosatti@redhat.com>,
	"Paolo Bonzini" <pbonzini@redhat.com>,
	"Philippe Mathieu-Daudé" <philmd@linaro.org>,
	"Mark Burton" <mburton@qti.qualcomm.com>,
	qemu-s390x@nongnu.org, "Peter Maydell" <peter.maydell@linaro.org>,
	kvm@vger.kernel.org, "Laurent Vivier" <lvivier@redhat.com>,
	"Halil Pasic" <pasic@linux.ibm.com>,
	"Christian Borntraeger" <borntraeger@linux.ibm.com>,
	"Alexandre Iooss" <erdnaxe@crans.org>,
	qemu-arm@nongnu.org, "Alexander Graf" <agraf@csgraf.de>,
	"Nicholas Piggin" <npiggin@gmail.com>,
	"Marco Liebel" <mliebel@qti.qualcomm.com>,
	"Thomas Huth" <thuth@redhat.com>,
	"Roman Bolshakov" <rbolshakov@ddn.com>,
	qemu-ppc@nongnu.org, "Mahmoud Mandour" <ma.mandourr@gmail.com>,
	"Cameron Esfahani" <dirty@apple.com>,
	"Jamie Iles" <quic_jiles@quicinc.com>,
	"Richard Henderson" <richard.henderson@linaro.org>
Subject: Re: [PATCH 9/9] contrib/plugins: add ips plugin example for cost modeling
Date: Mon, 17 Jun 2024 12:19:42 -0700	[thread overview]
Message-ID: <616df287-a167-4a05-8f08-70a78a544929@linaro.org> (raw)
In-Reply-To: <Zmy9g1U1uP1Vhx9N@gallifrey>

On 6/14/24 15:00, Dr. David Alan Gilbert wrote:
> * Pierrick Bouvier (pierrick.bouvier@linaro.org) wrote:
>> Hi Dave,
>>
>> On 6/12/24 14:02, Dr. David Alan Gilbert wrote:
>>> * Alex Bennée (alex.bennee@linaro.org) wrote:
>>>> From: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>>
>>>> This plugin uses the new time control interface to make decisions
>>>> about the state of time during the emulation. The algorithm is
>>>> currently very simple. The user specifies an ips rate which applies
>>>> per core. If the core runs ahead of its allocated execution time the
>>>> plugin sleeps for a bit to let real time catch up. Either way time is
>>>> updated for the emulation as a function of total executed instructions
>>>> with some adjustments for cores that idle.
>>>
>>> A few random thoughts:
>>>     a) Are there any definitions of what a plugin that controls time
>>>        should do with a live migration?
>>
>> It's not something that was considered as part of this work.
> 
> That's OK, the only thing is we need to stop anyone from hitting problems
> when they don't realise it's not been addressed.
> One way might be to add a migration blocker; see include/migration/blocker.h
> then you might print something like 'Migration not available due to plugin ....'
> 

So basically, we could make a call to migrate_add_blocker(), when 
someone request time_control through plugin API?

IMHO, it's something that should be part of plugin API (if any plugin 
calls qemu_plugin_request_time_control()), instead of the plugin code 
itself. This way, any plugin getting time control automatically blocks 
any potential migration.

>>>     b) The sleep in migration/dirtyrate.c points out g_usleep might
>>>        sleep for longer, so reads the actual wall clock time to
>>>        figure out a new 'now'.
>>
>> The current API mentions time starts at 0 from qemu startup. Maybe we could
>> consider in the future to change this behavior to retrieve time from an
>> existing migrated machine.
> 
> Ah, I meant for (b) to be independent of (a) - not related to migration; just
> down to the fact you used g_usleep in the plugin and a g_usleep might sleep
> for a different amount of time than you asked.
> 

We know that, and the plugin is not meant to be "cycle accurate" in 
general, we just set a upper bound for number of instructions we can 
execute in a given amount of time (1/10 second for now).

We compute the new time based on how many instructions effectively ran 
on the most used cpu, so even if we slept a bit more than expected, it's 
correct.

>>>     c) A fun thing to do with this would be to follow an external simulation
>>>        or 2nd qemu, trying to keep the two from running too far past
>>>        each other.
>>>
>>
>> Basically, to slow the first one, waiting for the replicated one to catch
>> up?
> 
> Yes, something like that.
> 
> Dave
> 
>>> Dave >
>>>> Examples
>>>> --------
>>>>
>>>> Slow down execution of /bin/true:
>>>> $ num_insn=$(./build/qemu-x86_64 -plugin ./build/tests/plugin/libinsn.so -d plugin /bin/true |& grep total | sed -e 's/.*: //')
>>>> $ time ./build/qemu-x86_64 -plugin ./build/contrib/plugins/libips.so,ips=$(($num_insn/4)) /bin/true
>>>> real 4.000s
>>>>
>>>> Boot a Linux kernel simulating a 250MHz cpu:
>>>> $ /build/qemu-system-x86_64 -kernel /boot/vmlinuz-6.1.0-21-amd64 -append "console=ttyS0" -plugin ./build/contrib/plugins/libips.so,ips=$((250*1000*1000)) -smp 1 -m 512
>>>> check time until kernel panic on serial0
>>>>
>>>> Tested in system mode by booting a full debian system, and using:
>>>> $ sysbench cpu run
>>>> Performance decrease linearly with the given number of ips.
>>>>
>>>> Signed-off-by: Pierrick Bouvier <pierrick.bouvier@linaro.org>
>>>> Message-Id: <20240530220610.1245424-7-pierrick.bouvier@linaro.org>
>>>> ---
>>>>    contrib/plugins/ips.c    | 164 +++++++++++++++++++++++++++++++++++++++
>>>>    contrib/plugins/Makefile |   1 +
>>>>    2 files changed, 165 insertions(+)
>>>>    create mode 100644 contrib/plugins/ips.c
>>>>
>>>> diff --git a/contrib/plugins/ips.c b/contrib/plugins/ips.c
>>>> new file mode 100644
>>>> index 0000000000..db77729264
>>>> --- /dev/null
>>>> +++ b/contrib/plugins/ips.c
>>>> @@ -0,0 +1,164 @@
>>>> +/*
>>>> + * ips rate limiting plugin.
>>>> + *
>>>> + * This plugin can be used to restrict the execution of a system to a
>>>> + * particular number of Instructions Per Second (ips). This controls
>>>> + * time as seen by the guest so while wall-clock time may be longer
>>>> + * from the guests point of view time will pass at the normal rate.
>>>> + *
>>>> + * This uses the new plugin API which allows the plugin to control
>>>> + * system time.
>>>> + *
>>>> + * Copyright (c) 2023 Linaro Ltd
>>>> + *
>>>> + * SPDX-License-Identifier: GPL-2.0-or-later
>>>> + */
>>>> +
>>>> +#include <stdio.h>
>>>> +#include <glib.h>
>>>> +#include <qemu-plugin.h>
>>>> +
>>>> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
>>>> +
>>>> +/* how many times do we update time per sec */
>>>> +#define NUM_TIME_UPDATE_PER_SEC 10
>>>> +#define NSEC_IN_ONE_SEC (1000 * 1000 * 1000)
>>>> +
>>>> +static GMutex global_state_lock;
>>>> +
>>>> +static uint64_t max_insn_per_second = 1000 * 1000 * 1000; /* ips per core, per second */
>>>> +static uint64_t max_insn_per_quantum; /* trap every N instructions */
>>>> +static int64_t virtual_time_ns; /* last set virtual time */
>>>> +
>>>> +static const void *time_handle;
>>>> +
>>>> +typedef struct {
>>>> +    uint64_t total_insn;
>>>> +    uint64_t quantum_insn; /* insn in last quantum */
>>>> +    int64_t last_quantum_time; /* time when last quantum started */
>>>> +} vCPUTime;
>>>> +
>>>> +struct qemu_plugin_scoreboard *vcpus;
>>>> +
>>>> +/* return epoch time in ns */
>>>> +static int64_t now_ns(void)
>>>> +{
>>>> +    return g_get_real_time() * 1000;
>>>> +}
>>>> +
>>>> +static uint64_t num_insn_during(int64_t elapsed_ns)
>>>> +{
>>>> +    double num_secs = elapsed_ns / (double) NSEC_IN_ONE_SEC;
>>>> +    return num_secs * (double) max_insn_per_second;
>>>> +}
>>>> +
>>>> +static int64_t time_for_insn(uint64_t num_insn)
>>>> +{
>>>> +    double num_secs = (double) num_insn / (double) max_insn_per_second;
>>>> +    return num_secs * (double) NSEC_IN_ONE_SEC;
>>>> +}
>>>> +
>>>> +static void update_system_time(vCPUTime *vcpu)
>>>> +{
>>>> +    int64_t elapsed_ns = now_ns() - vcpu->last_quantum_time;
>>>> +    uint64_t max_insn = num_insn_during(elapsed_ns);
>>>> +
>>>> +    if (vcpu->quantum_insn >= max_insn) {
>>>> +        /* this vcpu ran faster than expected, so it has to sleep */
>>>> +        uint64_t insn_advance = vcpu->quantum_insn - max_insn;
>>>> +        uint64_t time_advance_ns = time_for_insn(insn_advance);
>>>> +        int64_t sleep_us = time_advance_ns / 1000;
>>>> +        g_usleep(sleep_us);
>>>> +    }
>>>> +
>>>> +    vcpu->total_insn += vcpu->quantum_insn;
>>>> +    vcpu->quantum_insn = 0;
>>>> +    vcpu->last_quantum_time = now_ns();
>>>> +
>>>> +    /* based on total number of instructions, what should be the new time? */
>>>> +    int64_t new_virtual_time = time_for_insn(vcpu->total_insn);
>>>> +
>>>> +    g_mutex_lock(&global_state_lock);
>>>> +
>>>> +    /* Time only moves forward. Another vcpu might have updated it already. */
>>>> +    if (new_virtual_time > virtual_time_ns) {
>>>> +        qemu_plugin_update_ns(time_handle, new_virtual_time);
>>>> +        virtual_time_ns = new_virtual_time;
>>>> +    }
>>>> +
>>>> +    g_mutex_unlock(&global_state_lock);
>>>> +}
>>>> +
>>>> +static void vcpu_init(qemu_plugin_id_t id, unsigned int cpu_index)
>>>> +{
>>>> +    vCPUTime *vcpu = qemu_plugin_scoreboard_find(vcpus, cpu_index);
>>>> +    vcpu->total_insn = 0;
>>>> +    vcpu->quantum_insn = 0;
>>>> +    vcpu->last_quantum_time = now_ns();
>>>> +}
>>>> +
>>>> +static void vcpu_exit(qemu_plugin_id_t id, unsigned int cpu_index)
>>>> +{
>>>> +    vCPUTime *vcpu = qemu_plugin_scoreboard_find(vcpus, cpu_index);
>>>> +    update_system_time(vcpu);
>>>> +}
>>>> +
>>>> +static void every_quantum_insn(unsigned int cpu_index, void *udata)
>>>> +{
>>>> +    vCPUTime *vcpu = qemu_plugin_scoreboard_find(vcpus, cpu_index);
>>>> +    g_assert(vcpu->quantum_insn >= max_insn_per_quantum);
>>>> +    update_system_time(vcpu);
>>>> +}
>>>> +
>>>> +static void vcpu_tb_trans(qemu_plugin_id_t id, struct qemu_plugin_tb *tb)
>>>> +{
>>>> +    size_t n_insns = qemu_plugin_tb_n_insns(tb);
>>>> +    qemu_plugin_u64 quantum_insn =
>>>> +        qemu_plugin_scoreboard_u64_in_struct(vcpus, vCPUTime, quantum_insn);
>>>> +    /* count (and eventually trap) once per tb */
>>>> +    qemu_plugin_register_vcpu_tb_exec_inline_per_vcpu(
>>>> +        tb, QEMU_PLUGIN_INLINE_ADD_U64, quantum_insn, n_insns);
>>>> +    qemu_plugin_register_vcpu_tb_exec_cond_cb(
>>>> +        tb, every_quantum_insn,
>>>> +        QEMU_PLUGIN_CB_NO_REGS, QEMU_PLUGIN_COND_GE,
>>>> +        quantum_insn, max_insn_per_quantum, NULL);
>>>> +}
>>>> +
>>>> +static void plugin_exit(qemu_plugin_id_t id, void *udata)
>>>> +{
>>>> +    qemu_plugin_scoreboard_free(vcpus);
>>>> +}
>>>> +
>>>> +QEMU_PLUGIN_EXPORT int qemu_plugin_install(qemu_plugin_id_t id,
>>>> +                                           const qemu_info_t *info, int argc,
>>>> +                                           char **argv)
>>>> +{
>>>> +    for (int i = 0; i < argc; i++) {
>>>> +        char *opt = argv[i];
>>>> +        g_auto(GStrv) tokens = g_strsplit(opt, "=", 2);
>>>> +        if (g_strcmp0(tokens[0], "ips") == 0) {
>>>> +            max_insn_per_second = g_ascii_strtoull(tokens[1], NULL, 10);
>>>> +            if (!max_insn_per_second && errno) {
>>>> +                fprintf(stderr, "%s: couldn't parse %s (%s)\n",
>>>> +                        __func__, tokens[1], g_strerror(errno));
>>>> +                return -1;
>>>> +            }
>>>> +        } else {
>>>> +            fprintf(stderr, "option parsing failed: %s\n", opt);
>>>> +            return -1;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    vcpus = qemu_plugin_scoreboard_new(sizeof(vCPUTime));
>>>> +    max_insn_per_quantum = max_insn_per_second / NUM_TIME_UPDATE_PER_SEC;
>>>> +
>>>> +    time_handle = qemu_plugin_request_time_control();
>>>> +    g_assert(time_handle);
>>>> +
>>>> +    qemu_plugin_register_vcpu_tb_trans_cb(id, vcpu_tb_trans);
>>>> +    qemu_plugin_register_vcpu_init_cb(id, vcpu_init);
>>>> +    qemu_plugin_register_vcpu_exit_cb(id, vcpu_exit);
>>>> +    qemu_plugin_register_atexit_cb(id, plugin_exit, NULL);
>>>> +
>>>> +    return 0;
>>>> +}
>>>> diff --git a/contrib/plugins/Makefile b/contrib/plugins/Makefile
>>>> index 0b64d2c1e3..449ead1130 100644
>>>> --- a/contrib/plugins/Makefile
>>>> +++ b/contrib/plugins/Makefile
>>>> @@ -27,6 +27,7 @@ endif
>>>>    NAMES += hwprofile
>>>>    NAMES += cache
>>>>    NAMES += drcov
>>>> +NAMES += ips
>>>>    ifeq ($(CONFIG_WIN32),y)
>>>>    SO_SUFFIX := .dll
>>>> -- 
>>>> 2.39.2
>>>>

  reply	other threads:[~2024-06-17 19:20 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-12 15:34 [PATCH 0/9] maintainer updates (gdbstub, plugins, time control) Alex Bennée
2024-06-12 15:35 ` [PATCH 1/9] include/exec: add missing include guard comment Alex Bennée
2024-06-12 15:56   ` Pierrick Bouvier
2024-06-18 23:08   ` Richard Henderson
2024-06-12 15:35 ` [PATCH 2/9] gdbstub: move enums into separate header Alex Bennée
2024-06-12 15:57   ` Pierrick Bouvier
2024-06-18 23:09   ` Richard Henderson
2024-06-12 15:35 ` [PATCH 3/9] plugins: Ensure register handles are not NULL Alex Bennée
2024-06-12 15:58   ` Pierrick Bouvier
2024-06-18 23:10   ` Richard Henderson
2024-06-12 15:35 ` [PATCH 4/9] sysemu: add set_virtual_time to accel ops Alex Bennée
2024-06-18 23:12   ` Richard Henderson
2024-06-12 15:35 ` [PATCH 5/9] qtest: use cpu interface in qtest_clock_warp Alex Bennée
2024-06-12 15:35 ` [PATCH 6/9] sysemu: generalise qtest_warp_clock as qemu_clock_advance_virtual_time Alex Bennée
2024-06-12 15:35 ` [PATCH 7/9] qtest: move qtest_{get, set}_virtual_clock to accel/qtest/qtest.c Alex Bennée
2024-06-12 15:35 ` [PATCH 8/9] plugins: add time control API Alex Bennée
2024-06-12 15:56   ` Pierrick Bouvier
2024-06-12 19:37     ` Alex Bennée
2024-06-12 19:54       ` Pierrick Bouvier
2024-06-13  8:57   ` Philippe Mathieu-Daudé
2024-06-13 15:56     ` Alex Bennée
2024-06-14 17:36       ` Pierrick Bouvier
2024-06-12 15:35 ` [PATCH 9/9] contrib/plugins: add ips plugin example for cost modeling Alex Bennée
2024-06-12 21:02   ` Dr. David Alan Gilbert
2024-06-14 17:42     ` Pierrick Bouvier
2024-06-14 22:00       ` Dr. David Alan Gilbert
2024-06-17 19:19         ` Pierrick Bouvier [this message]
2024-06-17 20:56           ` Dr. David Alan Gilbert
2024-06-17 22:29             ` Pierrick Bouvier
2024-06-17 22:45               ` Dr. David Alan Gilbert
2024-06-18  9:53               ` Alex Bennée
2024-06-19  4:40                 ` Pierrick Bouvier
2024-06-19  9:49                   ` Alex Bennée
2024-06-19 15:06                     ` Pierrick Bouvier
2024-06-13  8:54   ` Philippe Mathieu-Daudé
2024-06-14 17:39     ` Pierrick Bouvier
2024-06-16 18:43       ` Alex Bennée
2024-06-17 19:11         ` Pierrick Bouvier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=616df287-a167-4a05-8f08-70a78a544929@linaro.org \
    --to=pierrick.bouvier@linaro.org \
    --cc=agraf@csgraf.de \
    --cc=alex.bennee@linaro.org \
    --cc=borntraeger@linux.ibm.com \
    --cc=danielhb413@gmail.com \
    --cc=dave@treblig.org \
    --cc=david@redhat.com \
    --cc=dirty@apple.com \
    --cc=erdnaxe@crans.org \
    --cc=iii@linux.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=lvivier@redhat.com \
    --cc=ma.mandourr@gmail.com \
    --cc=mburton@qti.qualcomm.com \
    --cc=mliebel@qti.qualcomm.com \
    --cc=mtosatti@redhat.com \
    --cc=npiggin@gmail.com \
    --cc=pasic@linux.ibm.com \
    --cc=pbonzini@redhat.com \
    --cc=peter.maydell@linaro.org \
    --cc=philmd@linaro.org \
    --cc=qemu-arm@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-ppc@nongnu.org \
    --cc=qemu-s390x@nongnu.org \
    --cc=quic_jiles@quicinc.com \
    --cc=rbolshakov@ddn.com \
    --cc=richard.henderson@linaro.org \
    --cc=thuth@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).