From: Anthony Liguori <anthony@codemonkey.ws>
To: Markus Armbruster <armbru@redhat.com>
Cc: qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
Date: Thu, 20 Jan 2011 10:34:52 -0600 [thread overview]
Message-ID: <4D38642C.5050306@codemonkey.ws> (raw)
In-Reply-To: <m38vyit1ao.fsf@blackfin.pond.sub.org>
On 01/18/2011 02:16 PM, Markus Armbruster wrote:
> The problem: you want to do serious scalability testing (1000s of VMs)
> of your management stack. If each guest eats up a few 100MiB and
> competes for CPU, that requires a serious host machine. Which you don't
> have. You also don't want to modify the management stack at all, if you
> can help it.
>
> The solution: a perfectly normal-looking QEMU that uses minimal
> resources. Ability to execute any guest code is strictly optional ;)
>
> New option -fake-machine creates a fake machine incapable of running
> guest code. Completely compiled out by default, enable with configure
> --enable-fake-machine.
>
> With -fake-machine, CPU use is negligible, and memory use is rather
> modest.
>
> Non-fake VM running F-14 live, right after boot:
> UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
> armbru 15707 2558 53 191837 414388 1 21:05 pts/3 00:00:29 [...]
>
> Same VM -fake-machine, after similar time elapsed:
> UID PID PPID C SZ RSS PSR STIME TTY TIME CMD
> armbru 15742 2558 0 85129 9412 0 21:07 pts/3 00:00:00 [...]
>
> We're using a very similar patch for RHEL scalability testing.
>
Interesting, but:
9432 anthony 20 0 153m 14m 5384 S 0 0.2 0:00.22
qemu-system-x86
That's qemu-system-x86 -m 4
In terms of memory overhead, the largest source is not really going to
be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
I don't really understand the point of not creating a VCPU with KVM. Is
there some type of overhead in doing that?
Regards,
Anthony Liguori
> HACK ALERT: Works by hacking the main loop so it never executes any
> guest code. Not implemented for KVM's main loop at this time, thus
> -fake-machine needs to force KVM off. It also replaces guest RAM by a
> token amount (pc machine only at this time), and forces -vga none,
> because VGA eats too much memory.
>
> Note the TODO and FIXME comments.
>
> Dan Berrange explored a different solution a while ago: a new do-nothing
> target, patterned after i386, and a new do-nothing machine, patterned
> after pc. His patch works. But it duplicates much target and machine
> code --- adds more than ten times as many lines as this patch. Keeping
> the duplicated code reasonably in sync would be bothersome. I didn't
> like that, talked it over with Dan, and we came up with this idea
> instead.
>
> Comments? Better ideas?
> ---
> configure | 12 ++++++++++++
> cpu-exec.c | 2 +-
> cpus.c | 3 +++
> hw/pc.c | 30 ++++++++++++++++++++----------
> qemu-options.hx | 7 +++++++
> targphys.h | 7 +++++++
> vl.c | 21 +++++++++++++++++++++
> 7 files changed, 71 insertions(+), 11 deletions(-)
>
> diff --git a/configure b/configure
> index d68f862..98b0a5f 100755
> --- a/configure
> +++ b/configure
> @@ -174,6 +174,7 @@ trace_backend="nop"
> trace_file="trace"
> spice=""
> rbd=""
> +fake_machine="no"
>
> # parse CC options first
> for opt do
> @@ -719,6 +720,10 @@ for opt do
> ;;
> --enable-rbd) rbd="yes"
> ;;
> + --disable-fake-machine) fake_machine="no"
> + ;;
> + --enable-fake-machine) fake_machine="yes"
> + ;;
> *) echo "ERROR: unknown option $opt"; show_help="yes"
> ;;
> esac
> @@ -913,6 +918,8 @@ echo " Default:trace-<pid>"
> echo " --disable-spice disable spice"
> echo " --enable-spice enable spice"
> echo " --enable-rbd enable building the rados block device (rbd)"
> +echo " --disable-fake-machine disable -fake-machine option"
> +echo " --enable-fake-machine enable -fake-machine option"
> echo ""
> echo "NOTE: The object files are built at the place where configure is launched"
> exit 1
> @@ -2455,6 +2462,7 @@ echo "Trace output file $trace_file-<pid>"
> echo "spice support $spice"
> echo "rbd support $rbd"
> echo "xfsctl support $xfs"
> +echo "-fake-machine $fake_machine"
>
> if test $sdl_too_old = "yes"; then
> echo "-> Your SDL version is too old - please upgrade to have SDL support"
> @@ -2727,6 +2735,10 @@ if test "$spice" = "yes" ; then
> echo "CONFIG_SPICE=y">> $config_host_mak
> fi
>
> +if test $fake_machine = "yes" ; then
> + echo "CONFIG_FAKE_MACHINE=y">> $config_host_mak
> +fi
> +
> # XXX: suppress that
> if [ "$bsd" = "yes" ] ; then
> echo "CONFIG_BSD=y">> $config_host_mak
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 8c9fb8b..cd1259a 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
> uint8_t *tc_ptr;
> unsigned long next_tb;
>
> - if (cpu_halted(env1) == EXCP_HALTED)
> + if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
> return EXCP_HALTED;
>
> cpu_single_env = env1;
> diff --git a/cpus.c b/cpus.c
> index 0309189..91e708f 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -128,6 +128,9 @@ static int cpu_can_run(CPUState *env)
>
> static int cpu_has_work(CPUState *env)
> {
> + if (fake_machine) {
> + return 0;
> + }
> if (env->stop)
> return 1;
> if (env->queued_work_first)
> diff --git a/hw/pc.c b/hw/pc.c
> index fface7d..809f53e 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -993,18 +993,28 @@ void pc_memory_init(ram_addr_t ram_size,
> linux_boot = (kernel_filename != NULL);
>
> /* allocate RAM */
> - ram_addr = qemu_ram_alloc(NULL, "pc.ram",
> - below_4g_mem_size + above_4g_mem_size);
> - cpu_register_physical_memory(0, 0xa0000, ram_addr);
> - cpu_register_physical_memory(0x100000,
> - below_4g_mem_size - 0x100000,
> - ram_addr + 0x100000);
> + if (fake_machine) {
> + /* If user boots with -m 1000 We don't actually want to
> + * allocate a GB of RAM, so lets force all RAM allocs to one
> + * page to keep our memory footprint nice and low.
> + *
> + * TODO try to use -m 1k instead
> + */
> + ram_addr = qemu_ram_alloc(NULL, "pc.ram", 1);
> + } else {
> + ram_addr = qemu_ram_alloc(NULL, "pc.ram",
> + below_4g_mem_size + above_4g_mem_size);
> + cpu_register_physical_memory(0, 0xa0000, ram_addr);
> + cpu_register_physical_memory(0x100000,
> + below_4g_mem_size - 0x100000,
> + ram_addr + 0x100000);
> #if TARGET_PHYS_ADDR_BITS> 32
> - if (above_4g_mem_size> 0) {
> - cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
> - ram_addr + below_4g_mem_size);
> - }
> + if (above_4g_mem_size> 0) {
> + cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
> + ram_addr + below_4g_mem_size);
> + }
> #endif
> + }
>
> /* BIOS load */
> if (bios_name == NULL)
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 898561d..8a8ef4b 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2324,6 +2324,13 @@ Specify a trace file to log output traces to.
> ETEXI
> #endif
>
> +#ifdef CONFIG_FAKE_MACHINE
> +DEF("fake-machine", 0, QEMU_OPTION_fake_machine,
> + "-fake-machine create a fake machine incapable of running guest code\n"
> + " mimimal resource use, use for scalability testing\n",
> + QEMU_ARCH_ALL)
> +#endif
> +
> HXCOMM This is the last statement. Insert new options before this line!
> STEXI
> @end table
> diff --git a/targphys.h b/targphys.h
> index 95648d6..f30530c 100644
> --- a/targphys.h
> +++ b/targphys.h
> @@ -18,4 +18,11 @@ typedef uint64_t target_phys_addr_t;
> #endif
> #endif
>
> +/* FIXME definitely in the wrong place here; where should it go? */
> +#ifdef CONFIG_FAKE_MACHINE
> +extern int fake_machine;
> +#else
> +#define fake_machine 0
> +#endif
> +
> #endif
> diff --git a/vl.c b/vl.c
> index 0292184..bcc60b0 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -240,6 +240,10 @@ struct FWBootEntry {
>
> QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order);
>
> +#ifdef CONFIG_FAKE_MACHINE
> +int fake_machine = 0;
> +#endif
> +
> int nb_numa_nodes;
> uint64_t node_mem[MAX_NODES];
> uint64_t node_cpumask[MAX_NODES];
> @@ -2727,6 +2731,11 @@ int main(int argc, char **argv, char **envp)
> fclose(fp);
> break;
> }
> +#ifdef CONFIG_FAKE_MACHINE
> + case QEMU_OPTION_fake_machine:
> + fake_machine = 1;
> + break;
> +#endif
> default:
> os_parse_cmd_args(popt->index, optarg);
> }
> @@ -2817,6 +2826,15 @@ int main(int argc, char **argv, char **envp)
> }
> if (default_vga)
> vga_interface_type = VGA_CIRRUS;
> + if (fake_machine) {
> + /* HACK: Ideally we'd configure VGA as usual, but this causes
> + * several MB of VGA RAM to be allocated, and we can't do the
> + * tricks we use elsewhere to just return a single 4k page,
> + * because the VGA driver immediately memsets() the entire
> + * allocation it requested.
> + */
> + vga_interface_type = VGA_NONE;
> + }
>
> socket_init();
>
> @@ -2835,6 +2853,9 @@ int main(int argc, char **argv, char **envp)
> exit(1);
> }
>
> + if (fake_machine) {
> + kvm_allowed = 0;
> + }
> if (kvm_allowed) {
> int ret = kvm_init(smp_cpus);
> if (ret< 0) {
>
next prev parent reply other threads:[~2011-01-20 16:34 UTC|newest]
Thread overview: 10+ messages / expand[flat|nested] mbox.gz Atom feed top
2011-01-18 20:16 [Qemu-devel] [RFC PATCH] Fake machine for scalability testing Markus Armbruster
2011-01-20 16:34 ` Anthony Liguori [this message]
2011-01-20 17:12 ` Markus Armbruster
2011-01-20 19:50 ` Anthony Liguori
2011-01-21 10:38 ` Markus Armbruster
2011-01-21 14:45 ` Anthony Liguori
2011-01-21 16:51 ` Markus Armbruster
2011-01-21 10:43 ` Daniel P. Berrange
2011-01-21 14:43 ` Anthony Liguori
2011-01-21 14:46 ` Daniel P. Berrange
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=4D38642C.5050306@codemonkey.ws \
--to=anthony@codemonkey.ws \
--cc=armbru@redhat.com \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).