[Qemu-devel] [RFC PATCH] Fake machine for scalability testing

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
@ 2011-01-18 20:16 Markus Armbruster
  2011-01-20 16:34 ` Anthony Liguori
  0 siblings, 1 reply; 10+ messages in thread
From: Markus Armbruster @ 2011-01-18 20:16 UTC (permalink / raw)
  To: qemu-devel

The problem: you want to do serious scalability testing (1000s of VMs)
of your management stack.  If each guest eats up a few 100MiB and
competes for CPU, that requires a serious host machine.  Which you don't
have.  You also don't want to modify the management stack at all, if you
can help it.

The solution: a perfectly normal-looking QEMU that uses minimal
resources.  Ability to execute any guest code is strictly optional ;)

New option -fake-machine creates a fake machine incapable of running
guest code.  Completely compiled out by default, enable with configure
--enable-fake-machine.

With -fake-machine, CPU use is negligible, and memory use is rather
modest.

Non-fake VM running F-14 live, right after boot:
UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]

Same VM -fake-machine, after similar time elapsed:
UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]

We're using a very similar patch for RHEL scalability testing.

HACK ALERT: Works by hacking the main loop so it never executes any
guest code.  Not implemented for KVM's main loop at this time, thus
-fake-machine needs to force KVM off.  It also replaces guest RAM by a
token amount (pc machine only at this time), and forces -vga none,
because VGA eats too much memory.

Note the TODO and FIXME comments.

Dan Berrange explored a different solution a while ago: a new do-nothing
target, patterned after i386, and a new do-nothing machine, patterned
after pc.  His patch works.  But it duplicates much target and machine
code --- adds more than ten times as many lines as this patch.  Keeping
the duplicated code reasonably in sync would be bothersome.  I didn't
like that, talked it over with Dan, and we came up with this idea
instead.

Comments?  Better ideas?
---
 configure       |   12 ++++++++++++
 cpu-exec.c      |    2 +-
 cpus.c          |    3 +++
 hw/pc.c         |   30 ++++++++++++++++++++----------
 qemu-options.hx |    7 +++++++
 targphys.h      |    7 +++++++
 vl.c            |   21 +++++++++++++++++++++
 7 files changed, 71 insertions(+), 11 deletions(-)

diff --git a/configure b/configure
index d68f862..98b0a5f 100755
--- a/configure
+++ b/configure
@@ -174,6 +174,7 @@ trace_backend="nop"
 trace_file="trace"
 spice=""
 rbd=""
+fake_machine="no"
 
 # parse CC options first
 for opt do
@@ -719,6 +720,10 @@ for opt do
   ;;
   --enable-rbd) rbd="yes"
   ;;
+  --disable-fake-machine) fake_machine="no"
+  ;;
+  --enable-fake-machine) fake_machine="yes"
+  ;;
   *) echo "ERROR: unknown option $opt"; show_help="yes"
   ;;
   esac
@@ -913,6 +918,8 @@ echo "                           Default:trace-<pid>"
 echo "  --disable-spice          disable spice"
 echo "  --enable-spice           enable spice"
 echo "  --enable-rbd             enable building the rados block device (rbd)"
+echo "  --disable-fake-machine   disable -fake-machine option"
+echo "  --enable-fake-machine    enable -fake-machine option"
 echo ""
 echo "NOTE: The object files are built at the place where configure is launched"
 exit 1
@@ -2455,6 +2462,7 @@ echo "Trace output file $trace_file-<pid>"
 echo "spice support     $spice"
 echo "rbd support       $rbd"
 echo "xfsctl support    $xfs"
+echo "-fake-machine     $fake_machine"
 
 if test $sdl_too_old = "yes"; then
 echo "-> Your SDL version is too old - please upgrade to have SDL support"
@@ -2727,6 +2735,10 @@ if test "$spice" = "yes" ; then
   echo "CONFIG_SPICE=y" >> $config_host_mak
 fi
 
+if test $fake_machine = "yes" ; then
+  echo "CONFIG_FAKE_MACHINE=y" >> $config_host_mak
+fi
+
 # XXX: suppress that
 if [ "$bsd" = "yes" ] ; then
   echo "CONFIG_BSD=y" >> $config_host_mak
diff --git a/cpu-exec.c b/cpu-exec.c
index 8c9fb8b..cd1259a 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
     uint8_t *tc_ptr;
     unsigned long next_tb;
 
-    if (cpu_halted(env1) == EXCP_HALTED)
+    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
         return EXCP_HALTED;
 
     cpu_single_env = env1;
diff --git a/cpus.c b/cpus.c
index 0309189..91e708f 100644
--- a/cpus.c
+++ b/cpus.c
@@ -128,6 +128,9 @@ static int cpu_can_run(CPUState *env)
 
 static int cpu_has_work(CPUState *env)
 {
+    if (fake_machine) {
+        return 0;
+    }
     if (env->stop)
         return 1;
     if (env->queued_work_first)
diff --git a/hw/pc.c b/hw/pc.c
index fface7d..809f53e 100644
--- a/hw/pc.c
+++ b/hw/pc.c
@@ -993,18 +993,28 @@ void pc_memory_init(ram_addr_t ram_size,
     linux_boot = (kernel_filename != NULL);
 
     /* allocate RAM */
-    ram_addr = qemu_ram_alloc(NULL, "pc.ram",
-                              below_4g_mem_size + above_4g_mem_size);
-    cpu_register_physical_memory(0, 0xa0000, ram_addr);
-    cpu_register_physical_memory(0x100000,
-                 below_4g_mem_size - 0x100000,
-                 ram_addr + 0x100000);
+    if (fake_machine) {
+        /* If user boots with -m 1000 We don't actually want to
+         * allocate a GB of RAM, so lets force all RAM allocs to one
+         * page to keep our memory footprint nice and low.
+         *
+         * TODO try to use -m 1k instead
+         */
+        ram_addr = qemu_ram_alloc(NULL, "pc.ram", 1);
+    } else {
+        ram_addr = qemu_ram_alloc(NULL, "pc.ram",
+                                  below_4g_mem_size + above_4g_mem_size);
+        cpu_register_physical_memory(0, 0xa0000, ram_addr);
+        cpu_register_physical_memory(0x100000,
+                                     below_4g_mem_size - 0x100000,
+                                     ram_addr + 0x100000);
 #if TARGET_PHYS_ADDR_BITS > 32
-    if (above_4g_mem_size > 0) {
-        cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
-                                     ram_addr + below_4g_mem_size);
-    }
+        if (above_4g_mem_size > 0) {
+            cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
+                                         ram_addr + below_4g_mem_size);
+        }
 #endif
+    }
 
     /* BIOS load */
     if (bios_name == NULL)
diff --git a/qemu-options.hx b/qemu-options.hx
index 898561d..8a8ef4b 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -2324,6 +2324,13 @@ Specify a trace file to log output traces to.
 ETEXI
 #endif
 
+#ifdef CONFIG_FAKE_MACHINE
+DEF("fake-machine", 0, QEMU_OPTION_fake_machine,
+    "-fake-machine   create a fake machine incapable of running guest code\n"
+    "                mimimal resource use, use for scalability testing\n",
+    QEMU_ARCH_ALL)
+#endif
+
 HXCOMM This is the last statement. Insert new options before this line!
 STEXI
 @end table
diff --git a/targphys.h b/targphys.h
index 95648d6..f30530c 100644
--- a/targphys.h
+++ b/targphys.h
@@ -18,4 +18,11 @@ typedef uint64_t target_phys_addr_t;
 #endif
 #endif
 
+/* FIXME definitely in the wrong place here; where should it go? */
+#ifdef CONFIG_FAKE_MACHINE
+extern int fake_machine;
+#else
+#define fake_machine 0
+#endif
+
 #endif
diff --git a/vl.c b/vl.c
index 0292184..bcc60b0 100644
--- a/vl.c
+++ b/vl.c
@@ -240,6 +240,10 @@ struct FWBootEntry {
 
 QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order);
 
+#ifdef CONFIG_FAKE_MACHINE
+int fake_machine = 0;
+#endif
+
 int nb_numa_nodes;
 uint64_t node_mem[MAX_NODES];
 uint64_t node_cpumask[MAX_NODES];
@@ -2727,6 +2731,11 @@ int main(int argc, char **argv, char **envp)
                     fclose(fp);
                     break;
                 }
+#ifdef CONFIG_FAKE_MACHINE
+            case QEMU_OPTION_fake_machine:
+                fake_machine = 1;
+                break;
+#endif
             default:
                 os_parse_cmd_args(popt->index, optarg);
             }
@@ -2817,6 +2826,15 @@ int main(int argc, char **argv, char **envp)
     }
     if (default_vga)
         vga_interface_type = VGA_CIRRUS;
+    if (fake_machine) {
+        /* HACK: Ideally we'd configure VGA as usual, but this causes
+         * several MB of VGA RAM to be allocated, and we can't do the
+         * tricks we use elsewhere to just return a single 4k page,
+         * because the VGA driver immediately memsets() the entire
+         * allocation it requested.
+         */
+        vga_interface_type = VGA_NONE;
+    }
 
     socket_init();
 
@@ -2835,6 +2853,9 @@ int main(int argc, char **argv, char **envp)
         exit(1);
     }
 
+    if (fake_machine) {
+        kvm_allowed = 0;
+    }
     if (kvm_allowed) {
         int ret = kvm_init(smp_cpus);
         if (ret < 0) {
-- 
1.7.2.3

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
  2011-01-18 20:16 [Qemu-devel] [RFC PATCH] Fake machine for scalability testing Markus Armbruster
@ 2011-01-20 16:34 ` Anthony Liguori
  2011-01-20 17:12   ` Markus Armbruster
  0 siblings, 1 reply; 10+ messages in thread
From: Anthony Liguori @ 2011-01-20 16:34 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel

On 01/18/2011 02:16 PM, Markus Armbruster wrote:
> The problem: you want to do serious scalability testing (1000s of VMs)
> of your management stack.  If each guest eats up a few 100MiB and
> competes for CPU, that requires a serious host machine.  Which you don't
> have.  You also don't want to modify the management stack at all, if you
> can help it.
>
> The solution: a perfectly normal-looking QEMU that uses minimal
> resources.  Ability to execute any guest code is strictly optional ;)
>
> New option -fake-machine creates a fake machine incapable of running
> guest code.  Completely compiled out by default, enable with configure
> --enable-fake-machine.
>
> With -fake-machine, CPU use is negligible, and memory use is rather
> modest.
>
> Non-fake VM running F-14 live, right after boot:
> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>
> Same VM -fake-machine, after similar time elapsed:
> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>
> We're using a very similar patch for RHEL scalability testing.
>    

Interesting, but:

  9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22 
qemu-system-x86

That's qemu-system-x86 -m 4

In terms of memory overhead, the largest source is not really going to 
be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).

I don't really understand the point of not creating a VCPU with KVM.  Is 
there some type of overhead in doing that?

Regards,

Anthony Liguori

> HACK ALERT: Works by hacking the main loop so it never executes any
> guest code.  Not implemented for KVM's main loop at this time, thus
> -fake-machine needs to force KVM off.  It also replaces guest RAM by a
> token amount (pc machine only at this time), and forces -vga none,
> because VGA eats too much memory.
>
> Note the TODO and FIXME comments.
>
> Dan Berrange explored a different solution a while ago: a new do-nothing
> target, patterned after i386, and a new do-nothing machine, patterned
> after pc.  His patch works.  But it duplicates much target and machine
> code --- adds more than ten times as many lines as this patch.  Keeping
> the duplicated code reasonably in sync would be bothersome.  I didn't
> like that, talked it over with Dan, and we came up with this idea
> instead.
>
> Comments?  Better ideas?
> ---
>   configure       |   12 ++++++++++++
>   cpu-exec.c      |    2 +-
>   cpus.c          |    3 +++
>   hw/pc.c         |   30 ++++++++++++++++++++----------
>   qemu-options.hx |    7 +++++++
>   targphys.h      |    7 +++++++
>   vl.c            |   21 +++++++++++++++++++++
>   7 files changed, 71 insertions(+), 11 deletions(-)
>
> diff --git a/configure b/configure
> index d68f862..98b0a5f 100755
> --- a/configure
> +++ b/configure
> @@ -174,6 +174,7 @@ trace_backend="nop"
>   trace_file="trace"
>   spice=""
>   rbd=""
> +fake_machine="no"
>
>   # parse CC options first
>   for opt do
> @@ -719,6 +720,10 @@ for opt do
>     ;;
>     --enable-rbd) rbd="yes"
>     ;;
> +  --disable-fake-machine) fake_machine="no"
> +  ;;
> +  --enable-fake-machine) fake_machine="yes"
> +  ;;
>     *) echo "ERROR: unknown option $opt"; show_help="yes"
>     ;;
>     esac
> @@ -913,6 +918,8 @@ echo "                           Default:trace-<pid>"
>   echo "  --disable-spice          disable spice"
>   echo "  --enable-spice           enable spice"
>   echo "  --enable-rbd             enable building the rados block device (rbd)"
> +echo "  --disable-fake-machine   disable -fake-machine option"
> +echo "  --enable-fake-machine    enable -fake-machine option"
>   echo ""
>   echo "NOTE: The object files are built at the place where configure is launched"
>   exit 1
> @@ -2455,6 +2462,7 @@ echo "Trace output file $trace_file-<pid>"
>   echo "spice support     $spice"
>   echo "rbd support       $rbd"
>   echo "xfsctl support    $xfs"
> +echo "-fake-machine     $fake_machine"
>
>   if test $sdl_too_old = "yes"; then
>   echo "->  Your SDL version is too old - please upgrade to have SDL support"
> @@ -2727,6 +2735,10 @@ if test "$spice" = "yes" ; then
>     echo "CONFIG_SPICE=y">>  $config_host_mak
>   fi
>
> +if test $fake_machine = "yes" ; then
> +  echo "CONFIG_FAKE_MACHINE=y">>  $config_host_mak
> +fi
> +
>   # XXX: suppress that
>   if [ "$bsd" = "yes" ] ; then
>     echo "CONFIG_BSD=y">>  $config_host_mak
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 8c9fb8b..cd1259a 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>       uint8_t *tc_ptr;
>       unsigned long next_tb;
>
> -    if (cpu_halted(env1) == EXCP_HALTED)
> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
>           return EXCP_HALTED;
>
>       cpu_single_env = env1;
> diff --git a/cpus.c b/cpus.c
> index 0309189..91e708f 100644
> --- a/cpus.c
> +++ b/cpus.c
> @@ -128,6 +128,9 @@ static int cpu_can_run(CPUState *env)
>
>   static int cpu_has_work(CPUState *env)
>   {
> +    if (fake_machine) {
> +        return 0;
> +    }
>       if (env->stop)
>           return 1;
>       if (env->queued_work_first)
> diff --git a/hw/pc.c b/hw/pc.c
> index fface7d..809f53e 100644
> --- a/hw/pc.c
> +++ b/hw/pc.c
> @@ -993,18 +993,28 @@ void pc_memory_init(ram_addr_t ram_size,
>       linux_boot = (kernel_filename != NULL);
>
>       /* allocate RAM */
> -    ram_addr = qemu_ram_alloc(NULL, "pc.ram",
> -                              below_4g_mem_size + above_4g_mem_size);
> -    cpu_register_physical_memory(0, 0xa0000, ram_addr);
> -    cpu_register_physical_memory(0x100000,
> -                 below_4g_mem_size - 0x100000,
> -                 ram_addr + 0x100000);
> +    if (fake_machine) {
> +        /* If user boots with -m 1000 We don't actually want to
> +         * allocate a GB of RAM, so lets force all RAM allocs to one
> +         * page to keep our memory footprint nice and low.
> +         *
> +         * TODO try to use -m 1k instead
> +         */
> +        ram_addr = qemu_ram_alloc(NULL, "pc.ram", 1);
> +    } else {
> +        ram_addr = qemu_ram_alloc(NULL, "pc.ram",
> +                                  below_4g_mem_size + above_4g_mem_size);
> +        cpu_register_physical_memory(0, 0xa0000, ram_addr);
> +        cpu_register_physical_memory(0x100000,
> +                                     below_4g_mem_size - 0x100000,
> +                                     ram_addr + 0x100000);
>   #if TARGET_PHYS_ADDR_BITS>  32
> -    if (above_4g_mem_size>  0) {
> -        cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
> -                                     ram_addr + below_4g_mem_size);
> -    }
> +        if (above_4g_mem_size>  0) {
> +            cpu_register_physical_memory(0x100000000ULL, above_4g_mem_size,
> +                                         ram_addr + below_4g_mem_size);
> +        }
>   #endif
> +    }
>
>       /* BIOS load */
>       if (bios_name == NULL)
> diff --git a/qemu-options.hx b/qemu-options.hx
> index 898561d..8a8ef4b 100644
> --- a/qemu-options.hx
> +++ b/qemu-options.hx
> @@ -2324,6 +2324,13 @@ Specify a trace file to log output traces to.
>   ETEXI
>   #endif
>
> +#ifdef CONFIG_FAKE_MACHINE
> +DEF("fake-machine", 0, QEMU_OPTION_fake_machine,
> +    "-fake-machine   create a fake machine incapable of running guest code\n"
> +    "                mimimal resource use, use for scalability testing\n",
> +    QEMU_ARCH_ALL)
> +#endif
> +
>   HXCOMM This is the last statement. Insert new options before this line!
>   STEXI
>   @end table
> diff --git a/targphys.h b/targphys.h
> index 95648d6..f30530c 100644
> --- a/targphys.h
> +++ b/targphys.h
> @@ -18,4 +18,11 @@ typedef uint64_t target_phys_addr_t;
>   #endif
>   #endif
>
> +/* FIXME definitely in the wrong place here; where should it go? */
> +#ifdef CONFIG_FAKE_MACHINE
> +extern int fake_machine;
> +#else
> +#define fake_machine 0
> +#endif
> +
>   #endif
> diff --git a/vl.c b/vl.c
> index 0292184..bcc60b0 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -240,6 +240,10 @@ struct FWBootEntry {
>
>   QTAILQ_HEAD(, FWBootEntry) fw_boot_order = QTAILQ_HEAD_INITIALIZER(fw_boot_order);
>
> +#ifdef CONFIG_FAKE_MACHINE
> +int fake_machine = 0;
> +#endif
> +
>   int nb_numa_nodes;
>   uint64_t node_mem[MAX_NODES];
>   uint64_t node_cpumask[MAX_NODES];
> @@ -2727,6 +2731,11 @@ int main(int argc, char **argv, char **envp)
>                       fclose(fp);
>                       break;
>                   }
> +#ifdef CONFIG_FAKE_MACHINE
> +            case QEMU_OPTION_fake_machine:
> +                fake_machine = 1;
> +                break;
> +#endif
>               default:
>                   os_parse_cmd_args(popt->index, optarg);
>               }
> @@ -2817,6 +2826,15 @@ int main(int argc, char **argv, char **envp)
>       }
>       if (default_vga)
>           vga_interface_type = VGA_CIRRUS;
> +    if (fake_machine) {
> +        /* HACK: Ideally we'd configure VGA as usual, but this causes
> +         * several MB of VGA RAM to be allocated, and we can't do the
> +         * tricks we use elsewhere to just return a single 4k page,
> +         * because the VGA driver immediately memsets() the entire
> +         * allocation it requested.
> +         */
> +        vga_interface_type = VGA_NONE;
> +    }
>
>       socket_init();
>
> @@ -2835,6 +2853,9 @@ int main(int argc, char **argv, char **envp)
>           exit(1);
>       }
>
> +    if (fake_machine) {
> +        kvm_allowed = 0;
> +    }
>       if (kvm_allowed) {
>           int ret = kvm_init(smp_cpus);
>           if (ret<  0) {
>    

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
  2011-01-20 16:34 ` Anthony Liguori
@ 2011-01-20 17:12   ` Markus Armbruster
  2011-01-20 19:50     ` Anthony Liguori
  0 siblings, 1 reply; 10+ messages in thread
From: Markus Armbruster @ 2011-01-20 17:12 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Anthony Liguori <anthony@codemonkey.ws> writes:

> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>> The problem: you want to do serious scalability testing (1000s of VMs)
>> of your management stack.  If each guest eats up a few 100MiB and
>> competes for CPU, that requires a serious host machine.  Which you don't
>> have.  You also don't want to modify the management stack at all, if you
>> can help it.
>>
>> The solution: a perfectly normal-looking QEMU that uses minimal
>> resources.  Ability to execute any guest code is strictly optional ;)
>>
>> New option -fake-machine creates a fake machine incapable of running
>> guest code.  Completely compiled out by default, enable with configure
>> --enable-fake-machine.
>>
>> With -fake-machine, CPU use is negligible, and memory use is rather
>> modest.
>>
>> Non-fake VM running F-14 live, right after boot:
>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>
>> Same VM -fake-machine, after similar time elapsed:
>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>
>> We're using a very similar patch for RHEL scalability testing.
>>    
>
> Interesting, but:
>
>  9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
> qemu-system-x86
>
> That's qemu-system-x86 -m 4

Sure you ran qemu-system-x86 -fake-machine?

> In terms of memory overhead, the largest source is not really going to
> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).

git-grep phys_ram_dirty finds nothing.

> I don't really understand the point of not creating a VCPU with KVM.
> Is there some type of overhead in doing that?

I briefly looked at both main loops, TCG's was the first one I happened
to crack, and I didn't feel like doing both then.  If the general
approach is okay, I'll gladly investigate how to do it with KVM.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
  2011-01-20 17:12   ` Markus Armbruster
@ 2011-01-20 19:50     ` Anthony Liguori
  2011-01-21 10:38       ` Markus Armbruster
  2011-01-21 10:43       ` Daniel P. Berrange
  0 siblings, 2 replies; 10+ messages in thread
From: Anthony Liguori @ 2011-01-20 19:50 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel

On 01/20/2011 11:12 AM, Markus Armbruster wrote:
> Anthony Liguori<anthony@codemonkey.ws>  writes:
>
>    
>> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>>      
>>> The problem: you want to do serious scalability testing (1000s of VMs)
>>> of your management stack.  If each guest eats up a few 100MiB and
>>> competes for CPU, that requires a serious host machine.  Which you don't
>>> have.  You also don't want to modify the management stack at all, if you
>>> can help it.
>>>
>>> The solution: a perfectly normal-looking QEMU that uses minimal
>>> resources.  Ability to execute any guest code is strictly optional ;)
>>>
>>> New option -fake-machine creates a fake machine incapable of running
>>> guest code.  Completely compiled out by default, enable with configure
>>> --enable-fake-machine.
>>>
>>> With -fake-machine, CPU use is negligible, and memory use is rather
>>> modest.
>>>
>>> Non-fake VM running F-14 live, right after boot:
>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>>
>>> Same VM -fake-machine, after similar time elapsed:
>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>>
>>> We're using a very similar patch for RHEL scalability testing.
>>>
>>>        
>> Interesting, but:
>>
>>   9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
>> qemu-system-x86
>>
>> That's qemu-system-x86 -m 4
>>      
> Sure you ran qemu-system-x86 -fake-machine?
>    

No, I didn't try it.  My point was that -m 4 is already pretty small.

>> In terms of memory overhead, the largest source is not really going to
>> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
>>      
> git-grep phys_ram_dirty finds nothing.
>    

Yeah, it's now ram_list[i].phys_dirty.

l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages

phys_dirty is mem_size_in_pages bytes.

>> I don't really understand the point of not creating a VCPU with KVM.
>> Is there some type of overhead in doing that?
>>      
> I briefly looked at both main loops, TCG's was the first one I happened
> to crack, and I didn't feel like doing both then.  If the general
> approach is okay, I'll gladly investigate how to do it with KVM.
>    

I guess what I don't understand is why do you need to not run guest 
code?  Specifically, if you remove the following, is it any less useful?

diff --git a/cpu-exec.c b/cpu-exec.c
index 8c9fb8b..cd1259a 100644
--- a/cpu-exec.c
+++ b/cpu-exec.c
@@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
      uint8_t *tc_ptr;
      unsigned long next_tb;

-    if (cpu_halted(env1) == EXCP_HALTED)
+    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)

          return EXCP_HALTED;


Regards,

Anthony Liguori

^ permalink raw reply related	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
  2011-01-20 19:50     ` Anthony Liguori
@ 2011-01-21 10:38       ` Markus Armbruster
  2011-01-21 14:45         ` Anthony Liguori
  2011-01-21 10:43       ` Daniel P. Berrange
  1 sibling, 1 reply; 10+ messages in thread
From: Markus Armbruster @ 2011-01-21 10:38 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Anthony Liguori <anthony@codemonkey.ws> writes:

> On 01/20/2011 11:12 AM, Markus Armbruster wrote:
>> Anthony Liguori<anthony@codemonkey.ws>  writes:
>>
>>    
>>> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>>>      
>>>> The problem: you want to do serious scalability testing (1000s of VMs)
>>>> of your management stack.  If each guest eats up a few 100MiB and
>>>> competes for CPU, that requires a serious host machine.  Which you don't
>>>> have.  You also don't want to modify the management stack at all, if you
>>>> can help it.
>>>>
>>>> The solution: a perfectly normal-looking QEMU that uses minimal
>>>> resources.  Ability to execute any guest code is strictly optional ;)
>>>>
>>>> New option -fake-machine creates a fake machine incapable of running
>>>> guest code.  Completely compiled out by default, enable with configure
>>>> --enable-fake-machine.
>>>>
>>>> With -fake-machine, CPU use is negligible, and memory use is rather
>>>> modest.
>>>>
>>>> Non-fake VM running F-14 live, right after boot:
>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>>>
>>>> Same VM -fake-machine, after similar time elapsed:
>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>>>
>>>> We're using a very similar patch for RHEL scalability testing.
>>>>
>>>>        
>>> Interesting, but:
>>>
>>>   9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
>>> qemu-system-x86
>>>
>>> That's qemu-system-x86 -m 4
>>>      
>> Sure you ran qemu-system-x86 -fake-machine?
>>    
>
> No, I didn't try it.  My point was that -m 4 is already pretty small.

Ah!

However, it's not as small as -fake-machine, and eats all the CPU it can
get.

None-fake VM as above, but with -m 4:
UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
armbru   19325  2558 93 39869 17020   1 11:30 pts/3    00:00:42 [...]

And I believe we can make -fake-machine use even less memory than now,
with a little more work.

>>> In terms of memory overhead, the largest source is not really going to
>>> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
>>>      
>> git-grep phys_ram_dirty finds nothing.
>>    
>
> Yeah, it's now ram_list[i].phys_dirty.
>
> l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
>
> phys_dirty is mem_size_in_pages bytes.

Thanks.

>>> I don't really understand the point of not creating a VCPU with KVM.
>>> Is there some type of overhead in doing that?
>>>      
>> I briefly looked at both main loops, TCG's was the first one I happened
>> to crack, and I didn't feel like doing both then.  If the general
>> approach is okay, I'll gladly investigate how to do it with KVM.
>>    
>
> I guess what I don't understand is why do you need to not run guest
> code?  Specifically, if you remove the following, is it any less
> useful?
>
> diff --git a/cpu-exec.c b/cpu-exec.c
> index 8c9fb8b..cd1259a 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>      uint8_t *tc_ptr;
>      unsigned long next_tb;
>
> -    if (cpu_halted(env1) == EXCP_HALTED)
> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
>
>          return EXCP_HALTED;

I don't want 1000s of guests running infinite "not enough memory to do
anything useful, panic!" reboot loops.  Because that's 1000s of guests
competing for CPU.

If you think we can achieve my goals (stated in my first paragraph) in a
different way, I'm all ears.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
  2011-01-20 19:50     ` Anthony Liguori
  2011-01-21 10:38       ` Markus Armbruster
@ 2011-01-21 10:43       ` Daniel P. Berrange
  2011-01-21 14:43         ` Anthony Liguori
  1 sibling, 1 reply; 10+ messages in thread
From: Daniel P. Berrange @ 2011-01-21 10:43 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Markus Armbruster, qemu-devel

On Thu, Jan 20, 2011 at 01:50:33PM -0600, Anthony Liguori wrote:
> On 01/20/2011 11:12 AM, Markus Armbruster wrote:
> >Anthony Liguori<anthony@codemonkey.ws>  writes:
> >
> >>On 01/18/2011 02:16 PM, Markus Armbruster wrote:
> >>>The problem: you want to do serious scalability testing (1000s of VMs)
> >>>of your management stack.  If each guest eats up a few 100MiB and
> >>>competes for CPU, that requires a serious host machine.  Which you don't
> >>>have.  You also don't want to modify the management stack at all, if you
> >>>can help it.
> >>>
> >>>The solution: a perfectly normal-looking QEMU that uses minimal
> >>>resources.  Ability to execute any guest code is strictly optional ;)
> >>>
> >>>New option -fake-machine creates a fake machine incapable of running
> >>>guest code.  Completely compiled out by default, enable with configure
> >>>--enable-fake-machine.
> >>>
> >>>With -fake-machine, CPU use is negligible, and memory use is rather
> >>>modest.
> >>>
> >>>Non-fake VM running F-14 live, right after boot:
> >>>UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> >>>armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
> >>>
> >>>Same VM -fake-machine, after similar time elapsed:
> >>>UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> >>>armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
> >>>
> >>>We're using a very similar patch for RHEL scalability testing.
> >>>
> >>Interesting, but:
> >>
> >>  9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
> >>qemu-system-x86
> >>
> >>That's qemu-system-x86 -m 4
> >Sure you ran qemu-system-x86 -fake-machine?
> 
> No, I didn't try it.  My point was that -m 4 is already pretty small.

One of the core ideas/requirements behind the "fake QEMU" was
that we won't need to modify applications to adjust the command
line arguments in this kind of way. We want all their machine
definition logic to remain unaffected. In fact my original
prototype did not even require addition of the passing of an
extra '-fake-machine' argument, it would have just been a plain
drop in alternative QEMU binary. It also stubbed out much of
the KVM codepaths, so you could "enable"  KVM mode without
actually really having KVM available on the host.

> >>In terms of memory overhead, the largest source is not really going to
> >>be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
> >git-grep phys_ram_dirty finds nothing.
> 
> Yeah, it's now ram_list[i].phys_dirty.
> 
> l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
> 
> phys_dirty is mem_size_in_pages bytes.
> 
> >>I don't really understand the point of not creating a VCPU with KVM.
> >>Is there some type of overhead in doing that?
> >I briefly looked at both main loops, TCG's was the first one I happened
> >to crack, and I didn't feel like doing both then.  If the general
> >approach is okay, I'll gladly investigate how to do it with KVM.
> 
> I guess what I don't understand is why do you need to not run guest
> code?  Specifically, if you remove the following, is it any less
> useful?

IIUC, if you don't have the following change, then the guest CPUs
will actually execute the kernel/bootable disk configured, causing
host CPU utilization to rise. Even if it only adds 2% load on the
host, this quickly becomes an issue as you get 200 or more VMs on
the host. Ideally we would have the main loop completely disabled,
not merely the CPUs, because this would avoid all possible background
CPU load that any QEMU internal timers might cause.

Basically the desired goal is, make no change to the QEMU command
line aguments, but have zero memory and CPU overhead by running
QEMU. fake-machine doesn't get as close to zero as my original
fake QEMU target managed, but it is still pretty good, considering
how much less code is involved in fake-machine.

> diff --git a/cpu-exec.c b/cpu-exec.c
> index 8c9fb8b..cd1259a 100644
> --- a/cpu-exec.c
> +++ b/cpu-exec.c
> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>      uint8_t *tc_ptr;
>      unsigned long next_tb;
> 
> -    if (cpu_halted(env1) == EXCP_HALTED)
> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
> 
>          return EXCP_HALTED;


Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
  2011-01-21 10:43       ` Daniel P. Berrange
@ 2011-01-21 14:43         ` Anthony Liguori
  2011-01-21 14:46           ` Daniel P. Berrange
  0 siblings, 1 reply; 10+ messages in thread
From: Anthony Liguori @ 2011-01-21 14:43 UTC (permalink / raw)
  To: Daniel P. Berrange; +Cc: Markus Armbruster, qemu-devel

On 01/21/2011 04:43 AM, Daniel P. Berrange wrote:
> On Thu, Jan 20, 2011 at 01:50:33PM -0600, Anthony Liguori wrote:
>    
>> On 01/20/2011 11:12 AM, Markus Armbruster wrote:
>>      
>>> Anthony Liguori<anthony@codemonkey.ws>   writes:
>>>
>>>        
>>>> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>>>>          
>>>>> The problem: you want to do serious scalability testing (1000s of VMs)
>>>>> of your management stack.  If each guest eats up a few 100MiB and
>>>>> competes for CPU, that requires a serious host machine.  Which you don't
>>>>> have.  You also don't want to modify the management stack at all, if you
>>>>> can help it.
>>>>>
>>>>> The solution: a perfectly normal-looking QEMU that uses minimal
>>>>> resources.  Ability to execute any guest code is strictly optional ;)
>>>>>
>>>>> New option -fake-machine creates a fake machine incapable of running
>>>>> guest code.  Completely compiled out by default, enable with configure
>>>>> --enable-fake-machine.
>>>>>
>>>>> With -fake-machine, CPU use is negligible, and memory use is rather
>>>>> modest.
>>>>>
>>>>> Non-fake VM running F-14 live, right after boot:
>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>>>>
>>>>> Same VM -fake-machine, after similar time elapsed:
>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>>>>
>>>>> We're using a very similar patch for RHEL scalability testing.
>>>>>
>>>>>            
>>>> Interesting, but:
>>>>
>>>>   9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
>>>> qemu-system-x86
>>>>
>>>> That's qemu-system-x86 -m 4
>>>>          
>>> Sure you ran qemu-system-x86 -fake-machine?
>>>        
>> No, I didn't try it.  My point was that -m 4 is already pretty small.
>>      
> One of the core ideas/requirements behind the "fake QEMU" was
> that we won't need to modify applications to adjust the command
> line arguments in this kind of way. We want all their machine
> definition logic to remain unaffected. In fact my original
> prototype did not even require addition of the passing of an
> extra '-fake-machine' argument, it would have just been a plain
> drop in alternative QEMU binary. It also stubbed out much of
> the KVM codepaths, so you could "enable"  KVM mode without
> actually really having KVM available on the host.
>
>    
>>>> In terms of memory overhead, the largest source is not really going to
>>>> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
>>>>          
>>> git-grep phys_ram_dirty finds nothing.
>>>        
>> Yeah, it's now ram_list[i].phys_dirty.
>>
>> l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
>>
>> phys_dirty is mem_size_in_pages bytes.
>>
>>      
>>>> I don't really understand the point of not creating a VCPU with KVM.
>>>> Is there some type of overhead in doing that?
>>>>          
>>> I briefly looked at both main loops, TCG's was the first one I happened
>>> to crack, and I didn't feel like doing both then.  If the general
>>> approach is okay, I'll gladly investigate how to do it with KVM.
>>>        
>> I guess what I don't understand is why do you need to not run guest
>> code?  Specifically, if you remove the following, is it any less
>> useful?
>>      
> IIUC, if you don't have the following change, then the guest CPUs
> will actually execute the kernel/bootable disk configured, causing
> host CPU utilization to rise. Even if it only adds 2% load on the
> host, this quickly becomes an issue as you get 200 or more VMs on
> the host. Ideally we would have the main loop completely disabled,
> not merely the CPUs, because this would avoid all possible background
> CPU load that any QEMU internal timers might cause.
>
> Basically the desired goal is, make no change to the QEMU command
> line aguments, but have zero memory and CPU overhead by running
> QEMU. fake-machine doesn't get as close to zero as my original
> fake QEMU target managed, but it is still pretty good, considering
> how much less code is involved in fake-machine.
>    

Oh, so what you really want to do is:

#!/bin/sh
/usr/libexec/qemu-kvm -m 4

Ignore all command line parameters and just run a micro guest.  If you 
don't specify any kernel/boot disks, you don't need to disable a VCPU 
execution because it'll spin in a hlt loop once the bios executes.

Regards,

Anthony Liguori

>    
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 8c9fb8b..cd1259a 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>>       uint8_t *tc_ptr;
>>       unsigned long next_tb;
>>
>> -    if (cpu_halted(env1) == EXCP_HALTED)
>> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
>>
>>           return EXCP_HALTED;
>>      
>
> Daniel
>
>    

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
  2011-01-21 10:38       ` Markus Armbruster
@ 2011-01-21 14:45         ` Anthony Liguori
  2011-01-21 16:51           ` Markus Armbruster
  0 siblings, 1 reply; 10+ messages in thread
From: Anthony Liguori @ 2011-01-21 14:45 UTC (permalink / raw)
  To: Markus Armbruster; +Cc: qemu-devel

On 01/21/2011 04:38 AM, Markus Armbruster wrote:
> Anthony Liguori<anthony@codemonkey.ws>  writes:
>
>    
>> On 01/20/2011 11:12 AM, Markus Armbruster wrote:
>>      
>>> Anthony Liguori<anthony@codemonkey.ws>   writes:
>>>
>>>
>>>        
>>>> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>>>>
>>>>          
>>>>> The problem: you want to do serious scalability testing (1000s of VMs)
>>>>> of your management stack.  If each guest eats up a few 100MiB and
>>>>> competes for CPU, that requires a serious host machine.  Which you don't
>>>>> have.  You also don't want to modify the management stack at all, if you
>>>>> can help it.
>>>>>
>>>>> The solution: a perfectly normal-looking QEMU that uses minimal
>>>>> resources.  Ability to execute any guest code is strictly optional ;)
>>>>>
>>>>> New option -fake-machine creates a fake machine incapable of running
>>>>> guest code.  Completely compiled out by default, enable with configure
>>>>> --enable-fake-machine.
>>>>>
>>>>> With -fake-machine, CPU use is negligible, and memory use is rather
>>>>> modest.
>>>>>
>>>>> Non-fake VM running F-14 live, right after boot:
>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>>>>
>>>>> Same VM -fake-machine, after similar time elapsed:
>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>>>>
>>>>> We're using a very similar patch for RHEL scalability testing.
>>>>>
>>>>>
>>>>>            
>>>> Interesting, but:
>>>>
>>>>    9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
>>>> qemu-system-x86
>>>>
>>>> That's qemu-system-x86 -m 4
>>>>
>>>>          
>>> Sure you ran qemu-system-x86 -fake-machine?
>>>
>>>        
>> No, I didn't try it.  My point was that -m 4 is already pretty small.
>>      
> Ah!
>
> However, it's not as small as -fake-machine, and eats all the CPU it can
> get.
>
> None-fake VM as above, but with -m 4:
> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> armbru   19325  2558 93 39869 17020   1 11:30 pts/3    00:00:42 [...]
>
> And I believe we can make -fake-machine use even less memory than now,
> with a little more work.
>
>    
>>>> In terms of memory overhead, the largest source is not really going to
>>>> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
>>>>
>>>>          
>>> git-grep phys_ram_dirty finds nothing.
>>>
>>>        
>> Yeah, it's now ram_list[i].phys_dirty.
>>
>> l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
>>
>> phys_dirty is mem_size_in_pages bytes.
>>      
> Thanks.
>
>    
>>>> I don't really understand the point of not creating a VCPU with KVM.
>>>> Is there some type of overhead in doing that?
>>>>
>>>>          
>>> I briefly looked at both main loops, TCG's was the first one I happened
>>> to crack, and I didn't feel like doing both then.  If the general
>>> approach is okay, I'll gladly investigate how to do it with KVM.
>>>
>>>        
>> I guess what I don't understand is why do you need to not run guest
>> code?  Specifically, if you remove the following, is it any less
>> useful?
>>
>> diff --git a/cpu-exec.c b/cpu-exec.c
>> index 8c9fb8b..cd1259a 100644
>> --- a/cpu-exec.c
>> +++ b/cpu-exec.c
>> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>>       uint8_t *tc_ptr;
>>       unsigned long next_tb;
>>
>> -    if (cpu_halted(env1) == EXCP_HALTED)
>> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
>>
>>           return EXCP_HALTED;
>>      
> I don't want 1000s of guests running infinite "not enough memory to do
> anything useful, panic!" reboot loops.  Because that's 1000s of guests
> competing for CPU.
>    

Hrm, that's not the behavior I see.  With no bootable drive, the BIOS 
will spin in a HLT loop as part of int18.

Regards,

Anthony Liguori

> If you think we can achieve my goals (stated in my first paragraph) in a
> different way, I'm all ears.
>
>    

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
  2011-01-21 14:43         ` Anthony Liguori
@ 2011-01-21 14:46           ` Daniel P. Berrange
  0 siblings, 0 replies; 10+ messages in thread
From: Daniel P. Berrange @ 2011-01-21 14:46 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Markus Armbruster, qemu-devel

On Fri, Jan 21, 2011 at 08:43:20AM -0600, Anthony Liguori wrote:
> On 01/21/2011 04:43 AM, Daniel P. Berrange wrote:
> >On Thu, Jan 20, 2011 at 01:50:33PM -0600, Anthony Liguori wrote:
> >>On 01/20/2011 11:12 AM, Markus Armbruster wrote:
> >>>Anthony Liguori<anthony@codemonkey.ws>   writes:
> >>>
> >>>>On 01/18/2011 02:16 PM, Markus Armbruster wrote:
> >>>>>The problem: you want to do serious scalability testing (1000s of VMs)
> >>>>>of your management stack.  If each guest eats up a few 100MiB and
> >>>>>competes for CPU, that requires a serious host machine.  Which you don't
> >>>>>have.  You also don't want to modify the management stack at all, if you
> >>>>>can help it.
> >>>>>
> >>>>>The solution: a perfectly normal-looking QEMU that uses minimal
> >>>>>resources.  Ability to execute any guest code is strictly optional ;)
> >>>>>
> >>>>>New option -fake-machine creates a fake machine incapable of running
> >>>>>guest code.  Completely compiled out by default, enable with configure
> >>>>>--enable-fake-machine.
> >>>>>
> >>>>>With -fake-machine, CPU use is negligible, and memory use is rather
> >>>>>modest.
> >>>>>
> >>>>>Non-fake VM running F-14 live, right after boot:
> >>>>>UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> >>>>>armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
> >>>>>
> >>>>>Same VM -fake-machine, after similar time elapsed:
> >>>>>UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
> >>>>>armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
> >>>>>
> >>>>>We're using a very similar patch for RHEL scalability testing.
> >>>>>
> >>>>Interesting, but:
> >>>>
> >>>>  9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
> >>>>qemu-system-x86
> >>>>
> >>>>That's qemu-system-x86 -m 4
> >>>Sure you ran qemu-system-x86 -fake-machine?
> >>No, I didn't try it.  My point was that -m 4 is already pretty small.
> >One of the core ideas/requirements behind the "fake QEMU" was
> >that we won't need to modify applications to adjust the command
> >line arguments in this kind of way. We want all their machine
> >definition logic to remain unaffected. In fact my original
> >prototype did not even require addition of the passing of an
> >extra '-fake-machine' argument, it would have just been a plain
> >drop in alternative QEMU binary. It also stubbed out much of
> >the KVM codepaths, so you could "enable"  KVM mode without
> >actually really having KVM available on the host.
> >
> >>>>In terms of memory overhead, the largest source is not really going to
> >>>>be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
> >>>git-grep phys_ram_dirty finds nothing.
> >>Yeah, it's now ram_list[i].phys_dirty.
> >>
> >>l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
> >>
> >>phys_dirty is mem_size_in_pages bytes.
> >>
> >>>>I don't really understand the point of not creating a VCPU with KVM.
> >>>>Is there some type of overhead in doing that?
> >>>I briefly looked at both main loops, TCG's was the first one I happened
> >>>to crack, and I didn't feel like doing both then.  If the general
> >>>approach is okay, I'll gladly investigate how to do it with KVM.
> >>I guess what I don't understand is why do you need to not run guest
> >>code?  Specifically, if you remove the following, is it any less
> >>useful?
> >IIUC, if you don't have the following change, then the guest CPUs
> >will actually execute the kernel/bootable disk configured, causing
> >host CPU utilization to rise. Even if it only adds 2% load on the
> >host, this quickly becomes an issue as you get 200 or more VMs on
> >the host. Ideally we would have the main loop completely disabled,
> >not merely the CPUs, because this would avoid all possible background
> >CPU load that any QEMU internal timers might cause.
> >
> >Basically the desired goal is, make no change to the QEMU command
> >line aguments, but have zero memory and CPU overhead by running
> >QEMU. fake-machine doesn't get as close to zero as my original
> >fake QEMU target managed, but it is still pretty good, considering
> >how much less code is involved in fake-machine.
> 
> Oh, so what you really want to do is:
> 
> #!/bin/sh
> /usr/libexec/qemu-kvm -m 4
> 
> Ignore all command line parameters and just run a micro guest.  If
> you don't specify any kernel/boot disks, you don't need to disable a
> VCPU execution because it'll spin in a hlt loop once the bios
> executes.

That's going to likely cause app confusion, because the app will
be specifying 1 GB, but when it talks to the balloon it is only
going to see / be allowed to set the balloon between 0 & 4 MB.

Daniel

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [Qemu-devel] [RFC PATCH] Fake machine for scalability testing
  2011-01-21 14:45         ` Anthony Liguori
@ 2011-01-21 16:51           ` Markus Armbruster
  0 siblings, 0 replies; 10+ messages in thread
From: Markus Armbruster @ 2011-01-21 16:51 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: qemu-devel

Anthony Liguori <anthony@codemonkey.ws> writes:

> On 01/21/2011 04:38 AM, Markus Armbruster wrote:
>> Anthony Liguori<anthony@codemonkey.ws>  writes:
>>
>>    
>>> On 01/20/2011 11:12 AM, Markus Armbruster wrote:
>>>      
>>>> Anthony Liguori<anthony@codemonkey.ws>   writes:
>>>>
>>>>
>>>>        
>>>>> On 01/18/2011 02:16 PM, Markus Armbruster wrote:
>>>>>
>>>>>          
>>>>>> The problem: you want to do serious scalability testing (1000s of VMs)
>>>>>> of your management stack.  If each guest eats up a few 100MiB and
>>>>>> competes for CPU, that requires a serious host machine.  Which you don't
>>>>>> have.  You also don't want to modify the management stack at all, if you
>>>>>> can help it.
>>>>>>
>>>>>> The solution: a perfectly normal-looking QEMU that uses minimal
>>>>>> resources.  Ability to execute any guest code is strictly optional ;)
>>>>>>
>>>>>> New option -fake-machine creates a fake machine incapable of running
>>>>>> guest code.  Completely compiled out by default, enable with configure
>>>>>> --enable-fake-machine.
>>>>>>
>>>>>> With -fake-machine, CPU use is negligible, and memory use is rather
>>>>>> modest.
>>>>>>
>>>>>> Non-fake VM running F-14 live, right after boot:
>>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>>> armbru   15707  2558 53 191837 414388 1 21:05 pts/3    00:00:29 [...]
>>>>>>
>>>>>> Same VM -fake-machine, after similar time elapsed:
>>>>>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>>>>>> armbru   15742  2558  0 85129  9412   0 21:07 pts/3    00:00:00 [...]
>>>>>>
>>>>>> We're using a very similar patch for RHEL scalability testing.
>>>>>>
>>>>>>
>>>>>>            
>>>>> Interesting, but:
>>>>>
>>>>>    9432 anthony   20   0  153m  14m 5384 S    0  0.2   0:00.22
>>>>> qemu-system-x86
>>>>>
>>>>> That's qemu-system-x86 -m 4
>>>>>
>>>>>          
>>>> Sure you ran qemu-system-x86 -fake-machine?
>>>>
>>>>        
>>> No, I didn't try it.  My point was that -m 4 is already pretty small.
>>>      
>> Ah!
>>
>> However, it's not as small as -fake-machine, and eats all the CPU it can
>> get.
>>
>> None-fake VM as above, but with -m 4:
>> UID        PID  PPID  C    SZ   RSS PSR STIME TTY          TIME CMD
>> armbru   19325  2558 93 39869 17020   1 11:30 pts/3    00:00:42 [...]
>>
>> And I believe we can make -fake-machine use even less memory than now,
>> with a little more work.
>>
>>    
>>>>> In terms of memory overhead, the largest source is not really going to
>>>>> be addressed by -fake-machine (l1_phys_map and phys_ram_dirty).
>>>>>
>>>>>          
>>>> git-grep phys_ram_dirty finds nothing.
>>>>
>>>>        
>>> Yeah, it's now ram_list[i].phys_dirty.
>>>
>>> l1_phys_map is (sizeof(void *) + sizeof(PhysPageDesc)) * mem_size_in_pages
>>>
>>> phys_dirty is mem_size_in_pages bytes.
>>>      
>> Thanks.
>>
>>    
>>>>> I don't really understand the point of not creating a VCPU with KVM.
>>>>> Is there some type of overhead in doing that?
>>>>>
>>>>>          
>>>> I briefly looked at both main loops, TCG's was the first one I happened
>>>> to crack, and I didn't feel like doing both then.  If the general
>>>> approach is okay, I'll gladly investigate how to do it with KVM.
>>>>
>>>>        
>>> I guess what I don't understand is why do you need to not run guest
>>> code?  Specifically, if you remove the following, is it any less
>>> useful?
>>>
>>> diff --git a/cpu-exec.c b/cpu-exec.c
>>> index 8c9fb8b..cd1259a 100644
>>> --- a/cpu-exec.c
>>> +++ b/cpu-exec.c
>>> @@ -230,7 +230,7 @@ int cpu_exec(CPUState *env1)
>>>       uint8_t *tc_ptr;
>>>       unsigned long next_tb;
>>>
>>> -    if (cpu_halted(env1) == EXCP_HALTED)
>>> +    if (fake_machine || cpu_halted(env1) == EXCP_HALTED)
>>>
>>>           return EXCP_HALTED;
>>>      
>> I don't want 1000s of guests running infinite "not enough memory to do
>> anything useful, panic!" reboot loops.  Because that's 1000s of guests
>> competing for CPU.
>>    
>
> Hrm, that's not the behavior I see.  With no bootable drive, the BIOS
> will spin in a HLT loop as part of int18.

Aha.  I used a bootable drive.

Using a non-bootable drive may well curb the CPU use sufficiently.  Not
sure we can always do that in our testing.  The less we have to hack up
the stack for testing, the better.

^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2011-01-21 16:51 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-01-18 20:16 [Qemu-devel] [RFC PATCH] Fake machine for scalability testing Markus Armbruster
2011-01-20 16:34 ` Anthony Liguori
2011-01-20 17:12   ` Markus Armbruster
2011-01-20 19:50     ` Anthony Liguori
2011-01-21 10:38       ` Markus Armbruster
2011-01-21 14:45         ` Anthony Liguori
2011-01-21 16:51           ` Markus Armbruster
2011-01-21 10:43       ` Daniel P. Berrange
2011-01-21 14:43         ` Anthony Liguori
2011-01-21 14:46           ` Daniel P. Berrange

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).