[Qemu-devel] [PATCH 0/4] Improve -icount, fix it with iothread

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] [PATCH 0/4] Improve -icount, fix it with iothread
@ 2011-02-21  8:51 Paolo Bonzini
  2011-02-21  8:51 ` [Qemu-devel] [PATCH 1/4] do not use qemu_icount_delta in the !use_icount case Paolo Bonzini
                   ` (4 more replies)
  0 siblings, 5 replies; 15+ messages in thread
From: Paolo Bonzini @ 2011-02-21  8:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: edgar.iglesias

This series redoes the way time spent waiting for I/O is accounted to
the vm_clock.

The current code is advancing qemu_icount before waiting for I/O.
Instead, after the patch qemu_icount is left aside (it is a pure
instruction counter) and qemu_icount_bias is changed according to
the actual amount of time spent in the wait.  This is more
accurate, and actually works in the iothread case as well.

(I started this as an experiment while trying to understand what was
going on.  But it fixes the bug and does not break the no-iothread
case, so hey...).

Patch 1 is a cleanup to Edgar's commit 225d02c (Avoid deadlock whith
iothread and icount, 2011-01-23).

Patch 2 fixes another misunderstanding in the role of qemu_next_deadline.

Patches 3 and 4 implement the actual new accounting algorithm.

With these patches, iothread "-icount N" doesn't work when the actual
execution speed cannot keep up with the requested speed; the execution
in that case is not deterministic.  It works when the requested speed
is slow enough.

(side note: all occurrences of wrong braces are in code that is purely
moved and/or reindented, _and_ that in fact disappears in subsequent
patches.  So, I plead myself innocent).

Paolo Bonzini (4):
  do not use qemu_icount_delta in the !use_icount case
  qemu_next_deadline should not consider host-time timers
  rewrite accounting of wait time to the vm_clock
  inline qemu_icount_delta

 qemu-timer.c |  114 ++++++++++++++++++++++++---------------------------------
 1 files changed, 48 insertions(+), 66 deletions(-)

-- 
1.7.3.5

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 1/4] do not use qemu_icount_delta in the !use_icount case
  2011-02-21  8:51 [Qemu-devel] [PATCH 0/4] Improve -icount, fix it with iothread Paolo Bonzini
@ 2011-02-21  8:51 ` Paolo Bonzini
  2011-02-21  8:51 ` [Qemu-devel] [PATCH 2/4] qemu_next_deadline should not consider host-time timers Paolo Bonzini
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2011-02-21  8:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: edgar.iglesias

The !use_icount code is the same for iothread and non-iothread,
except that the timeout is different.  Since the timeout might as
well be infinite and is only masking bugs, use the higher value.
With this change the !use_icount code is handled equivalently
in qemu_icount_delta and qemu_calculate_timeout, and we rip it
out of the former.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qemu-timer.c |   59 ++++++++++++++++++++++++---------------------------------
 1 files changed, 25 insertions(+), 34 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index b0db780..88c7b28 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -112,9 +112,7 @@ static int64_t cpu_get_clock(void)
 
 static int64_t qemu_icount_delta(void)
 {
-    if (!use_icount) {
-        return 5000 * (int64_t) 1000000;
-    } else if (use_icount == 1) {
+    if (use_icount == 1) {
         /* When not using an adaptive execution frequency
            we tend to get badly out of sync with real time,
            so just delay for a reasonable amount of time.  */
@@ -1077,43 +1075,36 @@ void quit_timers(void)
 int qemu_calculate_timeout(void)
 {
     int timeout;
+    int64_t add;
+    int64_t delta;
 
-#ifdef CONFIG_IOTHREAD
     /* When using icount, making forward progress with qemu_icount when the
        guest CPU is idle is critical. We only use the static io-thread timeout
        for non icount runs.  */
-    if (!use_icount) {
-        return 1000;
+    if (!use_icount || !vm_running) {
+        return 5000;
     }
-#endif
 
-    if (!vm_running)
-        timeout = 5000;
-    else {
-     /* XXX: use timeout computed from timers */
-        int64_t add;
-        int64_t delta;
-        /* Advance virtual time to the next event.  */
-	delta = qemu_icount_delta();
-        if (delta > 0) {
-            /* If virtual time is ahead of real time then just
-               wait for IO.  */
-            timeout = (delta + 999999) / 1000000;
-        } else {
-            /* Wait for either IO to occur or the next
-               timer event.  */
-            add = qemu_next_deadline();
-            /* We advance the timer before checking for IO.
-               Limit the amount we advance so that early IO
-               activity won't get the guest too far ahead.  */
-            if (add > 10000000)
-                add = 10000000;
-            delta += add;
-            qemu_icount += qemu_icount_round (add);
-            timeout = delta / 1000000;
-            if (timeout < 0)
-                timeout = 0;
-        }
+    /* Advance virtual time to the next event.  */
+    delta = qemu_icount_delta();
+    if (delta > 0) {
+        /* If virtual time is ahead of real time then just
+           wait for IO.  */
+        timeout = (delta + 999999) / 1000000;
+    } else {
+        /* Wait for either IO to occur or the next
+           timer event.  */
+        add = qemu_next_deadline();
+        /* We advance the timer before checking for IO.
+           Limit the amount we advance so that early IO
+           activity won't get the guest too far ahead.  */
+        if (add > 10000000)
+            add = 10000000;
+        delta += add;
+        qemu_icount += qemu_icount_round (add);
+        timeout = delta / 1000000;
+        if (timeout < 0)
+            timeout = 0;
     }
 
     return timeout;
-- 
1.7.3.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 2/4] qemu_next_deadline should not consider host-time timers
  2011-02-21  8:51 [Qemu-devel] [PATCH 0/4] Improve -icount, fix it with iothread Paolo Bonzini
  2011-02-21  8:51 ` [Qemu-devel] [PATCH 1/4] do not use qemu_icount_delta in the !use_icount case Paolo Bonzini
@ 2011-02-21  8:51 ` Paolo Bonzini
  2011-02-21  8:51 ` [Qemu-devel] [PATCH 3/4] rewrite accounting of wait time to the vm_clock Paolo Bonzini
                   ` (2 subsequent siblings)
  4 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2011-02-21  8:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: edgar.iglesias

It is purely for icount-based virtual timers.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qemu-timer.c |    7 +------
 1 files changed, 1 insertions(+), 6 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index 88c7b28..06fa507 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -689,16 +689,11 @@ int64_t qemu_next_deadline(void)
     /* To avoid problems with overflow limit this to 2^32.  */
     int64_t delta = INT32_MAX;
 
+    assert(use_icount);
     if (active_timers[QEMU_CLOCK_VIRTUAL]) {
         delta = active_timers[QEMU_CLOCK_VIRTUAL]->expire_time -
                      qemu_get_clock_ns(vm_clock);
     }
-    if (active_timers[QEMU_CLOCK_HOST]) {
-        int64_t hdelta = active_timers[QEMU_CLOCK_HOST]->expire_time -
-                 qemu_get_clock_ns(host_clock);
-        if (hdelta < delta)
-            delta = hdelta;
-    }
 
     if (delta < 0)
         delta = 0;
-- 
1.7.3.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 3/4] rewrite accounting of wait time to the vm_clock
  2011-02-21  8:51 [Qemu-devel] [PATCH 0/4] Improve -icount, fix it with iothread Paolo Bonzini
  2011-02-21  8:51 ` [Qemu-devel] [PATCH 1/4] do not use qemu_icount_delta in the !use_icount case Paolo Bonzini
  2011-02-21  8:51 ` [Qemu-devel] [PATCH 2/4] qemu_next_deadline should not consider host-time timers Paolo Bonzini
@ 2011-02-21  8:51 ` Paolo Bonzini
  2011-02-21  8:51 ` [Qemu-devel] [PATCH 4/4] inline qemu_icount_delta Paolo Bonzini
  2011-02-23 10:18 ` [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread Edgar E. Iglesias
  4 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2011-02-21  8:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: edgar.iglesias

The current code is advancing qemu_icount before waiting for I/O.
Instead, after the patch qemu_icount is left aside (it is a pure
instruction counter) and qemu_icount_bias is changed according to
the actual amount of time spent in the wait.  This is more
accurate, and actually works in the iothread case as well.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qemu-timer.c |   78 +++++++++++++++++++++++++++++-----------------------------
 1 files changed, 39 insertions(+), 39 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index 06fa507..163ec69 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -614,6 +614,39 @@ void configure_icount(const char *option)
                    qemu_get_clock(vm_clock) + get_ticks_per_sec() / 10);
 }
 
+static int64_t cpu_clock_last_read;
+
+int qemu_calculate_timeout(void)
+{
+    int64_t delta;
+
+    /* When using icount, vm_clock timers are handled outside of the alarm
+       timer.  So, wait for I/O in "small bits" to ensure forward progress of
+       vm_clock when the guest CPU is idle.  When not using icount, though, we
+       just wait for a fixed amount of time (it might as well be infinite).  */
+    if (!use_icount || !vm_running) {
+        return 5000;
+    }
+
+    delta = qemu_icount_delta();
+    if (delta > 0) {
+        /* Virtual time is ahead of real time, wait for it to sync.  Time
+           spent waiting for I/O will not be counted.  */
+        cpu_clock_last_read = -1;
+    } else {
+        /* Wait until the next virtual time event, and account the wait
+           as virtual time.  */
+        delta = qemu_next_deadline();
+        cpu_clock_last_read = cpu_get_clock();
+    }
+
+    if (delta > 0) {
+        return (delta + 999999) / 1000000;
+    } else {
+        return 0;
+    }
+}
+
 void qemu_run_all_timers(void)
 {
     alarm_timer->pending = 0;
@@ -626,6 +659,12 @@ void qemu_run_all_timers(void)
 
     /* vm time timers */
     if (vm_running) {
+        if (use_icount && cpu_clock_last_read != -1) {
+            /* Virtual time passed without executing instructions.  Increase
+               the bias between instruction count and virtual time.  */
+            qemu_icount_bias += cpu_get_clock() - cpu_clock_last_read;
+            cpu_clock_last_read = -1;
+        }
         qemu_run_timers(vm_clock);
     }
 
@@ -1066,42 +1105,3 @@ void quit_timers(void)
     alarm_timer = NULL;
     t->stop(t);
 }
-
-int qemu_calculate_timeout(void)
-{
-    int timeout;
-    int64_t add;
-    int64_t delta;
-
-    /* When using icount, making forward progress with qemu_icount when the
-       guest CPU is idle is critical. We only use the static io-thread timeout
-       for non icount runs.  */
-    if (!use_icount || !vm_running) {
-        return 5000;
-    }
-
-    /* Advance virtual time to the next event.  */
-    delta = qemu_icount_delta();
-    if (delta > 0) {
-        /* If virtual time is ahead of real time then just
-           wait for IO.  */
-        timeout = (delta + 999999) / 1000000;
-    } else {
-        /* Wait for either IO to occur or the next
-           timer event.  */
-        add = qemu_next_deadline();
-        /* We advance the timer before checking for IO.
-           Limit the amount we advance so that early IO
-           activity won't get the guest too far ahead.  */
-        if (add > 10000000)
-            add = 10000000;
-        delta += add;
-        qemu_icount += qemu_icount_round (add);
-        timeout = delta / 1000000;
-        if (timeout < 0)
-            timeout = 0;
-    }
-
-    return timeout;
-}
-
-- 
1.7.3.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] [PATCH 4/4] inline qemu_icount_delta
  2011-02-21  8:51 [Qemu-devel] [PATCH 0/4] Improve -icount, fix it with iothread Paolo Bonzini
                   ` (2 preceding siblings ...)
  2011-02-21  8:51 ` [Qemu-devel] [PATCH 3/4] rewrite accounting of wait time to the vm_clock Paolo Bonzini
@ 2011-02-21  8:51 ` Paolo Bonzini
  2011-02-23 10:18 ` [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread Edgar E. Iglesias
  4 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2011-02-21  8:51 UTC (permalink / raw)
  To: qemu-devel; +Cc: edgar.iglesias

In order to improve accuracy with -icount N, we inline qemu_icount_delta
into qemu_calculate_timeout.

This way, we can still avoid that virtual time gets too far ahead of
real time when using low speeds.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 qemu-timer.c |   26 +++++++++++---------------
 1 files changed, 11 insertions(+), 15 deletions(-)

diff --git a/qemu-timer.c b/qemu-timer.c
index 163ec69..5cc7e60 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -110,18 +110,6 @@ static int64_t cpu_get_clock(void)
     }
 }
 
-static int64_t qemu_icount_delta(void)
-{
-    if (use_icount == 1) {
-        /* When not using an adaptive execution frequency
-           we tend to get badly out of sync with real time,
-           so just delay for a reasonable amount of time.  */
-        return 0;
-    } else {
-        return cpu_get_icount() - cpu_get_clock();
-    }
-}
-
 /* enable cpu_get_ticks() */
 void cpu_enable_ticks(void)
 {
@@ -618,6 +606,7 @@ static int64_t cpu_clock_last_read;
 
 int qemu_calculate_timeout(void)
 {
+    int64_t cur_time, cur_icount;
     int64_t delta;
 
     /* When using icount, vm_clock timers are handled outside of the alarm
@@ -628,10 +617,17 @@ int qemu_calculate_timeout(void)
         return 5000;
     }
 
-    delta = qemu_icount_delta();
-    if (delta > 0) {
+    cur_time = cpu_get_clock();
+    cur_icount = cpu_get_icount();
+    if (cur_icount > cur_time) {
         /* Virtual time is ahead of real time, wait for it to sync.  Time
-           spent waiting for I/O will not be counted.  */
+           spent waiting for I/O will not be counted.  Be careful to avoid
+           overflow.  */
+        if (cur_icount > cur_time + INT32_MAX) {
+            delta = INT32_MAX;
+        } else {
+            delta = cur_icount - cur_time;
+        }
         cpu_clock_last_read = -1;
     } else {
         /* Wait until the next virtual time event, and account the wait
-- 
1.7.3.5

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
  2011-02-21  8:51 [Qemu-devel] [PATCH 0/4] Improve -icount, fix it with iothread Paolo Bonzini
                   ` (3 preceding siblings ...)
  2011-02-21  8:51 ` [Qemu-devel] [PATCH 4/4] inline qemu_icount_delta Paolo Bonzini
@ 2011-02-23 10:18 ` Edgar E. Iglesias
  2011-02-23 10:25   ` Paolo Bonzini
  4 siblings, 1 reply; 15+ messages in thread
From: Edgar E. Iglesias @ 2011-02-23 10:18 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Mon, Feb 21, 2011 at 09:51:22AM +0100, Paolo Bonzini wrote:
> This series redoes the way time spent waiting for I/O is accounted to
> the vm_clock.
> 
> The current code is advancing qemu_icount before waiting for I/O.
> Instead, after the patch qemu_icount is left aside (it is a pure
> instruction counter) and qemu_icount_bias is changed according to
> the actual amount of time spent in the wait.  This is more
> accurate, and actually works in the iothread case as well.
> 
> (I started this as an experiment while trying to understand what was
> going on.  But it fixes the bug and does not break the no-iothread
> case, so hey...).
> 
> Patch 1 is a cleanup to Edgar's commit 225d02c (Avoid deadlock whith
> iothread and icount, 2011-01-23).

Thanks, this one was a good cleanup, I've applied it.

> Patch 2 fixes another misunderstanding in the role of qemu_next_deadline.
> 
> Patches 3 and 4 implement the actual new accounting algorithm.

Sorry, I don't know the code well enough to give any sensible feedback
on patch 2 - 4. I did test them with some of my guests and things seem
to be OK with them but quite a bit slower.
I saw around 10 - 20% slowdown with a cris guest and -icount 10.

The slow down might be related to the issue with super slow icount together
with iothread (adressed by Marcelos iothread timeout patch).

> With these patches, iothread "-icount N" doesn't work when the actual
> execution speed cannot keep up with the requested speed; the execution
> in that case is not deterministic.  It works when the requested speed
> is slow enough.

Sorry, would you mind explaning this a bit?

For example, if I have a machine and guest sw that does no IO. It runs
the CPU and only uses devices that use the virtual time (e.g timers
and peripherals that compute stuff). Can I expect the guest (with
fixed icount speed "-icount N") to run deterministically regardless of
host speed?

Cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
  2011-02-23 10:18 ` [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread Edgar E. Iglesias
@ 2011-02-23 10:25   ` Paolo Bonzini
  2011-02-23 11:08     ` Edgar E. Iglesias
  0 siblings, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2011-02-23 10:25 UTC (permalink / raw)
  To: Edgar E. Iglesias; +Cc: qemu-devel

On 02/23/2011 11:18 AM, Edgar E. Iglesias wrote:
> Sorry, I don't know the code well enough to give any sensible feedback
> on patch 2 - 4. I did test them with some of my guests and things seem
> to be OK with them but quite a bit slower.
> I saw around 10 - 20% slowdown with a cris guest and -icount 10.
>
> The slow down might be related to the issue with super slow icount together
> with iothread (adressed by Marcelos iothread timeout patch).

No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to 
"looks like it deadlocked" anyway.  Also, Jan has ideas on how to remove 
the synchronization overhead in the main loop for TCG+iothread.

Have you tested without iothread too, both to check the speed and to 
ensure this introduces no regressions?

> >  With these patches, iothread "-icount N" doesn't work when the actual
> >  execution speed cannot keep up with the requested speed; the execution
> >  in that case is not deterministic.  It works when the requested speed
> >  is slow enough.
>
> Sorry, would you mind explaning this a bit?

-icount 0 doesn't give 1000 BogoMIPS unless the machine is fast enough 
to sustain it.  But that's a bug.  These patches are meant to be a start.

> For example, if I have a machine and guest sw that does no IO. It runs
> the CPU and only uses devices that use the virtual time (e.g timers
> and peripherals that compute stuff). Can I expect the guest (with
> fixed icount speed "-icount N") to run deterministically regardless of
> host speed?

Right now, only if the N is large enough for the host machine to sustain 
that speed.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
  2011-02-23 10:25   ` Paolo Bonzini
@ 2011-02-23 11:08     ` Edgar E. Iglesias
  2011-02-23 11:39       ` Jan Kiszka
  2011-02-23 12:42       ` Paolo Bonzini
  0 siblings, 2 replies; 15+ messages in thread
From: Edgar E. Iglesias @ 2011-02-23 11:08 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: qemu-devel

On Wed, Feb 23, 2011 at 11:25:54AM +0100, Paolo Bonzini wrote:
> On 02/23/2011 11:18 AM, Edgar E. Iglesias wrote:
> > Sorry, I don't know the code well enough to give any sensible feedback
> > on patch 2 - 4. I did test them with some of my guests and things seem
> > to be OK with them but quite a bit slower.
> > I saw around 10 - 20% slowdown with a cris guest and -icount 10.
> >
> > The slow down might be related to the issue with super slow icount together
> > with iothread (adressed by Marcelos iothread timeout patch).
> 
> No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to 
> "looks like it deadlocked" anyway.  Also, Jan has ideas on how to remove 
> the synchronization overhead in the main loop for TCG+iothread.

I see. I tried booting two of my MIPS and CRIS linux guests with iothread
and -icount 4. Without your patch, the boot crawls super slow. Your patch
gives a huge improvement. This was the "deadlock" scenario which I
mentioned in previous emails.

Just to clarify the previous test where I saw slowdown with your patch:
A CRIS setup that has a CRIS and basically only two peripherals,
a timer block and a device (X) that computes stuff but delays the results
with a virtual timer. The guest CPU is 99% of the time just
busy-waiting for device X to get ready.

This latter test runs in 3.7s with icount 4 and without iothread,
with or without your patch.

With icount 4 and iothread it runs in ~1m5s without your patch and
~1m20s with your patch. That was the 20% slowdown I mentioned earlier.

Don't know if that info helps...


> Have you tested without iothread too, both to check the speed and to 
> ensure this introduces no regressions?

I tried now, I see no regression without iothread.

> > >  With these patches, iothread "-icount N" doesn't work when the actual
> > >  execution speed cannot keep up with the requested speed; the execution
> > >  in that case is not deterministic.  It works when the requested speed
> > >  is slow enough.
> >
> > Sorry, would you mind explaning this a bit?
> 
> -icount 0 doesn't give 1000 BogoMIPS unless the machine is fast enough 
> to sustain it.  But that's a bug.  These patches are meant to be a start.
> 
> > For example, if I have a machine and guest sw that does no IO. It runs
> > the CPU and only uses devices that use the virtual time (e.g timers
> > and peripherals that compute stuff). Can I expect the guest (with
> > fixed icount speed "-icount N") to run deterministically regardless of
> > host speed?
> 
> Right now, only if the N is large enough for the host machine to sustain 
> that speed.

OK thanks.

Cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
  2011-02-23 11:08     ` Edgar E. Iglesias
@ 2011-02-23 11:39       ` Jan Kiszka
  2011-02-23 12:40         ` Edgar E. Iglesias
  2011-02-25 19:33         ` Paolo Bonzini
  2011-02-23 12:42       ` Paolo Bonzini
  1 sibling, 2 replies; 15+ messages in thread
From: Jan Kiszka @ 2011-02-23 11:39 UTC (permalink / raw)
  To: Edgar E. Iglesias; +Cc: Paolo Bonzini, qemu-devel

On 2011-02-23 12:08, Edgar E. Iglesias wrote:
> On Wed, Feb 23, 2011 at 11:25:54AM +0100, Paolo Bonzini wrote:
>> On 02/23/2011 11:18 AM, Edgar E. Iglesias wrote:
>>> Sorry, I don't know the code well enough to give any sensible feedback
>>> on patch 2 - 4. I did test them with some of my guests and things seem
>>> to be OK with them but quite a bit slower.
>>> I saw around 10 - 20% slowdown with a cris guest and -icount 10.
>>>
>>> The slow down might be related to the issue with super slow icount together
>>> with iothread (adressed by Marcelos iothread timeout patch).
>>
>> No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to 
>> "looks like it deadlocked" anyway.  Also, Jan has ideas on how to remove 
>> the synchronization overhead in the main loop for TCG+iothread.
> 
> I see. I tried booting two of my MIPS and CRIS linux guests with iothread
> and -icount 4. Without your patch, the boot crawls super slow. Your patch
> gives a huge improvement. This was the "deadlock" scenario which I
> mentioned in previous emails.
> 
> Just to clarify the previous test where I saw slowdown with your patch:
> A CRIS setup that has a CRIS and basically only two peripherals,
> a timer block and a device (X) that computes stuff but delays the results
> with a virtual timer. The guest CPU is 99% of the time just
> busy-waiting for device X to get ready.
> 
> This latter test runs in 3.7s with icount 4 and without iothread,
> with or without your patch.
> 
> With icount 4 and iothread it runs in ~1m5s without your patch and
> ~1m20s with your patch. That was the 20% slowdown I mentioned earlier.
> 
> Don't know if that info helps...

You should try to trace the event flow in qemu, either via strace, via
the built-in tracer (which likely requires a bit more tracepoints), or
via a system-level tracer (ftrace / kernelshark).

Did my patches contribute a bit to overhead reduction? They specifically
target the costly vcpu/iothread switches in TCG mode (caused by TCGs
excessive lock-holding times).

Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
  2011-02-23 11:39       ` Jan Kiszka
@ 2011-02-23 12:40         ` Edgar E. Iglesias
  2011-02-23 12:45           ` Jan Kiszka
  2011-02-25 19:33         ` Paolo Bonzini
  1 sibling, 1 reply; 15+ messages in thread
From: Edgar E. Iglesias @ 2011-02-23 12:40 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Paolo Bonzini, qemu-devel

On Wed, Feb 23, 2011 at 12:39:52PM +0100, Jan Kiszka wrote:
> On 2011-02-23 12:08, Edgar E. Iglesias wrote:
> > On Wed, Feb 23, 2011 at 11:25:54AM +0100, Paolo Bonzini wrote:
> >> On 02/23/2011 11:18 AM, Edgar E. Iglesias wrote:
> >>> Sorry, I don't know the code well enough to give any sensible feedback
> >>> on patch 2 - 4. I did test them with some of my guests and things seem
> >>> to be OK with them but quite a bit slower.
> >>> I saw around 10 - 20% slowdown with a cris guest and -icount 10.
> >>>
> >>> The slow down might be related to the issue with super slow icount together
> >>> with iothread (adressed by Marcelos iothread timeout patch).
> >>
> >> No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to 
> >> "looks like it deadlocked" anyway.  Also, Jan has ideas on how to remove 
> >> the synchronization overhead in the main loop for TCG+iothread.
> > 
> > I see. I tried booting two of my MIPS and CRIS linux guests with iothread
> > and -icount 4. Without your patch, the boot crawls super slow. Your patch
> > gives a huge improvement. This was the "deadlock" scenario which I
> > mentioned in previous emails.
> > 
> > Just to clarify the previous test where I saw slowdown with your patch:
> > A CRIS setup that has a CRIS and basically only two peripherals,
> > a timer block and a device (X) that computes stuff but delays the results
> > with a virtual timer. The guest CPU is 99% of the time just
> > busy-waiting for device X to get ready.
> > 
> > This latter test runs in 3.7s with icount 4 and without iothread,
> > with or without your patch.
> > 
> > With icount 4 and iothread it runs in ~1m5s without your patch and
> > ~1m20s with your patch. That was the 20% slowdown I mentioned earlier.
> > 
> > Don't know if that info helps...
> 
> You should try to trace the event flow in qemu, either via strace, via
> the built-in tracer (which likely requires a bit more tracepoints), or
> via a system-level tracer (ftrace / kernelshark).

Thanks, I'll see if I can get some time to run this more carefully during
some weekend.

> 
> Did my patches contribute a bit to overhead reduction? They specifically
> target the costly vcpu/iothread switches in TCG mode (caused by TCGs
> excessive lock-holding times).

Do you have a tree for quick access to your patches? (couldnt find them
on my inbox).

I could give them a quick go and post results.

Cheers

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
  2011-02-23 11:08     ` Edgar E. Iglesias
  2011-02-23 11:39       ` Jan Kiszka
@ 2011-02-23 12:42       ` Paolo Bonzini
  2011-02-23 16:27         ` Edgar E. Iglesias
  1 sibling, 1 reply; 15+ messages in thread
From: Paolo Bonzini @ 2011-02-23 12:42 UTC (permalink / raw)
  To: Edgar E. Iglesias; +Cc: qemu-devel

On 02/23/2011 12:08 PM, Edgar E. Iglesias wrote:
>> >  No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to
>> >  "looks like it deadlocked" anyway.  Also, Jan has ideas on how to remove
>> >  the synchronization overhead in the main loop for TCG+iothread.
> I see. I tried booting two of my MIPS and CRIS linux guests with iothread
> and -icount 4. Without your patch, the boot crawls super slow. Your patch
> gives a huge improvement. This was the "deadlock" scenario which I
> mentioned in previous emails.
>
> Just to clarify the previous test where I saw slowdown with your patch:
> A CRIS setup that has a CRIS and basically only two peripherals,
> a timer block and a device (X) that computes stuff but delays the results
> with a virtual timer. The guest CPU is 99% of the time just
> busy-waiting for device X to get ready.
>
> This latter test runs in 3.7s with icount 4 and without iothread,
> with or without your patch.

Thanks for testing this.

> With icount 4 and iothread it runs in ~1m5s without your patch and
> ~1m20s with your patch. That was the 20% slowdown I mentioned earlier.

Ok, so it is in both cases with iothread.  We go from 16x slowdown to 
19x on one testcase :) and "huge improvement" on another.  (Also, the 
CRIS images on qemu.org simply hang for me without my patch and numeric 
icount---and the watchdog triggers---so that's another factor in favor 
of the patches).  I guess we can live with the slowdown for now, if 
somebody else finds the patch okay.

Do you have images for the slow test?

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
  2011-02-23 12:40         ` Edgar E. Iglesias
@ 2011-02-23 12:45           ` Jan Kiszka
  0 siblings, 0 replies; 15+ messages in thread
From: Jan Kiszka @ 2011-02-23 12:45 UTC (permalink / raw)
  To: Edgar E. Iglesias; +Cc: Paolo Bonzini, qemu-devel@nongnu.org

On 2011-02-23 13:40, Edgar E. Iglesias wrote:
> On Wed, Feb 23, 2011 at 12:39:52PM +0100, Jan Kiszka wrote:
>> On 2011-02-23 12:08, Edgar E. Iglesias wrote:
>>> On Wed, Feb 23, 2011 at 11:25:54AM +0100, Paolo Bonzini wrote:
>>>> On 02/23/2011 11:18 AM, Edgar E. Iglesias wrote:
>>>>> Sorry, I don't know the code well enough to give any sensible feedback
>>>>> on patch 2 - 4. I did test them with some of my guests and things seem
>>>>> to be OK with them but quite a bit slower.
>>>>> I saw around 10 - 20% slowdown with a cris guest and -icount 10.
>>>>>
>>>>> The slow down might be related to the issue with super slow icount together
>>>>> with iothread (adressed by Marcelos iothread timeout patch).
>>>>
>>>> No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to 
>>>> "looks like it deadlocked" anyway.  Also, Jan has ideas on how to remove 
>>>> the synchronization overhead in the main loop for TCG+iothread.
>>>
>>> I see. I tried booting two of my MIPS and CRIS linux guests with iothread
>>> and -icount 4. Without your patch, the boot crawls super slow. Your patch
>>> gives a huge improvement. This was the "deadlock" scenario which I
>>> mentioned in previous emails.
>>>
>>> Just to clarify the previous test where I saw slowdown with your patch:
>>> A CRIS setup that has a CRIS and basically only two peripherals,
>>> a timer block and a device (X) that computes stuff but delays the results
>>> with a virtual timer. The guest CPU is 99% of the time just
>>> busy-waiting for device X to get ready.
>>>
>>> This latter test runs in 3.7s with icount 4 and without iothread,
>>> with or without your patch.
>>>
>>> With icount 4 and iothread it runs in ~1m5s without your patch and
>>> ~1m20s with your patch. That was the 20% slowdown I mentioned earlier.
>>>
>>> Don't know if that info helps...
>>
>> You should try to trace the event flow in qemu, either via strace, via
>> the built-in tracer (which likely requires a bit more tracepoints), or
>> via a system-level tracer (ftrace / kernelshark).
> 
> Thanks, I'll see if I can get some time to run this more carefully during
> some weekend.
> 
>>
>> Did my patches contribute a bit to overhead reduction? They specifically
>> target the costly vcpu/iothread switches in TCG mode (caused by TCGs
>> excessive lock-holding times).
> 
> Do you have a tree for quick access to your patches? (couldnt find them
> on my inbox).

http://thread.gmane.org/gmane.comp.emulators.qemu/93765
(looks like I failed to CC you)

and they are also part of

git://git.kiszka.org/qemu-kvm.git queues/kvm-upstream

> 
> I could give them a quick go and post results.
> 
> Cheers

Thanks,
Jan

-- 
Siemens AG, Corporate Technology, CT T DE IT 1
Corporate Competence Center Embedded Linux

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
  2011-02-23 12:42       ` Paolo Bonzini
@ 2011-02-23 16:27         ` Edgar E. Iglesias
  2011-02-23 16:32           ` Paolo Bonzini
  0 siblings, 1 reply; 15+ messages in thread
From: Edgar E. Iglesias @ 2011-02-23 16:27 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: Jan Kiszka, qemu-devel

On Wed, Feb 23, 2011 at 01:42:59PM +0100, Paolo Bonzini wrote:
> On 02/23/2011 12:08 PM, Edgar E. Iglesias wrote:
> >> >  No, this supersedes Marcelo's patch.  10-20% doesn't seem comparable to
> >> >  "looks like it deadlocked" anyway.  Also, Jan has ideas on how to remove
> >> >  the synchronization overhead in the main loop for TCG+iothread.
> > I see. I tried booting two of my MIPS and CRIS linux guests with iothread
> > and -icount 4. Without your patch, the boot crawls super slow. Your patch
> > gives a huge improvement. This was the "deadlock" scenario which I
> > mentioned in previous emails.
> >
> > Just to clarify the previous test where I saw slowdown with your patch:
> > A CRIS setup that has a CRIS and basically only two peripherals,
> > a timer block and a device (X) that computes stuff but delays the results
> > with a virtual timer. The guest CPU is 99% of the time just
> > busy-waiting for device X to get ready.
> >
> > This latter test runs in 3.7s with icount 4 and without iothread,
> > with or without your patch.

Sorry, typo here. I ran -icount 10, not 4.

> 
> Thanks for testing this.
> 
> > With icount 4 and iothread it runs in ~1m5s without your patch and
> > ~1m20s with your patch. That was the 20% slowdown I mentioned earlier.
> 
> Ok, so it is in both cases with iothread.  We go from 16x slowdown to 
> 19x on one testcase :) and "huge improvement" on another.  (Also, the 
> CRIS images on qemu.org simply hang for me without my patch and numeric 
> icount---and the watchdog triggers---so that's another factor in favor 
> of the patches).  I guess we can live with the slowdown for now, if 
> somebody else finds the patch okay.

I agree. It would be nice with someone else review aswell though. Jan?


> Do you have images for the slow test?

the 16x vs 19x slowdown testcase is unfortunately on a proprietary machine
which I cant release at the moment. I'll have to debug that case at
some point.

For the "almost deadlock" testcase, I think the CRIS image on the wiki is
OK. With the following patch, the watchdog can be disabled. Run with
icount & iothread and you'll see the crawling. With your patch you'll
see the the guest booting much faster than before. Still slower than
non iothread builds though. In particular if you compare icount 10, with
and without iothread.

diff --git a/hw/etraxfs_timer.c b/hw/etraxfs_timer.c
index 133741b..2223744 100644
--- a/hw/etraxfs_timer.c
+++ b/hw/etraxfs_timer.c
@@ -197,8 +197,8 @@ static void watchdog_hit(void *opaque)
         ptimer_run(t->ptimer_wd, 1);
         qemu_irq_raise(t->nmi);
     }
-    else
-        qemu_system_reset_request();
 
     t->wd_hits++;
 }

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
  2011-02-23 16:27         ` Edgar E. Iglesias
@ 2011-02-23 16:32           ` Paolo Bonzini
  0 siblings, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2011-02-23 16:32 UTC (permalink / raw)
  To: Edgar E. Iglesias; +Cc: Jan Kiszka, qemu-devel

On 02/23/2011 05:27 PM, Edgar E. Iglesias wrote:
> For the "almost deadlock" testcase, I think the CRIS image on the
> wiki is OK.

Ah yes, I was understanding that the watchdog itself slowed down too and
didn't trigger.  So I was indeed seeing it (and not patient enough to 
see it boot).

> Still slower than non iothread builds though. In particular if you
> compare icount 10, with and without iothread.

Very good, thanks.  I'll look at it after this series is in, and Jan's 
optimization patch as well hopefully.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread
  2011-02-23 11:39       ` Jan Kiszka
  2011-02-23 12:40         ` Edgar E. Iglesias
@ 2011-02-25 19:33         ` Paolo Bonzini
  1 sibling, 0 replies; 15+ messages in thread
From: Paolo Bonzini @ 2011-02-25 19:33 UTC (permalink / raw)
  To: Jan Kiszka; +Cc: Edgar E. Iglesias, qemu-devel

On 02/23/2011 12:39 PM, Jan Kiszka wrote:
> You should try to trace the event flow in qemu, either via strace, via
> the built-in tracer (which likely requires a bit more tracepoints), or
> via a system-level tracer (ftrace / kernelshark).

The apparent problem is that 25% of cycles is spent in mutex locking and
unlocking.  But in fact, the real problem is that 90% of the time is
spent doing something else than executing code.

QEMU exits _a lot_ due to the vm_clock timers.  The deadlines are rarely more
than a few ms ahead, and at 1 MIPS that leaves room for executing a few
thousand instructions for each context switch.  The iothread overhead
is what makes the situation so bad, because it takes a lot more time to
execute those instructions.

We do so many (almost) useless passes through cpu_exec_all that even
microoptimization helps, for example this:

--- a/cpus.c
+++ b/cpus.c
@@ -767,10 +767,6 @@ static void qemu_wait_io_event_common(CPUState *env)
 {
     CPUState *env;

-    while (all_cpu_threads_idle()) {
-        qemu_cond_timedwait(tcg_halt_cond, &qemu_global_mutex, 1000);
-    }
-
     qemu_mutex_unlock(&qemu_global_mutex);

     /*
@@ -1110,7 +1111,15 @@ bool cpu_exec_all(void)
         }
     }
     exit_request = 0;
+
+#ifdef CONFIG_IOTHREAD
+    while (all_cpu_threads_idle()) {
+       qemu_cond_timedwait(tcg_halt_cond, &qemu_global_mutex, 1000);
+    }
+    return true;
+#else
     return !all_cpu_threads_idle();
+#endif
 }

 void set_numa_modes(void)

is enough to cut all_cpu_threads_idle from 9 to 4.5% (not unexpected: the
number of calls is halved).  But it shouldn't be that high anyway, so
I'm not proposing the patch formally.

Additionally, the fact that the execution is 99.99% lockstep means you cannot
really overlap any part of the I/O and VCPU threads.

I found a couple of inaccuracies in my patches that already cut 50% of the
time, though.

> Did my patches contribute a bit to overhead reduction? They specifically
> target the costly vcpu/iothread switches in TCG mode (caused by TCGs
> excessive lock-holding times).

Yes, they cut 15%.

Paolo

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2011-02-25 19:34 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-21  8:51 [Qemu-devel] [PATCH 0/4] Improve -icount, fix it with iothread Paolo Bonzini
2011-02-21  8:51 ` [Qemu-devel] [PATCH 1/4] do not use qemu_icount_delta in the !use_icount case Paolo Bonzini
2011-02-21  8:51 ` [Qemu-devel] [PATCH 2/4] qemu_next_deadline should not consider host-time timers Paolo Bonzini
2011-02-21  8:51 ` [Qemu-devel] [PATCH 3/4] rewrite accounting of wait time to the vm_clock Paolo Bonzini
2011-02-21  8:51 ` [Qemu-devel] [PATCH 4/4] inline qemu_icount_delta Paolo Bonzini
2011-02-23 10:18 ` [Qemu-devel] Re: [PATCH 0/4] Improve -icount, fix it with iothread Edgar E. Iglesias
2011-02-23 10:25   ` Paolo Bonzini
2011-02-23 11:08     ` Edgar E. Iglesias
2011-02-23 11:39       ` Jan Kiszka
2011-02-23 12:40         ` Edgar E. Iglesias
2011-02-23 12:45           ` Jan Kiszka
2011-02-25 19:33         ` Paolo Bonzini
2011-02-23 12:42       ` Paolo Bonzini
2011-02-23 16:27         ` Edgar E. Iglesias
2011-02-23 16:32           ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).