* [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning
@ 2013-04-05 13:46 Anthony Liguori
  2013-04-05 15:04 ` Eric Blake
                   ` (3 more replies)
  0 siblings, 4 replies; 8+ messages in thread
From: Anthony Liguori @ 2013-04-05 13:46 UTC (permalink / raw)
  To: qemu-devel; +Cc: Paolo Bonzini, Anthony Liguori
The char-flow refactoring introduced a busy-wait that depended on
an action from the VCPU thread.  However, the VCPU thread could
never take that action because the busy-wait starved the VCPU thread
of the BQL because it never dropped the mutex while running select.
Paolo doesn't want to drop this optimization for fear that we will
stop detecting these busy waits.  I'm afraid to keep this optimization
even with the busy-wait fixed because I think a similar problem can
occur just with heavy I/O thread load manifesting itself as VCPU pauses.
As a compromise, introduce an artificial timeout after a thousand
iterations but print a rate limited warning when this happens.  This
let's us still detect when this condition occurs without it being
a fatal error.
Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
---
 main-loop.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)
diff --git a/main-loop.c b/main-loop.c
index eb80ff3..34a3983 100644
--- a/main-loop.c
+++ b/main-loop.c
@@ -188,14 +188,39 @@ static void glib_pollfds_poll(void)
     }
 }
 
+#define MAX_MAIN_LOOP_SPIN (1000)
+
 static int os_host_main_loop_wait(uint32_t timeout)
 {
     int ret;
+    static int spin_counter;
 
     glib_pollfds_fill(&timeout);
 
+    /* If the I/O thread is very busy or we are incorrectly busy waiting in
+     * the I/O thread, this can lead to starvation of the BQL such that the
+     * VCPU threads never run.  To make sure we can detect the later case,
+     * print a message to the screen.  If we run into this condition, create
+     * an fake timeout in order to give the VCPU threads a chance to run.
+     */
+    if (spin_counter > MAX_MAIN_LOOP_SPIN) {
+        static bool notified;
+
+        if (!notified) {
+            fprintf(stderr,
+                    "main-loop: WARNING: I/O thread spun for %d iterations\n",
+                    MAX_MAIN_LOOP_SPIN);
+            notified = true;
+        }
+
+        timeout = 1;
+    }
+
     if (timeout > 0) {
+        spin_counter = 0;
         qemu_mutex_unlock_iothread();
+    } else {
+        spin_counter++;
     }
 
     ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout);
-- 
1.8.0
^ permalink raw reply related	[flat|nested] 8+ messages in thread- * Re: [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning
  2013-04-05 13:46 [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning Anthony Liguori
@ 2013-04-05 15:04 ` Eric Blake
  2013-04-05 15:45   ` Anthony Liguori
  2013-04-05 15:05 ` Paolo Bonzini
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 8+ messages in thread
From: Eric Blake @ 2013-04-05 15:04 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel
[-- Attachment #1: Type: text/plain, Size: 1151 bytes --]
On 04/05/2013 07:46 AM, Anthony Liguori wrote:
> The char-flow refactoring introduced a busy-wait that depended on
> an action from the VCPU thread.  However, the VCPU thread could
> never take that action because the busy-wait starved the VCPU thread
> of the BQL because it never dropped the mutex while running select.
> 
> Paolo doesn't want to drop this optimization for fear that we will
> stop detecting these busy waits.  I'm afraid to keep this optimization
> even with the busy-wait fixed because I think a similar problem can
> occur just with heavy I/O thread load manifesting itself as VCPU pauses.
> 
> As a compromise, introduce an artificial timeout after a thousand
> iterations but print a rate limited warning when this happens.  This
> let's us still detect when this condition occurs without it being
> a fatal error.
> 
> +     * print a message to the screen.  If we run into this condition, create
> +     * an fake timeout in order to give the VCPU threads a chance to run.
s/an fake/a fake/
-- 
Eric Blake   eblake redhat com    +1-919-301-3266
Libvirt virtualization library http://libvirt.org
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 621 bytes --]
^ permalink raw reply	[flat|nested] 8+ messages in thread
- * Re: [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning
  2013-04-05 15:04 ` Eric Blake
@ 2013-04-05 15:45   ` Anthony Liguori
  0 siblings, 0 replies; 8+ messages in thread
From: Anthony Liguori @ 2013-04-05 15:45 UTC (permalink / raw)
  To: Eric Blake; +Cc: Paolo Bonzini, qemu-devel
Eric Blake <eblake@redhat.com> writes:
> On 04/05/2013 07:46 AM, Anthony Liguori wrote:
>> The char-flow refactoring introduced a busy-wait that depended on
>> an action from the VCPU thread.  However, the VCPU thread could
>> never take that action because the busy-wait starved the VCPU thread
>> of the BQL because it never dropped the mutex while running select.
>> 
>> Paolo doesn't want to drop this optimization for fear that we will
>> stop detecting these busy waits.  I'm afraid to keep this optimization
>> even with the busy-wait fixed because I think a similar problem can
>> occur just with heavy I/O thread load manifesting itself as VCPU pauses.
>> 
>> As a compromise, introduce an artificial timeout after a thousand
>> iterations but print a rate limited warning when this happens.  This
>> let's us still detect when this condition occurs without it being
>> a fatal error.
>> 
>
>> +     * print a message to the screen.  If we run into this condition, create
>> +     * an fake timeout in order to give the VCPU threads a chance to run.
>
> s/an fake/a fake/
Drat, I proof read the commit message hoping to avoid such a mistake but
should have reread the comment :-)
Regards,
Anthony Liguori
>
> -- 
> Eric Blake   eblake redhat com    +1-919-301-3266
> Libvirt virtualization library http://libvirt.org
^ permalink raw reply	[flat|nested] 8+ messages in thread 
 
- * Re: [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning
  2013-04-05 13:46 [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning Anthony Liguori
  2013-04-05 15:04 ` Eric Blake
@ 2013-04-05 15:05 ` Paolo Bonzini
  2013-04-07  5:12 ` Peter Crosthwaite
  2013-04-08 21:55 ` Anthony Liguori
  3 siblings, 0 replies; 8+ messages in thread
From: Paolo Bonzini @ 2013-04-05 15:05 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel
Il 05/04/2013 15:46, Anthony Liguori ha scritto:
> The char-flow refactoring introduced a busy-wait that depended on
> an action from the VCPU thread.  However, the VCPU thread could
> never take that action because the busy-wait starved the VCPU thread
> of the BQL because it never dropped the mutex while running select.
> 
> Paolo doesn't want to drop this optimization for fear that we will
> stop detecting these busy waits.  I'm afraid to keep this optimization
> even with the busy-wait fixed because I think a similar problem can
> occur just with heavy I/O thread load manifesting itself as VCPU pauses.
> 
> As a compromise, introduce an artificial timeout after a thousand
> iterations but print a rate limited warning when this happens.  This
> let's us still detect when this condition occurs without it being
> a fatal error.
Good idea, thanks!
Paolo
^ permalink raw reply	[flat|nested] 8+ messages in thread 
- * Re: [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning
  2013-04-05 13:46 [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning Anthony Liguori
  2013-04-05 15:04 ` Eric Blake
  2013-04-05 15:05 ` Paolo Bonzini
@ 2013-04-07  5:12 ` Peter Crosthwaite
  2013-04-08 21:55 ` Anthony Liguori
  3 siblings, 0 replies; 8+ messages in thread
From: Peter Crosthwaite @ 2013-04-07  5:12 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel
Hi Anthony,
On Fri, Apr 5, 2013 at 11:46 PM, Anthony Liguori <aliguori@us.ibm.com> wrote:
> The char-flow refactoring introduced a busy-wait that depended on
> an action from the VCPU thread.  However, the VCPU thread could
> never take that action because the busy-wait starved the VCPU thread
> of the BQL because it never dropped the mutex while running select.
>
> Paolo doesn't want to drop this optimization for fear that we will
> stop detecting these busy waits.  I'm afraid to keep this optimization
> even with the busy-wait fixed because I think a similar problem can
> occur just with heavy I/O thread load manifesting itself as VCPU pauses.
>
> As a compromise, introduce an artificial timeout after a thousand
> iterations but print a rate limited warning when this happens.  This
> let's us still detect when this condition occurs without it being
> a fatal error.
>
> Signed-off-by: Anthony Liguori <aliguori@us.ibm.com>
> ---
>  main-loop.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
>
> diff --git a/main-loop.c b/main-loop.c
> index eb80ff3..34a3983 100644
> --- a/main-loop.c
> +++ b/main-loop.c
> @@ -188,14 +188,39 @@ static void glib_pollfds_poll(void)
>      }
>  }
>
> +#define MAX_MAIN_LOOP_SPIN (1000)
> +
>  static int os_host_main_loop_wait(uint32_t timeout)
>  {
>      int ret;
> +    static int spin_counter;
>
>      glib_pollfds_fill(&timeout);
>
> +    /* If the I/O thread is very busy or we are incorrectly busy waiting in
> +     * the I/O thread, this can lead to starvation of the BQL such that the
> +     * VCPU threads never run.  To make sure we can detect the later case,
> +     * print a message to the screen.  If we run into this condition, create
> +     * an fake timeout in order to give the VCPU threads a chance to run.
> +     */
> +    if (spin_counter > MAX_MAIN_LOOP_SPIN) {
> +        static bool notified;
> +
> +        if (!notified) {
> +            fprintf(stderr,
> +                    "main-loop: WARNING: I/O thread spun for %d iterations\n",
> +                    MAX_MAIN_LOOP_SPIN);
> +            notified = true;
> +        }
> +
> +        timeout = 1;
> +    }
> +
>      if (timeout > 0) {
> +        spin_counter = 0;
This is too slow. Resetting the spin counter on every unlock means you
suffer the 1000 iteration wait every time until the starvation is
over. Im getting ten second delays when I mouse wheel paste 40+ chars
into the serial. I'll put my fix on list.
Regards,
Peter
>          qemu_mutex_unlock_iothread();
> +    } else {
> +        spin_counter++;
>      }
>
>      ret = g_poll((GPollFD *)gpollfds->data, gpollfds->len, timeout);
> --
> 1.8.0
>
>
^ permalink raw reply	[flat|nested] 8+ messages in thread
- * Re: [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning
  2013-04-05 13:46 [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning Anthony Liguori
                   ` (2 preceding siblings ...)
  2013-04-07  5:12 ` Peter Crosthwaite
@ 2013-04-08 21:55 ` Anthony Liguori
  2013-04-09 17:50   ` Andreas Färber
  3 siblings, 1 reply; 8+ messages in thread
From: Anthony Liguori @ 2013-04-08 21:55 UTC (permalink / raw)
  To: Anthony Liguori, qemu-devel; +Cc: Paolo Bonzini
Applied.  Thanks.
Regards,
Anthony Liguori
^ permalink raw reply	[flat|nested] 8+ messages in thread
- * Re: [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning
  2013-04-08 21:55 ` Anthony Liguori
@ 2013-04-09 17:50   ` Andreas Färber
  2013-04-09 18:28     ` Anthony Liguori
  0 siblings, 1 reply; 8+ messages in thread
From: Andreas Färber @ 2013-04-09 17:50 UTC (permalink / raw)
  To: Anthony Liguori; +Cc: Paolo Bonzini, qemu-devel, Peter Maydell
Am 08.04.2013 23:55, schrieb Anthony Liguori:
> Applied.  Thanks.
I am seeing "main-loop: WARNING: I/O thread spun for 1000 iterations"
for both check-qtest-i386 and check-qtest-x86_64 now, is that expected?
PMM just recently cleaned up the noisy arm qtest target... :(
Andreas
-- 
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
^ permalink raw reply	[flat|nested] 8+ messages in thread 
- * Re: [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning
  2013-04-09 17:50   ` Andreas Färber
@ 2013-04-09 18:28     ` Anthony Liguori
  0 siblings, 0 replies; 8+ messages in thread
From: Anthony Liguori @ 2013-04-09 18:28 UTC (permalink / raw)
  To: Andreas Färber; +Cc: Paolo Bonzini, qemu-devel, Peter Maydell
Andreas Färber <afaerber@suse.de> writes:
> Am 08.04.2013 23:55, schrieb Anthony Liguori:
>> Applied.  Thanks.
>
> I am seeing "main-loop: WARNING: I/O thread spun for 1000 iterations"
> for both check-qtest-i386 and check-qtest-x86_64 now, is that expected?
>
> PMM just recently cleaned up the noisy arm qtest target... :(
No, it's not expected.  Sounds like there's a flow control bug in qtest.
Regards,
Anthony Liguori
>
> Andreas
>
> -- 
> SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg, Germany
> GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer; HRB 16746 AG Nürnberg
^ permalink raw reply	[flat|nested] 8+ messages in thread 
 
 
end of thread, other threads:[~2013-04-09 18:29 UTC | newest]
Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-04-05 13:46 [Qemu-devel] [PATCH] main-loop: drop the BQL if the I/O appears to be spinning Anthony Liguori
2013-04-05 15:04 ` Eric Blake
2013-04-05 15:45   ` Anthony Liguori
2013-04-05 15:05 ` Paolo Bonzini
2013-04-07  5:12 ` Peter Crosthwaite
2013-04-08 21:55 ` Anthony Liguori
2013-04-09 17:50   ` Andreas Färber
2013-04-09 18:28     ` Anthony Liguori
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).