qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH] qemu-thread: add a fast path to the Win32 QemuEvent
@ 2015-08-12 13:38 Paolo Bonzini
  2015-09-10 16:12 ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2015-08-12 13:38 UTC (permalink / raw)
  To: qemu-devel

QemuEvents are used heavily by call_rcu.  We do not want them to be slow,
but the current implementation does a kernel call on every invocation
of qemu_event_* and won't cut it.

So, wrap a Win32 manual-reset event with a fast userspace path.  The
states and transitions are the same as for the futex and mutex/condvar
implementations, but the slow path is different of course.  The idea
is to reset the Win32 event lazily, as part of a test-reset-test-wait
sequence.  Such a sequence is, indeed, how QemuEvents are used by
RCU and other subsystems!

The patch includes a formal model of the algorithm.

Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
---
 docs/win32-qemu-event.promela | 98 +++++++++++++++++++++++++++++++++++++++++++
 include/qemu/thread-win32.h   |  1 +
 util/qemu-thread-win32.c      | 66 +++++++++++++++++++++++++++--
 3 files changed, 161 insertions(+), 4 deletions(-)
 create mode 100644 docs/win32-qemu-event.promela

diff --git a/docs/win32-qemu-event.promela b/docs/win32-qemu-event.promela
new file mode 100644
index 0000000..c446a71
--- /dev/null
+++ b/docs/win32-qemu-event.promela
@@ -0,0 +1,98 @@
+/*
+ * This model describes the implementation of QemuEvent in
+ * util/qemu-thread-win32.c.
+ *
+ * Author: Paolo Bonzini <pbonzini@redhat.com>
+ *
+ * This file is in the public domain.  If you really want a license,
+ * the WTFPL will do.
+ *
+ * To verify it:
+ *     spin -a docs/event.promela
+ *     gcc -O2 pan.c -DSAFETY
+ *     ./a.out
+ */
+
+bool event;
+int value;
+
+/* Primitives for a Win32 event */
+#define RAW_RESET event = false
+#define RAW_SET   event = true
+#define RAW_WAIT  do :: event -> break; od
+
+#if 0
+/* Basic sanity checking: test the Win32 event primitives */
+#define RESET     RAW_RESET
+#define SET       RAW_SET
+#define WAIT      RAW_WAIT
+#else
+/* Full model: layer a userspace-only fast path on top of the RAW_*
+ * primitives.  SET/RESET/WAIT have exactly the same semantics as
+ * RAW_SET/RAW_RESET/RAW_WAIT, but try to avoid invoking them.
+ */
+#define EV_SET 0
+#define EV_FREE 1
+#define EV_BUSY -1
+
+int state = EV_FREE;
+
+int xchg_result;
+#define SET   if :: state != EV_SET ->                                      \
+                    atomic { /* xchg_result=xchg(state, EV_SET) */          \
+                        xchg_result = state;                                \
+                        state = EV_SET;                                     \
+                    }                                                       \
+                    if :: xchg_result == EV_BUSY -> RAW_SET;                \
+                       :: else -> skip;                                     \
+                    fi;                                                     \
+                 :: else -> skip;                                           \
+              fi
+
+#define RESET if :: state == EV_SET -> atomic { state = state | EV_FREE; }  \
+                 :: else            -> skip;                                \
+              fi
+
+int tmp1, tmp2;
+#define WAIT  tmp1 = state;                                                 \
+              if :: tmp1 != EV_SET ->                                       \
+                    if :: tmp1 == EV_FREE ->                                \
+                          RAW_RESET;                                        \
+                          atomic { /* tmp2=cas(state, EV_FREE, EV_BUSY) */  \
+                              tmp2 = state;                                 \
+                              if :: tmp2 == EV_FREE -> state = EV_BUSY;     \
+                                 :: else            -> skip;                \
+                              fi;                                           \
+                          }                                                 \
+                          if :: tmp2 == EV_SET -> tmp1 = EV_SET;            \
+                             :: else           -> tmp1 = EV_BUSY;           \
+                          fi;                                               \
+                       :: else -> skip;                                     \
+                    fi;                                                     \
+                    assert(tmp1 != EV_FREE);                                \
+                    if :: tmp1 == EV_BUSY -> RAW_WAIT;                      \
+                       :: else -> skip;                                     \
+                    fi;                                                     \
+                 :: else -> skip;                                           \
+              fi
+#endif
+
+active proctype waiter()
+{
+     if
+         :: !value ->
+             RESET;
+             if
+                 :: !value -> WAIT;
+                 :: else   -> skip;
+             fi;
+        :: else -> skip;
+    fi;
+    assert(value);
+}
+
+active proctype notifier()
+{
+    value = true;
+    SET;
+}
diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
index 3d58081..385ff5f 100644
--- a/include/qemu/thread-win32.h
+++ b/include/qemu/thread-win32.h
@@ -18,6 +18,7 @@ struct QemuSemaphore {
 };
 
 struct QemuEvent {
+    int value;
     HANDLE event;
 };
 
diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
index 406b52f..6cdd553 100644
--- a/util/qemu-thread-win32.c
+++ b/util/qemu-thread-win32.c
@@ -238,10 +238,34 @@ void qemu_sem_wait(QemuSemaphore *sem)
     }
 }
 
+/* Wrap a Win32 manual-reset event with a fast userspace path.  The idea
+ * is to reset the Win32 event lazily, as part of a test-reset-test-wait
+ * sequence.  Such a sequence is, indeed, how QemuEvents are used by
+ * RCU and other subsystems!
+ *
+ * Valid transitions:
+ * - free->set, when setting the event
+ * - busy->set, when setting the event, followed by futex_wake
+ * - set->free, when resetting the event
+ * - free->busy, when waiting
+ *
+ * set->busy does not happen (it can be observed from the outside but
+ * it really is set->free->busy).
+ *
+ * busy->free provably cannot happen; to enforce it, the set->free transition
+ * is done with an OR, which becomes a no-op if the event has concurrently
+ * transitioned to free or busy (and is faster than cmpxchg).
+ */
+
+#define EV_SET         0
+#define EV_FREE        1
+#define EV_BUSY       -1
+
 void qemu_event_init(QemuEvent *ev, bool init)
 {
     /* Manual reset.  */
-    ev->event = CreateEvent(NULL, TRUE, init, NULL);
+    ev->event = CreateEvent(NULL, TRUE, TRUE, NULL);
+    ev->value = (init ? EV_SET : EV_FREE);
 }
 
 void qemu_event_destroy(QemuEvent *ev)
@@ -251,17 +275,51 @@ void qemu_event_destroy(QemuEvent *ev)
 
 void qemu_event_set(QemuEvent *ev)
 {
-    SetEvent(ev->event);
+    if (atomic_mb_read(&ev->value) != EV_SET) {
+        if (atomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
+            /* There were waiters, wake them up.  */
+            SetEvent(ev->event);
+        }
+    }
 }
 
 void qemu_event_reset(QemuEvent *ev)
 {
-    ResetEvent(ev->event);
+    if (atomic_mb_read(&ev->value) == EV_SET) {
+        /* If there was a concurrent reset (or even reset+wait),
+         * do nothing.  Otherwise change EV_SET->EV_FREE.
+         */
+        atomic_or(&ev->value, EV_FREE);
+    }
 }
 
 void qemu_event_wait(QemuEvent *ev)
 {
-    WaitForSingleObject(ev->event, INFINITE);
+    unsigned value;
+
+    value = atomic_mb_read(&ev->value);
+    if (value != EV_SET) {
+        if (value == EV_FREE) {
+            /* qemu_event_set is not yet going to call SetEvent, but we are
+             * going to do another check for EV_SET below when setting EV_BUSY.
+             * At that point it is safe to call WaitForSingleObject.
+             */
+            ResetEvent(ev->event);
+
+            /* Tell qemu_event_set that there are waiters.  No need to retry
+             * because there cannot be a concurent busy->free transition.
+             * After the CAS, the event will be either set or busy.
+             */
+            if (atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
+                value = EV_SET;
+            } else {
+                value = EV_BUSY;
+            }
+        }
+        if (value == EV_BUSY) {
+            WaitForSingleObject(ev->event, INFINITE);
+        }
+    }
 }
 
 struct QemuThreadData {
-- 
2.4.3

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-thread: add a fast path to the Win32 QemuEvent
  2015-08-12 13:38 [Qemu-devel] [PATCH] qemu-thread: add a fast path to the Win32 QemuEvent Paolo Bonzini
@ 2015-09-10 16:12 ` Paolo Bonzini
  2015-09-10 19:30   ` Stefan Weil
  0 siblings, 1 reply; 4+ messages in thread
From: Paolo Bonzini @ 2015-09-10 16:12 UTC (permalink / raw)
  To: qemu-devel, Stefan Weil

On 12/08/2015 15:38, Paolo Bonzini wrote:
> QemuEvents are used heavily by call_rcu.  We do not want them to be slow,
> but the current implementation does a kernel call on every invocation
> of qemu_event_* and won't cut it.
> 
> So, wrap a Win32 manual-reset event with a fast userspace path.  The
> states and transitions are the same as for the futex and mutex/condvar
> implementations, but the slow path is different of course.  The idea
> is to reset the Win32 event lazily, as part of a test-reset-test-wait
> sequence.  Such a sequence is, indeed, how QemuEvents are used by
> RCU and other subsystems!
> 
> The patch includes a formal model of the algorithm.
> 
> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
> ---
>  docs/win32-qemu-event.promela | 98 +++++++++++++++++++++++++++++++++++++++++++
>  include/qemu/thread-win32.h   |  1 +
>  util/qemu-thread-win32.c      | 66 +++++++++++++++++++++++++++--
>  3 files changed, 161 insertions(+), 4 deletions(-)
>  create mode 100644 docs/win32-qemu-event.promela
> 
> diff --git a/docs/win32-qemu-event.promela b/docs/win32-qemu-event.promela
> new file mode 100644
> index 0000000..c446a71
> --- /dev/null
> +++ b/docs/win32-qemu-event.promela
> @@ -0,0 +1,98 @@
> +/*
> + * This model describes the implementation of QemuEvent in
> + * util/qemu-thread-win32.c.
> + *
> + * Author: Paolo Bonzini <pbonzini@redhat.com>
> + *
> + * This file is in the public domain.  If you really want a license,
> + * the WTFPL will do.
> + *
> + * To verify it:
> + *     spin -a docs/event.promela
> + *     gcc -O2 pan.c -DSAFETY
> + *     ./a.out
> + */
> +
> +bool event;
> +int value;
> +
> +/* Primitives for a Win32 event */
> +#define RAW_RESET event = false
> +#define RAW_SET   event = true
> +#define RAW_WAIT  do :: event -> break; od
> +
> +#if 0
> +/* Basic sanity checking: test the Win32 event primitives */
> +#define RESET     RAW_RESET
> +#define SET       RAW_SET
> +#define WAIT      RAW_WAIT
> +#else
> +/* Full model: layer a userspace-only fast path on top of the RAW_*
> + * primitives.  SET/RESET/WAIT have exactly the same semantics as
> + * RAW_SET/RAW_RESET/RAW_WAIT, but try to avoid invoking them.
> + */
> +#define EV_SET 0
> +#define EV_FREE 1
> +#define EV_BUSY -1
> +
> +int state = EV_FREE;
> +
> +int xchg_result;
> +#define SET   if :: state != EV_SET ->                                      \
> +                    atomic { /* xchg_result=xchg(state, EV_SET) */          \
> +                        xchg_result = state;                                \
> +                        state = EV_SET;                                     \
> +                    }                                                       \
> +                    if :: xchg_result == EV_BUSY -> RAW_SET;                \
> +                       :: else -> skip;                                     \
> +                    fi;                                                     \
> +                 :: else -> skip;                                           \
> +              fi
> +
> +#define RESET if :: state == EV_SET -> atomic { state = state | EV_FREE; }  \
> +                 :: else            -> skip;                                \
> +              fi
> +
> +int tmp1, tmp2;
> +#define WAIT  tmp1 = state;                                                 \
> +              if :: tmp1 != EV_SET ->                                       \
> +                    if :: tmp1 == EV_FREE ->                                \
> +                          RAW_RESET;                                        \
> +                          atomic { /* tmp2=cas(state, EV_FREE, EV_BUSY) */  \
> +                              tmp2 = state;                                 \
> +                              if :: tmp2 == EV_FREE -> state = EV_BUSY;     \
> +                                 :: else            -> skip;                \
> +                              fi;                                           \
> +                          }                                                 \
> +                          if :: tmp2 == EV_SET -> tmp1 = EV_SET;            \
> +                             :: else           -> tmp1 = EV_BUSY;           \
> +                          fi;                                               \
> +                       :: else -> skip;                                     \
> +                    fi;                                                     \
> +                    assert(tmp1 != EV_FREE);                                \
> +                    if :: tmp1 == EV_BUSY -> RAW_WAIT;                      \
> +                       :: else -> skip;                                     \
> +                    fi;                                                     \
> +                 :: else -> skip;                                           \
> +              fi
> +#endif
> +
> +active proctype waiter()
> +{
> +     if
> +         :: !value ->
> +             RESET;
> +             if
> +                 :: !value -> WAIT;
> +                 :: else   -> skip;
> +             fi;
> +        :: else -> skip;
> +    fi;
> +    assert(value);
> +}
> +
> +active proctype notifier()
> +{
> +    value = true;
> +    SET;
> +}
> diff --git a/include/qemu/thread-win32.h b/include/qemu/thread-win32.h
> index 3d58081..385ff5f 100644
> --- a/include/qemu/thread-win32.h
> +++ b/include/qemu/thread-win32.h
> @@ -18,6 +18,7 @@ struct QemuSemaphore {
>  };
>  
>  struct QemuEvent {
> +    int value;
>      HANDLE event;
>  };
>  
> diff --git a/util/qemu-thread-win32.c b/util/qemu-thread-win32.c
> index 406b52f..6cdd553 100644
> --- a/util/qemu-thread-win32.c
> +++ b/util/qemu-thread-win32.c
> @@ -238,10 +238,34 @@ void qemu_sem_wait(QemuSemaphore *sem)
>      }
>  }
>  
> +/* Wrap a Win32 manual-reset event with a fast userspace path.  The idea
> + * is to reset the Win32 event lazily, as part of a test-reset-test-wait
> + * sequence.  Such a sequence is, indeed, how QemuEvents are used by
> + * RCU and other subsystems!
> + *
> + * Valid transitions:
> + * - free->set, when setting the event
> + * - busy->set, when setting the event, followed by futex_wake
> + * - set->free, when resetting the event
> + * - free->busy, when waiting
> + *
> + * set->busy does not happen (it can be observed from the outside but
> + * it really is set->free->busy).
> + *
> + * busy->free provably cannot happen; to enforce it, the set->free transition
> + * is done with an OR, which becomes a no-op if the event has concurrently
> + * transitioned to free or busy (and is faster than cmpxchg).
> + */
> +
> +#define EV_SET         0
> +#define EV_FREE        1
> +#define EV_BUSY       -1
> +
>  void qemu_event_init(QemuEvent *ev, bool init)
>  {
>      /* Manual reset.  */
> -    ev->event = CreateEvent(NULL, TRUE, init, NULL);
> +    ev->event = CreateEvent(NULL, TRUE, TRUE, NULL);
> +    ev->value = (init ? EV_SET : EV_FREE);
>  }
>  
>  void qemu_event_destroy(QemuEvent *ev)
> @@ -251,17 +275,51 @@ void qemu_event_destroy(QemuEvent *ev)
>  
>  void qemu_event_set(QemuEvent *ev)
>  {
> -    SetEvent(ev->event);
> +    if (atomic_mb_read(&ev->value) != EV_SET) {
> +        if (atomic_xchg(&ev->value, EV_SET) == EV_BUSY) {
> +            /* There were waiters, wake them up.  */
> +            SetEvent(ev->event);
> +        }
> +    }
>  }
>  
>  void qemu_event_reset(QemuEvent *ev)
>  {
> -    ResetEvent(ev->event);
> +    if (atomic_mb_read(&ev->value) == EV_SET) {
> +        /* If there was a concurrent reset (or even reset+wait),
> +         * do nothing.  Otherwise change EV_SET->EV_FREE.
> +         */
> +        atomic_or(&ev->value, EV_FREE);
> +    }
>  }
>  
>  void qemu_event_wait(QemuEvent *ev)
>  {
> -    WaitForSingleObject(ev->event, INFINITE);
> +    unsigned value;
> +
> +    value = atomic_mb_read(&ev->value);
> +    if (value != EV_SET) {
> +        if (value == EV_FREE) {
> +            /* qemu_event_set is not yet going to call SetEvent, but we are
> +             * going to do another check for EV_SET below when setting EV_BUSY.
> +             * At that point it is safe to call WaitForSingleObject.
> +             */
> +            ResetEvent(ev->event);
> +
> +            /* Tell qemu_event_set that there are waiters.  No need to retry
> +             * because there cannot be a concurent busy->free transition.
> +             * After the CAS, the event will be either set or busy.
> +             */
> +            if (atomic_cmpxchg(&ev->value, EV_FREE, EV_BUSY) == EV_SET) {
> +                value = EV_SET;
> +            } else {
> +                value = EV_BUSY;
> +            }
> +        }
> +        if (value == EV_BUSY) {
> +            WaitForSingleObject(ev->event, INFINITE);
> +        }
> +    }
>  }
>  
>  struct QemuThreadData {
> 

Ping?

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-thread: add a fast path to the Win32 QemuEvent
  2015-09-10 16:12 ` Paolo Bonzini
@ 2015-09-10 19:30   ` Stefan Weil
  2015-09-10 19:36     ` Paolo Bonzini
  0 siblings, 1 reply; 4+ messages in thread
From: Stefan Weil @ 2015-09-10 19:30 UTC (permalink / raw)
  To: Paolo Bonzini, qemu-devel

Am 10.09.2015 um 18:12 schrieb Paolo Bonzini:
> On 12/08/2015 15:38, Paolo Bonzini wrote:
>> QemuEvents are used heavily by call_rcu.  We do not want them to be slow,
>> but the current implementation does a kernel call on every invocation
>> of qemu_event_* and won't cut it.
>>
>> So, wrap a Win32 manual-reset event with a fast userspace path.  The
>> states and transitions are the same as for the futex and mutex/condvar
>> implementations, but the slow path is different of course.  The idea
>> is to reset the Win32 event lazily, as part of a test-reset-test-wait
>> sequence.  Such a sequence is, indeed, how QemuEvents are used by
>> RCU and other subsystems!
>>
>> The patch includes a formal model of the algorithm.
>>
>> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
>> ---
>>  docs/win32-qemu-event.promela | 98 +++++++++++++++++++++++++++++++++++++++++++
>>  include/qemu/thread-win32.h   |  1 +
>>  util/qemu-thread-win32.c      | 66 +++++++++++++++++++++++++++--
>>  3 files changed, 161 insertions(+), 4 deletions(-)
>>  create mode 100644 docs/win32-qemu-event.promela
>>
[...]
>>
> 
> Ping?
> 

Hello Paolo,

sorry, I did not notice your patch when it was initially sent.

I have just run several passes of a simple test (booting an fdos
image and running a test sequence which involves TCP / downloads
and compilations under fdos) under wine on my KVM server,
so this was not a typical Windows environment.

Command line:

time wine32 i686-w64-mingw32/i386-softmmu/qemu-system-i386 -fda
metados.img -boot a -net nic,model=pcnet -net user -snapshot

Results without patch:

real	3m8.601s
user	1m33.972s
sys	0m29.736s

real	2m25.502s
user	1m28.168s
sys	0m24.640s

Results with patch:

real	2m27.628s
user	1m27.844s
sys	0m13.304s

real	2m0.043s
user	1m22.728s
sys	0m11.004s

Interpretation:

The real time varies because the test includes user
interactions (pressing enter, starting test,
terminating QEMU after test).

The user time (mainly emulation) does not vary much.

The system time is significantly reduced (less than
50 % in both runs with patch compared to both runs
without patch).

I also had an (unrelated) crash with the unpatched code:

wine: Unhandled page fault on read access to 0x0000004c at address
0x43ed3d (thread 0009), starting debugger...

It happened once in three runs, so this needs separate
investigations.

Would you suggest additional tests?
I'll take your patch in my queue (git://qemu.weilnetz.de/qemu.git wxx).

Regards
Stefan

Tested-by: Stefan Weil <sw@weilnetz.de>

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: [Qemu-devel] [PATCH] qemu-thread: add a fast path to the Win32 QemuEvent
  2015-09-10 19:30   ` Stefan Weil
@ 2015-09-10 19:36     ` Paolo Bonzini
  0 siblings, 0 replies; 4+ messages in thread
From: Paolo Bonzini @ 2015-09-10 19:36 UTC (permalink / raw)
  To: Stefan Weil, qemu-devel



On 10/09/2015 21:30, Stefan Weil wrote:
> The real time varies because the test includes user
> interactions (pressing enter, starting test,
> terminating QEMU after test).
> 
> The user time (mainly emulation) does not vary much.
> 
> The system time is significantly reduced (less than
> 50 % in both runs with patch compared to both runs
> without patch).

Makes sense.  You tested it better than I did. :)

> I also had an (unrelated) crash with the unpatched code:
> 
> wine: Unhandled page fault on read access to 0x0000004c at address
> 0x43ed3d (thread 0009), starting debugger...
> 
> It happened once in three runs, so this needs separate
> investigations.
> 
> Would you suggest additional tests?

No, I think it's good.

> I'll take your patch in my queue (git://qemu.weilnetz.de/qemu.git wxx).

Thanks!

Paolo

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2015-09-10 19:36 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-08-12 13:38 [Qemu-devel] [PATCH] qemu-thread: add a fast path to the Win32 QemuEvent Paolo Bonzini
2015-09-10 16:12 ` Paolo Bonzini
2015-09-10 19:30   ` Stefan Weil
2015-09-10 19:36     ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).