* [Qemu-devel] [PATCH 0/2] AioContext: fix missing wakeups due to event_notifier_test_and_clear @ 2015-07-18 20:21 Paolo Bonzini 2015-07-18 20:21 ` [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear Paolo Bonzini 2015-07-18 20:21 ` [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier Paolo Bonzini 0 siblings, 2 replies; 8+ messages in thread From: Paolo Bonzini @ 2015-07-18 20:21 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, famz, lersek, rjones, stefanha This series fixes the remaining case where aio_poll() could hang I/O on the main thread due to a missing wakeup. It consists of a bugfix and an optimization, both of which have survived hundreds of tests on aarch64. Both the bugfix and the optimization come with a formal model of the interactions between the main thread, the VCPU thread doing aio_poll, and the worker thread doing qemu_bh_schedule. The models can test the code both with and without ctx->notify_me, showing that this bug is independent from the other. The patches apply on top of the ctx->notify_me v3. The code changes are really pretty small; the second patch has a good deal of comments too. Paolo Paolo Bonzini (2): AioContext: fix broken placement of event_notifier_test_and_clear AioContext: optimize clearing the EventNotifier aio-posix.c | 2 + aio-win32.c | 2 + async.c | 16 ++++- docs/aio_notify_accept.promela | 152 +++++++++++++++++++++++++++++++++++++++++ docs/aio_notify_bug.promela | 140 +++++++++++++++++++++++++++++++++++++ include/block/aio.h | 32 ++++++++- 6 files changed, 342 insertions(+), 2 deletions(-) create mode 100644 docs/aio_notify_accept.promela create mode 100644 docs/aio_notify_bug.promela -- 2.4.3 ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear 2015-07-18 20:21 [Qemu-devel] [PATCH 0/2] AioContext: fix missing wakeups due to event_notifier_test_and_clear Paolo Bonzini @ 2015-07-18 20:21 ` Paolo Bonzini 2015-07-20 3:55 ` Fam Zheng 2015-07-18 20:21 ` [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier Paolo Bonzini 1 sibling, 1 reply; 8+ messages in thread From: Paolo Bonzini @ 2015-07-18 20:21 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, famz, lersek, rjones, stefanha event_notifier_test_and_clear must be called before processing events. Otherwise, an aio_poll could "eat" the notification before the main I/O thread invokes ppoll(). The main I/O thread then never wakes up. This is an example of what could happen: i/o thread vcpu thread worker thread --------------------------------------------------------------------- lock_iothread notify_me = 1 ... unlock_iothread lock_iothread notify_me = 3 ppoll notify_me = 1 bh->scheduled = 1 event_notifier_set event_notifier_test_and_clear ppoll *** hang *** In contrast with the previous bug, there wasn't really much debugging to do here. "Tracing" with qemu_clock_get_ns shows pretty much the same behavior as in the previous patch, so the only wait to find the bug is staring at the code. One could also use a formal model, of course. The included one shows this with three processes: notifier corresponds to a QEMU thread pool worker, temporary_waiter to a VCPU thread that invokes aio_poll(), waiter to the main I/O thread. I would be happy to say that the formal model found the bug for me, but actually I wrote it after the fact. This patch is a bit of a big hammer. The next one optimizes it, with help (this time for real rather than a posteriori :)) from another, similar formal model. Reported-by: Richard W. M. Jones <rjones@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- aio-posix.c | 2 + aio-win32.c | 2 + async.c | 8 ++- docs/aio_notify_bug.promela | 140 ++++++++++++++++++++++++++++++++++++++++++++ 4 files changed, 151 insertions(+), 1 deletion(-) create mode 100644 docs/aio_notify_bug.promela diff --git a/aio-posix.c b/aio-posix.c index 249889f..5c8b266 100644 --- a/aio-posix.c +++ b/aio-posix.c @@ -276,6 +276,8 @@ bool aio_poll(AioContext *ctx, bool blocking) aio_context_acquire(ctx); } + event_notifier_test_and_clear(&ctx->notifier); + /* if we have any readable fds, dispatch event */ if (ret > 0) { for (i = 0; i < npfd; i++) { diff --git a/aio-win32.c b/aio-win32.c index ea655b0..3e0db20 100644 --- a/aio-win32.c +++ b/aio-win32.c @@ -337,6 +337,8 @@ bool aio_poll(AioContext *ctx, bool blocking) aio_context_acquire(ctx); } + event_notifier_test_and_clear(&ctx->notifier); + if (first && aio_bh_poll(ctx)) { progress = true; } diff --git a/async.c b/async.c index a232192..d625e8a 100644 --- a/async.c +++ b/async.c @@ -203,6 +203,8 @@ aio_ctx_check(GSource *source) QEMUBH *bh; atomic_and(&ctx->notify_me, ~1); + event_notifier_test_and_clear(&ctx->notifier); + for (bh = ctx->first_bh; bh; bh = bh->next) { if (!bh->deleted && bh->scheduled) { return true; @@ -279,6 +281,10 @@ static void aio_rfifolock_cb(void *opaque) aio_notify(opaque); } +static void event_notifier_dummy_cb(EventNotifier *e) +{ +} + AioContext *aio_context_new(Error **errp) { int ret; @@ -293,7 +299,7 @@ AioContext *aio_context_new(Error **errp) g_source_set_can_recurse(&ctx->source, true); aio_set_event_notifier(ctx, &ctx->notifier, (EventNotifierHandler *) - event_notifier_test_and_clear); + event_notifier_dummy_cb); ctx->thread_pool = NULL; qemu_mutex_init(&ctx->bh_lock); rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx); diff --git a/docs/aio_notify_bug.promela b/docs/aio_notify_bug.promela new file mode 100644 index 0000000..b3bfca1 --- /dev/null +++ b/docs/aio_notify_bug.promela @@ -0,0 +1,140 @@ +/* + * This model describes a bug in aio_notify. If ctx->notifier is + * cleared too late, a wakeup could be lost. + * + * Author: Paolo Bonzini <pbonzini@redhat.com> + * + * This file is in the public domain. If you really want a license, + * the WTFPL will do. + * + * To verify the buggy version: + * spin -a -DBUG docs/aio_notify_bug.promela + * gcc -O2 pan.c + * ./a.out -a -f + * + * To verify the working version: + * spin -a docs/aio_notify_bug.promela + * gcc -O2 pan.c + * ./a.out -a -f + * + * Add -DCHECK_REQ to test an alternative invariant and the + * "notify_me" optimization. + */ + +int notify_me; +bool event; +bool req; +bool notifier_done; + +#ifdef CHECK_REQ +#define USE_NOTIFY_ME 1 +#else +#define USE_NOTIFY_ME 0 +#endif + +active proctype notifier() +{ + do + :: true -> { + req = 1; + if + :: !USE_NOTIFY_ME || notify_me -> event = 1; + :: else -> skip; + fi + } + :: true -> break; + od; + notifier_done = 1; +} + +#ifdef BUG +#define AIO_POLL \ + notify_me++; \ + if \ + :: !req -> { \ + if \ + :: event -> skip; \ + fi; \ + } \ + :: else -> skip; \ + fi; \ + notify_me--; \ + \ + req = 0; \ + event = 0; +#else +#define AIO_POLL \ + notify_me++; \ + if \ + :: !req -> { \ + if \ + :: event -> skip; \ + fi; \ + } \ + :: else -> skip; \ + fi; \ + notify_me--; \ + \ + event = 0; \ + req = 0; +#endif + +active proctype waiter() +{ + do + :: true -> AIO_POLL; + od; +} + +/* Same as waiter(), but disappears after a while. */ +active proctype temporary_waiter() +{ + do + :: true -> AIO_POLL; + :: true -> break; + od; +} + +#ifdef CHECK_REQ +never { + do + :: req -> goto accept_if_req_not_eventually_false; + :: true -> skip; + od; + +accept_if_req_not_eventually_false: + if + :: req -> goto accept_if_req_not_eventually_false; + fi; + assert(0); +} + +#else +/* There must be infinitely many transitions of event as long + * as the notifier does not exit. + * + * If event stayed always true, the waiters would be busy looping. + * If event stayed always false, the waiters would be sleeping + * forever. + */ +never { + do + :: !event -> goto accept_if_event_not_eventually_true; + :: event -> goto accept_if_event_not_eventually_false; + :: true -> skip; + od; + +accept_if_event_not_eventually_true: + if + :: !event && notifier_done -> do :: true -> skip; od; + :: !event && !notifier_done -> goto accept_if_event_not_eventually_true; + fi; + assert(0); + +accept_if_event_not_eventually_false: + if + :: event -> goto accept_if_event_not_eventually_false; + fi; + assert(0); +} +#endif -- 2.4.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear 2015-07-18 20:21 ` [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear Paolo Bonzini @ 2015-07-20 3:55 ` Fam Zheng 2015-07-20 5:32 ` Paolo Bonzini 0 siblings, 1 reply; 8+ messages in thread From: Fam Zheng @ 2015-07-20 3:55 UTC (permalink / raw) To: Paolo Bonzini; +Cc: kwolf, lersek, qemu-devel, stefanha, rjones On Sat, 07/18 22:21, Paolo Bonzini wrote: > event_notifier_test_and_clear must be called before processing events. > Otherwise, an aio_poll could "eat" the notification before the main > I/O thread invokes ppoll(). The main I/O thread then never wakes up. > This is an example of what could happen: > > i/o thread vcpu thread worker thread > --------------------------------------------------------------------- > lock_iothread > notify_me = 1 > ... > unlock_iothread > lock_iothread > notify_me = 3 > ppoll > notify_me = 1 > bh->scheduled = 1 > event_notifier_set > event_notifier_test_and_clear > ppoll > *** hang *** It still seems possible for event_notifier_test_and_clear to happen before main thread ppoll, will that be a problem? How does main thread get waken up in that case? Fam > > In contrast with the previous bug, there wasn't really much debugging > to do here. "Tracing" with qemu_clock_get_ns shows pretty much the > same behavior as in the previous patch, so the only wait to find the > bug is staring at the code. > > One could also use a formal model, of course. The included one shows > this with three processes: notifier corresponds to a QEMU thread pool > worker, temporary_waiter to a VCPU thread that invokes aio_poll(), > waiter to the main I/O thread. I would be happy to say that the > formal model found the bug for me, but actually I wrote it after the > fact. > > This patch is a bit of a big hammer. The next one optimizes it, > with help (this time for real rather than a posteriori :)) from > another, similar formal model. > > Reported-by: Richard W. M. Jones <rjones@redhat.com> > Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> > --- > aio-posix.c | 2 + > aio-win32.c | 2 + > async.c | 8 ++- > docs/aio_notify_bug.promela | 140 ++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 151 insertions(+), 1 deletion(-) > create mode 100644 docs/aio_notify_bug.promela > > diff --git a/aio-posix.c b/aio-posix.c > index 249889f..5c8b266 100644 > --- a/aio-posix.c > +++ b/aio-posix.c > @@ -276,6 +276,8 @@ bool aio_poll(AioContext *ctx, bool blocking) > aio_context_acquire(ctx); > } > > + event_notifier_test_and_clear(&ctx->notifier); > + > /* if we have any readable fds, dispatch event */ > if (ret > 0) { > for (i = 0; i < npfd; i++) { > diff --git a/aio-win32.c b/aio-win32.c > index ea655b0..3e0db20 100644 > --- a/aio-win32.c > +++ b/aio-win32.c > @@ -337,6 +337,8 @@ bool aio_poll(AioContext *ctx, bool blocking) > aio_context_acquire(ctx); > } > > + event_notifier_test_and_clear(&ctx->notifier); > + > if (first && aio_bh_poll(ctx)) { > progress = true; > } > diff --git a/async.c b/async.c > index a232192..d625e8a 100644 > --- a/async.c > +++ b/async.c > @@ -203,6 +203,8 @@ aio_ctx_check(GSource *source) > QEMUBH *bh; > > atomic_and(&ctx->notify_me, ~1); > + event_notifier_test_and_clear(&ctx->notifier); > + > for (bh = ctx->first_bh; bh; bh = bh->next) { > if (!bh->deleted && bh->scheduled) { > return true; > @@ -279,6 +281,10 @@ static void aio_rfifolock_cb(void *opaque) > aio_notify(opaque); > } > > +static void event_notifier_dummy_cb(EventNotifier *e) > +{ > +} > + > AioContext *aio_context_new(Error **errp) > { > int ret; > @@ -293,7 +299,7 @@ AioContext *aio_context_new(Error **errp) > g_source_set_can_recurse(&ctx->source, true); > aio_set_event_notifier(ctx, &ctx->notifier, > (EventNotifierHandler *) > - event_notifier_test_and_clear); > + event_notifier_dummy_cb); > ctx->thread_pool = NULL; > qemu_mutex_init(&ctx->bh_lock); > rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx); > diff --git a/docs/aio_notify_bug.promela b/docs/aio_notify_bug.promela > new file mode 100644 > index 0000000..b3bfca1 > --- /dev/null > +++ b/docs/aio_notify_bug.promela > @@ -0,0 +1,140 @@ > +/* > + * This model describes a bug in aio_notify. If ctx->notifier is > + * cleared too late, a wakeup could be lost. > + * > + * Author: Paolo Bonzini <pbonzini@redhat.com> > + * > + * This file is in the public domain. If you really want a license, > + * the WTFPL will do. > + * > + * To verify the buggy version: > + * spin -a -DBUG docs/aio_notify_bug.promela > + * gcc -O2 pan.c > + * ./a.out -a -f > + * > + * To verify the working version: > + * spin -a docs/aio_notify_bug.promela > + * gcc -O2 pan.c > + * ./a.out -a -f > + * > + * Add -DCHECK_REQ to test an alternative invariant and the > + * "notify_me" optimization. > + */ > + > +int notify_me; > +bool event; > +bool req; > +bool notifier_done; > + > +#ifdef CHECK_REQ > +#define USE_NOTIFY_ME 1 > +#else > +#define USE_NOTIFY_ME 0 > +#endif > + > +active proctype notifier() > +{ > + do > + :: true -> { > + req = 1; > + if > + :: !USE_NOTIFY_ME || notify_me -> event = 1; > + :: else -> skip; > + fi > + } > + :: true -> break; > + od; > + notifier_done = 1; > +} > + > +#ifdef BUG > +#define AIO_POLL \ > + notify_me++; \ > + if \ > + :: !req -> { \ > + if \ > + :: event -> skip; \ > + fi; \ > + } \ > + :: else -> skip; \ > + fi; \ > + notify_me--; \ > + \ > + req = 0; \ > + event = 0; > +#else > +#define AIO_POLL \ > + notify_me++; \ > + if \ > + :: !req -> { \ > + if \ > + :: event -> skip; \ > + fi; \ > + } \ > + :: else -> skip; \ > + fi; \ > + notify_me--; \ > + \ > + event = 0; \ > + req = 0; > +#endif > + > +active proctype waiter() > +{ > + do > + :: true -> AIO_POLL; > + od; > +} > + > +/* Same as waiter(), but disappears after a while. */ > +active proctype temporary_waiter() > +{ > + do > + :: true -> AIO_POLL; > + :: true -> break; > + od; > +} > + > +#ifdef CHECK_REQ > +never { > + do > + :: req -> goto accept_if_req_not_eventually_false; > + :: true -> skip; > + od; > + > +accept_if_req_not_eventually_false: > + if > + :: req -> goto accept_if_req_not_eventually_false; > + fi; > + assert(0); > +} > + > +#else > +/* There must be infinitely many transitions of event as long > + * as the notifier does not exit. > + * > + * If event stayed always true, the waiters would be busy looping. > + * If event stayed always false, the waiters would be sleeping > + * forever. > + */ > +never { > + do > + :: !event -> goto accept_if_event_not_eventually_true; > + :: event -> goto accept_if_event_not_eventually_false; > + :: true -> skip; > + od; > + > +accept_if_event_not_eventually_true: > + if > + :: !event && notifier_done -> do :: true -> skip; od; > + :: !event && !notifier_done -> goto accept_if_event_not_eventually_true; > + fi; > + assert(0); > + > +accept_if_event_not_eventually_false: > + if > + :: event -> goto accept_if_event_not_eventually_false; > + fi; > + assert(0); > +} > +#endif > -- > 2.4.3 > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear 2015-07-20 3:55 ` Fam Zheng @ 2015-07-20 5:32 ` Paolo Bonzini 0 siblings, 0 replies; 8+ messages in thread From: Paolo Bonzini @ 2015-07-20 5:32 UTC (permalink / raw) To: Fam Zheng; +Cc: kwolf, lersek, qemu-devel, stefanha, rjones On 20/07/2015 05:55, Fam Zheng wrote: > On Sat, 07/18 22:21, Paolo Bonzini wrote: >> event_notifier_test_and_clear must be called before processing events. >> Otherwise, an aio_poll could "eat" the notification before the main >> I/O thread invokes ppoll(). The main I/O thread then never wakes up. >> This is an example of what could happen: >> >> i/o thread vcpu thread worker thread >> --------------------------------------------------------------------- >> lock_iothread >> notify_me = 1 >> ... >> unlock_iothread >> lock_iothread >> notify_me = 3 >> ppoll >> notify_me = 1 >> bh->scheduled = 1 >> event_notifier_set >> event_notifier_test_and_clear >> ppoll >> *** hang *** > > It still seems possible for event_notifier_test_and_clear to happen before main > thread ppoll, will that be a problem? How does main thread get waken up in that > case? You could have this: i/o thread vcpu thread worker thread --------------------------------------------------------------------- lock_iothread notify_me = 1 ... unlock_iothread lock_iothread notify_me = 3 ppoll notify_me = 1 bh->scheduled = 1 event_notifier_set event_notifier_test_and_clear ppoll But then the bottom half *will* be processed by the VCPU thread. It's a similar "rule" to the one that you use with lock-free algorithms: the consumer, 99% of the time, uses the variables in the opposite order of the producer. Here the producer calls event_notifier_set after bh->scheduled (of course...); the consumer *must* clear the EventNotifier before reading bh->scheduled. Paolo > >> >> In contrast with the previous bug, there wasn't really much debugging >> to do here. "Tracing" with qemu_clock_get_ns shows pretty much the >> same behavior as in the previous patch, so the only wait to find the >> bug is staring at the code. >> >> One could also use a formal model, of course. The included one shows >> this with three processes: notifier corresponds to a QEMU thread pool >> worker, temporary_waiter to a VCPU thread that invokes aio_poll(), >> waiter to the main I/O thread. I would be happy to say that the >> formal model found the bug for me, but actually I wrote it after the >> fact. >> >> This patch is a bit of a big hammer. The next one optimizes it, >> with help (this time for real rather than a posteriori :)) from >> another, similar formal model. >> >> Reported-by: Richard W. M. Jones <rjones@redhat.com> >> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> >> --- >> aio-posix.c | 2 + >> aio-win32.c | 2 + >> async.c | 8 ++- >> docs/aio_notify_bug.promela | 140 ++++++++++++++++++++++++++++++++++++++++++++ >> 4 files changed, 151 insertions(+), 1 deletion(-) >> create mode 100644 docs/aio_notify_bug.promela >> >> diff --git a/aio-posix.c b/aio-posix.c >> index 249889f..5c8b266 100644 >> --- a/aio-posix.c >> +++ b/aio-posix.c >> @@ -276,6 +276,8 @@ bool aio_poll(AioContext *ctx, bool blocking) >> aio_context_acquire(ctx); >> } >> >> + event_notifier_test_and_clear(&ctx->notifier); >> + >> /* if we have any readable fds, dispatch event */ >> if (ret > 0) { >> for (i = 0; i < npfd; i++) { >> diff --git a/aio-win32.c b/aio-win32.c >> index ea655b0..3e0db20 100644 >> --- a/aio-win32.c >> +++ b/aio-win32.c >> @@ -337,6 +337,8 @@ bool aio_poll(AioContext *ctx, bool blocking) >> aio_context_acquire(ctx); >> } >> >> + event_notifier_test_and_clear(&ctx->notifier); >> + >> if (first && aio_bh_poll(ctx)) { >> progress = true; >> } >> diff --git a/async.c b/async.c >> index a232192..d625e8a 100644 >> --- a/async.c >> +++ b/async.c >> @@ -203,6 +203,8 @@ aio_ctx_check(GSource *source) >> QEMUBH *bh; >> >> atomic_and(&ctx->notify_me, ~1); >> + event_notifier_test_and_clear(&ctx->notifier); >> + >> for (bh = ctx->first_bh; bh; bh = bh->next) { >> if (!bh->deleted && bh->scheduled) { >> return true; >> @@ -279,6 +281,10 @@ static void aio_rfifolock_cb(void *opaque) >> aio_notify(opaque); >> } >> >> +static void event_notifier_dummy_cb(EventNotifier *e) >> +{ >> +} >> + >> AioContext *aio_context_new(Error **errp) >> { >> int ret; >> @@ -293,7 +299,7 @@ AioContext *aio_context_new(Error **errp) >> g_source_set_can_recurse(&ctx->source, true); >> aio_set_event_notifier(ctx, &ctx->notifier, >> (EventNotifierHandler *) >> - event_notifier_test_and_clear); >> + event_notifier_dummy_cb); >> ctx->thread_pool = NULL; >> qemu_mutex_init(&ctx->bh_lock); >> rfifolock_init(&ctx->lock, aio_rfifolock_cb, ctx); >> diff --git a/docs/aio_notify_bug.promela b/docs/aio_notify_bug.promela >> new file mode 100644 >> index 0000000..b3bfca1 >> --- /dev/null >> +++ b/docs/aio_notify_bug.promela >> @@ -0,0 +1,140 @@ >> +/* >> + * This model describes a bug in aio_notify. If ctx->notifier is >> + * cleared too late, a wakeup could be lost. >> + * >> + * Author: Paolo Bonzini <pbonzini@redhat.com> >> + * >> + * This file is in the public domain. If you really want a license, >> + * the WTFPL will do. >> + * >> + * To verify the buggy version: >> + * spin -a -DBUG docs/aio_notify_bug.promela >> + * gcc -O2 pan.c >> + * ./a.out -a -f >> + * >> + * To verify the working version: >> + * spin -a docs/aio_notify_bug.promela >> + * gcc -O2 pan.c >> + * ./a.out -a -f >> + * >> + * Add -DCHECK_REQ to test an alternative invariant and the >> + * "notify_me" optimization. >> + */ >> + >> +int notify_me; >> +bool event; >> +bool req; >> +bool notifier_done; >> + >> +#ifdef CHECK_REQ >> +#define USE_NOTIFY_ME 1 >> +#else >> +#define USE_NOTIFY_ME 0 >> +#endif >> + >> +active proctype notifier() >> +{ >> + do >> + :: true -> { >> + req = 1; >> + if >> + :: !USE_NOTIFY_ME || notify_me -> event = 1; >> + :: else -> skip; >> + fi >> + } >> + :: true -> break; >> + od; >> + notifier_done = 1; >> +} >> + >> +#ifdef BUG >> +#define AIO_POLL \ >> + notify_me++; \ >> + if \ >> + :: !req -> { \ >> + if \ >> + :: event -> skip; \ >> + fi; \ >> + } \ >> + :: else -> skip; \ >> + fi; \ >> + notify_me--; \ >> + \ >> + req = 0; \ >> + event = 0; >> +#else >> +#define AIO_POLL \ >> + notify_me++; \ >> + if \ >> + :: !req -> { \ >> + if \ >> + :: event -> skip; \ >> + fi; \ >> + } \ >> + :: else -> skip; \ >> + fi; \ >> + notify_me--; \ >> + \ >> + event = 0; \ >> + req = 0; >> +#endif >> + >> +active proctype waiter() >> +{ >> + do >> + :: true -> AIO_POLL; >> + od; >> +} >> + >> +/* Same as waiter(), but disappears after a while. */ >> +active proctype temporary_waiter() >> +{ >> + do >> + :: true -> AIO_POLL; >> + :: true -> break; >> + od; >> +} >> + >> +#ifdef CHECK_REQ >> +never { >> + do >> + :: req -> goto accept_if_req_not_eventually_false; >> + :: true -> skip; >> + od; >> + >> +accept_if_req_not_eventually_false: >> + if >> + :: req -> goto accept_if_req_not_eventually_false; >> + fi; >> + assert(0); >> +} >> + >> +#else >> +/* There must be infinitely many transitions of event as long >> + * as the notifier does not exit. >> + * >> + * If event stayed always true, the waiters would be busy looping. >> + * If event stayed always false, the waiters would be sleeping >> + * forever. >> + */ >> +never { >> + do >> + :: !event -> goto accept_if_event_not_eventually_true; >> + :: event -> goto accept_if_event_not_eventually_false; >> + :: true -> skip; >> + od; >> + >> +accept_if_event_not_eventually_true: >> + if >> + :: !event && notifier_done -> do :: true -> skip; od; >> + :: !event && !notifier_done -> goto accept_if_event_not_eventually_true; >> + fi; >> + assert(0); >> + >> +accept_if_event_not_eventually_false: >> + if >> + :: event -> goto accept_if_event_not_eventually_false; >> + fi; >> + assert(0); >> +} >> +#endif >> -- >> 2.4.3 >> >> > > ^ permalink raw reply [flat|nested] 8+ messages in thread
* [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier 2015-07-18 20:21 [Qemu-devel] [PATCH 0/2] AioContext: fix missing wakeups due to event_notifier_test_and_clear Paolo Bonzini 2015-07-18 20:21 ` [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear Paolo Bonzini @ 2015-07-18 20:21 ` Paolo Bonzini 2015-07-20 2:27 ` Fam Zheng 1 sibling, 1 reply; 8+ messages in thread From: Paolo Bonzini @ 2015-07-18 20:21 UTC (permalink / raw) To: qemu-devel; +Cc: kwolf, famz, lersek, rjones, stefanha It is pretty rare for aio_notify to actually set the EventNotifier. It can happen with worker threads such as thread-pool.c's, but otherwise it should never be set thanks to the ctx->notify_me optimization. The previous patch, unfortunately, added an unconditional call to event_notifier_test_and_clear; now add a userspace fast path that avoids the call. Note that it is not possible to do the same with event_notifier_set; it would break, as proved (again) by the included formal model. This patch survived over 800 reboots on aarch64 KVM. Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> --- aio-posix.c | 2 +- aio-win32.c | 2 +- async.c | 10 ++- docs/aio_notify_accept.promela | 152 +++++++++++++++++++++++++++++++++++++++++ include/block/aio.h | 32 ++++++++- 5 files changed, 194 insertions(+), 4 deletions(-) create mode 100644 docs/aio_notify_accept.promela diff --git a/aio-posix.c b/aio-posix.c index 5c8b266..d477033 100644 --- a/aio-posix.c +++ b/aio-posix.c @@ -276,7 +276,7 @@ bool aio_poll(AioContext *ctx, bool blocking) aio_context_acquire(ctx); } - event_notifier_test_and_clear(&ctx->notifier); + aio_notify_accept(ctx); /* if we have any readable fds, dispatch event */ if (ret > 0) { diff --git a/aio-win32.c b/aio-win32.c index 3e0db20..9e6eb71 100644 --- a/aio-win32.c +++ b/aio-win32.c @@ -337,7 +337,7 @@ bool aio_poll(AioContext *ctx, bool blocking) aio_context_acquire(ctx); } - event_notifier_test_and_clear(&ctx->notifier); + aio_notify_accept(ctx); if (first && aio_bh_poll(ctx)) { progress = true; diff --git a/async.c b/async.c index d625e8a..9a98a74 100644 --- a/async.c +++ b/async.c @@ -203,7 +203,7 @@ aio_ctx_check(GSource *source) QEMUBH *bh; atomic_and(&ctx->notify_me, ~1); - event_notifier_test_and_clear(&ctx->notifier); + aio_notify_accept(ctx); for (bh = ctx->first_bh; bh; bh = bh->next) { if (!bh->deleted && bh->scheduled) { @@ -267,6 +267,14 @@ void aio_notify(AioContext *ctx) smp_mb(); if (ctx->notify_me) { event_notifier_set(&ctx->notifier); + atomic_mb_set(&ctx->notified, true); + } +} + +void aio_notify_accept(AioContext *ctx) +{ + if (atomic_xchg(&ctx->notified, false)) { + event_notifier_test_and_clear(&ctx->notifier); } } diff --git a/docs/aio_notify_accept.promela b/docs/aio_notify_accept.promela new file mode 100644 index 0000000..9cef2c9 --- /dev/null +++ b/docs/aio_notify_accept.promela @@ -0,0 +1,152 @@ +/* + * This model describes the interaction between ctx->notified + * and ctx->notifier. + * + * Author: Paolo Bonzini <pbonzini@redhat.com> + * + * This file is in the public domain. If you really want a license, + * the WTFPL will do. + * + * To verify the buggy version: + * spin -a -DBUG1 docs/aio_notify_bug.promela + * gcc -O2 pan.c + * ./a.out -a -f + * (or -DBUG2) + * + * To verify the fixed version: + * spin -a docs/aio_notify_bug.promela + * gcc -O2 pan.c + * ./a.out -a -f + * + * Add -DCHECK_REQ to test an alternative invariant and the + * "notify_me" optimization. + */ + +int notify_me; +bool notified; +bool event; +bool req; +bool notifier_done; + +#ifdef CHECK_REQ +#define USE_NOTIFY_ME 1 +#else +#define USE_NOTIFY_ME 0 +#endif + +#ifdef BUG +#error Please define BUG1 or BUG2 instead. +#endif + +active proctype notifier() +{ + do + :: true -> { + req = 1; + if + :: !USE_NOTIFY_ME || notify_me -> +#if defined BUG1 + /* CHECK_REQ does not detect this bug! */ + notified = 1; + event = 1; +#elif defined BUG2 + if + :: !notified -> event = 1; + :: else -> skip; + fi; + notified = 1; +#else + event = 1; + notified = 1; +#endif + :: else -> skip; + fi + } + :: true -> break; + od; + notifier_done = 1; +} + +#define AIO_POLL \ + notify_me++; \ + if \ + :: !req -> { \ + if \ + :: event -> skip; \ + fi; \ + } \ + :: else -> skip; \ + fi; \ + notify_me--; \ + \ + atomic { old = notified; notified = 0; } \ + if \ + :: old -> event = 0; \ + :: else -> skip; \ + fi; \ + \ + req = 0; + +active proctype waiter() +{ + bool old; + + do + :: true -> AIO_POLL; + od; +} + +/* Same as waiter(), but disappears after a while. */ +active proctype temporary_waiter() +{ + bool old; + + do + :: true -> AIO_POLL; + :: true -> break; + od; +} + +#ifdef CHECK_REQ +never { + do + :: req -> goto accept_if_req_not_eventually_false; + :: true -> skip; + od; + +accept_if_req_not_eventually_false: + if + :: req -> goto accept_if_req_not_eventually_false; + fi; + assert(0); +} + +#else +/* There must be infinitely many transitions of event as long + * as the notifier does not exit. + * + * If event stayed always true, the waiters would be busy looping. + * If event stayed always false, the waiters would be sleeping + * forever. + */ +never { + do + :: !event -> goto accept_if_event_not_eventually_true; + :: event -> goto accept_if_event_not_eventually_false; + :: true -> skip; + od; + +accept_if_event_not_eventually_true: + if + :: !event && notifier_done -> do :: true -> skip; od; + :: !event && !notifier_done -> goto accept_if_event_not_eventually_true; + fi; + assert(0); + +accept_if_event_not_eventually_false: + if + :: event -> goto accept_if_event_not_eventually_false; + fi; + assert(0); +} +#endif diff --git a/include/block/aio.h b/include/block/aio.h index be91e3f..9dd32e0 100644 --- a/include/block/aio.h +++ b/include/block/aio.h @@ -99,7 +99,19 @@ struct AioContext { */ int walking_bh; - /* Used for aio_notify. */ + /* Used by aio_notify. + * + * "notified" is used to avoid expensive event_notifier_test_and_clear + * calls. When it is clear, the EventNotifier is clear, or one thread + * is going to clear "notified" before processing more events. False + * positives are possible, i.e. "notified" could be set even though the + * EventNotifier is clear. + * + * Note that event_notifier_set *cannot* be optimized the same way. For + * more information on the problem that would result, see "#ifdef BUG2" + * in the docs/aio_notify_accept.promela formal model. + */ + bool notified; EventNotifier notifier; /* Thread pool for performing work and receiving completion callbacks */ @@ -174,6 +186,24 @@ QEMUBH *aio_bh_new(AioContext *ctx, QEMUBHFunc *cb, void *opaque); void aio_notify(AioContext *ctx); /** + * aio_notify_accept: Acknowledge receiving an aio_notify. + * + * aio_notify() uses an EventNotifier in order to wake up a sleeping + * aio_poll() or g_main_context_iteration(). Calls to aio_notify() are + * usually rare, but the AioContext has to clear the EventNotifier on + * every aio_poll() or g_main_context_iteration() in order to avoid + * busy waiting. This event_notifier_test_and_clear() cannot be done + * using the usual aio_context_set_event_notifier(), because it must + * be done before processing all events (file descriptors, bottom halves, + * timers). + * + * aio_notify_accept() is an optimized event_notifier_test_and_clear() + * that is specific to an AioContext's notifier; it is used internally + * to clear the EventNotifier only if aio_notify() had been called. + */ +void aio_notify_accept(AioContext *ctx); + +/** * aio_bh_poll: Poll bottom halves for an AioContext. * * These are internal functions used by the QEMU main loop. -- 2.4.3 ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier 2015-07-18 20:21 ` [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier Paolo Bonzini @ 2015-07-20 2:27 ` Fam Zheng 2015-07-20 5:25 ` Paolo Bonzini 0 siblings, 1 reply; 8+ messages in thread From: Fam Zheng @ 2015-07-20 2:27 UTC (permalink / raw) To: Paolo Bonzini; +Cc: kwolf, lersek, qemu-devel, stefanha, rjones On Sat, 07/18 22:21, Paolo Bonzini wrote: > It is pretty rare for aio_notify to actually set the EventNotifier. It > can happen with worker threads such as thread-pool.c's, but otherwise it > should never be set thanks to the ctx->notify_me optimization. The > previous patch, unfortunately, added an unconditional call to > event_notifier_test_and_clear; now add a userspace fast path that > avoids the call. > > Note that it is not possible to do the same with event_notifier_set; > it would break, as proved (again) by the included formal model. > > This patch survived over 800 reboots on aarch64 KVM. For aio-posix, how about keeping the optimization local which doesn't need atomic operation? (no idea for win32 :) diff --git a/aio-posix.c b/aio-posix.c index 5c8b266..7e98123 100644 --- a/aio-posix.c +++ b/aio-posix.c @@ -236,6 +236,7 @@ bool aio_poll(AioContext *ctx, bool blocking) int i, ret; bool progress; int64_t timeout; + int aio_notifier_idx = -1; aio_context_acquire(ctx); progress = false; @@ -256,11 +257,18 @@ bool aio_poll(AioContext *ctx, bool blocking) assert(npfd == 0); /* fill pollfds */ + i = 0; QLIST_FOREACH(node, &ctx->aio_handlers, node) { if (!node->deleted && node->pfd.events) { add_pollfd(node); + if (node->pfd.fd == event_notifier_get_fd(&ctx->notifier)) { + assert(aio_notifier_idx == -1); + aio_notifier_idx = i; + } + i++; } } + assert(aio_notifier_idx != -1); timeout = blocking ? aio_compute_timeout(ctx) : 0; @@ -276,7 +284,9 @@ bool aio_poll(AioContext *ctx, bool blocking) aio_context_acquire(ctx); } - event_notifier_test_and_clear(&ctx->notifier); + if (pollfds[aio_notifier_idx].revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) { + event_notifier_test_and_clear(&ctx->notifier); + } /* if we have any readable fds, dispatch event */ if (ret > 0) { ^ permalink raw reply related [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier 2015-07-20 2:27 ` Fam Zheng @ 2015-07-20 5:25 ` Paolo Bonzini 2015-07-20 5:34 ` Paolo Bonzini 0 siblings, 1 reply; 8+ messages in thread From: Paolo Bonzini @ 2015-07-20 5:25 UTC (permalink / raw) To: Fam Zheng; +Cc: kwolf, lersek, qemu-devel, stefanha, rjones On 20/07/2015 04:27, Fam Zheng wrote: > For aio-posix, how about keeping the optimization local which doesn't need > atomic operation? (no idea for win32 :) > > diff --git a/aio-posix.c b/aio-posix.c > index 5c8b266..7e98123 100644 > --- a/aio-posix.c > +++ b/aio-posix.c > @@ -236,6 +236,7 @@ bool aio_poll(AioContext *ctx, bool blocking) > int i, ret; > bool progress; > int64_t timeout; > + int aio_notifier_idx = -1; > > aio_context_acquire(ctx); > progress = false; > @@ -256,11 +257,18 @@ bool aio_poll(AioContext *ctx, bool blocking) > assert(npfd == 0); > > /* fill pollfds */ > + i = 0; > QLIST_FOREACH(node, &ctx->aio_handlers, node) { > if (!node->deleted && node->pfd.events) { > add_pollfd(node); > + if (node->pfd.fd == event_notifier_get_fd(&ctx->notifier)) { > + assert(aio_notifier_idx == -1); > + aio_notifier_idx = i; > + } > + i++; > } > } That's a good idea. Since aio_set_fd_handler uses QLIST_INSERT_HEAD, perhaps we can be sure that aio_notifier_idx is always the last one (i.e. i-1)? And the same can be done on Windows, I think. > + if (pollfds[aio_notifier_idx].revents & (G_IO_IN | G_IO_HUP | G_IO_ERR)) { > + event_notifier_test_and_clear(&ctx->notifier); > + } Might as well zero pollfds[aio_notifier_idx].revents after clearing it. The atomic operation is not too expensive, so it would really be more about simplicity than about speed. Not that simplicity is a bad thing! I'll send v2 of the first patch, you can look at doing the optimization. Thanks! Paolo ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier 2015-07-20 5:25 ` Paolo Bonzini @ 2015-07-20 5:34 ` Paolo Bonzini 0 siblings, 0 replies; 8+ messages in thread From: Paolo Bonzini @ 2015-07-20 5:34 UTC (permalink / raw) To: Fam Zheng; +Cc: kwolf, lersek, qemu-devel, stefanha, rjones On 20/07/2015 07:25, Paolo Bonzini wrote: > > /* fill pollfds */ > > + i = 0; > > QLIST_FOREACH(node, &ctx->aio_handlers, node) { > > if (!node->deleted && node->pfd.events) { > > add_pollfd(node); > > + if (node->pfd.fd == event_notifier_get_fd(&ctx->notifier)) { > > + assert(aio_notifier_idx == -1); > > + aio_notifier_idx = i; > > + } > > + i++; > > } > > } > > That's a good idea. Since aio_set_fd_handler uses QLIST_INSERT_HEAD, > perhaps we can be sure that aio_notifier_idx is always the last one > (i.e. i-1)? And the same can be done on Windows, I think. BTW, if this is not true I think I prefer the optimization with atomics. But if it's true that we can just use i-1, this one is better. Paolo ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2015-07-20 5:34 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-07-18 20:21 [Qemu-devel] [PATCH 0/2] AioContext: fix missing wakeups due to event_notifier_test_and_clear Paolo Bonzini 2015-07-18 20:21 ` [Qemu-devel] [PATCH 1/2] AioContext: fix broken placement of event_notifier_test_and_clear Paolo Bonzini 2015-07-20 3:55 ` Fam Zheng 2015-07-20 5:32 ` Paolo Bonzini 2015-07-18 20:21 ` [Qemu-devel] [PATCH 2/2] AioContext: optimize clearing the EventNotifier Paolo Bonzini 2015-07-20 2:27 ` Fam Zheng 2015-07-20 5:25 ` Paolo Bonzini 2015-07-20 5:34 ` Paolo Bonzini
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).