* Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane [not found] <1401561792-13410-1-git-send-email-mreitz@redhat.com> @ 2014-06-03 14:38 ` Stefan Hajnoczi 2014-06-05 17:41 ` Max Reitz [not found] ` <1401561792-13410-2-git-send-email-mreitz@redhat.com> ` (5 subsequent siblings) 6 siblings, 1 reply; 20+ messages in thread From: Stefan Hajnoczi @ 2014-06-03 14:38 UTC (permalink / raw) To: Max Reitz Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On Sat, May 31, 2014 at 08:43:07PM +0200, Max Reitz wrote: > For the NBD server to work with dataplane, it needs to correctly access > the exported BDS. It makes the most sense to run both in the same > AioContext, therefore this series implements methods for tracking a > BDS's AioContext and makes NBD make use of this for keeping the clients > connected to that BDS in the same AioContext. > > The reason this is an RFC and not a PATCH is my inexperience with AIO, > coroutines and the like. Also, I'm not sure about what to do about the > coroutines. The NBD server has up to two coroutines per client: One for > receiving and one for sending. Theoretically, both have to be > "transferred" to the new AioContext if it is changed; however, as far as > I see it, coroutines are not really bound to an AioContext, they are > simply run in the AioContext entering them. Therefore, I think a > transfer is unnecessary. All coroutines are entered from nbd_read() and > nbd_restart_write(), both of which are AIO routines registered via > aio_set_fd_handler2(). > > As bs_aio_detach() unregisters all of these routines, the coroutines can > no longer be entered, but only after bs_aio_attach() is called again. > Then, when they are called, they will enter the coroutines in the new > AioContext. Therefore, I think an explicit transfer unnecessary. This reasoning sounds correct. > However, if bs_aio_detach() is called from a different thread than the > old AioContext is running in, we may still have coroutines running for > which we should wait before returning from bs_aio_detach(). The bdrv_attach/detach_aio_context() APIs have rules regarding where these functions are called from: /** * bdrv_set_aio_context: * * Changes the #AioContext used for fd handlers, timers, and BHs by this * BlockDriverState and all its children. * * This function must be called from the old #AioContext or with a lock held so * the old #AioContext is not executing. */ void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context); and: /* Remove fd handlers, timers, and other event loop callbacks so the event * loop is no longer in use. Called with no in-flight requests and in * depth-first traversal order with parents before child nodes. */ void (*bdrv_detach_aio_context)(BlockDriverState *bs); /* Add fd handlers, timers, and other event loop callbacks so I/O requests * can be processed again. Called with no in-flight requests and in * depth-first traversal order with child nodes before parent nodes. */ void (*bdrv_attach_aio_context)(BlockDriverState *bs, AioContext *new_context); These rules ensure that it's safe to perform these operations. You don't have to support arbitrary callers in NBD either. > But because of my inexperience with coroutines, I'm not sure. I now have > these patches nearly unchanged here for about a week and I'm looking for > ways of testing them, but so far I could only test whether the old use > cases work, but not whether they will work for what they are intended to > do: With BDS changing their AioContext. > > So, because I'm not sure what else to do and because I don't know how to > test multiple AIO threads (how do I move a BDS into another iothread?) > I'm just sending this out as an RFC. Use a Linux guest with virtio-blk: qemu -drive if=none,file=test.img,id=drive0 \ -object iothread,id=iothread0 \ -device virtio-blk-pci,drive=drive0,x-iothread=iothread0 \ ... Once the guest has booted the virtio-blk device will be in dataplane mode. That means drive0's BlockDriverState ->aio_context will be the IOThread AioContext and not the global qemu_aio_context. Now you can exercise the run-time NBD server over QMP and check that things still work. For example, try running a few instance of dd if=/dev/vdb of=/dev/null iflag=direct inside the guest to stress guest I/O. Typically what happens if code is not dataplane-aware is that a deadlock or crash occurs due to race conditions between the QEMU main loop and the IOThread for this virtio-blk device. For an overview of dataplane programming concepts, see: https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg01436.html Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane 2014-06-03 14:38 ` [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane Stefan Hajnoczi @ 2014-06-05 17:41 ` Max Reitz 0 siblings, 0 replies; 20+ messages in thread From: Max Reitz @ 2014-06-05 17:41 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On 03.06.2014 16:38, Stefan Hajnoczi wrote: > On Sat, May 31, 2014 at 08:43:07PM +0200, Max Reitz wrote: >> For the NBD server to work with dataplane, it needs to correctly access >> the exported BDS. It makes the most sense to run both in the same >> AioContext, therefore this series implements methods for tracking a >> BDS's AioContext and makes NBD make use of this for keeping the clients >> connected to that BDS in the same AioContext. >> >> The reason this is an RFC and not a PATCH is my inexperience with AIO, >> coroutines and the like. Also, I'm not sure about what to do about the >> coroutines. The NBD server has up to two coroutines per client: One for >> receiving and one for sending. Theoretically, both have to be >> "transferred" to the new AioContext if it is changed; however, as far as >> I see it, coroutines are not really bound to an AioContext, they are >> simply run in the AioContext entering them. Therefore, I think a >> transfer is unnecessary. All coroutines are entered from nbd_read() and >> nbd_restart_write(), both of which are AIO routines registered via >> aio_set_fd_handler2(). >> >> As bs_aio_detach() unregisters all of these routines, the coroutines can >> no longer be entered, but only after bs_aio_attach() is called again. >> Then, when they are called, they will enter the coroutines in the new >> AioContext. Therefore, I think an explicit transfer unnecessary. > This reasoning sounds correct. > >> However, if bs_aio_detach() is called from a different thread than the >> old AioContext is running in, we may still have coroutines running for >> which we should wait before returning from bs_aio_detach(). > The bdrv_attach/detach_aio_context() APIs have rules regarding where > these functions are called from: > > /** > * bdrv_set_aio_context: > * > * Changes the #AioContext used for fd handlers, timers, and BHs by this > * BlockDriverState and all its children. > * > * This function must be called from the old #AioContext or with a lock held so > * the old #AioContext is not executing. Oh, that makes things easier. *g* > */ > void bdrv_set_aio_context(BlockDriverState *bs, AioContext *new_context); > > and: > > /* Remove fd handlers, timers, and other event loop callbacks so the event > * loop is no longer in use. Called with no in-flight requests and in > * depth-first traversal order with parents before child nodes. > */ > void (*bdrv_detach_aio_context)(BlockDriverState *bs); > > /* Add fd handlers, timers, and other event loop callbacks so I/O requests > * can be processed again. Called with no in-flight requests and in > * depth-first traversal order with child nodes before parent nodes. > */ > void (*bdrv_attach_aio_context)(BlockDriverState *bs, > AioContext *new_context); > > These rules ensure that it's safe to perform these operations. You > don't have to support arbitrary callers in NBD either. > >> But because of my inexperience with coroutines, I'm not sure. I now have >> these patches nearly unchanged here for about a week and I'm looking for >> ways of testing them, but so far I could only test whether the old use >> cases work, but not whether they will work for what they are intended to >> do: With BDS changing their AioContext. >> >> So, because I'm not sure what else to do and because I don't know how to >> test multiple AIO threads (how do I move a BDS into another iothread?) >> I'm just sending this out as an RFC. > Use a Linux guest with virtio-blk: > > qemu -drive if=none,file=test.img,id=drive0 \ > -object iothread,id=iothread0 \ > -device virtio-blk-pci,drive=drive0,x-iothread=iothread0 \ > ... > > Once the guest has booted the virtio-blk device will be in dataplane > mode. That means drive0's BlockDriverState ->aio_context will be the > IOThread AioContext and not the global qemu_aio_context. Ah, thank you. > Now you can exercise the run-time NBD server over QMP and check that > things still work. For example, try running a few instance of dd > if=/dev/vdb of=/dev/null iflag=direct inside the guest to stress guest > I/O. > > Typically what happens if code is not dataplane-aware is that a deadlock > or crash occurs due to race conditions between the QEMU main loop and > the IOThread for this virtio-blk device. > > For an overview of dataplane programming concepts, see: > https://lists.gnu.org/archive/html/qemu-devel/2014-05/msg01436.html Yes, I took this email as a reference, however it only said how to create a new iothread, but not how to use it. :-) Max > Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <1401561792-13410-2-git-send-email-mreitz@redhat.com>]
* Re: [Qemu-devel] [RFC 1/5] nbd: Correct name comparison for export_set_name() [not found] ` <1401561792-13410-2-git-send-email-mreitz@redhat.com> @ 2014-06-04 11:52 ` Stefan Hajnoczi 2014-06-05 17:28 ` Max Reitz 0 siblings, 1 reply; 20+ messages in thread From: Stefan Hajnoczi @ 2014-06-04 11:52 UTC (permalink / raw) To: Max Reitz Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On Sat, May 31, 2014 at 08:43:08PM +0200, Max Reitz wrote: > exp->name == name is certainly true if both strings are equal and will > work for both of them being NULL (which is important to check here); > however, the strings may also be equal without having the same address, > in which case there is no need to replace the export's name either. > Therefore, add a check for this case. > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > nbd.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/nbd.c b/nbd.c > index e5084b6..0787cba 100644 > --- a/nbd.c > +++ b/nbd.c > @@ -832,7 +832,7 @@ NBDExport *nbd_export_find(const char *name) > > void nbd_export_set_name(NBDExport *exp, const char *name) > { > - if (exp->name == name) { > + if (exp->name == name || (exp->name && name && !strcmp(exp->name, name))) { > return; > } It's not clear to me why we even bother. The function is idempotent and there are only 2 call sites in QEMU. This is not a performance-critical function where it helps to bail early. Can we just drop the if statement completely? void nbd_export_set_name(NBDExport *exp, const char *name) { if (exp->name == name) { return; } nbd_export_get(exp); if (exp->name != NULL) { g_free(exp->name); exp->name = NULL; QTAILQ_REMOVE(&exports, exp, next); nbd_export_put(exp); } if (name != NULL) { nbd_export_get(exp); exp->name = g_strdup(name); QTAILQ_INSERT_TAIL(&exports, exp, next); } nbd_export_put(exp); } ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 1/5] nbd: Correct name comparison for export_set_name() 2014-06-04 11:52 ` [Qemu-devel] [RFC 1/5] nbd: Correct name comparison for export_set_name() Stefan Hajnoczi @ 2014-06-05 17:28 ` Max Reitz 0 siblings, 0 replies; 20+ messages in thread From: Max Reitz @ 2014-06-05 17:28 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On 04.06.2014 13:52, Stefan Hajnoczi wrote: > On Sat, May 31, 2014 at 08:43:08PM +0200, Max Reitz wrote: >> exp->name == name is certainly true if both strings are equal and will >> work for both of them being NULL (which is important to check here); >> however, the strings may also be equal without having the same address, >> in which case there is no need to replace the export's name either. >> Therefore, add a check for this case. >> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> nbd.c | 2 +- >> 1 file changed, 1 insertion(+), 1 deletion(-) >> >> diff --git a/nbd.c b/nbd.c >> index e5084b6..0787cba 100644 >> --- a/nbd.c >> +++ b/nbd.c >> @@ -832,7 +832,7 @@ NBDExport *nbd_export_find(const char *name) >> >> void nbd_export_set_name(NBDExport *exp, const char *name) >> { >> - if (exp->name == name) { >> + if (exp->name == name || (exp->name && name && !strcmp(exp->name, name))) { >> return; >> } > It's not clear to me why we even bother. The function is idempotent and > there are only 2 call sites in QEMU. This is not a performance-critical > function where it helps to bail early. You're probably right. I just happened to stumble over this code when looking into NBD. > Can we just drop the if statement completely? Probably, yes, but then again, I think it's not worth bothering with dropping it, either. ;-) Max > void nbd_export_set_name(NBDExport *exp, const char *name) > { > if (exp->name == name) { > return; > } > > nbd_export_get(exp); > if (exp->name != NULL) { > g_free(exp->name); > exp->name = NULL; > QTAILQ_REMOVE(&exports, exp, next); > nbd_export_put(exp); > } > if (name != NULL) { > nbd_export_get(exp); > exp->name = g_strdup(name); > QTAILQ_INSERT_TAIL(&exports, exp, next); > } > nbd_export_put(exp); > } ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <1401561792-13410-3-git-send-email-mreitz@redhat.com>]
* Re: [Qemu-devel] [RFC 2/5] aio: Add io_read_poll() callback [not found] ` <1401561792-13410-3-git-send-email-mreitz@redhat.com> @ 2014-06-03 17:55 ` Paolo Bonzini 2014-06-05 17:29 ` Max Reitz 2014-06-04 11:59 ` Stefan Hajnoczi 1 sibling, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2014-06-03 17:55 UTC (permalink / raw) To: Max Reitz, qemu-devel; +Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi Il 31/05/2014 20:43, Max Reitz ha scritto: > @@ -108,7 +120,9 @@ bool aio_pending(AioContext *ctx) > int revents; > > revents = node->pfd.revents & node->pfd.events; > - if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read) { > + if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && node->io_read && > + (!node->io_read_poll || node->io_read_poll(node->opaque))) > + { > return true; > } This would cause a busy loop whenever io_read_poll returns false. You would need to call io_read_poll when you fill the pollfds array, i.e. in aio_poll and aio_ctx_prepare. Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 2/5] aio: Add io_read_poll() callback 2014-06-03 17:55 ` [Qemu-devel] [RFC 2/5] aio: Add io_read_poll() callback Paolo Bonzini @ 2014-06-05 17:29 ` Max Reitz 0 siblings, 0 replies; 20+ messages in thread From: Max Reitz @ 2014-06-05 17:29 UTC (permalink / raw) To: Paolo Bonzini, qemu-devel; +Cc: Kevin Wolf, Fam Zheng, Stefan Hajnoczi On 03.06.2014 19:55, Paolo Bonzini wrote: > Il 31/05/2014 20:43, Max Reitz ha scritto: >> @@ -108,7 +120,9 @@ bool aio_pending(AioContext *ctx) >> int revents; >> >> revents = node->pfd.revents & node->pfd.events; >> - if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && >> node->io_read) { >> + if (revents & (G_IO_IN | G_IO_HUP | G_IO_ERR) && >> node->io_read && >> + (!node->io_read_poll || node->io_read_poll(node->opaque))) >> + { >> return true; >> } > > This would cause a busy loop whenever io_read_poll returns false. You > would need to call io_read_poll when you fill the pollfds array, i.e. > in aio_poll and aio_ctx_prepare. Thanks; considering Stefan's comment, I'll see whether I can get away without implement io_read_poll(). Max > Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 2/5] aio: Add io_read_poll() callback [not found] ` <1401561792-13410-3-git-send-email-mreitz@redhat.com> 2014-06-03 17:55 ` [Qemu-devel] [RFC 2/5] aio: Add io_read_poll() callback Paolo Bonzini @ 2014-06-04 11:59 ` Stefan Hajnoczi 1 sibling, 0 replies; 20+ messages in thread From: Stefan Hajnoczi @ 2014-06-04 11:59 UTC (permalink / raw) To: Max Reitz Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On Sat, May 31, 2014 at 08:43:09PM +0200, Max Reitz wrote: > Similar to how qemu_set_fd_handler2() allows defining a function which > checks whether the recipient is ready to read, add aio_set_fd_handler2() > with a parameter for defining such a callback. > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > aio-posix.c | 26 ++++++++++++++++++++------ > include/block/aio.h | 12 ++++++++++++ > include/qemu/main-loop.h | 1 - > 3 files changed, 32 insertions(+), 7 deletions(-) > > diff --git a/aio-posix.c b/aio-posix.c > index f921d4f..267df35 100644 > --- a/aio-posix.c > +++ b/aio-posix.c > @@ -21,6 +21,7 @@ > struct AioHandler > { > GPollFD pfd; > + IOCanReadHandler *io_read_poll; I don't like the io_read_poll() mechanism because it is easy to deadlock. In the network layer these problems are common because it relies on io_read_poll(). Any piece of code that affects the return value of io_read_poll() must remember to call aio_notify() to force a new event loop iteration. Otherwise the thread stays stuck in poll(2) and fails to add the file descriptor back to its fdset. If you really need io_read_poll() then it's fine. Just wanted to say that code using io_read_poll() can be tricky. Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <1401561792-13410-4-git-send-email-mreitz@redhat.com>]
* Re: [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() [not found] ` <1401561792-13410-4-git-send-email-mreitz@redhat.com> @ 2014-06-04 12:37 ` Stefan Hajnoczi 2014-06-04 18:02 ` Paolo Bonzini 2014-06-05 18:18 ` Max Reitz 0 siblings, 2 replies; 20+ messages in thread From: Stefan Hajnoczi @ 2014-06-04 12:37 UTC (permalink / raw) To: Max Reitz Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On Sat, May 31, 2014 at 08:43:10PM +0200, Max Reitz wrote: > Instead of using the main loop function qemu_set_fd_handler2(), use the > AIO function in the context of the exported BDS. Managing fd handlers shouldn't be necessary at the NBD code level. The current NBD code hasn't been fully converted to coroutines. This email explains how the NBD code can be fully converted to coroutines. It should simplify the code and reduce the chance of bugs. Whether you want to actually do the conversion is up to you, since it's somewhat orthogonal to the purpose of this patch series. The point of coroutines is that blocking operations like send/recv on a socket should look like regular blocking calls. Let coroutines handle the event loop housekeeping (registering fd handlers, callbacks). Only use aio explicitly when concurrency is needed. Here is how I would structure NBD using coroutines: 1 coroutine per connection to receive NBD commands and submit I/O requests: def nbd_server_receive_co(conn): while True: req = nbd_read_req(conn) if req is None: break if req.type == NBD_READ: bdrv_aio_readv(bs, ...) elif req.type == NBD_WRITE: ... Notice that bdrv_aio_*() is used since we want concurrent I/O requests. 1 coroutine per connection to send NBD replies: def nbd_server_send_co(conn): while True: while conn.send_queue: resp = conn.send_queue.pop() nbd_write_resp(conn, resp) qemu_coroutine_yield() And finally the bdrv_aio_*() callback to put responses on to the send queue: def nbd_server_aio_cb(conn): resp = NBDResponse(...) conn.send_queue.push(resp) conn.send_co.enter() Why is this design cleaner? Because NBD code doesn't have to worry about fd handlers. It uses straightforward coroutine send/recv for socket I/O inside nbd_read_req() and nbd_write_resp(). It's easy to see that only one coroutine receives from the socket and that only one coroutine writes to the socket. Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() 2014-06-04 12:37 ` [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() Stefan Hajnoczi @ 2014-06-04 18:02 ` Paolo Bonzini 2014-06-05 8:12 ` Stefan Hajnoczi 2014-06-05 18:18 ` Max Reitz 1 sibling, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2014-06-04 18:02 UTC (permalink / raw) To: Stefan Hajnoczi, Max Reitz Cc: Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi Il 04/06/2014 14:37, Stefan Hajnoczi ha scritto: > Why is this design cleaner? Because NBD code doesn't have to worry > about fd handlers. It uses straightforward coroutine send/recv for > socket I/O inside nbd_read_req() and nbd_write_resp(). It's easy to see > that only one coroutine receives from the socket and that only one > coroutine writes to the socket. I don't understand how this would work without managing fd handlers. - If you don't want to receive anything (because you reached the maximum queue depth), and the client sends something, the read handler will busy wait. The current code solves it with can_read; you could do it by enabling or disabling the read handler as you need, but the idea is the same. - If you're not sending anything, a socket that has a write handler will have POLLOUT continuously signaled and again you'd busy wait. Since there is no can_write, nbd.c instead enables/disables the write handler around nbd_co_send_reply. Removing can_read from the NBD server is simpler than Max's patch 2, so that's what I suggest. The NBD server code uses a coroutine per request. Using one coroutine per direction is nice, but doesn't avert the need to manage fd handlers. Basically, it boils down to select(2) being level-triggered. If it were edge-triggered, of course you could have other bugs but you probably could write network code without managing fd handlers. Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() 2014-06-04 18:02 ` Paolo Bonzini @ 2014-06-05 8:12 ` Stefan Hajnoczi 2014-06-05 9:27 ` Paolo Bonzini 0 siblings, 1 reply; 20+ messages in thread From: Stefan Hajnoczi @ 2014-06-05 8:12 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Stefan Hajnoczi, Fam Zheng, qemu-devel, Max Reitz On Wed, Jun 04, 2014 at 08:02:06PM +0200, Paolo Bonzini wrote: > Il 04/06/2014 14:37, Stefan Hajnoczi ha scritto: > >Why is this design cleaner? Because NBD code doesn't have to worry > >about fd handlers. It uses straightforward coroutine send/recv for > >socket I/O inside nbd_read_req() and nbd_write_resp(). It's easy to see > >that only one coroutine receives from the socket and that only one > >coroutine writes to the socket. > > I don't understand how this would work without managing fd handlers. fd handlers still need to be managed, but not by NBD code. They must be managed by coroutine recv/send utility functions. In other words, fd handlers are used locally, not globally. def co_recv(fd, buf): while True: nbytes = recv(fd, buf, len(buf)) if nbytes == -1: if errno == EINTR: continue if errno == EAGAIN or errno == EWOULDBLOCK: aio_set_fd_read_handler(fd, co_recv_cb) qemu_coroutine_yield() aio_set_fd_read_handler(fd, NULL) continue return nbytes The send function is similar. This does require an extension to the AioContext API. We need to be able to modify the read/write callback independently without clobbering the other callback. This way full duplex I/O is possible. > - If you don't want to receive anything (because you reached the maximum > queue depth), and the client sends something, the read handler will busy > wait. The current code solves it with can_read; you could do it by enabling > or disabling the read handler as you need, but the idea is the same. > > - If you're not sending anything, a socket that has a write handler will > have POLLOUT continuously signaled and again you'd busy wait. Since there > is no can_write, nbd.c instead enables/disables the write handler around > nbd_co_send_reply. You only install an fd handler when you want to read/write. This does mean that the request coroutine needs to be woken up by the response coroutine if we were at maximum queue depth. Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() 2014-06-05 8:12 ` Stefan Hajnoczi @ 2014-06-05 9:27 ` Paolo Bonzini 2014-06-05 13:32 ` Stefan Hajnoczi 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2014-06-05 9:27 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Stefan Hajnoczi, Fam Zheng, qemu-devel, Max Reitz Il 05/06/2014 10:12, Stefan Hajnoczi ha scritto: > On Wed, Jun 04, 2014 at 08:02:06PM +0200, Paolo Bonzini wrote: >> Il 04/06/2014 14:37, Stefan Hajnoczi ha scritto: >>> Why is this design cleaner? Because NBD code doesn't have to worry >>> about fd handlers. It uses straightforward coroutine send/recv for >>> socket I/O inside nbd_read_req() and nbd_write_resp(). It's easy to see >>> that only one coroutine receives from the socket and that only one >>> coroutine writes to the socket. >> >> I don't understand how this would work without managing fd handlers. > > fd handlers still need to be managed, but not by NBD code. They must be > managed by coroutine recv/send utility functions. In other words, fd > handlers are used locally, not globally. > > def co_recv(fd, buf): > while True: > nbytes = recv(fd, buf, len(buf)) > if nbytes == -1: > if errno == EINTR: > continue > if errno == EAGAIN or errno == EWOULDBLOCK: > aio_set_fd_read_handler(fd, co_recv_cb) > qemu_coroutine_yield() > aio_set_fd_read_handler(fd, NULL) > continue > return nbytes > > The send function is similar. I see what you mean now---this however is not how qemu_co_recv and qemu_co_send work. NBD uses them, but they require the caller to set up the handlers manually. Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() 2014-06-05 9:27 ` Paolo Bonzini @ 2014-06-05 13:32 ` Stefan Hajnoczi 0 siblings, 0 replies; 20+ messages in thread From: Stefan Hajnoczi @ 2014-06-05 13:32 UTC (permalink / raw) To: Paolo Bonzini Cc: Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi, Max Reitz On Thu, Jun 05, 2014 at 11:27:49AM +0200, Paolo Bonzini wrote: > Il 05/06/2014 10:12, Stefan Hajnoczi ha scritto: > >On Wed, Jun 04, 2014 at 08:02:06PM +0200, Paolo Bonzini wrote: > >>Il 04/06/2014 14:37, Stefan Hajnoczi ha scritto: > >>>Why is this design cleaner? Because NBD code doesn't have to worry > >>>about fd handlers. It uses straightforward coroutine send/recv for > >>>socket I/O inside nbd_read_req() and nbd_write_resp(). It's easy to see > >>>that only one coroutine receives from the socket and that only one > >>>coroutine writes to the socket. > >> > >>I don't understand how this would work without managing fd handlers. > > > >fd handlers still need to be managed, but not by NBD code. They must be > >managed by coroutine recv/send utility functions. In other words, fd > >handlers are used locally, not globally. > > > >def co_recv(fd, buf): > > while True: > > nbytes = recv(fd, buf, len(buf)) > > if nbytes == -1: > > if errno == EINTR: > > continue > > if errno == EAGAIN or errno == EWOULDBLOCK: > > aio_set_fd_read_handler(fd, co_recv_cb) > > qemu_coroutine_yield() > > aio_set_fd_read_handler(fd, NULL) > > continue > > return nbytes > > > >The send function is similar. > > I see what you mean now---this however is not how qemu_co_recv and > qemu_co_send work. NBD uses them, but they require the caller to set up the > handlers manually. Yep, and I think that's not a clean way to program since it forgoes the advantage of coroutines: functions that look just like their blocking equivalents. I think we should hide the fd handler management so coroutine socket users don't need to worry about it. Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() 2014-06-04 12:37 ` [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() Stefan Hajnoczi 2014-06-04 18:02 ` Paolo Bonzini @ 2014-06-05 18:18 ` Max Reitz 2014-06-06 7:44 ` Paolo Bonzini 1 sibling, 1 reply; 20+ messages in thread From: Max Reitz @ 2014-06-05 18:18 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On 04.06.2014 14:37, Stefan Hajnoczi wrote: > On Sat, May 31, 2014 at 08:43:10PM +0200, Max Reitz wrote: >> Instead of using the main loop function qemu_set_fd_handler2(), use the >> AIO function in the context of the exported BDS. > Managing fd handlers shouldn't be necessary at the NBD code level. The > current NBD code hasn't been fully converted to coroutines. > > This email explains how the NBD code can be fully converted to > coroutines. It should simplify the code and reduce the chance of bugs. > Whether you want to actually do the conversion is up to you, since it's > somewhat orthogonal to the purpose of this patch series. > > The point of coroutines is that blocking operations like send/recv on a > socket should look like regular blocking calls. Let coroutines handle > the event loop housekeeping (registering fd handlers, callbacks). Only > use aio explicitly when concurrency is needed. > > Here is how I would structure NBD using coroutines: > > 1 coroutine per connection to receive NBD commands and submit I/O > requests: > > def nbd_server_receive_co(conn): > while True: > req = nbd_read_req(conn) > if req is None: > break > if req.type == NBD_READ: > bdrv_aio_readv(bs, ...) > elif req.type == NBD_WRITE: > ... > > Notice that bdrv_aio_*() is used since we want concurrent I/O requests. > > 1 coroutine per connection to send NBD replies: > > def nbd_server_send_co(conn): > while True: > while conn.send_queue: > resp = conn.send_queue.pop() > nbd_write_resp(conn, resp) > qemu_coroutine_yield() > > And finally the bdrv_aio_*() callback to put responses on to the send > queue: > > def nbd_server_aio_cb(conn): > resp = NBDResponse(...) > conn.send_queue.push(resp) > conn.send_co.enter() > > Why is this design cleaner? Because NBD code doesn't have to worry > about fd handlers. It uses straightforward coroutine send/recv for > socket I/O inside nbd_read_req() and nbd_write_resp(). It's easy to see > that only one coroutine receives from the socket and that only one > coroutine writes to the socket. Yes, this sounds better. I'll take a look into it and see how far I can get. Max ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() 2014-06-05 18:18 ` Max Reitz @ 2014-06-06 7:44 ` Paolo Bonzini 2014-06-07 19:27 ` Max Reitz 0 siblings, 1 reply; 20+ messages in thread From: Paolo Bonzini @ 2014-06-06 7:44 UTC (permalink / raw) To: Max Reitz, Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi Il 05/06/2014 20:18, Max Reitz ha scritto: >> >> Why is this design cleaner? Because NBD code doesn't have to worry >> about fd handlers. It uses straightforward coroutine send/recv for >> socket I/O inside nbd_read_req() and nbd_write_resp(). It's easy to see >> that only one coroutine receives from the socket and that only one >> coroutine writes to the socket. > > Yes, this sounds better. I'll take a look into it and see how far I can > get. But it doesn't solve any problem, and requires rethinking the aio_set_fd_handler API. I suggest you just refactor all calls to qemu_set_fd_handler2 into a function like qemu_set_fd_handler2(fd, NULL, nbd_can_read(client) ? nbd_read : NULL, client->send_coroutine ? nbd_restart_write : NULL, opaque); and call this function every time the result of nbd_can_read() changes. Then you can switch to aio_set_fd_handler easily. Paolo ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() 2014-06-06 7:44 ` Paolo Bonzini @ 2014-06-07 19:27 ` Max Reitz 2014-06-09 13:35 ` Stefan Hajnoczi 0 siblings, 1 reply; 20+ messages in thread From: Max Reitz @ 2014-06-07 19:27 UTC (permalink / raw) To: Paolo Bonzini, Stefan Hajnoczi Cc: Kevin Wolf, Fam Zheng, qemu-devel, Stefan Hajnoczi On 06.06.2014 09:44, Paolo Bonzini wrote: > Il 05/06/2014 20:18, Max Reitz ha scritto: >>> >>> Why is this design cleaner? Because NBD code doesn't have to worry >>> about fd handlers. It uses straightforward coroutine send/recv for >>> socket I/O inside nbd_read_req() and nbd_write_resp(). It's easy to >>> see >>> that only one coroutine receives from the socket and that only one >>> coroutine writes to the socket. >> >> Yes, this sounds better. I'll take a look into it and see how far I can >> get. > > But it doesn't solve any problem, and requires rethinking the > aio_set_fd_handler API. It might clean things up. :-) On the other hand, you are right, it's probably too complicated right now. > I suggest you just refactor all calls to qemu_set_fd_handler2 into a > function like > > qemu_set_fd_handler2(fd, NULL, > nbd_can_read(client) ? nbd_read : NULL, > client->send_coroutine ? > nbd_restart_write : NULL, > opaque); > > and call this function every time the result of nbd_can_read() > changes. Then you can switch to aio_set_fd_handler easily. Seems good, I'll do that. Max ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() 2014-06-07 19:27 ` Max Reitz @ 2014-06-09 13:35 ` Stefan Hajnoczi 0 siblings, 0 replies; 20+ messages in thread From: Stefan Hajnoczi @ 2014-06-09 13:35 UTC (permalink / raw) To: Max Reitz Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi [-- Attachment #1: Type: text/plain, Size: 1392 bytes --] On Sat, Jun 07, 2014 at 09:27:26PM +0200, Max Reitz wrote: > On 06.06.2014 09:44, Paolo Bonzini wrote: > >Il 05/06/2014 20:18, Max Reitz ha scritto: > >>> > >>>Why is this design cleaner? Because NBD code doesn't have to worry > >>>about fd handlers. It uses straightforward coroutine send/recv for > >>>socket I/O inside nbd_read_req() and nbd_write_resp(). It's easy to > >>>see > >>>that only one coroutine receives from the socket and that only one > >>>coroutine writes to the socket. > >> > >>Yes, this sounds better. I'll take a look into it and see how far I can > >>get. > > > >But it doesn't solve any problem, and requires rethinking the > >aio_set_fd_handler API. > > It might clean things up. :-) > > On the other hand, you are right, it's probably too complicated right now. > > >I suggest you just refactor all calls to qemu_set_fd_handler2 into a > >function like > > > > qemu_set_fd_handler2(fd, NULL, > > nbd_can_read(client) ? nbd_read : NULL, > > client->send_coroutine ? > > nbd_restart_write : NULL, > > opaque); > > > >and call this function every time the result of nbd_can_read() changes. > >Then you can switch to aio_set_fd_handler easily. > > Seems good, I'll do that. Sounds fine to me. Restructuring NBD can be done later, if desirable. Stefan [-- Attachment #2: Type: application/pgp-signature, Size: 473 bytes --] ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <1401561792-13410-5-git-send-email-mreitz@redhat.com>]
* Re: [Qemu-devel] [RFC 4/5] block: Add AIO followers [not found] ` <1401561792-13410-5-git-send-email-mreitz@redhat.com> @ 2014-06-04 12:41 ` Stefan Hajnoczi 2014-06-05 17:31 ` Max Reitz 0 siblings, 1 reply; 20+ messages in thread From: Stefan Hajnoczi @ 2014-06-04 12:41 UTC (permalink / raw) To: Max Reitz Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On Sat, May 31, 2014 at 08:43:11PM +0200, Max Reitz wrote: > If a long-running operation on a BDS wants to always remain in the same > AIO context, it somehow needs to keep track of the BDS changing its > context. This adds a function for registering callbacks on a BDS which > are called whenever the BDS is attached or detached from an AIO context. > > Signed-off-by: Max Reitz <mreitz@redhat.com> > --- > block.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++ > include/block/block_int.h | 40 ++++++++++++++++++++++++++++++++++ > 2 files changed, 95 insertions(+) Great, we definitely need this mechanism. Block jobs can also use it later. Usually QEMU calls this a "notifier" rather than a "follower". For example, see include/qemu/notifier.h and bdrv_add_close_notifier(). Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 4/5] block: Add AIO followers 2014-06-04 12:41 ` [Qemu-devel] [RFC 4/5] block: Add AIO followers Stefan Hajnoczi @ 2014-06-05 17:31 ` Max Reitz 0 siblings, 0 replies; 20+ messages in thread From: Max Reitz @ 2014-06-05 17:31 UTC (permalink / raw) To: Stefan Hajnoczi Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On 04.06.2014 14:41, Stefan Hajnoczi wrote: > On Sat, May 31, 2014 at 08:43:11PM +0200, Max Reitz wrote: >> If a long-running operation on a BDS wants to always remain in the same >> AIO context, it somehow needs to keep track of the BDS changing its >> context. This adds a function for registering callbacks on a BDS which >> are called whenever the BDS is attached or detached from an AIO context. >> >> Signed-off-by: Max Reitz <mreitz@redhat.com> >> --- >> block.c | 55 +++++++++++++++++++++++++++++++++++++++++++++++ >> include/block/block_int.h | 40 ++++++++++++++++++++++++++++++++++ >> 2 files changed, 95 insertions(+) > Great, we definitely need this mechanism. Block jobs can also use it > later. > > Usually QEMU calls this a "notifier" rather than a "follower". For > example, see include/qemu/notifier.h and bdrv_add_close_notifier(). Ah, I see, thank you. Max > Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
[parent not found: <538A3A8F.3060508@redhat.com>]
* Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane [not found] ` <538A3A8F.3060508@redhat.com> @ 2014-06-04 12:47 ` Stefan Hajnoczi 0 siblings, 0 replies; 20+ messages in thread From: Stefan Hajnoczi @ 2014-06-04 12:47 UTC (permalink / raw) To: Max Reitz Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On Sat, May 31, 2014 at 10:24:47PM +0200, Max Reitz wrote: > On 31.05.2014 20:43, Max Reitz wrote: > >[snip] > > > >However, if bs_aio_detach() is called from a different thread than the > >old AioContext is running in, we may still have coroutines running for > >which we should wait before returning from bs_aio_detach(). > > After re-reading Stefan's RFC and the AIO locking code, would it suffice to > call aio_context_acquire() and aio_context_release() on the old AioContext > at the end of bs_aio_detach() to ensure all coroutines are settled? If not, > I guess I have to wait until the coroutine variables are set to NULL (which > indicates they have completely finished execution; however, this is not > actually necessary and might lead to an infinite loop if the block driver > keeps yielding due to some condition related to the AioContext switch), or I > really have to switch the existing coroutines to the new AioContext while > they are running. Doing this from the outside will probably be even messier > than it would already be from the inside, so I sure hope to be able to avoid > this… I think you don't need to worry about this at all. bdrv_set_aio_context() is always invoked with the QEMU global mutex held. Regarding switching a coroutine from one AioContext to another (possibly in another thread), I've implemented it but it's a little involved (complex and slow). Let's avoid it, if possible. I also lost the patch during the great "make distclean twice deletes .git/" fiasco of 2014 so would need to reimplement it :). Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane [not found] <1401561792-13410-1-git-send-email-mreitz@redhat.com> ` (5 preceding siblings ...) [not found] ` <538A3A8F.3060508@redhat.com> @ 2014-06-04 12:50 ` Stefan Hajnoczi 6 siblings, 0 replies; 20+ messages in thread From: Stefan Hajnoczi @ 2014-06-04 12:50 UTC (permalink / raw) To: Max Reitz Cc: Kevin Wolf, Paolo Bonzini, Fam Zheng, qemu-devel, Stefan Hajnoczi On Sat, May 31, 2014 at 08:43:07PM +0200, Max Reitz wrote: > For the NBD server to work with dataplane, it needs to correctly access > the exported BDS. It makes the most sense to run both in the same > AioContext, therefore this series implements methods for tracking a > BDS's AioContext and makes NBD make use of this for keeping the clients > connected to that BDS in the same AioContext. > > The reason this is an RFC and not a PATCH is my inexperience with AIO, > coroutines and the like. Also, I'm not sure about what to do about the > coroutines. The NBD server has up to two coroutines per client: One for > receiving and one for sending. Theoretically, both have to be > "transferred" to the new AioContext if it is changed; however, as far as > I see it, coroutines are not really bound to an AioContext, they are > simply run in the AioContext entering them. Therefore, I think a > transfer is unnecessary. All coroutines are entered from nbd_read() and > nbd_restart_write(), both of which are AIO routines registered via > aio_set_fd_handler2(). > > As bs_aio_detach() unregisters all of these routines, the coroutines can > no longer be entered, but only after bs_aio_attach() is called again. > Then, when they are called, they will enter the coroutines in the new > AioContext. Therefore, I think an explicit transfer unnecessary. > > However, if bs_aio_detach() is called from a different thread than the > old AioContext is running in, we may still have coroutines running for > which we should wait before returning from bs_aio_detach(). > > But because of my inexperience with coroutines, I'm not sure. I now have > these patches nearly unchanged here for about a week and I'm looking for > ways of testing them, but so far I could only test whether the old use > cases work, but not whether they will work for what they are intended to > do: With BDS changing their AioContext. > > So, because I'm not sure what else to do and because I don't know how to > test multiple AIO threads (how do I move a BDS into another iothread?) > I'm just sending this out as an RFC. > > > Max Reitz (5): > nbd: Correct name comparison for export_set_name() > aio: Add io_read_poll() callback > nbd: Use aio_set_fd_handler2() > block: Add AIO followers > nbd: Follow the BDS' AIO context > > aio-posix.c | 26 ++++++++++++++++----- > block.c | 55 +++++++++++++++++++++++++++++++++++++++++++ > include/block/aio.h | 12 ++++++++++ > include/block/block_int.h | 40 ++++++++++++++++++++++++++++++++ > include/qemu/main-loop.h | 1 - > nbd.c | 59 ++++++++++++++++++++++++++++++++++++++--------- > 6 files changed, 175 insertions(+), 18 deletions(-) Thanks for the RFC. It looks like the concept will work correctly. I left comments, a lot of them are just ideas and you don't have to implement them if you don't want to. Stefan ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2014-06-09 13:35 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <1401561792-13410-1-git-send-email-mreitz@redhat.com> 2014-06-03 14:38 ` [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane Stefan Hajnoczi 2014-06-05 17:41 ` Max Reitz [not found] ` <1401561792-13410-2-git-send-email-mreitz@redhat.com> 2014-06-04 11:52 ` [Qemu-devel] [RFC 1/5] nbd: Correct name comparison for export_set_name() Stefan Hajnoczi 2014-06-05 17:28 ` Max Reitz [not found] ` <1401561792-13410-3-git-send-email-mreitz@redhat.com> 2014-06-03 17:55 ` [Qemu-devel] [RFC 2/5] aio: Add io_read_poll() callback Paolo Bonzini 2014-06-05 17:29 ` Max Reitz 2014-06-04 11:59 ` Stefan Hajnoczi [not found] ` <1401561792-13410-4-git-send-email-mreitz@redhat.com> 2014-06-04 12:37 ` [Qemu-devel] [RFC 3/5] nbd: Use aio_set_fd_handler2() Stefan Hajnoczi 2014-06-04 18:02 ` Paolo Bonzini 2014-06-05 8:12 ` Stefan Hajnoczi 2014-06-05 9:27 ` Paolo Bonzini 2014-06-05 13:32 ` Stefan Hajnoczi 2014-06-05 18:18 ` Max Reitz 2014-06-06 7:44 ` Paolo Bonzini 2014-06-07 19:27 ` Max Reitz 2014-06-09 13:35 ` Stefan Hajnoczi [not found] ` <1401561792-13410-5-git-send-email-mreitz@redhat.com> 2014-06-04 12:41 ` [Qemu-devel] [RFC 4/5] block: Add AIO followers Stefan Hajnoczi 2014-06-05 17:31 ` Max Reitz [not found] ` <538A3A8F.3060508@redhat.com> 2014-06-04 12:47 ` [Qemu-devel] [RFC 0/5] nbd: Adapt for dataplane Stefan Hajnoczi 2014-06-04 12:50 ` Stefan Hajnoczi
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).