* [Qemu-devel] restart a coroutine? @ 2012-03-04 12:41 Michael Tokarev 2012-03-04 16:08 ` Avi Kivity 0 siblings, 1 reply; 5+ messages in thread From: Michael Tokarev @ 2012-03-04 12:41 UTC (permalink / raw) To: qemu-devel Since all block (bdrv) layer is now implemented using coroutines, I thought I'd give it a try. But immediately hit a question to which I don't know a good answer. Suppose we've some networking block device (like NBD) and want to be able to support reconnection - this is actually very useful feature, in order to be able to reboot/restart the NBD server without a need to restart all the clients. For this to work, we should have an ability to reconnect to the server and re-issue all requests which were waiting for reply. Traditionally, in asyncronous event-loop-based scheme, this is implemented as a queue of requests linked to the block driver state structure, and in case of reconnection we just walk over all requests and requeue these. But if the block driver is implemented as a set of coroutines (like nbd currently does), I see no sane/safe way to restart the requests. Setjmp/longjmp can be uses with extra care there, but with these it is extremly fragile. Any hints on how to do that? Thanks, /mjt ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] restart a coroutine? 2012-03-04 12:41 [Qemu-devel] restart a coroutine? Michael Tokarev @ 2012-03-04 16:08 ` Avi Kivity 2012-03-04 20:01 ` Michael Tokarev 0 siblings, 1 reply; 5+ messages in thread From: Avi Kivity @ 2012-03-04 16:08 UTC (permalink / raw) To: Michael Tokarev; +Cc: qemu-devel On 03/04/2012 02:41 PM, Michael Tokarev wrote: > Since all block (bdrv) layer is now implemented using > coroutines, I thought I'd give it a try. But immediately > hit a question to which I don't know a good answer. > > Suppose we've some networking block device (like NBD) and > want to be able to support reconnection - this is actually > very useful feature, in order to be able to reboot/restart > the NBD server without a need to restart all the clients. > > For this to work, we should have an ability to reconnect > to the server and re-issue all requests which were waiting > for reply. > > Traditionally, in asyncronous event-loop-based scheme, this > is implemented as a queue of requests linked to the block > driver state structure, and in case of reconnection we just > walk over all requests and requeue these. > > But if the block driver is implemented as a set of coroutines > (like nbd currently does), I see no sane/safe way to restart > the requests. Setjmp/longjmp can be uses with extra care > there, but with these it is extremly fragile. > > Any hints on how to do that? > >From the block layer's point of view, the requests should still be pending. For example, if a read request sees a dropped connection, it adds itself to a list of coroutines waiting for a reconnect, wakes up a connection manager coroutine (or thread), and sleeps. The connection manager periodically tries to connect, and if it succeeds, it wakes up the coroutines waiting for a reconnection. It's important to implement request cancellation correctly here, or we can end up with a device that cannot be unplugged or a guest that cannot be shutdown. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] restart a coroutine? 2012-03-04 16:08 ` Avi Kivity @ 2012-03-04 20:01 ` Michael Tokarev 2012-03-05 8:07 ` Paolo Bonzini 2012-03-05 9:20 ` Avi Kivity 0 siblings, 2 replies; 5+ messages in thread From: Michael Tokarev @ 2012-03-04 20:01 UTC (permalink / raw) To: Avi Kivity; +Cc: qemu-devel On 04.03.2012 20:08, Avi Kivity wrote: > On 03/04/2012 02:41 PM, Michael Tokarev wrote: >> Since all block (bdrv) layer is now implemented using >> coroutines, I thought I'd give it a try. But immediately >> hit a question to which I don't know a good answer. >> >> Suppose we've some networking block device (like NBD) and >> want to be able to support reconnection - this is actually >> very useful feature, in order to be able to reboot/restart >> the NBD server without a need to restart all the clients. >> >> For this to work, we should have an ability to reconnect >> to the server and re-issue all requests which were waiting >> for reply. >> >> Traditionally, in asyncronous event-loop-based scheme, this >> is implemented as a queue of requests linked to the block >> driver state structure, and in case of reconnection we just >> walk over all requests and requeue these. >> >> But if the block driver is implemented as a set of coroutines >> (like nbd currently does), I see no sane/safe way to restart >> the requests. Setjmp/longjmp can be uses with extra care >> there, but with these it is extremly fragile. >> >> Any hints on how to do that? > > From the block layer's point of view, the requests should still be > pending. For example, if a read request sees a dropped connection, it > adds itself to a list of coroutines waiting for a reconnect, wakes up a > connection manager coroutine (or thread), and sleeps. The connection > manager periodically tries to connect, and if it succeeds, it wakes up > the coroutines waiting for a reconnection. This is all fine, except of one thing: restarting (resending) of the requests which has been sent to the remote and for which we were waiting for reply already. For these requests, they should be resent using new socket, when the connection manager wakes corresponding coroutine up. That's where my question comes. The request handling coroutine looks like regular function (pseudocode): offset = 0; while(offset < sendsize) { ret = send(sock, senddata+offset, sendsize-offset); if (EAGAIN) { coroutine_yeld(); continue; } if (ret < 0) return -EIO; offset += ret; } offset = 0; while(offset < recvsize) { ret = recv(sock, recvdata+offset, recvsize-offset); if (EAGAIN) { coroutine_yeld(); continue; } if (ret < 0) return -EIO; offset += ret; } return status(recvdata); This function will need to have a ton of "goto begin" in all places where it calls yeld() -- in order to actually start _sending_ the request to the new sock after a reconnect. It is all good when it is in one function, but if the code is split into several functions, it becomes very clumsy, to a point where regular asyncronous state machine bay be more appropriate. It also requires to open-code all helper functions (like qiov handlers). So I was asking if it can be done using a setjmp/longjmp maybe, to simplify it all somehow. > It's important to implement request cancellation correctly here, or we > can end up with a device that cannot be unplugged or a guest that cannot > be shutdown. Is there some mechanism to cancel bdrv_co_{read,write}v()? I see a way to cancel bdrv_aio_{read,write}v(), but even these are now implemented as coroutines... Thanks! /mjt ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] restart a coroutine? 2012-03-04 20:01 ` Michael Tokarev @ 2012-03-05 8:07 ` Paolo Bonzini 2012-03-05 9:20 ` Avi Kivity 1 sibling, 0 replies; 5+ messages in thread From: Paolo Bonzini @ 2012-03-05 8:07 UTC (permalink / raw) To: qemu-devel Il 04/03/2012 21:01, Michael Tokarev ha scritto: > This is all fine, except of one thing: restarting (resending) of > the requests which has been sent to the remote and for which we > were waiting for reply already. > > For these requests, they should be resent using new socket, when > the connection manager wakes corresponding coroutine up. That's > where my question comes. > > The request handling coroutine looks like regular function > (pseudocode): > > offset = 0; > while(offset < sendsize) { > ret = send(sock, senddata+offset, sendsize-offset); > if (EAGAIN) { coroutine_yeld(); continue; } > if (ret < 0) return -EIO; > offset += ret; > } > offset = 0; > while(offset < recvsize) { > ret = recv(sock, recvdata+offset, recvsize-offset); > if (EAGAIN) { coroutine_yeld(); continue; } > if (ret < 0) return -EIO; > offset += ret; > } > return status(recvdata); > > This function will need to have a ton of "goto begin" in all places > where it calls yeld() -- in order to actually start _sending_ the > request to the new sock after a reconnect. That's why Avi mentioned (I think) having a list of requests. As soon as you close the socket, or at least shut it down, all recv and send will fail and requests will fail. However, if you save the list of requests before closing, you can restart them on the new socket. > Is there some mechanism to cancel bdrv_co_{read,write}v()? I see a > way to cancel bdrv_aio_{read,write}v(), but even these are now > implemented as coroutines... Not yet. But I don't think it is necessary. Paolo ^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] restart a coroutine? 2012-03-04 20:01 ` Michael Tokarev 2012-03-05 8:07 ` Paolo Bonzini @ 2012-03-05 9:20 ` Avi Kivity 1 sibling, 0 replies; 5+ messages in thread From: Avi Kivity @ 2012-03-05 9:20 UTC (permalink / raw) To: Michael Tokarev; +Cc: qemu-devel On 03/04/2012 10:01 PM, Michael Tokarev wrote: > On 04.03.2012 20:08, Avi Kivity wrote: > > On 03/04/2012 02:41 PM, Michael Tokarev wrote: > >> Since all block (bdrv) layer is now implemented using > >> coroutines, I thought I'd give it a try. But immediately > >> hit a question to which I don't know a good answer. > >> > >> Suppose we've some networking block device (like NBD) and > >> want to be able to support reconnection - this is actually > >> very useful feature, in order to be able to reboot/restart > >> the NBD server without a need to restart all the clients. > >> > >> For this to work, we should have an ability to reconnect > >> to the server and re-issue all requests which were waiting > >> for reply. > >> > >> Traditionally, in asyncronous event-loop-based scheme, this > >> is implemented as a queue of requests linked to the block > >> driver state structure, and in case of reconnection we just > >> walk over all requests and requeue these. > >> > >> But if the block driver is implemented as a set of coroutines > >> (like nbd currently does), I see no sane/safe way to restart > >> the requests. Setjmp/longjmp can be uses with extra care > >> there, but with these it is extremly fragile. > >> > >> Any hints on how to do that? > > > > From the block layer's point of view, the requests should still be > > pending. For example, if a read request sees a dropped connection, it > > adds itself to a list of coroutines waiting for a reconnect, wakes up a > > connection manager coroutine (or thread), and sleeps. The connection > > manager periodically tries to connect, and if it succeeds, it wakes up > > the coroutines waiting for a reconnection. > > This is all fine, except of one thing: restarting (resending) of > the requests which has been sent to the remote and for which we > were waiting for reply already. > > For these requests, they should be resent using new socket, when > the connection manager wakes corresponding coroutine up. That's > where my question comes. > > The request handling coroutine looks like regular function > (pseudocode): > > offset = 0; > while(offset < sendsize) { > ret = send(sock, senddata+offset, sendsize-offset); > if (EAGAIN) { coroutine_yeld(); continue; } > if (ret < 0) return -EIO; > offset += ret; > } > offset = 0; > while(offset < recvsize) { > ret = recv(sock, recvdata+offset, recvsize-offset); > if (EAGAIN) { coroutine_yeld(); continue; } > if (ret < 0) return -EIO; > offset += ret; > } > return status(recvdata); > > This function will need to have a ton of "goto begin" in all places > where it calls yeld() No, just wrap this function with another one that checks for the error code, and calls it again, after obtaining a new fd. You can use shutdown(2) from the connection manager to force the request coroutines to fail. > -- in order to actually start _sending_ the > request to the new sock after a reconnect. It is all good when it > is in one function, but if the code is split into several functions, > it becomes very clumsy, to a point where regular asyncronous state > machine bay be more appropriate. The whole point of coroutines was to avoid state machines. Anyway I don't see why a retry needs recoding, just wrap the thing-to-be-retried. > It also requires to open-code all > helper functions (like qiov handlers). > > So I was asking if it can be done using a setjmp/longjmp maybe, to > simplify it all somehow. > > > It's important to implement request cancellation correctly here, or we > > can end up with a device that cannot be unplugged or a guest that cannot > > be shutdown. > > Is there some mechanism to cancel bdrv_co_{read,write}v()? I see a > way to cancel bdrv_aio_{read,write}v(), but even these are now > implemented as coroutines... > bdrv_aio_cancel(). Of course the mechanism behind it has to be implemented for each block format driver. -- error compiling committee.c: too many arguments to function ^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2012-03-05 9:21 UTC | newest] Thread overview: 5+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2012-03-04 12:41 [Qemu-devel] restart a coroutine? Michael Tokarev 2012-03-04 16:08 ` Avi Kivity 2012-03-04 20:01 ` Michael Tokarev 2012-03-05 8:07 ` Paolo Bonzini 2012-03-05 9:20 ` Avi Kivity
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).