[Qemu-devel] restart a coroutine?

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

* [Qemu-devel] restart a coroutine?
@ 2012-03-04 12:41 Michael Tokarev
  2012-03-04 16:08 ` Avi Kivity
  0 siblings, 1 reply; 5+ messages in thread
From: Michael Tokarev @ 2012-03-04 12:41 UTC (permalink / raw)
  To: qemu-devel

Since all block (bdrv) layer is now implemented using
coroutines, I thought I'd give it a try.  But immediately
hit a question to which I don't know a good answer.

Suppose we've some networking block device (like NBD) and
want to be able to support reconnection - this is actually
very useful feature, in order to be able to reboot/restart
the NBD server without a need to restart all the clients.

For this to work, we should have an ability to reconnect
to the server and re-issue all requests which were waiting
for reply.

Traditionally, in asyncronous event-loop-based scheme, this
is implemented as a queue of requests linked to the block
driver state structure, and in case of reconnection we just
walk over all requests and requeue these.

But if the block driver is implemented as a set of coroutines
(like nbd currently does), I see no sane/safe way to restart
the requests.  Setjmp/longjmp can be uses with extra care
there, but with these it is extremly fragile.

Any hints on how to do that?

Thanks,

/mjt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] restart a coroutine?
  2012-03-04 12:41 [Qemu-devel] restart a coroutine? Michael Tokarev
@ 2012-03-04 16:08 ` Avi Kivity
  2012-03-04 20:01   ` Michael Tokarev
  0 siblings, 1 reply; 5+ messages in thread
From: Avi Kivity @ 2012-03-04 16:08 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: qemu-devel

On 03/04/2012 02:41 PM, Michael Tokarev wrote:
> Since all block (bdrv) layer is now implemented using
> coroutines, I thought I'd give it a try.  But immediately
> hit a question to which I don't know a good answer.
>
> Suppose we've some networking block device (like NBD) and
> want to be able to support reconnection - this is actually
> very useful feature, in order to be able to reboot/restart
> the NBD server without a need to restart all the clients.
>
> For this to work, we should have an ability to reconnect
> to the server and re-issue all requests which were waiting
> for reply.
>
> Traditionally, in asyncronous event-loop-based scheme, this
> is implemented as a queue of requests linked to the block
> driver state structure, and in case of reconnection we just
> walk over all requests and requeue these.
>
> But if the block driver is implemented as a set of coroutines
> (like nbd currently does), I see no sane/safe way to restart
> the requests.  Setjmp/longjmp can be uses with extra care
> there, but with these it is extremly fragile.
>
> Any hints on how to do that?
>

>From the block layer's point of view, the requests should still be
pending.  For example, if a read request sees a dropped connection, it
adds itself to a list of coroutines waiting for a reconnect, wakes up a
connection manager coroutine (or thread), and sleeps.  The connection
manager periodically tries to connect, and if it succeeds, it wakes up
the coroutines waiting for a reconnection.

It's important to implement request cancellation correctly here, or we
can end up with a device that cannot be unplugged or a guest that cannot
be shutdown.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] restart a coroutine?
  2012-03-04 16:08 ` Avi Kivity
@ 2012-03-04 20:01   ` Michael Tokarev
  2012-03-05  8:07     ` Paolo Bonzini
  2012-03-05  9:20     ` Avi Kivity
  0 siblings, 2 replies; 5+ messages in thread
From: Michael Tokarev @ 2012-03-04 20:01 UTC (permalink / raw)
  To: Avi Kivity; +Cc: qemu-devel

On 04.03.2012 20:08, Avi Kivity wrote:
> On 03/04/2012 02:41 PM, Michael Tokarev wrote:
>> Since all block (bdrv) layer is now implemented using
>> coroutines, I thought I'd give it a try.  But immediately
>> hit a question to which I don't know a good answer.
>>
>> Suppose we've some networking block device (like NBD) and
>> want to be able to support reconnection - this is actually
>> very useful feature, in order to be able to reboot/restart
>> the NBD server without a need to restart all the clients.
>>
>> For this to work, we should have an ability to reconnect
>> to the server and re-issue all requests which were waiting
>> for reply.
>>
>> Traditionally, in asyncronous event-loop-based scheme, this
>> is implemented as a queue of requests linked to the block
>> driver state structure, and in case of reconnection we just
>> walk over all requests and requeue these.
>>
>> But if the block driver is implemented as a set of coroutines
>> (like nbd currently does), I see no sane/safe way to restart
>> the requests.  Setjmp/longjmp can be uses with extra care
>> there, but with these it is extremly fragile.
>>
>> Any hints on how to do that?
> 
> From the block layer's point of view, the requests should still be
> pending.  For example, if a read request sees a dropped connection, it
> adds itself to a list of coroutines waiting for a reconnect, wakes up a
> connection manager coroutine (or thread), and sleeps.  The connection
> manager periodically tries to connect, and if it succeeds, it wakes up
> the coroutines waiting for a reconnection.

This is all fine, except of one thing: restarting (resending) of
the requests which has been sent to the remote and for which we
were waiting for reply already.

For these requests, they should be resent using new socket, when
the connection manager wakes corresponding coroutine up.  That's
where my question comes.

The request handling coroutine looks like regular function
(pseudocode):

 offset = 0;
 while(offset < sendsize) {
   ret = send(sock, senddata+offset, sendsize-offset);
   if (EAGAIN) { coroutine_yeld(); continue; }
   if (ret < 0) return -EIO;
   offset += ret;
 }
 offset = 0;
 while(offset < recvsize) {
   ret = recv(sock, recvdata+offset, recvsize-offset);
   if (EAGAIN) { coroutine_yeld(); continue; }
   if (ret < 0) return -EIO;
   offset += ret;
 }
 return status(recvdata);

This function will need to have a ton of "goto begin" in all places
where it calls yeld() -- in order to actually start _sending_ the
request to the new sock after a reconnect.  It is all good when it
is in one function, but if the code is split into several functions,
it becomes very clumsy, to a point where regular asyncronous state
machine bay be more appropriate.  It also requires to open-code all
helper functions (like qiov handlers).

So I was asking if it can be done using a setjmp/longjmp maybe, to
simplify it all somehow.

> It's important to implement request cancellation correctly here, or we
> can end up with a device that cannot be unplugged or a guest that cannot
> be shutdown.

Is there some mechanism to cancel bdrv_co_{read,write}v()?  I see a
way to cancel bdrv_aio_{read,write}v(), but even these are now
implemented as coroutines...

Thanks!

/mjt

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] restart a coroutine?
  2012-03-04 20:01   ` Michael Tokarev
@ 2012-03-05  8:07     ` Paolo Bonzini
  2012-03-05  9:20     ` Avi Kivity
  1 sibling, 0 replies; 5+ messages in thread
From: Paolo Bonzini @ 2012-03-05  8:07 UTC (permalink / raw)
  To: qemu-devel

Il 04/03/2012 21:01, Michael Tokarev ha scritto:
> This is all fine, except of one thing: restarting (resending) of
> the requests which has been sent to the remote and for which we
> were waiting for reply already.
> 
> For these requests, they should be resent using new socket, when
> the connection manager wakes corresponding coroutine up.  That's
> where my question comes.
> 
> The request handling coroutine looks like regular function
> (pseudocode):
> 
>  offset = 0;
>  while(offset < sendsize) {
>    ret = send(sock, senddata+offset, sendsize-offset);
>    if (EAGAIN) { coroutine_yeld(); continue; }
>    if (ret < 0) return -EIO;
>    offset += ret;
>  }
>  offset = 0;
>  while(offset < recvsize) {
>    ret = recv(sock, recvdata+offset, recvsize-offset);
>    if (EAGAIN) { coroutine_yeld(); continue; }
>    if (ret < 0) return -EIO;
>    offset += ret;
>  }
>  return status(recvdata);
> 
> This function will need to have a ton of "goto begin" in all places
> where it calls yeld() -- in order to actually start _sending_ the
> request to the new sock after a reconnect.

That's why Avi mentioned (I think) having a list of requests.  As soon
as you close the socket, or at least shut it down, all recv and send
will fail and requests will fail.  However, if you save the list of
requests before closing, you can restart them on the new socket.

> Is there some mechanism to cancel bdrv_co_{read,write}v()?  I see a
> way to cancel bdrv_aio_{read,write}v(), but even these are now
> implemented as coroutines...

Not yet.  But I don't think it is necessary.

Paolo

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Qemu-devel] restart a coroutine?
  2012-03-04 20:01   ` Michael Tokarev
  2012-03-05  8:07     ` Paolo Bonzini
@ 2012-03-05  9:20     ` Avi Kivity
  1 sibling, 0 replies; 5+ messages in thread
From: Avi Kivity @ 2012-03-05  9:20 UTC (permalink / raw)
  To: Michael Tokarev; +Cc: qemu-devel

On 03/04/2012 10:01 PM, Michael Tokarev wrote:
> On 04.03.2012 20:08, Avi Kivity wrote:
> > On 03/04/2012 02:41 PM, Michael Tokarev wrote:
> >> Since all block (bdrv) layer is now implemented using
> >> coroutines, I thought I'd give it a try.  But immediately
> >> hit a question to which I don't know a good answer.
> >>
> >> Suppose we've some networking block device (like NBD) and
> >> want to be able to support reconnection - this is actually
> >> very useful feature, in order to be able to reboot/restart
> >> the NBD server without a need to restart all the clients.
> >>
> >> For this to work, we should have an ability to reconnect
> >> to the server and re-issue all requests which were waiting
> >> for reply.
> >>
> >> Traditionally, in asyncronous event-loop-based scheme, this
> >> is implemented as a queue of requests linked to the block
> >> driver state structure, and in case of reconnection we just
> >> walk over all requests and requeue these.
> >>
> >> But if the block driver is implemented as a set of coroutines
> >> (like nbd currently does), I see no sane/safe way to restart
> >> the requests.  Setjmp/longjmp can be uses with extra care
> >> there, but with these it is extremly fragile.
> >>
> >> Any hints on how to do that?
> > 
> > From the block layer's point of view, the requests should still be
> > pending.  For example, if a read request sees a dropped connection, it
> > adds itself to a list of coroutines waiting for a reconnect, wakes up a
> > connection manager coroutine (or thread), and sleeps.  The connection
> > manager periodically tries to connect, and if it succeeds, it wakes up
> > the coroutines waiting for a reconnection.
>
> This is all fine, except of one thing: restarting (resending) of
> the requests which has been sent to the remote and for which we
> were waiting for reply already.
>
> For these requests, they should be resent using new socket, when
> the connection manager wakes corresponding coroutine up.  That's
> where my question comes.
>
> The request handling coroutine looks like regular function
> (pseudocode):
>
>  offset = 0;
>  while(offset < sendsize) {
>    ret = send(sock, senddata+offset, sendsize-offset);
>    if (EAGAIN) { coroutine_yeld(); continue; }
>    if (ret < 0) return -EIO;
>    offset += ret;
>  }
>  offset = 0;
>  while(offset < recvsize) {
>    ret = recv(sock, recvdata+offset, recvsize-offset);
>    if (EAGAIN) { coroutine_yeld(); continue; }
>    if (ret < 0) return -EIO;
>    offset += ret;
>  }
>  return status(recvdata);
>
> This function will need to have a ton of "goto begin" in all places
> where it calls yeld()

No, just wrap this function with another one that checks for the error
code, and calls it again, after obtaining a new fd.  You can use
shutdown(2) from the connection manager to force the request coroutines
to fail.

>  -- in order to actually start _sending_ the
> request to the new sock after a reconnect.  It is all good when it
> is in one function, but if the code is split into several functions,
> it becomes very clumsy, to a point where regular asyncronous state
> machine bay be more appropriate.  

The whole point of coroutines was to avoid state machines.  Anyway I
don't see why a retry needs recoding, just wrap the thing-to-be-retried.

> It also requires to open-code all
> helper functions (like qiov handlers).
>
> So I was asking if it can be done using a setjmp/longjmp maybe, to
> simplify it all somehow.
>
> > It's important to implement request cancellation correctly here, or we
> > can end up with a device that cannot be unplugged or a guest that cannot
> > be shutdown.
>
> Is there some mechanism to cancel bdrv_co_{read,write}v()?  I see a
> way to cancel bdrv_aio_{read,write}v(), but even these are now
> implemented as coroutines...
>

bdrv_aio_cancel().  Of course the mechanism behind it has to be
implemented for each block format driver.

-- 
error compiling committee.c: too many arguments to function

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2012-03-05  9:21 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-04 12:41 [Qemu-devel] restart a coroutine? Michael Tokarev
2012-03-04 16:08 ` Avi Kivity
2012-03-04 20:01   ` Michael Tokarev
2012-03-05  8:07     ` Paolo Bonzini
2012-03-05  9:20     ` Avi Kivity

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).