qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Questions about nbd with QIOChannel
@ 2016-04-07 11:04 Changlong Xie
  2016-04-07 11:17 ` Paolo Bonzini
  0 siblings, 1 reply; 2+ messages in thread
From: Changlong Xie @ 2016-04-07 11:04 UTC (permalink / raw)
  To: qemu devel, Paolo Bonzini, Daniel P. Berrange
  Cc: Li Zhijian, "Xie, Changlong/谢 昌龙",
	zhanghailiang

Hi all

Recently during test COLO, i found sometimes the client goes to hung on 
Primary side. First i thought it maybe a COLO revelant issue, but after 
ton of tests i doubt that this maybe a NBD issue (athough i'm not sure). 
So i'd like to share what i found:

Since commit 1c778ef7, we convert to using QIOChannel APIs for actual 
socket I/O.

Let foucus on nbd_reply_ready() here:

Before commit 1c778ef7
nbd_reply_ready()
   nbd_receive_reply()
     nbd_wr_sync()
     {
      ...
      while (offset < size) {
          if (do_read) {
              len = qemu_recv(fd, buffer + offset, size - offset, 0);
          } else {
              ...
          }
          if (len < 0) {
              err = socket_error();
              if (err == EINTR || (offset > 0 && (err == EAGAIN || err 
== EWOULDBLOCK))) {
                  continue;
              }
              return -err;
          }
          ...
      }
      ....
     }

if len < 0 && error == EAGAIN. we have two choice
1) continue to recv until finished.
2) return -EAGAIN, nbd_receive_reply() will check this return value and 
will return *Successfully*.
			
After commit 1c778ef7:
nbd_reply_ready()
   read_sync()
     nbd_wr_syncv()
     {
      ...
      while (nlocal_iov > 0) {
          ...
          if (do_read) {
              len = qio_channel_readv(ioc, local_iov, nlocal_iov, 
&local_err);
          } else {
              ...
          }
          if (len == QIO_CHANNEL_ERR_BLOCK) {
              if (qemu_in_coroutine()) {
                  qemu_coroutine_yield();
              } else {
                  qio_channel_wait(ioc,
                                   do_read ? G_IO_IN : G_IO_OUT);
              }
              continue;
          }
          ...
      }
     }

For NBD,
qio_channel_readv()
   qio_channel_readv_full
     klass->io_readv()
      qio_channel_socket_readv()
      {
         for(..) {
             ret = recv(xxx);
             if (ret < 0) {
                 if (errno == EAGAIN) {
                     if (done) {
                         return done;
                     } else {
                         return QIO_CHANNEL_ERR_BLOCK;
                     }
                 }

             }
             ...
         }
      }

Here, if ret < 0 && error == EAGAIN && !done, we'll return 
QIO_CHANNEL_ERR_BLOCK. Then nbd_wr_syncv() will invoke 
qio_channel_wait() and the guest will *HUNG* until i kill
nbd server service.

It's easy to reproduce. My question: If the scenario i describe above is 
what we expected?

Thanks
     -Xie

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: [Qemu-devel] Questions about nbd with QIOChannel
  2016-04-07 11:04 [Qemu-devel] Questions about nbd with QIOChannel Changlong Xie
@ 2016-04-07 11:17 ` Paolo Bonzini
  0 siblings, 0 replies; 2+ messages in thread
From: Paolo Bonzini @ 2016-04-07 11:17 UTC (permalink / raw)
  To: Changlong Xie, qemu devel, Daniel P. Berrange; +Cc: Li Zhijian, zhanghailiang



On 07/04/2016 13:04, Changlong Xie wrote:
> Hi all
> 
> Recently during test COLO, i found sometimes the client goes to hung on
> Primary side. First i thought it maybe a COLO revelant issue, but after
> ton of tests i doubt that this maybe a NBD issue (athough i'm not sure).
> So i'd like to share what i found:
> 
> Since commit 1c778ef7, we convert to using QIOChannel APIs for actual
> socket I/O.
> 
> nbd_reply_ready()
>   read_sync()
>     nbd_wr_syncv()
>     {
>      ...
>      while (nlocal_iov > 0) {
>          ...
>          if (do_read) {
>              len = qio_channel_readv(ioc, local_iov, nlocal_iov,
>                                      &local_err);
>          } else {
>              ...
>          }
>          if (len == QIO_CHANNEL_ERR_BLOCK) {
>              if (qemu_in_coroutine()) {
>                  qemu_coroutine_yield();
>              } else {
>                  qio_channel_wait(ioc,
>                                   do_read ? G_IO_IN : G_IO_OUT);
>              }

You are right; you've found a bug.

Paolo

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-04-07 11:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-07 11:04 [Qemu-devel] Questions about nbd with QIOChannel Changlong Xie
2016-04-07 11:17 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).