qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
* [Qemu-devel] Questions about nbd with QIOChannel
@ 2016-04-07 11:04 Changlong Xie
  2016-04-07 11:17 ` Paolo Bonzini
  0 siblings, 1 reply; 2+ messages in thread
From: Changlong Xie @ 2016-04-07 11:04 UTC (permalink / raw)
  To: qemu devel, Paolo Bonzini, Daniel P. Berrange
  Cc: Li Zhijian, "Xie, Changlong/谢 昌龙",
	zhanghailiang

Hi all

Recently during test COLO, i found sometimes the client goes to hung on 
Primary side. First i thought it maybe a COLO revelant issue, but after 
ton of tests i doubt that this maybe a NBD issue (athough i'm not sure). 
So i'd like to share what i found:

Since commit 1c778ef7, we convert to using QIOChannel APIs for actual 
socket I/O.

Let foucus on nbd_reply_ready() here:

Before commit 1c778ef7
nbd_reply_ready()
   nbd_receive_reply()
     nbd_wr_sync()
     {
      ...
      while (offset < size) {
          if (do_read) {
              len = qemu_recv(fd, buffer + offset, size - offset, 0);
          } else {
              ...
          }
          if (len < 0) {
              err = socket_error();
              if (err == EINTR || (offset > 0 && (err == EAGAIN || err 
== EWOULDBLOCK))) {
                  continue;
              }
              return -err;
          }
          ...
      }
      ....
     }

if len < 0 && error == EAGAIN. we have two choice
1) continue to recv until finished.
2) return -EAGAIN, nbd_receive_reply() will check this return value and 
will return *Successfully*.
			
After commit 1c778ef7:
nbd_reply_ready()
   read_sync()
     nbd_wr_syncv()
     {
      ...
      while (nlocal_iov > 0) {
          ...
          if (do_read) {
              len = qio_channel_readv(ioc, local_iov, nlocal_iov, 
&local_err);
          } else {
              ...
          }
          if (len == QIO_CHANNEL_ERR_BLOCK) {
              if (qemu_in_coroutine()) {
                  qemu_coroutine_yield();
              } else {
                  qio_channel_wait(ioc,
                                   do_read ? G_IO_IN : G_IO_OUT);
              }
              continue;
          }
          ...
      }
     }

For NBD,
qio_channel_readv()
   qio_channel_readv_full
     klass->io_readv()
      qio_channel_socket_readv()
      {
         for(..) {
             ret = recv(xxx);
             if (ret < 0) {
                 if (errno == EAGAIN) {
                     if (done) {
                         return done;
                     } else {
                         return QIO_CHANNEL_ERR_BLOCK;
                     }
                 }

             }
             ...
         }
      }

Here, if ret < 0 && error == EAGAIN && !done, we'll return 
QIO_CHANNEL_ERR_BLOCK. Then nbd_wr_syncv() will invoke 
qio_channel_wait() and the guest will *HUNG* until i kill
nbd server service.

It's easy to reproduce. My question: If the scenario i describe above is 
what we expected?

Thanks
     -Xie

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-04-07 11:17 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-07 11:04 [Qemu-devel] Questions about nbd with QIOChannel Changlong Xie
2016-04-07 11:17 ` Paolo Bonzini

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).