From: Changlong Xie <xiecl.fnst@cn.fujitsu.com>
To: qemu devel <qemu-devel@nongnu.org>,
Paolo Bonzini <pbonzini@redhat.com>,
"Daniel P. Berrange" <berrange@redhat.com>
Cc: "Li Zhijian" <lizhijian@cn.fujitsu.com>,
"\"Xie, Changlong/谢 昌龙\"" <xiecl.fnst@cn.fujitsu.com>,
zhanghailiang <zhang.zhanghailiang@huawei.com>
Subject: [Qemu-devel] Questions about nbd with QIOChannel
Date: Thu, 7 Apr 2016 19:04:04 +0800 [thread overview]
Message-ID: <57063EA4.6050402@cn.fujitsu.com> (raw)
Hi all
Recently during test COLO, i found sometimes the client goes to hung on
Primary side. First i thought it maybe a COLO revelant issue, but after
ton of tests i doubt that this maybe a NBD issue (athough i'm not sure).
So i'd like to share what i found:
Since commit 1c778ef7, we convert to using QIOChannel APIs for actual
socket I/O.
Let foucus on nbd_reply_ready() here:
Before commit 1c778ef7
nbd_reply_ready()
nbd_receive_reply()
nbd_wr_sync()
{
...
while (offset < size) {
if (do_read) {
len = qemu_recv(fd, buffer + offset, size - offset, 0);
} else {
...
}
if (len < 0) {
err = socket_error();
if (err == EINTR || (offset > 0 && (err == EAGAIN || err
== EWOULDBLOCK))) {
continue;
}
return -err;
}
...
}
....
}
if len < 0 && error == EAGAIN. we have two choice
1) continue to recv until finished.
2) return -EAGAIN, nbd_receive_reply() will check this return value and
will return *Successfully*.
After commit 1c778ef7:
nbd_reply_ready()
read_sync()
nbd_wr_syncv()
{
...
while (nlocal_iov > 0) {
...
if (do_read) {
len = qio_channel_readv(ioc, local_iov, nlocal_iov,
&local_err);
} else {
...
}
if (len == QIO_CHANNEL_ERR_BLOCK) {
if (qemu_in_coroutine()) {
qemu_coroutine_yield();
} else {
qio_channel_wait(ioc,
do_read ? G_IO_IN : G_IO_OUT);
}
continue;
}
...
}
}
For NBD,
qio_channel_readv()
qio_channel_readv_full
klass->io_readv()
qio_channel_socket_readv()
{
for(..) {
ret = recv(xxx);
if (ret < 0) {
if (errno == EAGAIN) {
if (done) {
return done;
} else {
return QIO_CHANNEL_ERR_BLOCK;
}
}
}
...
}
}
Here, if ret < 0 && error == EAGAIN && !done, we'll return
QIO_CHANNEL_ERR_BLOCK. Then nbd_wr_syncv() will invoke
qio_channel_wait() and the guest will *HUNG* until i kill
nbd server service.
It's easy to reproduce. My question: If the scenario i describe above is
what we expected?
Thanks
-Xie
next reply other threads:[~2016-04-07 11:01 UTC|newest]
Thread overview: 2+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-04-07 11:04 Changlong Xie [this message]
2016-04-07 11:17 ` [Qemu-devel] Questions about nbd with QIOChannel Paolo Bonzini
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=57063EA4.6050402@cn.fujitsu.com \
--to=xiecl.fnst@cn.fujitsu.com \
--cc=berrange@redhat.com \
--cc=lizhijian@cn.fujitsu.com \
--cc=pbonzini@redhat.com \
--cc=qemu-devel@nongnu.org \
--cc=zhang.zhanghailiang@huawei.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.