* [Qemu-devel] Questions about nbd with QIOChannel
@ 2016-04-07 11:04 Changlong Xie
2016-04-07 11:17 ` Paolo Bonzini
0 siblings, 1 reply; 2+ messages in thread
From: Changlong Xie @ 2016-04-07 11:04 UTC (permalink / raw)
To: qemu devel, Paolo Bonzini, Daniel P. Berrange
Cc: Li Zhijian, "Xie, Changlong/谢 昌龙",
zhanghailiang
Hi all
Recently during test COLO, i found sometimes the client goes to hung on
Primary side. First i thought it maybe a COLO revelant issue, but after
ton of tests i doubt that this maybe a NBD issue (athough i'm not sure).
So i'd like to share what i found:
Since commit 1c778ef7, we convert to using QIOChannel APIs for actual
socket I/O.
Let foucus on nbd_reply_ready() here:
Before commit 1c778ef7
nbd_reply_ready()
nbd_receive_reply()
nbd_wr_sync()
{
...
while (offset < size) {
if (do_read) {
len = qemu_recv(fd, buffer + offset, size - offset, 0);
} else {
...
}
if (len < 0) {
err = socket_error();
if (err == EINTR || (offset > 0 && (err == EAGAIN || err
== EWOULDBLOCK))) {
continue;
}
return -err;
}
...
}
....
}
if len < 0 && error == EAGAIN. we have two choice
1) continue to recv until finished.
2) return -EAGAIN, nbd_receive_reply() will check this return value and
will return *Successfully*.
After commit 1c778ef7:
nbd_reply_ready()
read_sync()
nbd_wr_syncv()
{
...
while (nlocal_iov > 0) {
...
if (do_read) {
len = qio_channel_readv(ioc, local_iov, nlocal_iov,
&local_err);
} else {
...
}
if (len == QIO_CHANNEL_ERR_BLOCK) {
if (qemu_in_coroutine()) {
qemu_coroutine_yield();
} else {
qio_channel_wait(ioc,
do_read ? G_IO_IN : G_IO_OUT);
}
continue;
}
...
}
}
For NBD,
qio_channel_readv()
qio_channel_readv_full
klass->io_readv()
qio_channel_socket_readv()
{
for(..) {
ret = recv(xxx);
if (ret < 0) {
if (errno == EAGAIN) {
if (done) {
return done;
} else {
return QIO_CHANNEL_ERR_BLOCK;
}
}
}
...
}
}
Here, if ret < 0 && error == EAGAIN && !done, we'll return
QIO_CHANNEL_ERR_BLOCK. Then nbd_wr_syncv() will invoke
qio_channel_wait() and the guest will *HUNG* until i kill
nbd server service.
It's easy to reproduce. My question: If the scenario i describe above is
what we expected?
Thanks
-Xie
^ permalink raw reply [flat|nested] 2+ messages in thread
* Re: [Qemu-devel] Questions about nbd with QIOChannel
2016-04-07 11:04 [Qemu-devel] Questions about nbd with QIOChannel Changlong Xie
@ 2016-04-07 11:17 ` Paolo Bonzini
0 siblings, 0 replies; 2+ messages in thread
From: Paolo Bonzini @ 2016-04-07 11:17 UTC (permalink / raw)
To: Changlong Xie, qemu devel, Daniel P. Berrange; +Cc: Li Zhijian, zhanghailiang
On 07/04/2016 13:04, Changlong Xie wrote:
> Hi all
>
> Recently during test COLO, i found sometimes the client goes to hung on
> Primary side. First i thought it maybe a COLO revelant issue, but after
> ton of tests i doubt that this maybe a NBD issue (athough i'm not sure).
> So i'd like to share what i found:
>
> Since commit 1c778ef7, we convert to using QIOChannel APIs for actual
> socket I/O.
>
> nbd_reply_ready()
> read_sync()
> nbd_wr_syncv()
> {
> ...
> while (nlocal_iov > 0) {
> ...
> if (do_read) {
> len = qio_channel_readv(ioc, local_iov, nlocal_iov,
> &local_err);
> } else {
> ...
> }
> if (len == QIO_CHANNEL_ERR_BLOCK) {
> if (qemu_in_coroutine()) {
> qemu_coroutine_yield();
> } else {
> qio_channel_wait(ioc,
> do_read ? G_IO_IN : G_IO_OUT);
> }
You are right; you've found a bug.
Paolo
^ permalink raw reply [flat|nested] 2+ messages in thread
end of thread, other threads:[~2016-04-07 11:17 UTC | newest]
Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-04-07 11:04 [Qemu-devel] Questions about nbd with QIOChannel Changlong Xie
2016-04-07 11:17 ` Paolo Bonzini
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).