qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>, qemu-devel@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	"open list:Network Block Dev..." <qemu-block@nongnu.org>,
	Max Reitz <mreitz@redhat.com>
Subject: Re: [Qemu-devel] [PULL 7/7] nbd-client: Fix regression when server sends garbage
Date: Wed, 16 Aug 2017 15:00:35 +0300	[thread overview]
Message-ID: <f4ca52b8-9ec7-bf99-dc19-dc732b994cd3@virtuozzo.com> (raw)
In-Reply-To: <2d3ed7c8-f561-cdae-e7e5-411dff341dd8@redhat.com>

15.08.2017 19:51, Eric Blake wrote:
> On 08/15/2017 10:50 AM, Vladimir Sementsov-Ogievskiy wrote:
>> 15.08.2017 18:09, Eric Blake wrote:
>>> When we switched NBD to use coroutines for qemu 2.9 (in particular,
>>> commit a12a712a), we introduced a regression: if a server sends us
>>> garbage (such as a corrupted magic number), we quit the read loop
>>> but do not stop sending further queued commands, resulting in the
>>> client hanging when it never reads the response to those additional
>>> commands.  In qemu 2.8, we properly detected that the server is no
>>> longer reliable, and cancelled all existing pending commands with
>>> EIO, then tore down the socket so that all further command attempts
>>> get EPIPE.
>>>
>>> +++ b/block/nbd-client.c
>>> @@ -73,7 +73,7 @@ static coroutine_fn void nbd_read_reply_entry(void
>>> *opaque)
>>>        int ret;
>>>        Error *local_err = NULL;
>>>
>>> -    for (;;) {
>>> +    while (!s->quit) {
>>>            assert(s->reply.handle == 0);
>>>            ret = nbd_receive_reply(s->ioc, &s->reply, &local_err);
>>>            if (ret < 0) {
>> I think we should check quit here, if it is true, we should not continue
>> normal path of handling reply
> I don't think it matters.  If nbd_receive_reply() correctly got data off
> the wire for this particular coroutine's request, we might as well act
> on that data, regardless of what other coroutines have learned in the
> meantime.
>
> This is already in the pull request for -rc3, but if you can come up
> with a scenario that still behaves incorrectly, we can do a followup

it just don't correspond to your commit message:
"... therefore end all pending commands with EIO, and quit trying to 
send any further commands"

so, we should end this command (we read reply for it) with EIO, instead 
of continuing success path.

However, I think this don't leads to a scenario, leading to a hang or a 
crash of the client, it's OK for me to
handle it in my refactoring for 2.11.

> patch for -rc4 (although I'm hoping we don't have to change it any
> further for 2.10).  Otherwise, I'm fine if your refactoring work for
> 2.11 addresses the issue as part of making the code easier to read.
>
>>> @@ -154,6 +161,9 @@ static int nbd_co_send_request(BlockDriverState *bs,
>>>        } else {
>>>            rc = nbd_send_request(s->ioc, request);
>>>        }
>>> +    if (rc < 0) {
>>> +        s->quit = true;
>>> +    }
>>>        qemu_co_mutex_unlock(&s->send_mutex);
>> and here, if rc == 0 and quite is true, we should not return 0
>>
>>>        return rc;
> We don't - we return rc, which is negative.

I think it can be zero, while quit is set to true in other coroutine.

>
>>>    }
>>> @@ -168,8 +178,7 @@ static void nbd_co_receive_reply(NBDClientSession *s,
>>>        /* Wait until we're woken up by nbd_read_reply_entry.  */
>>>        qemu_coroutine_yield();
>>>        *reply = s->reply;
>>> -    if (reply->handle != request->handle ||
>>> -        !s->ioc) {
>>> +    if (reply->handle != request->handle || !s->ioc || s->quit) {
>>>            reply->error = EIO;
>> here, if s->quit is false, we should set it to inform other coroutines
> We can't get into nbd_co_receive_reply() unless the two handles were
> once equal, and the only code that changes them to be not equal is when
> we are shutting down.  Checking s->quit is a safety valve if some other
> coroutine detects corruption first, but this coroutine does not need to
> set s->quit because it is either already set, or we are already shutting
> down.

ok, and, as s->quit is set when we are shutting down, we can drop 
comparing handles here.

>
>>>        } else {
>>>            if (qiov && reply->error == 0) {
>> and here follows a call to nbd_rwv(), where s->quit should be
>> appropriately handled..
> Reading from a corrupt server is not as bad as writing to the corrupt
> server; the patch for 2.10 is solely focused on preventing writes where
> we need a followup read (because once we know the server is corrupt, we
> can't guarantee the followup reads will come).
>
> Again, if you can prove we have a scenario that is still buggy (client
> can crash or hang), then it is -rc4 material; if not, then this is all
> the more that 2.10 needs, and your refactoring work for 2.11 should
> clean up a lot of this mess in the first place as you make the
> coroutines easier to follow.

ok.

>

-- 
Best regards,
Vladimir

  reply	other threads:[~2017-08-16 12:00 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-15 15:09 [Qemu-devel] [PULL for 2.10-rc3 0/7] NBD changes for 2.10-rc3 Eric Blake
2017-08-15 15:09 ` [Qemu-devel] [PULL 1/7] nbd: Fix trace message for disconnect Eric Blake
2017-08-15 15:09 ` [Qemu-devel] [PULL 2/7] qemu-iotests: step clock after each test iteration Eric Blake
2017-08-15 15:09 ` [Qemu-devel] [PULL 3/7] stubs: Add vm state change handler stubs Eric Blake
2017-08-15 15:09 ` [Qemu-devel] [PULL 4/7] nbd: Fix order of bdrv_set_perm and bdrv_invalidate_cache Eric Blake
2017-08-15 15:09 ` [Qemu-devel] [PULL 5/7] block-backend: Defer shared_perm tightening migration completion Eric Blake
2017-08-15 15:09 ` [Qemu-devel] [PULL 6/7] iotests: Add non-shared storage migration case 192 Eric Blake
2017-08-15 15:09 ` [Qemu-devel] [PULL 7/7] nbd-client: Fix regression when server sends garbage Eric Blake
2017-08-15 15:50   ` Vladimir Sementsov-Ogievskiy
2017-08-15 16:51     ` Eric Blake
2017-08-16 12:00       ` Vladimir Sementsov-Ogievskiy [this message]
2017-08-21 10:11   ` Vladimir Sementsov-Ogievskiy
2017-08-21 10:13     ` Vladimir Sementsov-Ogievskiy
2017-08-23 15:09       ` [Qemu-devel] " Vladimir Sementsov-Ogievskiy
2017-08-23 15:17         ` Eric Blake
2017-08-23 15:21           ` Vladimir Sementsov-Ogievskiy
2017-08-15 17:52 ` [Qemu-devel] [PULL for 2.10-rc3 0/7] NBD changes for 2.10-rc3 Peter Maydell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f4ca52b8-9ec7-bf99-dc19-dc732b994cd3@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=eblake@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).