Re: [Qemu-devel] [PATCH v6 6/7] block/nbd-client: nbd reconnect

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Kevin Wolf <kwolf@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: Denis Lunev <den@virtuozzo.com>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"armbru@redhat.com" <armbru@redhat.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"mreitz@redhat.com" <mreitz@redhat.com>,
	"stefanha@redhat.com" <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v6 6/7] block/nbd-client: nbd reconnect
Date: Fri, 7 Jun 2019 16:54:22 +0200	[thread overview]
Message-ID: <20190607145422.GF5055@dhcp-200-226.str.redhat.com> (raw)
In-Reply-To: <c1c87a1c-9f1a-b7c8-d5a7-bb496556a256@virtuozzo.com>

Am 07.06.2019 um 16:27 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 07.06.2019 16:26, Kevin Wolf wrote:
> > Am 07.06.2019 um 14:02 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 07.06.2019 11:06, Kevin Wolf wrote:
> >>> Am 07.06.2019 um 05:17 hat Eric Blake geschrieben:
> >>>> On 4/11/19 12:27 PM, Vladimir Sementsov-Ogievskiy wrote:
> >>>>> +static coroutine_fn void nbd_reconnect_loop(NBDConnection *con)
> >>>>> +{
> >>>>> +    NBDClientSession *s = nbd_get_client_session(con->bs);
> >>>>> +    uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> >>>>> +    uint64_t delay_ns = s->reconnect_delay * 1000000000UL;
> >>>>
> >>>> Do we have a #define constant for nanoseconds in a second to make this
> >>>> more legible than counting '0's?
> >>>>
> >>>>> +    uint64_t timeout = 1000000000UL; /* 1 sec */
> >>>>> +    uint64_t max_timeout = 16000000000UL; /* 16 sec */
> >>>>
> >>>> 1 * constant, 16 * constant
> >>>>
> >>>>> +
> >>>>> +    nbd_reconnect_attempt(con);
> >>>>> +
> >>>>> +    while (nbd_client_connecting(s)) {
> >>>>> +        if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
> >>>>> +            qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
> >>>>> +        {
> >>>>> +            s->state = NBD_CLIENT_CONNECTING_NOWAIT;
> >>>>> +            qemu_co_queue_restart_all(&s->free_sema);
> >>>>> +        }
> >>>>> +
> >>>>> +        bdrv_dec_in_flight(con->bs);
> >>>>> +        qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, timeout);
> >>>>
> >>>> Another place where I'd like someone more familiar with coroutines to
> >>>> also have a look.
> >>>
> >>> What's the exact question you'd like me to answer?
> >>>
> >>> But anyway, bdrv_dec/inc_in_flight() around the sleep looks wrong to me.
> >>> Either the operation must be waited for in drain, then you can't
> >>> decrease the counter even during the sleep. Or drain doesn't have to
> >>> consider it, then why is the counter even increased in the first place?
> >>>
> >>> The way it currently is, drain can return assuming that there are no
> >>> requests, but after the timeout the request automatically comes back
> >>> while the drain caller assumes that there is no more activity. This
> >>> doesn't look right.
> >>
> >> Hmm.
> >>
> >> This ind/dec around all lifetime of connection coroutine you added in
> >>
> >> commit 5ad81b4946baf948c65cf7e1436d9b74803c1280
> >> Author: Kevin Wolf <kwolf@redhat.com>
> >> Date:   Fri Feb 15 16:53:51 2019 +0100
> >>
> >>       nbd: Restrict connection_co reentrance
> >>
> >>
> >> And now I try to connect in endless loop, with qemu_co_sleep_ns() inside.
> >> I need to get a change to bdrv_drain to complete, so I have to sometimes
> >> drop this in_flight request. But I understand your point.
> > 
> > Ah, right, I see. I think it's fine to add a second point where we
> > decrease the counter (though I would add a comment telling why we do
> > this) as long as the right conditions are met.
> > 
> > The right conditions are probably something like: Once drained, we
> > guarantee that we don't call any callbacks until the drained section
> > ends. In nbd_read_eof() this is true because we can't get an answer if
> > we had no request running.
> > 
> > Or actually... This assumes a nice compliant server that doesn't just
> > arbitrarily send unexpected messages. Is the existing code broken if we
> > connect to a malicious server?
> > 
> >> Will the following work better?
> >>
> >> bdrv_dec_in_flight(con->bs);
> >> qemu_co_sleep_ns(...);
> >> while (s->drained) {
> >>     s->wait_drained_end = true;
> >>     qemu_coroutine_yield();
> >> }
> >> bdrv_inc_in_flight(con->bs);
> >>
> >> ...
> >> .drained_begin() {
> >>      s->drained = true;
> >> }
> >>
> >> .drained_end() {
> >>      if (s->wait_drained_end) {
> >>         s->wait_drained_end = false;
> >>         aio_co_wake(s->connection_co);
> >>      }
> >> }
> > 
> > This should make sure that we don't call any callbacks before the drain
> > section has ended.
> > 
> > We probably still have a problem because nbd_client_attach_aio_context()
> > reenters the coroutine with qemu_aio_coroutine_enter(), which will cause
> > an assertion failure if co->scheduled is set. So this needs to use a
> > version that can cancel a sleep.
> > 
> > I see that your patch currently simply ignores attaching a new context,
> > but then the coroutine stays in the old AioContext. Did I miss some
> > additional code that moves it to the new context somehow or will it just
> > stay where it was if the coroutine happens to be reconnecting when the
> > AioContext was supposed to change?
> 
> 
> Hmm. Do you mean "in latter case we do nothing" in
> 
>    void nbd_client_attach_aio_context(BlockDriverState *bs,
>                                       AioContext *new_context)
>    {
>        NBDClientSession *client = nbd_get_client_session(bs);
> 
>        /*
>         * client->connection_co is either yielded from nbd_receive_reply or from
>         * nbd_reconnect_loop(), in latter case we do nothing
>         */
>        if (client->state == NBD_CLIENT_CONNECTED) {
>            qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
> 
>            bdrv_inc_in_flight(bs);
> 
>            /*
>             * Need to wait here for the BH to run because the BH must run while the
>             * node is still drained.
>             */
>            aio_wait_bh_oneshot(new_context, nbd_client_attach_aio_context_bh, bs);
>        }
>    }
> 
> ?
> Not sure why I ignored this case. So, I should run if() body unconditionally here and support
> interrupting timer-sleeping coroutine in nbd_client_attach_aio_context_bh, yes?

Yes, I think so. We have now two places where the coroutine could be
yielded, the old place that simply yielded in a loop and can be
reentered without a problem and the new one in a sleep. Both need to be
supported when we switch to coroutine to a new AioContext.

Kevin

next prev parent reply	other threads:[~2019-06-07 15:59 UTC|newest]

Thread overview: 43+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-04-11 17:27 [Qemu-devel] [PATCH v6 0/7] NBD reconnect Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 1/7] block/nbd-client: split connection_co start out of nbd_client_connect Vladimir Sementsov-Ogievskiy
2019-04-11 17:27   ` Vladimir Sementsov-Ogievskiy
2019-06-07  2:32   ` Eric Blake
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 2/7] block/nbd-client: use non-blocking io channel for nbd negotiation Vladimir Sementsov-Ogievskiy
2019-04-11 17:27   ` Vladimir Sementsov-Ogievskiy
2019-06-07  2:34   ` Eric Blake
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 3/7] block/nbd-client: move from quit to state Vladimir Sementsov-Ogievskiy
2019-04-11 17:27   ` Vladimir Sementsov-Ogievskiy
2019-06-07  2:38   ` Eric Blake
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 4/7] block/nbd: add cmdline and qapi parameter reconnect-delay Vladimir Sementsov-Ogievskiy
2019-04-11 17:27   ` Vladimir Sementsov-Ogievskiy
2019-06-07  2:43   ` Eric Blake
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 5/7] qemu-coroutine-sleep: introduce qemu_co_sleep_wake Vladimir Sementsov-Ogievskiy
2019-04-11 17:27   ` Vladimir Sementsov-Ogievskiy
2019-06-07  2:48   ` Eric Blake
2019-06-07  7:57   ` Kevin Wolf
2019-06-07 11:18     ` Vladimir Sementsov-Ogievskiy
2019-06-07 13:02       ` Kevin Wolf
2019-06-07 15:01         ` Vladimir Sementsov-Ogievskiy
2019-06-07 15:52         ` Vladimir Sementsov-Ogievskiy
2019-06-07 17:10           ` Vladimir Sementsov-Ogievskiy
2019-06-11  8:53             ` Kevin Wolf
2019-06-11 10:28               ` Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 6/7] block/nbd-client: nbd reconnect Vladimir Sementsov-Ogievskiy
2019-04-11 17:27   ` Vladimir Sementsov-Ogievskiy
2019-06-07  3:17   ` Eric Blake
2019-06-07  8:06     ` Kevin Wolf
2019-06-07 12:02       ` Vladimir Sementsov-Ogievskiy
2019-06-07 13:26         ` Kevin Wolf
2019-06-07 14:27           ` Vladimir Sementsov-Ogievskiy
2019-06-07 14:54             ` Kevin Wolf [this message]
2019-06-07 11:44     ` Vladimir Sementsov-Ogievskiy
2019-06-10 12:38     ` Vladimir Sementsov-Ogievskiy
2019-06-10 13:29       ` Vladimir Sementsov-Ogievskiy
2019-06-10 14:33         ` Eric Blake
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 7/7] iotests: test " Vladimir Sementsov-Ogievskiy
2019-04-11 17:27   ` Vladimir Sementsov-Ogievskiy
2019-04-30 17:50 ` [Qemu-devel] [PATCH v6 0/7] NBD reconnect Vladimir Sementsov-Ogievskiy
2019-04-30 17:50   ` Vladimir Sementsov-Ogievskiy
2019-05-15  9:03 ` [Qemu-devel] ping " Vladimir Sementsov-Ogievskiy
2019-06-04 12:42 ` Vladimir Sementsov-Ogievskiy

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20190607145422.GF5055@dhcp-200-226.str.redhat.com \
    --to=kwolf@redhat.com \
    --cc=armbru@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).