From: Kevin Wolf <kwolf@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
Cc: Denis Lunev <den@virtuozzo.com>,
"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
"armbru@redhat.com" <armbru@redhat.com>,
"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
"mreitz@redhat.com" <mreitz@redhat.com>,
"stefanha@redhat.com" <stefanha@redhat.com>
Subject: Re: [Qemu-devel] [PATCH v6 6/7] block/nbd-client: nbd reconnect
Date: Fri, 7 Jun 2019 16:54:22 +0200 [thread overview]
Message-ID: <20190607145422.GF5055@dhcp-200-226.str.redhat.com> (raw)
In-Reply-To: <c1c87a1c-9f1a-b7c8-d5a7-bb496556a256@virtuozzo.com>
Am 07.06.2019 um 16:27 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 07.06.2019 16:26, Kevin Wolf wrote:
> > Am 07.06.2019 um 14:02 hat Vladimir Sementsov-Ogievskiy geschrieben:
> >> 07.06.2019 11:06, Kevin Wolf wrote:
> >>> Am 07.06.2019 um 05:17 hat Eric Blake geschrieben:
> >>>> On 4/11/19 12:27 PM, Vladimir Sementsov-Ogievskiy wrote:
> >>>>> +static coroutine_fn void nbd_reconnect_loop(NBDConnection *con)
> >>>>> +{
> >>>>> + NBDClientSession *s = nbd_get_client_session(con->bs);
> >>>>> + uint64_t start_time_ns = qemu_clock_get_ns(QEMU_CLOCK_REALTIME);
> >>>>> + uint64_t delay_ns = s->reconnect_delay * 1000000000UL;
> >>>>
> >>>> Do we have a #define constant for nanoseconds in a second to make this
> >>>> more legible than counting '0's?
> >>>>
> >>>>> + uint64_t timeout = 1000000000UL; /* 1 sec */
> >>>>> + uint64_t max_timeout = 16000000000UL; /* 16 sec */
> >>>>
> >>>> 1 * constant, 16 * constant
> >>>>
> >>>>> +
> >>>>> + nbd_reconnect_attempt(con);
> >>>>> +
> >>>>> + while (nbd_client_connecting(s)) {
> >>>>> + if (s->state == NBD_CLIENT_CONNECTING_WAIT &&
> >>>>> + qemu_clock_get_ns(QEMU_CLOCK_REALTIME) - start_time_ns > delay_ns)
> >>>>> + {
> >>>>> + s->state = NBD_CLIENT_CONNECTING_NOWAIT;
> >>>>> + qemu_co_queue_restart_all(&s->free_sema);
> >>>>> + }
> >>>>> +
> >>>>> + bdrv_dec_in_flight(con->bs);
> >>>>> + qemu_co_sleep_ns(QEMU_CLOCK_REALTIME, timeout);
> >>>>
> >>>> Another place where I'd like someone more familiar with coroutines to
> >>>> also have a look.
> >>>
> >>> What's the exact question you'd like me to answer?
> >>>
> >>> But anyway, bdrv_dec/inc_in_flight() around the sleep looks wrong to me.
> >>> Either the operation must be waited for in drain, then you can't
> >>> decrease the counter even during the sleep. Or drain doesn't have to
> >>> consider it, then why is the counter even increased in the first place?
> >>>
> >>> The way it currently is, drain can return assuming that there are no
> >>> requests, but after the timeout the request automatically comes back
> >>> while the drain caller assumes that there is no more activity. This
> >>> doesn't look right.
> >>
> >> Hmm.
> >>
> >> This ind/dec around all lifetime of connection coroutine you added in
> >>
> >> commit 5ad81b4946baf948c65cf7e1436d9b74803c1280
> >> Author: Kevin Wolf <kwolf@redhat.com>
> >> Date: Fri Feb 15 16:53:51 2019 +0100
> >>
> >> nbd: Restrict connection_co reentrance
> >>
> >>
> >> And now I try to connect in endless loop, with qemu_co_sleep_ns() inside.
> >> I need to get a change to bdrv_drain to complete, so I have to sometimes
> >> drop this in_flight request. But I understand your point.
> >
> > Ah, right, I see. I think it's fine to add a second point where we
> > decrease the counter (though I would add a comment telling why we do
> > this) as long as the right conditions are met.
> >
> > The right conditions are probably something like: Once drained, we
> > guarantee that we don't call any callbacks until the drained section
> > ends. In nbd_read_eof() this is true because we can't get an answer if
> > we had no request running.
> >
> > Or actually... This assumes a nice compliant server that doesn't just
> > arbitrarily send unexpected messages. Is the existing code broken if we
> > connect to a malicious server?
> >
> >> Will the following work better?
> >>
> >> bdrv_dec_in_flight(con->bs);
> >> qemu_co_sleep_ns(...);
> >> while (s->drained) {
> >> s->wait_drained_end = true;
> >> qemu_coroutine_yield();
> >> }
> >> bdrv_inc_in_flight(con->bs);
> >>
> >> ...
> >> .drained_begin() {
> >> s->drained = true;
> >> }
> >>
> >> .drained_end() {
> >> if (s->wait_drained_end) {
> >> s->wait_drained_end = false;
> >> aio_co_wake(s->connection_co);
> >> }
> >> }
> >
> > This should make sure that we don't call any callbacks before the drain
> > section has ended.
> >
> > We probably still have a problem because nbd_client_attach_aio_context()
> > reenters the coroutine with qemu_aio_coroutine_enter(), which will cause
> > an assertion failure if co->scheduled is set. So this needs to use a
> > version that can cancel a sleep.
> >
> > I see that your patch currently simply ignores attaching a new context,
> > but then the coroutine stays in the old AioContext. Did I miss some
> > additional code that moves it to the new context somehow or will it just
> > stay where it was if the coroutine happens to be reconnecting when the
> > AioContext was supposed to change?
>
>
> Hmm. Do you mean "in latter case we do nothing" in
>
> void nbd_client_attach_aio_context(BlockDriverState *bs,
> AioContext *new_context)
> {
> NBDClientSession *client = nbd_get_client_session(bs);
>
> /*
> * client->connection_co is either yielded from nbd_receive_reply or from
> * nbd_reconnect_loop(), in latter case we do nothing
> */
> if (client->state == NBD_CLIENT_CONNECTED) {
> qio_channel_attach_aio_context(QIO_CHANNEL(client->ioc), new_context);
>
> bdrv_inc_in_flight(bs);
>
> /*
> * Need to wait here for the BH to run because the BH must run while the
> * node is still drained.
> */
> aio_wait_bh_oneshot(new_context, nbd_client_attach_aio_context_bh, bs);
> }
> }
>
> ?
> Not sure why I ignored this case. So, I should run if() body unconditionally here and support
> interrupting timer-sleeping coroutine in nbd_client_attach_aio_context_bh, yes?
Yes, I think so. We have now two places where the coroutine could be
yielded, the old place that simply yielded in a loop and can be
reentered without a problem and the new one in a sleep. Both need to be
supported when we switch to coroutine to a new AioContext.
Kevin
next prev parent reply other threads:[~2019-06-07 15:59 UTC|newest]
Thread overview: 43+ messages / expand[flat|nested] mbox.gz Atom feed top
2019-04-11 17:27 [Qemu-devel] [PATCH v6 0/7] NBD reconnect Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 1/7] block/nbd-client: split connection_co start out of nbd_client_connect Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` Vladimir Sementsov-Ogievskiy
2019-06-07 2:32 ` Eric Blake
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 2/7] block/nbd-client: use non-blocking io channel for nbd negotiation Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` Vladimir Sementsov-Ogievskiy
2019-06-07 2:34 ` Eric Blake
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 3/7] block/nbd-client: move from quit to state Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` Vladimir Sementsov-Ogievskiy
2019-06-07 2:38 ` Eric Blake
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 4/7] block/nbd: add cmdline and qapi parameter reconnect-delay Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` Vladimir Sementsov-Ogievskiy
2019-06-07 2:43 ` Eric Blake
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 5/7] qemu-coroutine-sleep: introduce qemu_co_sleep_wake Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` Vladimir Sementsov-Ogievskiy
2019-06-07 2:48 ` Eric Blake
2019-06-07 7:57 ` Kevin Wolf
2019-06-07 11:18 ` Vladimir Sementsov-Ogievskiy
2019-06-07 13:02 ` Kevin Wolf
2019-06-07 15:01 ` Vladimir Sementsov-Ogievskiy
2019-06-07 15:52 ` Vladimir Sementsov-Ogievskiy
2019-06-07 17:10 ` Vladimir Sementsov-Ogievskiy
2019-06-11 8:53 ` Kevin Wolf
2019-06-11 10:28 ` Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 6/7] block/nbd-client: nbd reconnect Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` Vladimir Sementsov-Ogievskiy
2019-06-07 3:17 ` Eric Blake
2019-06-07 8:06 ` Kevin Wolf
2019-06-07 12:02 ` Vladimir Sementsov-Ogievskiy
2019-06-07 13:26 ` Kevin Wolf
2019-06-07 14:27 ` Vladimir Sementsov-Ogievskiy
2019-06-07 14:54 ` Kevin Wolf [this message]
2019-06-07 11:44 ` Vladimir Sementsov-Ogievskiy
2019-06-10 12:38 ` Vladimir Sementsov-Ogievskiy
2019-06-10 13:29 ` Vladimir Sementsov-Ogievskiy
2019-06-10 14:33 ` Eric Blake
2019-04-11 17:27 ` [Qemu-devel] [PATCH v6 7/7] iotests: test " Vladimir Sementsov-Ogievskiy
2019-04-11 17:27 ` Vladimir Sementsov-Ogievskiy
2019-04-30 17:50 ` [Qemu-devel] [PATCH v6 0/7] NBD reconnect Vladimir Sementsov-Ogievskiy
2019-04-30 17:50 ` Vladimir Sementsov-Ogievskiy
2019-05-15 9:03 ` [Qemu-devel] ping " Vladimir Sementsov-Ogievskiy
2019-06-04 12:42 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20190607145422.GF5055@dhcp-200-226.str.redhat.com \
--to=kwolf@redhat.com \
--cc=armbru@redhat.com \
--cc=den@virtuozzo.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).