From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: Eric Blake <eblake@redhat.com>
Cc: qemu-block@nongnu.org, qemu-devel@nongnu.org, hreitz@redhat.com,
kwolf@redhat.com, rkagan@virtuozzo.com
Subject: Re: [PATCH v6 4/5] block/nbd: drop connection_co
Date: Mon, 27 Sep 2021 14:57:36 +0300 [thread overview]
Message-ID: <6bdbd25a-666e-be2e-bbe4-da9317ef9f62@virtuozzo.com> (raw)
In-Reply-To: <20210924174439.jphzecreh7usmecj@redhat.com>
24.09.2021 20:44, Eric Blake wrote:
> On Thu, Sep 02, 2021 at 01:38:04PM +0300, Vladimir Sementsov-Ogievskiy wrote:
>> OK, that's a big rewrite of the logic.
>
> And a time-consuming review on my part!
>
>>
>> Pre-patch we have an always running coroutine - connection_co. It does
>> reply receiving and reconnecting. And it leads to a lot of difficult
>> and unobvious code around drained sections and context switch. We also
>> abuse bs->in_flight counter which is increased for connection_co and
>> temporary decreased in points where we want to allow drained section to
>> begin. One of these place is in another file: in nbd_read_eof() in
>> nbd/client.c.
>>
>> We also cancel reconnect and requests waiting for reconnect on drained
>> begin which is not correct. And this patch fixes that.
>>
>> Let's finally drop this always running coroutine and go another way:
>> do both reconnect and receiving in request coroutines.
>>
>> The detailed list of changes below (in the sequence of diff hunks).
>
> Well, depending on the git order file in use ;)
>
>>
>> 1. receiving coroutines are woken directly from nbd_channel_error, when
>> we change s->state
>>
>> 2. nbd_co_establish_connection_cancel(): we don't have drain_begin now,
>> and in nbd_teardown_connection() all requests should already be
>> finished (and reconnect is done from request). So
>> nbd_co_establish_connection_cancel() is called from
>> nbd_cancel_in_flight() (to cancel the request that is doing
>> nbd_co_establish_connection()) and from reconnect_delay_timer_cb()
>> (previously we didn't need it, as reconnect delay only should cancel
>> active requests not the reconnection itself. But now reconnection
>
> Missing )
>
>> itself is done in the separate thread (we now call
>> nbd_client_connection_enable_retry() in nbd_open()), and we need to
>> cancel the requests that waits in nbd_co_establish_connection()
>
> Singular/plural disagreement. I think the intended meaning is
> 'requests that wait' and not 'request that waits'.
>
>> now).
>>
>> 2. We do receive headers in request coroutine. But we also should
>
> Second point 2; I'll call it 2A below, because it looks related to
> point 8.
Ohh. OK.
>
>> dispatch replies for another pending requests. So,
>
> s/another/other/
>
>> nbd_connection_entry() is turned into nbd_receive_replies(), which
>> does reply dispatching until it receive another request headers, and
>
> s/until/while/, s/another/other/
>
>> returns when it receive the requested header.
>
> receives
>
>>
>> 3. All old staff around drained sections and context switch is dropped.
>> In details:
>> - we don't need to move connection_co to new aio context, as we
>> don't have connection_co anymore
>> - we don't have a fake "request" of connection_co (extra increasing
>> in_flight), so don't care with it in drain_begin/end
>> - we don't stop reconnection during drained section anymore. This
>> means that drain_begin may wait for a long time (up to
>> reconnect_delay). But that's an improvement and more correct
>> behavior see below[*]
>>
>> 4. In nbd_teardown_connection() we don't have to wait for
>> connection_co, as it is dropped. And cleanup for s->ioc and nbd_yank
>> is moved here from removed connection_co.
>>
>> 5. In nbd_co_do_establish_connection() we now should handle
>> NBD_CLIENT_CONNECTING_NOWAIT: if new request comes when we are in
>> NBD_CLIENT_CONNECTING_NOWAIT, it still should call
>> nbd_co_establish_connection() (who knows, maybe connection already
>> established by thread in background). But we shouldn't wait: if
>
> maybe the connection was already established by another thread in the background
"another thread" sounds vague. I think better: by connection thread.
>
>> nbd_co_establish_connection() can't return new channel immediately
>> the request should fail (we are in NBD_CLIENT_CONNECTING_NOWAIT
>> state).
>>
>> 6. nbd_reconnect_attempt() is simplified: it's now easier to wait for
>> other requests in the caller, so here we just assert that fact.
>> Also delay time is now initialized here: we can easily detect first
>> attempt and start a timer.
>>
>> 7. nbd_co_reconnect_loop() is dropped, we don't need it. Reconnect
>> retries are fully handle by thread (nbd/client-connection.c), delay
>> timer we initialize in nbd_reconnect_attempt(), we don't have to
>> bother with s->drained and friends. nbd_reconnect_attempt() now
>> called from nbd_co_send_request().
>
> A lot going on there, but it's making sense so far.
>
>>
>> 8. nbd_connection_entry is dropped: reconnect is now handled by
>> nbd_co_send_request(), receiving reply is now handled by
>> nbd_receive_replies(): all handled from request coroutines.
>>
>> 9. So, welcome new nbd_receive_replies() called from request coroutine,
>> that receives reply header instead of nbd_connection_entry().
>> Like with sending requests, only one coroutine may receive in a
>> moment. So we introduce receive_mutex, which is locked around
>> nbd_receive_reply(). It also protects some related fields. Still,
>> full audit of thread-safety in nbd driver is a separate task.
>
> It sounds like you're more worried about auditing for which locks are
> held longer than necessary, rather than expecting to find cases where
> we aren't thread-safe because we are lacking locks? At any rate, as
> this patch is already easier to reason about, I'm okay deferring that
> audit to later patches.
I just try to not interduce new not-threadsafe things. But I'm not sure do we have a good kind of threadsafety in nbd driver code now.
>
>> New function waits for a reply with specified handle being received
>> and works rather simple:
>>
>> Under mutex:
>> - if current handle is 0, do receive by hand. If another handle
>> received - switch to other request coroutine, release mutex and
>> yield. Otherwise return success
>> - if current handle == requested handle, we are done
>> - otherwise, release mutex and yield
>>
>> 10: in nbd_co_send_request() we now do nbd_reconnect_attempt() if
>> needed. Also waiting in free_sema queue we now wait for one of two
>> conditions:
>> - connectED, in_flight < MAX_NBD_REQUESTS (so we can start new one)
>> - connectING, in_flight == 0, so we can call
>> nbd_reconnect_attempt()
>> And this logic is protected by s->send_mutex
>>
>> Also, on failure we don't have to care of removed s->connection_co
>>
>> 11. nbd_co_do_receive_one_chunk(): now instead of yield() and wait for
>> s->connection_co we just call new nbd_receive_replies().
>>
>> 12. nbd_co_receive_one_chunk(): place where s->reply.handle becomes 0,
>> which means that handling of the whole reply is finished. Here we
>> need to wake one of coroutines sleeping in nbd_receive_replies().
>> If now one sleeps - do nothing. That's another behavior change: we
>
> s/now one sleeps/none are sleeping/
>
>> don't have endless recv() in the idle time. It may be considered as
>> a drawback. If so, it may be fixed later.
>
> The case where no coroutine is waiting to receive a reply should only
> happen due to a malicious server (a compliant server will only send
> replies matching an existing pending request).
With new code, when we don't have any in-flight requests - there are no waiting coroutines as well.
And nobody will note if connection silently get lost.
With old code, we always have recv() call called from connection_co, so we should catch disconnects (at least some kinds of disconnects or if keep-alive is set up properly) sooner and start reconnection procedure.
>
>>
>> 13. nbd_reply_chunk_iter_receive(): don't care about removed
>> connection_co, just ping in_flight waiters.
>>
>> 14. Don't create connection_co, enable retry in the connection thread
>> (we don't have own reconnect loop anymore)
>>
>> 15. We need now nbd_co_establish_connection_cancel() call in
>> nbd_cancel_in_flight(), to cancel the request that doing connection
>> attempt.
>
> s/need now/now need to add a/
> s/request that doing/request that is doing a/
>
>>
>> [*], ok, now we don't cancel reconnect on drain begin. That's correct:
>> reconnect feature leads to possibility of long-running requests (up
>> to reconnect delay). Still, drain begin is not a reason to kill
>> long requests. We should wait for them.
>>
>> This also means, that we can again reproduce a dead-lock, described
>> in 8c517de24a8a1dcbeb54e7e12b5b0fda42a90ace.
>> Why we are OK with it:
>> 1. Now this is not absolutely-dead dead-lock: the vm is unfrozen
>> after reconnect delay. Actually 8c517de24a8a1dc fixed a bug in
>> NBD logic, that was not described in 8c517de24a8a1dc and led to
>> forever dead-lock. The problem was that nobody woken free_sema
>
> s/woken/woke the/
>
>> queue, but drain_begin can't finish until there is a request in
>> free_sema queue. Now we have a reconnect delay timer that works
>> well.
>> 2. It's not a problem of NBD driver, it's a problem of ide code,
>> that does drain_begin under global mutex
>
> I agree that this point means that it is appropriate to address that
> in a separate patch series. Hopefully someone tackles it before 6.2,
> but even if not, I agree that the worst that happens is that we have a
> command stalled waiting on a long timeout, rather than actual
> deadlock.
>
>> 3. That doesn't reproduce if chose scsi instead of ide.
>
> Seems like this sentence fragment fits better as the tail of 2, rather
> than as a separate bullet 3? I'll reword as:
>
> ...it's a problem of the ide code, because it does drain_begin under
> the global mutex; the problem doesn't reproduce when using scsi
> instead of ide.
Sounds good
>
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>> ---
[..]
>
> Verdict: I found some typos/grammar fixes, but I think I can touch
> those up. I'm now hammering on the patches in my NBD tree, and
> barring any surprises, this will be in my next pull request Monday at
> the latest.
>
> Reviewed-by: Eric Blake <eblake@redhat.com>
>
Thanks a lot for reviewing, and time spent to understand and fix my English )
--
Best regards,
Vladimir
next prev parent reply other threads:[~2021-09-27 11:58 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-09-02 10:38 [PATCH v6 0/5] block/nbd: drop connection_co Vladimir Sementsov-Ogievskiy
2021-09-02 10:38 ` [PATCH v6 1/5] block/nbd: nbd_channel_error() shutdown channel unconditionally Vladimir Sementsov-Ogievskiy
2021-09-03 16:16 ` Eric Blake
2021-09-02 10:38 ` [PATCH v6 2/5] block/nbd: move nbd_recv_coroutines_wake_all() up Vladimir Sementsov-Ogievskiy
2021-09-02 10:38 ` [PATCH v6 3/5] block/nbd: refactor nbd_recv_coroutines_wake_all() Vladimir Sementsov-Ogievskiy
2021-09-03 17:36 ` Eric Blake
2021-09-02 10:38 ` [PATCH v6 4/5] block/nbd: drop connection_co Vladimir Sementsov-Ogievskiy
2021-09-24 17:44 ` Eric Blake
2021-09-27 11:57 ` Vladimir Sementsov-Ogievskiy [this message]
2021-09-02 10:38 ` [PATCH v6 5/5] block/nbd: check that received handle is valid Vladimir Sementsov-Ogievskiy
2021-09-03 17:54 ` Eric Blake
2021-09-22 7:48 ` [PATCH v6 0/5] block/nbd: drop connection_co Vladimir Sementsov-Ogievskiy
2021-09-22 13:12 ` Eric Blake
2021-09-22 13:49 ` Vladimir Sementsov-Ogievskiy
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6bdbd25a-666e-be2e-bbe4-da9317ef9f62@virtuozzo.com \
--to=vsementsov@virtuozzo.com \
--cc=eblake@redhat.com \
--cc=hreitz@redhat.com \
--cc=kwolf@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=rkagan@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).