Re: [Qemu-devel] [PATCH 4/4] block-backend: Queue requests while drained

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Max Reitz <mreitz@redhat.com>
To: Kevin Wolf <kwolf@redhat.com>
Cc: dplotnikov@virtuozzo.com, vsementsov@virtuozzo.com,
	den@virtuozzo.com, qemu-block@nongnu.org, qemu-devel@nongnu.org
Subject: Re: [Qemu-devel] [PATCH 4/4] block-backend: Queue requests while drained
Date: Fri, 26 Jul 2019 14:34:33 +0200	[thread overview]
Message-ID: <3443822d-46f4-546f-65ba-70182425478e@redhat.com> (raw)
In-Reply-To: <20190726114930.GD6295@localhost.localdomain>


[-- Attachment #1.1: Type: text/plain, Size: 5445 bytes --]

On 26.07.19 13:49, Kevin Wolf wrote:
> Am 26.07.2019 um 12:50 hat Max Reitz geschrieben:
>> On 25.07.19 18:27, Kevin Wolf wrote:
>>> This fixes device like IDE that can still start new requests from I/O
>>
>> *devices
>>
>>> handlers in the CPU thread while the block backend is drained.
>>>
>>> The basic assumption is that in a drain section, no new requests should
>>> be allowed through a BlockBackend (blk_drained_begin/end don't exist,
>>> we get drain sections only on the node level). However, there are two
>>> special cases where requests should not be queued:
>>>
>>> 1. Block jobs: We already make sure that block jobs are paused in a
>>>    drain section, so they won't start new requests. However, if the
>>>    drain_begin is called on the job's BlockBackend first, it can happen
>>>    that we deadlock because the job stays busy until it reaches a pause
>>>    point - which it can't if it's requests aren't processed any more.
>>>
>>>    The proper solution here would be to make all requests through the
>>>    job's filter node instead of using a BlockBackend. For now, just
>>>    disabling request queuin on the job BlockBackend is simpler.
>>
>> Yep, seems reasonable.
>>
>> (We’d need a relationship that a BB is owned by some job, and then pause
>> the job when the BB is drained, I suppose.  But that’s exactly
>> accomplished by not making the job use a BB, but its BdrvChild
>> references instead.)
> 
> We actually had this before commit ad90feba, when we changed it to use
> the job's BdrvChild objects instead. All block jobs have both currently,
> they just don't use their BdrvChild objects much.
> 
>>> 2. In test cases where making requests through bdrv_* would be
>>>    cumbersome because we'd need a BdrvChild. As we already got the
>>>    functionality to disable request queuing from 1., use it in tests,
>>>    too, for convenience.
>>>
>>> Signed-off-by: Kevin Wolf <kwolf@redhat.com>
>>> ---
>>>  include/sysemu/block-backend.h | 11 +++---
>>>  block/backup.c                 |  1 +
>>>  block/block-backend.c          | 69 +++++++++++++++++++++++++++++-----
>>>  block/commit.c                 |  2 +
>>>  block/mirror.c                 |  6 ++-
>>>  blockjob.c                     |  3 ++
>>>  tests/test-bdrv-drain.c        |  1 +
>>>  7 files changed, 76 insertions(+), 17 deletions(-)
>>
>> [...]
>>
>>> diff --git a/block/block-backend.c b/block/block-backend.c
>>> index fdd6b01ecf..603b281743 100644
>>> --- a/block/block-backend.c
>>> +++ b/block/block-backend.c
>>
>> [...]
>>
>>> @@ -1127,13 +1136,26 @@ static int blk_check_byte_request(BlockBackend *blk, int64_t offset,
>>>      return 0;
>>>  }
>>>  
>>> +static void blk_wait_while_drained(BlockBackend *blk)
>>
>> +coroutine_fn?  (Maybe even blk_co_wait...)
>>
>>> +{
>>> +    if (blk->quiesce_counter && !blk->disable_request_queuing) {
>>> +        qemu_co_queue_wait(&blk->queued_requests, NULL);
>>> +    }
>>> +}
>>> +
>>>  int coroutine_fn blk_co_preadv(BlockBackend *blk, int64_t offset,
>>>                                 unsigned int bytes, QEMUIOVector *qiov,
>>> -                               BdrvRequestFlags flags)
>>> +                               BdrvRequestFlags flags, bool wait_while_drained)
>>
>> What’s the purpose of this parameter?  How would it hurt to always
>> wait_while_drained?
>>
>> I see the following callers of blk_co_p{read,write}v() that call it with
>> wait_while_drained=false:
>>
>> 1. blk_aio_{read,write}_entry(): They wait themselves, so they don’t
>>    need these functions to wait.  But OTOH, because they have waited, we
>>    know that the BB is not quiesced here, so we won’t wait here anyway.
>>    (These functions should be coroutine_fn, too, by the way)
> 
> I think I was worried that the coroutine might yield between the two
> places. Later I noticed that blk_wait_while_drained() must be the very
> first thing anyway, so maybe it doesn't matter any more now.
> 
> If we did yield here for requests coming from blk_aio_prwv(), in_flight
> would be increased and drain would deadlock.
> 
> Would you prefer if I just unconditionally wait if we're drained?

I think I would, yes.

>> 2. mirror: It disables request queuing anyway, so wait_while_drained
>>    doesn’t have any effect.
> 
> Yes, I wasn't sure what to use there. false seemed like it would be
> less likely to cause misunderstandings because it just repeats what
> would happen anyway.
> 
>>>  {
>>>      int ret;
>>> -    BlockDriverState *bs = blk_bs(blk);
>>> +    BlockDriverState *bs;
>>>  
>>> +    if (wait_while_drained) {
>>> +        blk_wait_while_drained(blk);
>>> +    }
>>
>> [...]
>>
>> What about blk_co_flush()?  Should that wait, too?
> 
> Hm, probably, yes.
> 
>>> @@ -2232,6 +2278,9 @@ static void blk_root_drained_end(BdrvChild *child, int *drained_end_counter)
>>>          if (blk->dev_ops && blk->dev_ops->drained_end) {
>>>              blk->dev_ops->drained_end(blk->dev_opaque);
>>>          }
>>> +        while (qemu_co_enter_next(&blk->queued_requests, NULL)) {
>>> +            /* Resume all queued requests */
>>> +        }
>>
>> Wouldn’t qemu_co_queue_restart_all(&blk->queued_requests) achieve the same?
> 
> It would fail an assertion because we're not in coroutine context.
> (Guess what my first attempt was!)

:-)

Max


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

     prev parent reply	other threads:[~2019-07-26 12:34 UTC|newest]

Thread overview: 14+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-07-25 16:27 [Qemu-devel] [PATCH 0/4] block-backend: Queue requests while drained Kevin Wolf
2019-07-25 16:27 ` [Qemu-devel] [PATCH 1/4] block: Remove blk_pread_unthrottled() Kevin Wolf
2019-07-26  9:18   ` Max Reitz
2019-07-25 16:27 ` [Qemu-devel] [PATCH 2/4] block: Reduce (un)drains when replacing a child Kevin Wolf
2019-07-25 16:27 ` [Qemu-devel] [PATCH 3/4] mirror: Keep target drained until graph changes are done Kevin Wolf
2019-07-25 17:03   ` Eric Blake
2019-07-26  9:52   ` Max Reitz
2019-07-26 11:36     ` Kevin Wolf
2019-07-26 12:30       ` Max Reitz
2019-07-25 16:27 ` [Qemu-devel] [PATCH 4/4] block-backend: Queue requests while drained Kevin Wolf
2019-07-25 17:06   ` Eric Blake
2019-07-26 10:50   ` Max Reitz
2019-07-26 11:49     ` Kevin Wolf
2019-07-26 12:34       ` Max Reitz [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3443822d-46f4-546f-65ba-70182425478e@redhat.com \
    --to=mreitz@redhat.com \
    --cc=den@virtuozzo.com \
    --cc=dplotnikov@virtuozzo.com \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).