Re: [PATCH 2/4] block: Split padded I/O vectors exceeding IOV_MAX

qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed

From: Hanna Czenczek <hreitz@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>,
	qemu-block@nongnu.org
Cc: qemu-devel@nongnu.org, Eric Blake <eblake@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>,
	Stefan Hajnoczi <stefanha@redhat.com>, Fam Zheng <fam@euphon.net>
Subject: Re: [PATCH 2/4] block: Split padded I/O vectors exceeding IOV_MAX
Date: Tue, 4 Apr 2023 19:32:40 +0200	[thread overview]
Message-ID: <251b1d36-7fe0-498d-f257-b1a0d256779f@redhat.com> (raw)
In-Reply-To: <9bf47acd-9c41-b838-c6a9-fea2c586d385@yandex-team.ru>

On 04.04.23 10:10, Vladimir Sementsov-Ogievskiy wrote:
> On 03.04.23 16:33, Hanna Czenczek wrote:
>> (Sorry for the rather late reply... Thanks for the review!)
>>
>> On 20.03.23 11:31, Vladimir Sementsov-Ogievskiy wrote:
>>> On 17.03.23 20:50, Hanna Czenczek wrote:
>>
>> [...]
>>
>>>> diff --git a/block/io.c b/block/io.c
>>>> index 8974d46941..1e9cdba17a 100644
>>>> --- a/block/io.c
>>>> +++ b/block/io.c
>>>
>>> [..]
>>>
>>>> +    pad->write = write;
>>>> +
>>>>       return true;
>>>>   }
>>>>   @@ -1545,6 +1561,18 @@ zero_mem:
>>>>     static void bdrv_padding_destroy(BdrvRequestPadding *pad)
>>>
>>> Maybe, rename to _finalize, to stress that it's not only freeing 
>>> memory.
>>
>> Sounds good!
>>
>> [...]
>>
>>>> @@ -1552,6 +1580,101 @@ static void 
>>>> bdrv_padding_destroy(BdrvRequestPadding *pad)
>>>>       memset(pad, 0, sizeof(*pad));
>>>>   }
>>>>   +/*
>>>> + * Create pad->local_qiov by wrapping @iov in the padding head and 
>>>> tail, while
>>>> + * ensuring that the resulting vector will not exceed IOV_MAX 
>>>> elements.
>>>> + *
>>>> + * To ensure this, when necessary, the first couple of elements 
>>>> (up to three)
>>>
>>> maybe, "first two-three elements"
>>
>> Sure (here and...
>>
>> [...]
>>
>>>> +    /*
>>>> +     * If padded_niov > IOV_MAX, we cannot just concatenate 
>>>> everything.
>>>> +     * Instead, merge the first couple of elements of @iov to 
>>>> reduce the number
>>>
>>> maybe, "first two-three elements"
>>
>> ...here).
>>
>>>
>>>> +     * of vector elements as necessary.
>>>> +     */
>>>> +    if (padded_niov > IOV_MAX) {
>>>>
>>>
>>> [..]
>>>
>>>> @@ -1653,8 +1786,8 @@ int coroutine_fn 
>>>> bdrv_co_preadv_part(BdrvChild *child,
>>>>           flags |= BDRV_REQ_COPY_ON_READ;
>>>>       }
>>>>   -    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, 
>>>> &bytes, &pad,
>>>> -                           NULL, &flags);
>>>> +    ret = bdrv_pad_request(bs, &qiov, &qiov_offset, &offset, 
>>>> &bytes, false,
>>>> +                           &pad, NULL, &flags);
>>>>       if (ret < 0) {
>>>>           goto fail;
>>>>       }
>>>
>>> a bit later:
>>>
>>> tracked_request_end(&req);
>>> bdrv_padding_destroy(&pad);
>>>
>>>
>>> Now, the request is formally finished inside 
>>> bdrv_padding_destroy().. Not sure, does it really violate something, 
>>> but seems safer to swap these two calls. 
>>
>> I’d rather not, for two reasons: First, tracked requests are (as far 
>> as I understand) only there to implement request serialization, and 
>> so only care about metadata (offset, length, and type), which is not 
>> changed by changes to the I/O vector.
>>
>> Second, even if the state of the I/O vector were relevant to tracked 
>> requests, I think it would actually be the other way around, i.e. the 
>> tracked request must be ended before the padding is 
>> finalized/destroyed.  The tracked request is about the actual request 
>> we submit to `child` (which is why tracked_request_begin() is called 
>> after bdrv_pad_request()), and that request is done using the 
>> modified I/O vector.  So if the tracked request had any connection to 
>> the request’s I/O vector (which it doesn’t), it would be to this 
>> modified one, so we mustn’t invalidate it via bdrv_padding_finalize() 
>> while the tracked request lives.
>>
>> Or, said differently: I generally try to clean up things in the 
>> inverse way they were set up, and because bdrv_pad_requests() comes 
>> before tracked_request_begin(), I think tracked_request_end() should 
>> come before bdrv_padding_finalize().
>
> Note, that it's wise-versa in bdrv_co_pwritev_part().

Well, and it’s this way here.  We agree that for clean-up, the order 
doesn’t functionally matter, so either way is actually fine.

> For me it's just simpler to think that the whole request, including 
> filling user-given qiov with data on read part is inside 
> tracked_request_begin() / tracked_request_end().

It isn’t, though, because padding must be done before the tracked 
request is created.  The tracked request uses the request’s actual 
offset and length, after padding, so bdrv_pad_request() must always be 
done before (i.e., outside) tracked_request_begin().

> And moving the last manipulation with qiov out of it breaks this 
> simple thought.
> Guest should not care of it, as it doesn't know about request 
> tracking.. But what about internal code? Some code may depend on some 
> requests be finished after bdrv_drained_begin() call, but now they may 
> be not fully finished, and some data may be not copied back to 
> original qiov.
>
> I agree with your point about sequence of objects finalization, but 
> maybe, that just shows that copying data back to qiov should not be a 
> part of bdrv_padding_finalize(), but instead be a separate function, 
> called before tracked_request_end().

But my thought is that copying back shouldn’t be done before 
tracked_request_end(), because copying back is not part of the tracked 
request.  What we track is the padded request, which uses a modified I/O 
vector, so undoing that modification shouldn’t be done while the tracked 
request lives.

I know I’m inconsistent with regards to bdrv_co_pwritev_part(), which is 
because it doesn’t matter.  My actual position is that tracked requests 
are about metadata, so undoing/finalizing the padding (including 
potentially copying data back) has nothing to do with a tracked request, 
so the order cannot of finalizing both cannot matter.

But you’re arguing for consistency, and my position on that is, if we 
want consistency, I’d finalize the tracked request first, and the 
padding second.  This is also because tracking is done for 
serialization, so we should end it as soon as possible, so that 
concurrent requests are resumed quickly.  (Though I’m not sure if 
delaying it by a memcpy() matters for an essentially single-threaded 
block layer at this time.)

Hanna

next prev parent reply	other threads:[~2023-04-04 17:33 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-17 17:50 [PATCH 0/4] block: Split padded I/O vectors exceeding IOV_MAX Hanna Czenczek
2023-03-17 17:50 ` [PATCH 1/4] util/iov: Make qiov_slice() public Hanna Czenczek
2023-03-18 12:06   ` Eric Blake
2023-03-20 10:00   ` Vladimir Sementsov-Ogievskiy
2023-03-17 17:50 ` [PATCH 2/4] block: Split padded I/O vectors exceeding IOV_MAX Hanna Czenczek
2023-03-18 12:19   ` Eric Blake
2023-03-20 10:31   ` Vladimir Sementsov-Ogievskiy
2023-04-03 13:33     ` Hanna Czenczek
2023-04-04  8:10       ` Vladimir Sementsov-Ogievskiy
2023-04-04 17:32         ` Hanna Czenczek [this message]
2023-04-05  9:59           ` Vladimir Sementsov-Ogievskiy
2023-04-06 16:51             ` Hanna Czenczek
2023-04-06 21:35               ` Vladimir Sementsov-Ogievskiy
2023-03-17 17:50 ` [PATCH 3/4] util/iov: Remove qemu_iovec_init_extended() Hanna Czenczek
2023-03-18 12:22   ` Eric Blake
2023-03-20 11:59   ` Vladimir Sementsov-Ogievskiy
2023-03-17 17:50 ` [PATCH 4/4] iotests/iov-padding: New test Hanna Czenczek
2023-03-18 12:39   ` Eric Blake
2023-04-03 14:23     ` Hanna Czenczek
2023-03-20 12:12   ` Vladimir Sementsov-Ogievskiy
2023-04-05 19:44 ` [PATCH 0/4] block: Split padded I/O vectors exceeding IOV_MAX Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=251b1d36-7fe0-498d-f257-b1a0d256779f@redhat.com \
    --to=hreitz@redhat.com \
    --cc=eblake@redhat.com \
    --cc=fam@euphon.net \
    --cc=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@yandex-team.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).