From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
qemu-block@nongnu.org
Cc: Kevin Wolf <kwolf@redhat.com>,
qemu-devel@nongnu.org, Max Reitz <mreitz@redhat.com>,
Stefan Hajnoczi <stefanha@redhat.com>,
Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions
Date: Thu, 10 Jun 2021 16:21:11 +0200 [thread overview]
Message-ID: <f3790bfb-b151-df6b-5c78-0e536a8ad310@redhat.com> (raw)
In-Reply-To: <ef104344-d857-c741-2893-315e9f0a77a5@virtuozzo.com>
On 10/06/2021 13:12, Vladimir Sementsov-Ogievskiy wrote:
> 10.06.2021 13:46, Emanuele Giuseppe Esposito wrote:
>>
>>
>> On 10/06/2021 12:27, Vladimir Sementsov-Ogievskiy wrote:
>>> 10.06.2021 13:14, Emanuele Giuseppe Esposito wrote:
>>>>
>>>>
>>>> On 09/06/2021 11:12, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 08.06.2021 10:33, Emanuele Giuseppe Esposito wrote:
>>>>>> As done in BlockCopyCallState, categorize BlockCopyTask
>>>>>> and BlockCopyState in IN, State and OUT fields.
>>>>>> This is just to understand which field has to be protected with a
>>>>>> lock.
>>>>>>
>>>>>> .sleep_state is handled in the series "coroutine: new sleep/wake API"
>>>>>> and thus here left as TODO.
>>>>>>
>>>>>> Signed-off-by: Emanuele Giuseppe Esposito <eesposit@redhat.com>
>>>>>> ---
>>>>>> block/block-copy.c | 47
>>>>>> ++++++++++++++++++++++++++++++----------------
>>>>>> 1 file changed, 31 insertions(+), 16 deletions(-)
>>>>>>
>>>>>> diff --git a/block/block-copy.c b/block/block-copy.c
>>>>>> index d58051288b..b3533a3003 100644
>>>>>> --- a/block/block-copy.c
>>>>>> +++ b/block/block-copy.c
>>>>>> @@ -56,25 +56,33 @@ typedef struct BlockCopyCallState {
>>>>>> QLIST_ENTRY(BlockCopyCallState) list;
>>>>>> /* State */
>>>>>
>>>>> Why previous @list field is not in the state? For sure it's not an
>>>>> IN parameter and should be protected somehow.
>>>>>
>>>>>> - int ret;
>>>>>> bool finished;
>>>>>> - QemuCoSleep sleep;
>>>>>> - bool cancelled;
>>>>>> + QemuCoSleep sleep; /* TODO: protect API with a lock */
>>>>>> /* OUT parameters */
>>>>>> + bool cancelled;
>>>>>> bool error_is_read;
>>>>>> + int ret;
>>>>>> } BlockCopyCallState;
>>>>>> typedef struct BlockCopyTask {
>>>>>> AioTask task;
>>>>>> + /*
>>>>>> + * IN parameters. Initialized in block_copy_task_create()
>>>>>> + * and never changed.
>>>>>> + */
>>>>>> BlockCopyState *s;
>>>>>> BlockCopyCallState *call_state;
>>>>>> int64_t offset;
>>>>>> - int64_t bytes;
>>>>>> - BlockCopyMethod method;
>>>>>> - QLIST_ENTRY(BlockCopyTask) list;
>>>>>> + int64_t bytes; /* only re-set in task_shrink, before running
>>>>>> the task */
>>>>>> + BlockCopyMethod method; /* initialized in
>>>>>> block_copy_dirty_clusters() */
>>>>>
>>>>> hmm. to be precise method is initialized in block_copy_task_create.
>>>>>
>>>>> And after block_copy_task_create finished, task is in the list and
>>>>> can be read by parallel block_copy_dirty_clusters(). So, @bytes is
>>>>> part of State, we must protect it..
>>>>
>>>> So if I understand correctly, you refer to the fact that a parallel
>>>> block_copy_dirty_clusters() can create another task and search with
>>>> find_conflicting_task_locked(), or in general also
>>>> block_copy_wait_one() can do the same in parallel, correct?
>>>
>>> yes
>>>
>>>>
>>>> Here there is also another problem: if we add the task to the list
>>>> and then shrink it in two different critical sections, we are going
>>>> to have problems because in the meanwhile find_conflicting_tasks can
>>>> be issued in parallel.
>>>
>>> But we shrink task only once, and we do it under mutex, so we are OK
>>> I think?
>>
>> I think you understood, but just in case: I am thinking the case where
>> we have:
>>
>> But maybe I am overcomplicating.
>>
>
> Both shrink and find_ are done under mutex, so they can't intersect. But
> yes, we should keep in mind that if we do find_ under mutex, and then
> release mutex, the information get from find_ may become incorrect.
>
> Check callers of find_conflicting_task_locked():
>
> block_copy_wait_one has one critical section.
>
> if no conflicting tasks we are OK.. Are we? Ok, look at the only caller
> of block_copy_wait_one() - block_copy_common().
>
> assume block_copy_dirty_clusters() returns 0, so there no dirty bits at
> some moment...
>
> than in parallel thread some task may finish with failure, leaving some
> new dirty bits.. Then we check that there no conflicting tasks.. And
> then we go out of the loop, when actually we must retry for these new
> dirty bits.
>
> So I'm afraid you are right, we are not threadsafe yet in
> block_copy_common(), as we should check conflicting tasks and dirty bits
> in same critical section to be consistent.
Wait, we are talking about two different problems:
- What I wanted to point out has to do with @bytes, not (as far as I
understand) with the dirty bits. From the example I made below, I assume
there are 3 separate non-overlapping critical sections:
>>> T1: block_copy_task_create()
>>> T2: find_conflicting_tasks() <-- sees the initial task
>>> T1: task_shrink() <-- bytes are updated, T2 saw the wrong amount of
>>> bytes. This might or might not have consequences, I am not sure.
T1 creates the task, T2 iterates to search for conflicting tasks (called
from a parallel block_copy_wait_one), T1 shrinks the current task. I
think that T2 in this case misses the updated task, even though the
worst it can happen is that the task is smaller, so a false positive (a
task is not conflicting but might be marked as conflicting).
The outcome is that T2 is waiting for a task it shouldn't, but there is
no error there.
- Your point is about a task failing between block_copy_dirty_clusters
and block_copy_wait_one. The task failing calls block_copy_task_end and
sets the dirty bitmap, but at that point block_copy_wait_one won't check
it anymore and the bitmap is left dirty. I think the default behavior
here should be that block_copy_dirty_clusters() is called and a new task
is created. This, as you pointed out, is a proper error.
In this case, we need to stop iterating only when 1) the whole bitmap is
clear, and 2) no conflicting task is present.
Therefore a possible solution can be the one below:
int stop_looping = 0;
...
do {
// create all the tasks, clears the bitmap but
// adds tasks to the task list
block_copy_dirty_clusters();
/* here a task can fail, but then the dirty map will be set */
lock();
// make sure no task is running for this operation
stop_looping = (find_conflicting_task() == NULL);
// make sure that the dirty bitmap is clear
stop_looping |= (!bdrv_dirty_bitmap_next_dirty_area()) << 1;
unlock();
/* if stop_looping is == 0, no task can fail */
/* if a task fails here, the if below won't see it but it will
* block_copy_dirty_clusters in the next iteration */
if (stop_looping & 1) {
// there is some conflicting task, wait for it
qemu_co_queue_wait(&task->wait_queue);
}
} while(stop_looping != 0);
...
What do you think?
Emanuele
>
>>
>>>
>>>>
>>>> So, is there a reason why we don't want
>>>> QLIST_INSERT_HEAD(&s->tasks, task, list);
>>>> in block_copy_dirty_clusters()?
>>>>
>>>> By doing that, I think we also spare @bytes from the critical
>>>> section, since it is only read from that point onwards.
>>>
>>> This way find_conflicting_tasks will just skip our new creating
>>> task.. And we'll get conflict when try to add our new task. No, we
>>> should add task to the list at same critical section where we clear
>>> dirty bits from the bitmap.
>>
>>
>> I agree, with the above.
>> So to me the most correct solution would be to call create and shrink
>> in the same lock, but this creates a much wider critical section.
>>
>> Alternatively, I can leave it as it is and just update the comment.
>>
>>>
>>> Then we shrink task in another critical section, it should be OK too.
>>>
>>>>
>>>> I am also trying to see if I can group some critical sections.
>>>>
>>>> Btw I think we already talked about @bytes and it's not the first
>>>> time we switch it from IN to STATE and vice-versa...
>>>> I mean, I agree with you but it starts to be confusing.
>>>
>>> On last review it seemed to me that you actually protect bytes by
>>> critical section where it is needed. So here I'm saying only about
>>> the comment..
>>>
>>>>
>>>>
>>>> This also goes against your comment later in patch 4,
>>>>>> @@ -212,7 +222,7 @@ static BlockCopyTask
>>>>>> *block_copy_task_create(BlockCopyState *s,
>>>>>> bytes = QEMU_ALIGN_UP(bytes, s->cluster_size);
>>>>>> /* region is dirty, so no existent tasks possible in it */
>>>>>> - assert(!find_conflicting_task(s, offset, bytes));
>>>>>> + assert(!find_conflicting_task_locked(s, offset, bytes));
>>>>>> bdrv_reset_dirty_bitmap(s->copy_bitmap, offset, bytes);
>>>>>> s->in_flight_bytes += bytes;
>>>>>> @@ -248,16 +258,19 @@ static void coroutine_fn
>>>>>> block_copy_task_shrink(BlockCopyTask *task,
>>>>>
>>>>> The function reads task->bytes not under mutex.. It's safe, as only
>>>>> that function is modifying the field, and it's called once. Still,
>>>>> let's make critical section a little bit wider, just for
>>>>> simplicity. I mean, simple QEMU_LOCK_GUARD() at start of function.
>>>>
>>>> Where if I understand correctly, it is not safe, because
>>>> find_conflicting_tasks might search the non-updated task.
>>>>
>>>
>>> find_conflicting_tasks only reads bytes, so it can't make damage..
>>> Anyway making critical sections a bit wider won't hurt.
>>>
>>>
>>
>
>
next prev parent reply other threads:[~2021-06-10 14:22 UTC|newest]
Thread overview: 19+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-06-08 7:33 [PATCH v3 0/5] block-copy: protect block-copy internal structures Emanuele Giuseppe Esposito
2021-06-08 7:33 ` [PATCH v3 1/5] block-copy: streamline choice of copy_range vs. read/write Emanuele Giuseppe Esposito
2021-06-09 8:51 ` Vladimir Sementsov-Ogievskiy
2021-06-09 9:33 ` Paolo Bonzini
2021-06-09 10:09 ` Vladimir Sementsov-Ogievskiy
2021-06-09 10:54 ` Vladimir Sementsov-Ogievskiy
2021-06-08 7:33 ` [PATCH v3 2/5] block-copy: improve comments of BlockCopyTask and BlockCopyState types and functions Emanuele Giuseppe Esposito
2021-06-09 9:12 ` Vladimir Sementsov-Ogievskiy
2021-06-10 10:14 ` Emanuele Giuseppe Esposito
2021-06-10 10:27 ` Vladimir Sementsov-Ogievskiy
2021-06-10 10:46 ` Emanuele Giuseppe Esposito
2021-06-10 11:12 ` Vladimir Sementsov-Ogievskiy
2021-06-10 14:21 ` Emanuele Giuseppe Esposito [this message]
2021-06-10 15:05 ` Vladimir Sementsov-Ogievskiy
2021-06-08 7:33 ` [PATCH v3 3/5] block-copy: move progress_set_remaining in block_copy_task_end Emanuele Giuseppe Esposito
2021-06-08 7:33 ` [PATCH v3 4/5] block-copy: add a CoMutex Emanuele Giuseppe Esposito
2021-06-09 12:25 ` Vladimir Sementsov-Ogievskiy
2021-06-10 14:49 ` Emanuele Giuseppe Esposito
2021-06-08 7:33 ` [PATCH v3 5/5] block-copy: atomic .cancelled and .finished fields in BlockCopyCallState Emanuele Giuseppe Esposito
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=f3790bfb-b151-df6b-5c78-0e536a8ad310@redhat.com \
--to=eesposit@redhat.com \
--cc=jsnow@redhat.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=pbonzini@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
--cc=stefanha@redhat.com \
--cc=vsementsov@virtuozzo.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).