qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: John Snow <jsnow@redhat.com>
To: Stefan Hajnoczi <stefanha@gmail.com>,
	Paolo Bonzini <pbonzini@redhat.com>,
	Kevin Wolf <kwolf@redhat.com>
Cc: Max Reitz <mreitz@redhat.com>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	"qemu-devel@nongnu.org" <qemu-devel@nongnu.org>,
	"qemu-block@nongnu.org" <qemu-block@nongnu.org>,
	"qemu-stable@nongnu.org" <qemu-stable@nongnu.org>
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH] block/backup: install notifier during creation
Date: Wed, 18 Sep 2019 16:31:02 -0400	[thread overview]
Message-ID: <0abc4992-9322-010a-118b-62e79cbc5b58@redhat.com> (raw)
In-Reply-To: <9bf835d7-8bfa-feba-c2f7-acd6cda4a81e@redhat.com>



On 9/10/19 9:23 AM, John Snow wrote:
> 
> 
> On 9/10/19 4:19 AM, Stefan Hajnoczi wrote:
>> On Wed, Aug 21, 2019 at 04:01:52PM -0400, John Snow wrote:
>>>
>>>
>>> On 8/21/19 10:41 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>> 09.08.2019 23:13, John Snow wrote:
>>>>> Backup jobs may yield prior to installing their handler, because of the
>>>>> job_co_entry shim which guarantees that a job won't begin work until
>>>>> we are ready to start an entire transaction.
>>>>>
>>>>> Unfortunately, this makes proving correctness about transactional
>>>>> points-in-time for backup hard to reason about. Make it explicitly clear
>>>>> by moving the handler registration to creation time, and changing the
>>>>> write notifier to a no-op until the job is started.
>>>>>
>>>>> Reported-by: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
>>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>>> ---
>>>>>   block/backup.c     | 32 +++++++++++++++++++++++---------
>>>>>   include/qemu/job.h |  5 +++++
>>>>>   job.c              |  2 +-
>>>>>   3 files changed, 29 insertions(+), 10 deletions(-)
>>>>>
>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>> index 07d751aea4..4df5b95415 100644
>>>>> --- a/block/backup.c
>>>>> +++ b/block/backup.c
>>>>> @@ -344,6 +344,13 @@ static int coroutine_fn backup_before_write_notify(
>>>>>       assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
>>>>>       assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));
>>>>>   
>>>>> +    /* The handler is installed at creation time; the actual point-in-time
>>>>> +     * starts at job_start(). Transactions guarantee those two points are
>>>>> +     * the same point in time. */
>>>>> +    if (!job_started(&job->common.job)) {
>>>>> +        return 0;
>>>>> +    }
>>>>
>>>> Hmm, sorry if it is a stupid question, I'm not good in multiprocessing and in
>>>> Qemu iothreads..
>>>>
>>>> job_started just reads job->co. If bs runs in iothread, and therefore write-notifier
>>>> is in iothread, when job_start is called from main thread.. Is it guaranteed that
>>>> write-notifier will see job->co variable change early enough to not miss guest write?
>>>> Should not job->co be volatile for example or something like this?
>>>>
>>>> If not think about this patch looks good for me.
>>>>
>>>
>>> You know, it's a really good question.
>>> So good, in fact, that I have no idea.
>>>
>>> ¯\_(ツ)_/¯
>>>
>>> I'm fairly certain that IO will not come in until the .clean phase of a
>>> qmp_transaction, because bdrv_drained_begin(bs) is called during
>>> .prepare, and we activate the handler (by starting the job) in .commit.
>>> We do not end the drained section until .clean.
>>>
>>> I'm not fully clear on what threading guarantees we have otherwise,
>>> though; is it possible that "Thread A" would somehow lift the bdrv_drain
>>> on an IO thread ("Thread B") and, after that, "Thread B" would somehow
>>> still be able to see an outdated version of job->co that was set by
>>> "Thread A"?
>>>
>>> I doubt it; but I can't prove it.
>>
>> In the qmp_backup() case (not qmp_transaction()) there is:
>>
>>   void qmp_drive_backup(DriveBackup *arg, Error **errp)
>>   {
>>
>>       BlockJob *job;
>>       job = do_drive_backup(arg, NULL, errp);
>>       if (job) {
>>           job_start(&job->job);
>>       }
>>   }
>>
>> job_start() is called without any thread synchronization, which is
>> usually fine because the coroutine doesn't run until job_start() calls
>> aio_co_enter().
>>
>> Now that the before write notifier has been installed early, there is
>> indeed a race between job_start() and the write notifier accessing
>> job->co from an IOThread.
>>
>> The write before notifier might see job->co != NULL before job_start()
>> has finished.  This could lead to issues if job_*() APIs are invoked by
>> the write notifier and access an in-between job state.
>>
> 
> I see. I think in this case, as long as it sees != NULL, that the
> notifier is actually safe to run. I agree that this might be confusing
> to verify and could bite us in the future. The worry we had, too, is
> more the opposite: will it see NULL for too long? We want to make sure
> that it is registering as true *before the first yield*.
> 
>> A safer approach is to set a BackupBlockJob variable at the beginning of
>> backup_run() and check it from the before write notifier.
>>
> 
> That's too late, for reasons below.
> 
>> That said, I don't understand the benefit of this patch and IMO it makes
>> the code harder to understand because now we need to think about the
>> created but not started state too.
>>
>> Stefan
>>
> 
> It's always possible I've hyped myself up into believing there's a
> problem where there isn't one, but the fear is this:
> 
> The point in time from a QMP transaction covers the job creation and the
> job start, but when we start the job it will actually yield before we
> get to backup_run -- and there is no guarantee that the handler will get
> installed synchronously, so the point in time ends before the handler
> activates.
> 

i.e., the handler might get installed AFTER the critical region of a
transaction. We could drop initial writes if we were unlucky.

(I think.)

> The yield occurs in job_co_entry as an intentional feature of forcing a
> yield and pause point at run time -- so it's harder to write a job that
> accidentally hogs the thread during initialization.
> 
> This is an attempt to get the handler installed earlier to ensure the
> point of time stays synchronized with creation time to provide a
> stronger transactional guarantee.
> 

Squeaky wheel gets the grease. Any comment?


  reply	other threads:[~2019-09-18 20:33 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-08-09 20:13 [Qemu-devel] [PATCH] block/backup: install notifier during creation John Snow
2019-08-20 22:42 ` John Snow
2019-08-21 14:41 ` Vladimir Sementsov-Ogievskiy
2019-08-21 20:01   ` John Snow
2019-09-10  8:19     ` [Qemu-devel] [Qemu-block] " Stefan Hajnoczi
2019-09-10 13:23       ` John Snow
2019-09-18 20:31         ` John Snow [this message]
2019-09-19  7:11           ` Vladimir Sementsov-Ogievskiy
2019-09-19 19:11             ` [Qemu-block] [Qemu-devel] " John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=0abc4992-9322-010a-118b-62e79cbc5b58@redhat.com \
    --to=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=qemu-stable@nongnu.org \
    --cc=stefanha@gmail.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).