From mboxrd@z Thu Jan  1 00:00:00 1970
Received: from eggs.gnu.org ([2001:4830:134:3::10]:60626)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1c3w1h-0006BY-3i
	for qemu-devel@nongnu.org; Mon, 07 Nov 2016 21:20:42 -0500
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <jsnow@redhat.com>) id 1c3w1g-0003uh-1Z
	for qemu-devel@nongnu.org; Mon, 07 Nov 2016 21:20:41 -0500
References: <1478109056-25198-1-git-send-email-jsnow@redhat.com>
	<1478109056-25198-5-git-send-email-jsnow@redhat.com>
	<20161103121743.GB5352@noname.redhat.com>
	<f5cd95d3-33a5-2451-1bf5-f9086cdb9164@redhat.com>
	<20161108020528.GC8956@localhost.localdomain>
From: John Snow <jsnow@redhat.com>
Message-ID: <33ccab69-7bda-a082-7758-79ff40290804@redhat.com>
Date: Mon, 7 Nov 2016 21:20:32 -0500
MIME-Version: 1.0
In-Reply-To: <20161108020528.GC8956@localhost.localdomain>
Content-Type: text/plain; charset=windows-1252; format=flowed
Content-Transfer-Encoding: 7bit
Subject: Re: [Qemu-devel] [PATCH v3 4/6] blockjob: add block_job_start
List-Id: <qemu-devel.nongnu.org>
List-Unsubscribe: <https://lists.nongnu.org/mailman/options/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=unsubscribe>
List-Archive: <http://lists.nongnu.org/archive/html/qemu-devel/>
List-Post: <mailto:qemu-devel@nongnu.org>
List-Help: <mailto:qemu-devel-request@nongnu.org?subject=help>
List-Subscribe: <https://lists.nongnu.org/mailman/listinfo/qemu-devel>,
	<mailto:qemu-devel-request@nongnu.org?subject=subscribe>
To: Jeff Cody <jcody@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, vsementsov@virtuozzo.com, qemu-block@nongnu.org, qemu-devel@nongnu.org, stefanha@redhat.com, pbonzini@redhat.com


On 11/07/2016 09:05 PM, Jeff Cody wrote:
> On Mon, Nov 07, 2016 at 09:02:14PM -0500, John Snow wrote:
>>
>>
>> On 11/03/2016 08:17 AM, Kevin Wolf wrote:
>>> Am 02.11.2016 um 18:50 hat John Snow geschrieben:
>>>> Instead of automatically starting jobs at creation time via backup_start
>>>> et al, we'd like to return a job object pointer that can be started
>>>> manually at later point in time.
>>>>
>>>> For now, add the block_job_start mechanism and start the jobs
>>>> automatically as we have been doing, with conversions job-by-job coming
>>>> in later patches.
>>>>
>>>> Of note: cancellation of unstarted jobs will perform all the normal
>>>> cleanup as if the job had started, particularly abort and clean. The
>>>> only difference is that we will not emit any events, because the job
>>>> never actually started.
>>>>
>>>> Signed-off-by: John Snow <jsnow@redhat.com>
>>>
>>>> diff --git a/block/commit.c b/block/commit.c
>>>> index 20d27e2..5b7c454 100644
>>>> --- a/block/commit.c
>>>> +++ b/block/commit.c
>>>> @@ -289,10 +289,9 @@ void commit_start(const char *job_id, BlockDriverState *bs,
>>>>     s->backing_file_str = g_strdup(backing_file_str);
>>>>
>>>>     s->on_error = on_error;
>>>> -    s->common.co = qemu_coroutine_create(s->common.driver->start, s);
>>>>
>>>>     trace_commit_start(bs, base, top, s, s->common.co);
>>>
>>> s->common.co is now uninitialised and should probably be removed from
>>> the tracepoint arguments. The same is true for mirror and stream.
>>>
>>>> -    qemu_coroutine_enter(s->common.co);
>>>> +    block_job_start(&s->common);
>>>> }
>>>
>>>> diff --git a/blockjob.c b/blockjob.c
>>>> index e3c458c..16c5159 100644
>>>> --- a/blockjob.c
>>>> +++ b/blockjob.c
>>>> @@ -174,7 +174,8 @@ void *block_job_create(const char *job_id, const BlockJobDriver *driver,
>>>>     job->blk           = blk;
>>>>     job->cb            = cb;
>>>>     job->opaque        = opaque;
>>>> -    job->busy          = true;
>>>> +    job->busy          = false;
>>>> +    job->paused        = true;
>>>>     job->refcnt        = 1;
>>>>     bs->job = job;
>>>>
>>>> @@ -202,6 +203,21 @@ bool block_job_is_internal(BlockJob *job)
>>>>     return (job->id == NULL);
>>>> }
>>>>
>>>> +static bool block_job_started(BlockJob *job)
>>>> +{
>>>> +    return job->co;
>>>> +}
>>>> +
>>>> +void block_job_start(BlockJob *job)
>>>> +{
>>>> +    assert(job && !block_job_started(job) && job->paused &&
>>>> +           !job->busy && job->driver->start);
>>>> +    job->paused = false;
>>>> +    job->busy = true;
>>>> +    job->co = qemu_coroutine_create(job->driver->start, job);
>>>> +    qemu_coroutine_enter(job->co);
>>>> +}
>>>
>>> We allow the user to pause a job while it's not started yet. You
>>> classified this as "harmless". But if we accept this, can we really
>>> unconditionally enter the coroutine even if the job has been paused?
>>> Can't a user expect that a job remains in paused state when they
>>> explicitly requested a pause and the job was already internally paused,
>>> like in this case by block_job_create()?
>>>
>>
>> What will end up happening is that we'll enter the job, and then it'll pause
>> immediately upon entrance. Is that a problem?
>>
>> If the jobs themselves are not checking their pause state fastidiously, it
>> could be (but block/backup does -- after it creates a write notifier.)
>>
>> Do we want a stronger guarantee here?
>>
>> Naively I think it's OK as-is, but I could add a stronger boolean in that
>> lets us know if it's okay to start or not, and we could delay the actual
>> creation and start until the 'resume' comes in if you'd like.
>>
>> I'd like to avoid the complexity if we can help it, but perhaps I'm not
>> thinking carefully enough about the existing edge cases.
>>
>
> Is there any reason we can't just use job->pause_count here?  When the job
> is created, set job->paused = true, and job->pause_count = 1.  In the
> block_job_start(), check the pause_count prior to qemu_coroutine_enter():
>
>     void block_job_start(BlockJob *job)
>     {
>         assert(job && !block_job_started(job) && job->paused &&
>               !job->busy && job->driver->start);
>         job->co = qemu_coroutine_create(job->driver->start, job);
>         job->paused = --job->pause_count > 0;
>         if (!job->paused) {
>             job->busy = true;
>             qemu_coroutine_enter(job->co);
>         }
>     }
>

Solid point. Let's do it this way.
Thanks!

>
>>> The same probably also applies to the internal job pausing during
>>> bdrv_drain_all_begin/end, though as you know there is a larger problem
>>> with starting jobs under drain_all anyway. For now, we just need to keep
>>> in mind that we can neither create nor start a job in such sections.
>>>
>>
>> Yeah, there are deeper problems there. As long as the existing critical
>> sections don't allow us to create jobs (started or not) I think we're
>> probably already OK.
>>
>>> Kevin
>>>