qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Emanuele Giuseppe Esposito <eesposit@redhat.com>
To: Stefan Hajnoczi <stefanha@redhat.com>
Cc: Kevin Wolf <kwolf@redhat.com>, Fam Zheng <fam@euphon.net>,
	Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>,
	qemu-block@nongnu.org, Wen Congyang <wencongyang2@huawei.com>,
	Xie Changlong <xiechanglong.d@gmail.com>,
	Markus Armbruster <armbru@redhat.com>,
	qemu-devel@nongnu.org, Hanna Reitz <hreitz@redhat.com>,
	Paolo Bonzini <pbonzini@redhat.com>, John Snow <jsnow@redhat.com>
Subject: Re: [PATCH v5 08/20] jobs: protect jobs with job_lock/unlock
Date: Thu, 24 Feb 2022 17:55:03 +0100	[thread overview]
Message-ID: <d7e80df1-20d8-9d75-a972-43a72c12faa0@redhat.com> (raw)
In-Reply-To: <Yhe2zihPZARQbEDJ@stefanha-x1.localdomain>



On 24/02/2022 17:48, Stefan Hajnoczi wrote:
> On Thu, Feb 24, 2022 at 01:45:48PM +0100, Emanuele Giuseppe Esposito wrote:
>>
>>
>> On 17/02/2022 15:48, Stefan Hajnoczi wrote:
>>> On Tue, Feb 08, 2022 at 09:35:01AM -0500, Emanuele Giuseppe Esposito wrote:
>>>> diff --git a/block/replication.c b/block/replication.c
>>>> index 55c8f894aa..a03b28726e 100644
>>>> --- a/block/replication.c
>>>> +++ b/block/replication.c
>>>> @@ -149,7 +149,9 @@ static void replication_close(BlockDriverState *bs)
>>>>      if (s->stage == BLOCK_REPLICATION_FAILOVER) {
>>>>          commit_job = &s->commit_job->job;
>>>>          assert(commit_job->aio_context == qemu_get_current_aio_context());
>>>
>>> Is it safe to access commit_job->aio_context outside job_mutex?
>>
>> No, but it is currently not done. Patch 18 takes care of protecting
>> aio_context. Remember again that job lock API is still nop.
>>>
>>>> @@ -1838,7 +1840,9 @@ static void drive_backup_abort(BlkActionState *common)
>>>>          aio_context = bdrv_get_aio_context(state->bs);
>>>>          aio_context_acquire(aio_context);
>>>>  
>>>> -        job_cancel_sync(&state->job->job, true);
>>>> +        WITH_JOB_LOCK_GUARD() {
>>>> +            job_cancel_sync(&state->job->job, true);
>>>> +        }
>>>
>>> Maybe job_cancel_sync() should take the lock internally since all
>>> callers in this patch seem to need the lock?
>>
>> The _locked version is useful because it is used when lock guards are
>> already present, and cover multiple operations. There are only 3 places
>> where a lock guard is added to cover job_cance_sync_locked. Is it worth
>> defining another additional function?
>>
>>
>>>
>>> I noticed this patch does not add WITH_JOB_LOCK_GUARD() to
>>> tests/unit/test-blockjob.c:cancel_common(). Was that an oversight or is
>>> there a reason why job_mutex is not needed around the job_cancel_sync()
>>> call there?
>>
>> No, locks in unit tests are added in patch 10 "jobs: protect jobs with
>> job_lock/unlock".
> 
> I see, it's a question of how to split up the patches. When patches
> leave the code in a state with broken invariants it becomes difficult to
> review. I can't distinguish between actual bugs and temporary violations
> that will be fixed in a later patch (unless they are clearly marked).
> 
> If you can structure patches so they are self-contained and don't leave
> the broken invariants then that would make review easier, but in this
> case it is tricky so I'll do the best I can to review it if you cannot
> restructure the sequence of commits.

Yes, the main problem is that ideally we want to add job lock and remove
Aiocontext lock. But together this can't happen, and just adding proper
locks will create a ton of deadlocks, because in order to maintain
invariants sometimes job lock is inside aiocontext lock, and some other
times the opposite happens.

The way it is done in this serie is:
1) create job_lock/unlock as nop
2) make sure the nop job_lock is protecting the Job fields
3) do all API renaming, accoring with the locking used above
4) enable job_lock (not nop anymore) and at the same time remove the
AioContext

If you want, a more up-to-date branch with your feedbacks applied so far
is here:
https://gitlab.com/eesposit/qemu/-/commits/dp_jobs_reviewed

> 
>>>
>>>> @@ -252,7 +258,13 @@ int block_job_add_bdrv(BlockJob *job, const char *name, BlockDriverState *bs,
>>>>  
>>>>  static void block_job_on_idle(Notifier *n, void *opaque)
>>>>  {
>>>> +    /*
>>>> +     * we can't kick with job_mutex held, but we also want
>>>> +     * to protect the notifier list.
>>>> +     */
>>>> +    job_unlock();
>>>>      aio_wait_kick();
>>>> +    job_lock();
>>>
>>> I don't understand this. aio_wait_kick() looks safe to call with a mutex
>>> held?
>> You are right. It should be safe.
>>
>>>
>>>> @@ -292,7 +304,9 @@ bool block_job_set_speed(BlockJob *job, int64_t speed, Error **errp)
>>>>      job->speed = speed;
>>>>  
>>>>      if (drv->set_speed) {
>>>> +        job_unlock();
>>>>          drv->set_speed(job, speed);
>>>> +        job_lock();
>>>
>>> What guarantees that job stays alive during drv->set_speed(job)? We
>>> don't hold a ref here. Maybe the assumption is that
>>> block_job_set_speed() only gets called from the main loop thread and
>>> nothing else will modify the jobs list while we're in drv->set_speed()?
>>
>> What guaranteed this before? I am not sure.
> 
> I guess the reason is the one I suggested. It should be documented in
> the comments.
> 
>>
>>>
>>>> @@ -545,10 +566,15 @@ BlockErrorAction block_job_error_action(BlockJob *job, BlockdevOnError on_err,
>>>>                                          action);
>>>>      }
>>>>      if (action == BLOCK_ERROR_ACTION_STOP) {
>>>> -        if (!job->job.user_paused) {
>>>> -            job_pause(&job->job);
>>>> -            /* make the pause user visible, which will be resumed from QMP. */
>>>> -            job->job.user_paused = true;
>>>> +        WITH_JOB_LOCK_GUARD() {
>>>> +            if (!job->job.user_paused) {
>>>> +                job_pause(&job->job);
>>>> +                /*
>>>> +                 * make the pause user visible, which will be
>>>> +                 * resumed from QMP.
>>>> +                 */
>>>> +                job->job.user_paused = true;
>>>> +            }
>>>>          }
>>>>          block_job_iostatus_set_err(job, error);
>>>
>>> Does this need the lock? If not, why is block_job_iostatus_reset()
>>> called with the hold?
>>>
>> block_job_iostatus_set_err does not touch any Job fields. On the other
>> hand block_job_iostatus_reset reads job.user_paused and job.pause_count.
> 
> BlockJob->iostatus requires no locking?
> 



  reply	other threads:[~2022-02-24 16:56 UTC|newest]

Thread overview: 49+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-08 14:34 [PATCH v5 00/20] job: replace AioContext lock with job_mutex Emanuele Giuseppe Esposito
2022-02-08 14:34 ` [PATCH v5 01/20] job.c: make job_mutex and job_lock/unlock() public Emanuele Giuseppe Esposito
2022-02-10 15:38   ` Stefan Hajnoczi
2022-02-08 14:34 ` [PATCH v5 02/20] job.h: categorize fields in struct Job Emanuele Giuseppe Esposito
2022-02-10 15:40   ` Stefan Hajnoczi
2022-02-10 16:26     ` Emanuele Giuseppe Esposito
2022-02-10 17:35       ` Stefan Hajnoczi
2022-02-11  9:00         ` Emanuele Giuseppe Esposito
2022-02-08 14:34 ` [PATCH v5 03/20] job.c: API functions not used outside should be static Emanuele Giuseppe Esposito
2022-02-10 15:43   ` Stefan Hajnoczi
2022-02-08 14:34 ` [PATCH v5 04/20] job.c: move inner aiocontext lock in callbacks Emanuele Giuseppe Esposito
2022-02-17 13:45   ` Stefan Hajnoczi
2022-02-24  9:20     ` Emanuele Giuseppe Esposito
2022-02-24  9:29     ` Emanuele Giuseppe Esposito
2022-02-08 14:34 ` [PATCH v5 05/20] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Emanuele Giuseppe Esposito
2022-02-17 13:56   ` Stefan Hajnoczi
2022-02-08 14:34 ` [PATCH v5 06/20] jobs: remove aiocontext locks since the functions are under BQL Emanuele Giuseppe Esposito
2022-02-17 14:12   ` Stefan Hajnoczi
2022-02-24  9:27     ` Emanuele Giuseppe Esposito
2022-02-08 14:35 ` [PATCH v5 07/20] job.h: add _locked duplicates for job API functions called with and without job_mutex Emanuele Giuseppe Esposito
2022-02-17 14:20   ` Stefan Hajnoczi
2022-02-24  9:52     ` Emanuele Giuseppe Esposito
2022-02-08 14:35 ` [PATCH v5 08/20] jobs: protect jobs with job_lock/unlock Emanuele Giuseppe Esposito
2022-02-17 14:48   ` Stefan Hajnoczi
2022-02-24 12:45     ` Emanuele Giuseppe Esposito
2022-02-24 16:48       ` Stefan Hajnoczi
2022-02-24 16:55         ` Emanuele Giuseppe Esposito [this message]
2022-02-08 14:35 ` [PATCH v5 09/20] jobs: add job lock in find_* functions Emanuele Giuseppe Esposito
2022-02-17 15:00   ` Stefan Hajnoczi
2022-02-24 12:36     ` Emanuele Giuseppe Esposito
2022-02-08 14:35 ` [PATCH v5 10/20] jobs: use job locks also in the unit tests Emanuele Giuseppe Esposito
2022-02-08 14:35 ` [PATCH v5 11/20] block/mirror.c: use of job helpers in drivers to avoid TOC/TOU Emanuele Giuseppe Esposito
2022-02-17 16:53   ` Stefan Hajnoczi
2022-02-08 14:35 ` [PATCH v5 12/20] jobs: rename static functions called with job_mutex held Emanuele Giuseppe Esposito
2022-02-08 14:35 ` [PATCH v5 13/20] job.h: rename job API " Emanuele Giuseppe Esposito
2022-02-08 14:35 ` [PATCH v5 14/20] block_job: rename block_job " Emanuele Giuseppe Esposito
2022-02-08 14:35 ` [PATCH v5 15/20] job.h: define unlocked functions Emanuele Giuseppe Esposito
2022-02-08 14:35 ` [PATCH v5 16/20] commit and mirror: create new nodes using bdrv_get_aio_context, and not the job aiocontext Emanuele Giuseppe Esposito
2022-03-08 11:13   ` Stefan Hajnoczi
2022-02-08 14:35 ` [PATCH v5 17/20] job: detect change of aiocontext within job coroutine Emanuele Giuseppe Esposito
2022-03-08 11:19   ` Stefan Hajnoczi
2022-02-08 14:35 ` [PATCH v5 18/20] jobs: protect job.aio_context with BQL and job_mutex Emanuele Giuseppe Esposito
2022-03-08 13:41   ` Stefan Hajnoczi
2022-03-10 10:09     ` Emanuele Giuseppe Esposito
2022-02-08 14:35 ` [PATCH v5 19/20] job.c: enable job lock/unlock and remove Aiocontext locks Emanuele Giuseppe Esposito
2022-03-08 14:04   ` Stefan Hajnoczi
2022-03-10 10:25     ` Emanuele Giuseppe Esposito
2022-02-08 14:35 ` [PATCH v5 20/20] block_job_query: remove atomic read Emanuele Giuseppe Esposito
2022-03-08 14:06   ` Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d7e80df1-20d8-9d75-a972-43a72c12faa0@redhat.com \
    --to=eesposit@redhat.com \
    --cc=armbru@redhat.com \
    --cc=fam@euphon.net \
    --cc=hreitz@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=pbonzini@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    --cc=wencongyang2@huawei.com \
    --cc=xiechanglong.d@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).