All of lore.kernel.org
 help / color / mirror / Atom feed
From: Kevin Wolf <kwolf@redhat.com>
To: qemu-block@nongnu.org
Cc: kwolf@redhat.com, stefanha@redhat.com, qemu-devel@nongnu.org
Subject: [PULL 40/50] job: detect change of aiocontext within job coroutine
Date: Fri,  7 Oct 2022 12:47:42 +0200	[thread overview]
Message-ID: <20221007104752.141361-41-kwolf@redhat.com> (raw)
In-Reply-To: <20221007104752.141361-1-kwolf@redhat.com>

From: Paolo Bonzini <pbonzini@redhat.com>

We want to make sure access of job->aio_context is always done
under either BQL or job_mutex. The problem is that using
aio_co_enter(job->aiocontext, job->co) in job_start and job_enter_cond
makes the coroutine immediately resume, so we can't hold the job lock.
And caching it is not safe either, as it might change.

job_start is under BQL, so it can freely read job->aiocontext, but
job_enter_cond is not.
We want to avoid reading job->aio_context in job_enter_cond, therefore:
1) use aio_co_wake(), since it doesn't want an aiocontext as argument
   but uses job->co->ctx
2) detect possible discrepancy between job->co->ctx and job->aio_context
   by checking right after the coroutine resumes back from yielding if
   job->aio_context has changed. If so, reschedule the coroutine to the
   new context.

Calling bdrv_try_set_aio_context() will issue the following calls
(simplified):
* in terms of  bdrv callbacks:
  .drained_begin -> .set_aio_context -> .drained_end
* in terms of child_job functions:
  child_job_drained_begin -> child_job_set_aio_context -> child_job_drained_end
* in terms of job functions:
  job_pause_locked -> job_set_aio_context -> job_resume_locked

We can see that after setting the new aio_context, job_resume_locked
calls again job_enter_cond, which then invokes aio_co_wake(). But
while job->aiocontext has been set in job_set_aio_context,
job->co->ctx has not changed, so the coroutine would be entering in
the wrong aiocontext.

Using aio_co_schedule in job_resume_locked() might seem as a valid
alternative, but the problem is that the bh resuming the coroutine
is not scheduled immediately, and if in the meanwhile another
bdrv_try_set_aio_context() is run (see test_propagate_mirror() in
test-block-iothread.c), we would have the first schedule in the
wrong aiocontext, and the second set of drains won't even manage
to schedule the coroutine, as job->busy would still be true from
the previous job_resume_locked().

The solution is to stick with aio_co_wake() and detect every time
the coroutine resumes back from yielding if job->aio_context
has changed. If so, we can reschedule it to the new context.

Check for the aiocontext change in job_do_yield_locked because:
1) aio_co_reschedule_self requires to be in the running coroutine
2) since child_job_set_aio_context allows changing the aiocontext only
   while the job is paused, this is the exact place where the coroutine
   resumes, before running JobDriver's code.

Reviewed-by: Vladimir Sementsov-Ogievskiy <vsementsov@yandex-team.ru>
Reviewed-by: Stefan Hajnoczi <stefanha@redhat.com>
Signed-off-by: Paolo Bonzini <pbonzini@redhat.com>
Message-Id: <20220926093214.506243-13-eesposit@redhat.com>
Reviewed-by: Kevin Wolf <kwolf@redhat.com>
Signed-off-by: Kevin Wolf <kwolf@redhat.com>
---
 job.c | 19 +++++++++++++++++--
 1 file changed, 17 insertions(+), 2 deletions(-)

diff --git a/job.c b/job.c
index 926e385ac2..3ef5028751 100644
--- a/job.c
+++ b/job.c
@@ -588,7 +588,7 @@ void job_enter_cond_locked(Job *job, bool(*fn)(Job *job))
     job->busy = true;
     real_job_unlock();
     job_unlock();
-    aio_co_enter(job->aio_context, job->co);
+    aio_co_wake(job->co);
     job_lock();
 }
 
@@ -615,6 +615,8 @@ void job_enter(Job *job)
  */
 static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
 {
+    AioContext *next_aio_context;
+
     real_job_lock();
     if (ns != -1) {
         timer_mod(&job->sleep_timer, ns);
@@ -626,7 +628,20 @@ static void coroutine_fn job_do_yield_locked(Job *job, uint64_t ns)
     qemu_coroutine_yield();
     job_lock();
 
-    /* Set by job_enter_cond() before re-entering the coroutine.  */
+    next_aio_context = job->aio_context;
+    /*
+     * Coroutine has resumed, but in the meanwhile the job AioContext
+     * might have changed via bdrv_try_set_aio_context(), so we need to move
+     * the coroutine too in the new aiocontext.
+     */
+    while (qemu_get_current_aio_context() != next_aio_context) {
+        job_unlock();
+        aio_co_reschedule_self(next_aio_context);
+        job_lock();
+        next_aio_context = job->aio_context;
+    }
+
+    /* Set by job_enter_cond_locked() before re-entering the coroutine.  */
     assert(job->busy);
 }
 
-- 
2.37.3



  parent reply	other threads:[~2022-10-07 12:55 UTC|newest]

Thread overview: 52+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-07 10:47 [PULL 00/50] Block layer patches Kevin Wolf
2022-10-07 10:47 ` [PULL 01/50] Revert "qapi: fix examples of blockdev-add with qcow2" Kevin Wolf
2022-10-07 10:47 ` [PULL 02/50] coroutine: Drop coroutine_fn annotation from qemu_coroutine_self() Kevin Wolf
2022-10-07 10:47 ` [PULL 03/50] block/nvme: separate nvme_get_free_req cases for coroutine/non-coroutine context Kevin Wolf
2022-10-07 10:47 ` [PULL 04/50] block: add missing coroutine_fn annotations Kevin Wolf
2022-10-07 10:47 ` [PULL 05/50] qcow2: remove incorrect " Kevin Wolf
2022-10-07 10:47 ` [PULL 06/50] nbd: " Kevin Wolf
2022-10-07 10:47 ` [PULL 07/50] coroutine: " Kevin Wolf
2022-10-07 10:47 ` [PULL 08/50] blkverify: add missing " Kevin Wolf
2022-10-07 10:47 ` [PULL 09/50] file-posix: " Kevin Wolf
2022-10-07 10:47 ` [PULL 10/50] iscsi: " Kevin Wolf
2022-10-07 10:47 ` [PULL 11/50] nbd: " Kevin Wolf
2022-10-07 10:47 ` [PULL 12/50] nfs: " Kevin Wolf
2022-10-07 10:47 ` [PULL 13/50] nvme: " Kevin Wolf
2022-10-07 10:47 ` [PULL 14/50] parallels: " Kevin Wolf
2022-10-07 10:47 ` [PULL 15/50] qcow2: " Kevin Wolf
2022-10-07 10:47 ` [PULL 16/50] copy-before-write: " Kevin Wolf
2022-10-07 10:47 ` [PULL 17/50] curl: " Kevin Wolf
2022-10-07 10:47 ` [PULL 18/50] qed: " Kevin Wolf
2022-10-07 10:47 ` [PULL 19/50] quorum: " Kevin Wolf
2022-10-07 10:47 ` [PULL 20/50] throttle: " Kevin Wolf
2022-10-07 10:47 ` [PULL 21/50] vmdk: " Kevin Wolf
2022-10-07 10:47 ` [PULL 22/50] job: " Kevin Wolf
2022-10-07 10:47 ` [PULL 23/50] coroutine-lock: " Kevin Wolf
2022-10-07 10:47 ` [PULL 24/50] raw-format: " Kevin Wolf
2022-10-07 10:47 ` [PULL 25/50] 9p: " Kevin Wolf
2022-10-07 10:47 ` [PULL 26/50] migration: " Kevin Wolf
2022-10-07 10:47 ` [PULL 27/50] test-coroutine: " Kevin Wolf
2022-10-07 10:47 ` [PULL 28/50] quorum: Remove unnecessary forward declaration Kevin Wolf
2022-10-07 10:47 ` [PULL 29/50] job.c: make job_mutex and job_lock/unlock() public Kevin Wolf
2022-10-07 10:47 ` [PULL 30/50] job.h: categorize fields in struct Job Kevin Wolf
2022-10-07 10:47 ` [PULL 31/50] job.c: API functions not used outside should be static Kevin Wolf
2022-10-07 10:47 ` [PULL 32/50] aio-wait.h: introduce AIO_WAIT_WHILE_UNLOCKED Kevin Wolf
2022-10-07 10:47 ` [PULL 33/50] job.c: add job_lock/unlock while keeping job.h intact Kevin Wolf
2022-10-07 10:47 ` [PULL 34/50] job: move and update comments from blockjob.c Kevin Wolf
2022-10-07 10:47 ` [PULL 35/50] blockjob: introduce block_job _locked() APIs Kevin Wolf
2022-10-07 10:47 ` [PULL 36/50] jobs: add job lock in find_* functions Kevin Wolf
2022-10-07 10:47 ` [PULL 37/50] jobs: use job locks also in the unit tests Kevin Wolf
2022-10-07 10:47 ` [PULL 38/50] block/mirror.c: use of job helpers in drivers Kevin Wolf
2022-10-07 10:47 ` [PULL 39/50] jobs: group together API calls under the same job lock Kevin Wolf
2022-10-07 10:47 ` Kevin Wolf [this message]
2022-10-07 10:47 ` [PULL 41/50] jobs: protect job.aio_context with BQL and job_mutex Kevin Wolf
2022-10-07 10:47 ` [PULL 42/50] blockjob.h: categorize fields in struct BlockJob Kevin Wolf
2022-10-07 10:47 ` [PULL 43/50] blockjob: rename notifier callbacks as _locked Kevin Wolf
2022-10-07 10:47 ` [PULL 44/50] blockjob: protect iostatus field in BlockJob struct Kevin Wolf
2022-10-07 10:47 ` [PULL 45/50] job.h: categorize JobDriver callbacks that need the AioContext lock Kevin Wolf
2022-10-07 10:47 ` [PULL 46/50] job.c: enable job lock/unlock and remove Aiocontext locks Kevin Wolf
2022-10-07 10:47 ` [PULL 47/50] block_job_query: remove atomic read Kevin Wolf
2022-10-07 10:47 ` [PULL 48/50] blockjob: remove unused functions Kevin Wolf
2022-10-07 10:47 ` [PULL 49/50] job: " Kevin Wolf
2022-10-07 10:47 ` [PULL 50/50] file-posix: Remove unused s->discard_zeroes Kevin Wolf
2022-10-12 21:25 ` [PULL 00/50] Block layer patches Stefan Hajnoczi

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221007104752.141361-41-kwolf@redhat.com \
    --to=kwolf@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.