qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Stefan Reiter <s.reiter@proxmox.com>
To: qemu-devel@nongnu.org, qemu-block@nongnu.org
Cc: kwolf@redhat.com, vsementsov@virtuozzo.com, slp@redhat.com,
	mreitz@redhat.com, stefanha@redhat.com, jsnow@redhat.com,
	dietmar@proxmox.com
Subject: [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply
Date: Wed,  1 Apr 2020 10:15:02 +0200	[thread overview]
Message-ID: <20200401081504.200017-2-s.reiter@proxmox.com> (raw)
In-Reply-To: <20200401081504.200017-1-s.reiter@proxmox.com>

All callers of job_txn_apply hold a single job's lock, but different
jobs within a transaction can have different contexts, thus we need to
lock each one individually before applying the callback function.

Similar to job_completed_txn_abort this also requires releasing the
caller's context before and reacquiring it after to avoid recursive
locks which might break AIO_WAIT_WHILE in the callback.

This also brings to light a different issue: When a callback function in
job_txn_apply moves it's job to a different AIO context, job_exit will
try to release the wrong lock (now that we re-acquire the lock
correctly, previously it would just continue with the old lock, leaving
the job unlocked for the rest of the codepath back to job_exit). Fix
this by not caching the job's context in job_exit and add a comment
about why this is done.

One test needed adapting, since it calls job_finalize directly, so it
manually needs to acquire the correct context.

Signed-off-by: Stefan Reiter <s.reiter@proxmox.com>
---
 job.c                 | 48 ++++++++++++++++++++++++++++++++++---------
 tests/test-blockjob.c |  2 ++
 2 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/job.c b/job.c
index 134a07b92e..5fbaaabf78 100644
--- a/job.c
+++ b/job.c
@@ -136,17 +136,36 @@ static void job_txn_del_job(Job *job)
     }
 }
 
-static int job_txn_apply(JobTxn *txn, int fn(Job *))
+static int job_txn_apply(Job *job, int fn(Job *))
 {
-    Job *job, *next;
+    AioContext *inner_ctx;
+    Job *other_job, *next;
+    JobTxn *txn = job->txn;
     int rc = 0;
 
-    QLIST_FOREACH_SAFE(job, &txn->jobs, txn_list, next) {
-        rc = fn(job);
+    /*
+     * Similar to job_completed_txn_abort, we take each job's lock before
+     * applying fn, but since we assume that outer_ctx is held by the caller,
+     * we need to release it here to avoid holding the lock twice - which would
+     * break AIO_WAIT_WHILE from within fn.
+     */
+    aio_context_release(job->aio_context);
+
+    QLIST_FOREACH_SAFE(other_job, &txn->jobs, txn_list, next) {
+        inner_ctx = other_job->aio_context;
+        aio_context_acquire(inner_ctx);
+        rc = fn(other_job);
+        aio_context_release(inner_ctx);
         if (rc) {
             break;
         }
     }
+
+    /*
+     * Note that job->aio_context might have been changed by calling fn, so we
+     * can't use a local variable to cache it.
+     */
+    aio_context_acquire(job->aio_context);
     return rc;
 }
 
@@ -774,11 +793,11 @@ static void job_do_finalize(Job *job)
     assert(job && job->txn);
 
     /* prepare the transaction to complete */
-    rc = job_txn_apply(job->txn, job_prepare);
+    rc = job_txn_apply(job, job_prepare);
     if (rc) {
         job_completed_txn_abort(job);
     } else {
-        job_txn_apply(job->txn, job_finalize_single);
+        job_txn_apply(job, job_finalize_single);
     }
 }
 
@@ -824,10 +843,10 @@ static void job_completed_txn_success(Job *job)
         assert(other_job->ret == 0);
     }
 
-    job_txn_apply(txn, job_transition_to_pending);
+    job_txn_apply(job, job_transition_to_pending);
 
     /* If no jobs need manual finalization, automatically do so */
-    if (job_txn_apply(txn, job_needs_finalize) == 0) {
+    if (job_txn_apply(job, job_needs_finalize) == 0) {
         job_do_finalize(job);
     }
 }
@@ -849,9 +868,10 @@ static void job_completed(Job *job)
 static void job_exit(void *opaque)
 {
     Job *job = (Job *)opaque;
-    AioContext *ctx = job->aio_context;
+    AioContext *ctx;
 
-    aio_context_acquire(ctx);
+    job_ref(job);
+    aio_context_acquire(job->aio_context);
 
     /* This is a lie, we're not quiescent, but still doing the completion
      * callbacks. However, completion callbacks tend to involve operations that
@@ -862,6 +882,14 @@ static void job_exit(void *opaque)
 
     job_completed(job);
 
+    /*
+     * Note that calling job_completed can move the job to a different
+     * aio_context, so we cannot cache from above. job_txn_apply takes care of
+     * acquiring the new lock, and we ref/unref to avoid job_completed freeing
+     * the job underneath us.
+     */
+    ctx = job->aio_context;
+    job_unref(job);
     aio_context_release(ctx);
 }
 
diff --git a/tests/test-blockjob.c b/tests/test-blockjob.c
index 4eeb184caf..7519847912 100644
--- a/tests/test-blockjob.c
+++ b/tests/test-blockjob.c
@@ -367,7 +367,9 @@ static void test_cancel_concluded(void)
     aio_poll(qemu_get_aio_context(), true);
     assert(job->status == JOB_STATUS_PENDING);
 
+    aio_context_acquire(job->aio_context);
     job_finalize(job, &error_abort);
+    aio_context_release(job->aio_context);
     assert(job->status == JOB_STATUS_CONCLUDED);
 
     cancel_common(s);
-- 
2.26.0




  reply	other threads:[~2020-04-01  8:18 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-04-01  8:15 [PATCH v4 0/3] Fix some AIO context locking in jobs Stefan Reiter
2020-04-01  8:15 ` Stefan Reiter [this message]
2020-04-02 12:33   ` [PATCH v4 1/3] job: take each job's lock individually in job_txn_apply Max Reitz
2020-04-02 15:04     ` Stefan Reiter
2020-04-01  8:15 ` [PATCH v4 2/3] replication: acquire aio context before calling job_cancel_sync Stefan Reiter
2020-04-02 12:41   ` Max Reitz
2020-04-02 15:05     ` Stefan Reiter
2020-04-01  8:15 ` [PATCH v4 3/3] backup: don't acquire aio_context in backup_clean Stefan Reiter
2020-04-02 12:59   ` Max Reitz
2020-04-02 12:48 ` [PATCH v4 0/3] Fix some AIO context locking in jobs Kevin Wolf
2020-04-06 20:11   ` John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200401081504.200017-2-s.reiter@proxmox.com \
    --to=s.reiter@proxmox.com \
    --cc=dietmar@proxmox.com \
    --cc=jsnow@redhat.com \
    --cc=kwolf@redhat.com \
    --cc=mreitz@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=slp@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=vsementsov@virtuozzo.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).