From: Fam Zheng <famz@redhat.com>
To: Anton Nefedov <anton.nefedov@virtuozzo.com>
Cc: Alberto Garcia <berto@igalia.com>,
qemu-devel@nongnu.org, kwolf@redhat.com, qemu-block@nongnu.org,
mreitz@redhat.com
Subject: Re: [Qemu-devel] [Qemu-block] segfault in parallel blockjobs (iotest 30)
Date: Thu, 9 Nov 2017 14:05:26 +0800 [thread overview]
Message-ID: <20171109060526.GA30991@lemon> (raw)
In-Reply-To: <84b1b069-642e-2b33-7723-70edce3aea43@virtuozzo.com>
On Wed, 11/08 18:50, Anton Nefedov wrote:
>
>
> On 8/11/2017 5:45 PM, Alberto Garcia wrote:
> > On Tue 07 Nov 2017 05:19:41 PM CET, Anton Nefedov wrote:
> > > BlockBackend gets deleted by another job's stream_complete(), deferred
> > > to the main loop, so the fact that the job is put to sleep by
> > > bdrv_drain_all_begin() doesn't really stop it from execution.
> >
> > I was debugging this a bit, and the block_job_defer_to_main_loop() call
> > happens _after_ all jobs have been paused, so I think that when the BDS
> > is drained then stream_run() finishes the last iteration without
> > checking if it's paused.
> >
> > Without your patch (i.e. with a smaller STREAM_BUFFER_SIZE) then I
> > assume that the function would have to continue looping and
> > block_job_sleep_ns() would make the job coroutine yield, effectively
> > pausing the job and preventing the crash.
> >
> > I can fix the crash by adding block_job_pause_point(&s->common) at the
> > end of stream_run() (where the 'out' label is).
> >
> > I'm thinking that perhaps we should add the pause point directly to
> > block_job_defer_to_main_loop(), to prevent any block job from running
> > the exit function when it's paused.
> >
>
> Is it possible that the exit function is already deferred when the jobs
> are being paused? (even though it's at least less likely to happen)
>
> Then should we flush the bottom halves somehow in addition to putting
> the jobs to sleep? And also then it all probably has to happen before
> bdrv_reopen_queue()
Or we can stash away the BH temporarily during pause period:
---
diff --git a/blockjob.c b/blockjob.c
index 3a0c49137e..7058ff7ae1 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -127,6 +127,9 @@ void block_job_txn_add_job(BlockJobTxn *txn, BlockJob *job)
static void block_job_pause(BlockJob *job)
{
job->pause_count++;
+ if (job->defer_to_main_loop_bh) {
+ qemu_bh_cancel(job->defer_to_main_loop_bh);
+ }
}
static void block_job_resume(BlockJob *job)
@@ -136,6 +139,9 @@ static void block_job_resume(BlockJob *job)
if (job->pause_count) {
return;
}
+ if (job->defer_to_main_loop_bh) {
+ qemu_bh_schedule(job->defer_to_main_loop_bh);
+ }
block_job_enter(job);
}
@@ -900,6 +906,9 @@ static void block_job_defer_to_main_loop_bh(void *opaque)
/* Prevent race with block_job_defer_to_main_loop() */
aio_context_acquire(data->aio_context);
+ qemu_bh_delete(data->job->defer_to_main_loop_bh);
+ data->job->defer_to_main_loop_bh = NULL;
+
/* Fetch BDS AioContext again, in case it has changed */
aio_context = blk_get_aio_context(data->job->blk);
if (aio_context != data->aio_context) {
@@ -927,7 +936,9 @@ void block_job_defer_to_main_loop(BlockJob *job,
data->fn = fn;
data->opaque = opaque;
job->deferred_to_main_loop = true;
+ assert(!job->defer_to_main_loop_bh);
+ job->defer_to_main_loop_bh = aio_bh_new(qemu_get_aio_context(),
+ block_job_defer_to_main_loop_bh, data);
- aio_bh_schedule_oneshot(qemu_get_aio_context(),
- block_job_defer_to_main_loop_bh, data);
+ qemu_bh_schedule(job->defer_to_main_loop_bh);
}
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index 67c0968fa5..28d29b3be6 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -97,6 +97,11 @@ typedef struct BlockJob {
*/
bool deferred_to_main_loop;
+ /**
+ * The main loop BH for "defer to main loop" operation.
+ */
+ QEMUBH *defer_to_main_loop_bh;
+
/** Element of the list of block jobs */
QLIST_ENTRY(BlockJob) job_list;
next prev parent reply other threads:[~2017-11-09 6:05 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2017-11-07 16:19 [Qemu-devel] segfault in parallel blockjobs (iotest 30) Anton Nefedov
2017-11-08 13:58 ` [Qemu-devel] [Qemu-block] " Alberto Garcia
2017-11-08 14:45 ` Alberto Garcia
2017-11-08 15:50 ` Anton Nefedov
2017-11-09 6:05 ` Fam Zheng [this message]
2017-11-09 10:03 ` Alberto Garcia
2017-11-09 16:26 ` Alberto Garcia
2017-11-10 3:02 ` Fam Zheng
2017-11-15 15:42 ` Alberto Garcia
2017-11-15 16:31 ` Anton Nefedov
2017-11-16 15:54 ` Alberto Garcia
2017-11-16 16:09 ` Anton Nefedov
2017-11-21 12:51 ` Alberto Garcia
2017-11-21 15:18 ` Anton Nefedov
2017-11-21 15:31 ` Alberto Garcia
2017-11-22 0:13 ` John Snow
2017-11-22 12:55 ` Alberto Garcia
2017-11-22 15:58 ` John Snow
2017-11-16 21:56 ` John Snow
2017-11-17 16:16 ` Alberto Garcia
2017-11-22 15:05 ` Alberto Garcia
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20171109060526.GA30991@lemon \
--to=famz@redhat.com \
--cc=anton.nefedov@virtuozzo.com \
--cc=berto@igalia.com \
--cc=kwolf@redhat.com \
--cc=mreitz@redhat.com \
--cc=qemu-block@nongnu.org \
--cc=qemu-devel@nongnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).