Re: [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
To: John Snow <jsnow@redhat.com>, qemu-devel <qemu-devel@nongnu.org>,
	Stefan Hajnoczi <stefanha@redhat.com>,
	Jeff Cody <jcody@redhat.com>, "Denis V. Lunev" <den@openvz.org>,
	qemu block <qemu-block@nongnu.org>
Subject: Re: [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition
Date: Wed, 28 Sep 2016 15:16:33 +0300	[thread overview]
Message-ID: <57EBB4A1.1000104@virtuozzo.com> (raw)
In-Reply-To: <bf412a34-acb7-d3df-1710-ee7917ee2060@redhat.com>

Ok, let's discuss it on list.

On 27.09.2016 19:19, John Snow wrote:
> Replying off-list to follow your lead, but this might be a good 
> discussion to have upstream where Stefan and Jeff can see it.
>
> On 09/26/2016 11:53 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi John!
>>
>> What about this series? If you are working on this, I can tell about
>> another related problem.
>>
>
> I got mired in trying to think about how to delete a BlockJob that 
> hasn't technically been started yet -- this patch series is unsafe in 
> that it will be unable to "cancel" a job that hasn't started yet.
>
> (And I don't want to start a job just to cancel it, that's gross.)
>
> I need to expand the job modifications here a little bit to make this 
> completely safe. I may need to create a job->started bool that allows 
> you to call a job_delete() function if and only if started is false.
>
> I'll try to send this along this week.
>
>> I'm working on async scheme for backup. I've started from something like
>> just wrapping backup_do_cow into coroutine and use this coroutine in
>> full-backup-loop instead of just call backup_do_cow. This simple
>> approach seems better than used in mirror - in mirror we create new
>> coroutine for each async read and write, when I create new coroutine for
>> copy of = read + write. But then I decided to rewrite this to
>> worker-coroutines scheme. I.e. we do not create new coroutine for each
>> request and control their count by " while (job->inflight_requests >=
>> MAX_IN_FLIGHT) { yield }" but we create MAX_IN_FLIGHT worker coroutines
>> at backup start, and then they just copy cluster by cluster, using
>> common cop_bitmap to synchronize. All cluster copying is done in
>> backup-workers and write-notifiers only change the order in which
>> workers copy clusters. And I liked it all until I figured out the error
>> handling.
>>
>
> Makes sense to me so far.
>
>> I can't handle errors in workers, because I need to call
>> block_job_error_action, which may stop blockjob, and it seems to be a
>> problem if several workers calls it simultaneously. So, worker have to
>
> What kind of problems are you seeing? Maybe we can fix them? is it for 
> STOP cases? (or also on IGNORE/REPORT?)
>
> Otherwise, just stash an error object in a location that the master 
> coroutine can find, pause/terminate all worker threads, and signal the 
> error.
>
>> defer error handling to main coroutine of the blockjob (this coroutine
>> in my solution doesn't do copying but only handles blockjob staff like
>> pauses, stops, throttling). So, we will have a queue of workers, waiting
>> for main coroutine to handle their errors. This looks too complicated..
>
> Really? I don't think it's too bad. The worker detects an error and 
> creates a record of the problem, then the worker yields.
>
> The controller detects the worker has yielded and sees the record of 
> the error. The controller opts not to re-schedule the worker.
>
> The controller then instructs all remaining workers to yield. Once all 
> workers are yielded, the controller invokes block_job_error_action.
>
> From there, we can handle the error accordingly.
>
> Either way it sounds like we're going to have to manage the complexity 
> of what happens when a job with worker coroutines is paused by e.g. 
> the QMP monitor: the controller needs to send and coordinate messages 
> to the workers; so this doesn't sound like too much added complexity 
> unless I'm dreaming.
>
>> And finally I come to the idea that all this problems and complications
>> are because blockjob is not created for asynchronous work, because
>> blockjob has only one working coroutine. So the true way is to maintain
>> several working coroutines on generic blockjob layer. This will lead to
>> simpler and transparent async schemes for backup and mirror (and may be
>> other blockjobs).
>>
>> So, the question is: what about refactoring blockjobs, to move from one
>> working coroutine to several ones? Is it hard and is it possible? I do
>> not understand all block-jobs related staff..
>>
>
> I think jobs will need to remain "one coroutine, one job" for now, but 
> there's no reason why drive-backup or blockdev-backup can't just 
> create multiple jobs each if that's what they need to do. (The backup 
> job object could, in theory, just have another job pointer to a helper 
> job if it really became necessary.)
>
> Let's try to solve your design problem first and see if we can't make 
> error reporting behave nicely in a contested environment.

Hmm, ok, I'll go this way for now. But anyway, I think that finally 
async scheme in backup and mirror should be the same. And it should 
share the same code, so all worker-related should be separated from 
backup to own file, or injected into block-jobs.

>
>>
>> On 08.08.2016 22:09, John Snow wrote:
>>> There are a few problems with transactional job completion right now.
>>>
>>> First, if jobs complete so quickly they complete before remaining jobs
>>> get a chance to join the transaction, the completion mode can leave 
>>> well
>>> known state and the QLIST can get corrupted and the transactional jobs
>>> can complete in batches or phases instead of all together.
>>>
>>> Second, if two or more jobs defer to the main loop at roughly the same
>>> time, it's possible for one job's cleanup to directly invoke the other
>>> job's cleanup from within the same thread, leading to a situation that
>>> will deadlock the entire transaction.
>>>
>>> Thanks to Vladimir for pointing out these modes of failure.
>>>
>>> ________________________________________________________________________________ 
>>>
>>>
>>>
>>> For convenience, this branch is available at:
>>> https://github.com/jnsnow/qemu.git  branch job-manual-start
>>> https://github.com/jnsnow/qemu/tree/job-manual-start
>>>
>>> This version is tagged job-manual-start-v1:
>>> https://github.com/jnsnow/qemu/releases/tag/job-manual-start-v1
>>>
>>> John Snow (4):
>>>    blockjob: add block_job_start
>>>    blockjob: refactor backup_start as backup_job_create
>>>    blockjob: add .clean property
>>>    iotests: add transactional failure race test
>>>
>>> Vladimir Sementsov-Ogievskiy (1):
>>>    blockjob: fix dead pointer in txn list
>>>
>>>   block/backup.c             |  50 +++++++-----
>>>   block/commit.c             |   2 +-
>>>   block/mirror.c             |   2 +-
>>>   block/stream.c             |   2 +-
>>>   blockdev.c                 | 194
>>> ++++++++++++++++++++++++++-------------------
>>>   blockjob.c                 |  24 +++++-
>>>   include/block/block_int.h  |  19 ++---
>>>   include/block/blockjob.h   |  16 ++++
>>>   tests/qemu-iotests/124     |  91 +++++++++++++++++++++
>>>   tests/qemu-iotests/124.out |   4 +-
>>>   tests/test-blockjob-txn.c  |   2 +-
>>>   11 files changed, 284 insertions(+), 122 deletions(-)
>>>
>>
>>


-- 
Best regards,
Vladimir

next prev parent reply	other threads:[~2016-09-28 12:16 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-08-08 19:09 [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition John Snow
2016-08-08 19:09 ` [Qemu-devel] [PATCH 1/5] blockjob: fix dead pointer in txn list John Snow
2016-09-29 18:16   ` Eric Blake
2016-08-08 19:09 ` [Qemu-devel] [PATCH 2/5] blockjob: add block_job_start John Snow
2016-08-08 19:09 ` [Qemu-devel] [PATCH 3/5] blockjob: refactor backup_start as backup_job_create John Snow
2016-08-08 21:23   ` John Snow
2016-08-08 19:09 ` [Qemu-devel] [PATCH 4/5] blockjob: add .clean property John Snow
2016-08-08 19:09 ` [Qemu-devel] [PATCH 5/5] iotests: add transactional failure race test John Snow
2016-08-10 15:19   ` Vladimir Sementsov-Ogievskiy
2016-08-08 19:18 ` [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition no-reply
2016-08-08 19:19 ` John Snow
     [not found] ` <57E94491.8090501@virtuozzo.com>
     [not found]   ` <bf412a34-acb7-d3df-1710-ee7917ee2060@redhat.com>
2016-09-28 12:16     ` Vladimir Sementsov-Ogievskiy [this message]
2016-09-29 11:33       ` [Qemu-devel] [Qemu-block] " Kevin Wolf
2016-09-29 12:30         ` Stefan Hajnoczi
2016-09-29 20:58         ` John Snow

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57EBB4A1.1000104@virtuozzo.com \
    --to=vsementsov@virtuozzo.com \
    --cc=den@openvz.org \
    --cc=jcody@redhat.com \
    --cc=jsnow@redhat.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=stefanha@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.