From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:46142) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bpDn4-00088w-Ag for qemu-devel@nongnu.org; Wed, 28 Sep 2016 08:16:51 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bpDn0-0006sm-63 for qemu-devel@nongnu.org; Wed, 28 Sep 2016 08:16:46 -0400 References: <1470683381-16680-1-git-send-email-jsnow@redhat.com> <57E94491.8090501@virtuozzo.com> From: Vladimir Sementsov-Ogievskiy Message-ID: <57EBB4A1.1000104@virtuozzo.com> Date: Wed, 28 Sep 2016 15:16:33 +0300 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 0/5] blockjobs: Fix transactional race condition List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: John Snow , qemu-devel , Stefan Hajnoczi , Jeff Cody , "Denis V. Lunev" , qemu block Ok, let's discuss it on list. On 27.09.2016 19:19, John Snow wrote: > Replying off-list to follow your lead, but this might be a good > discussion to have upstream where Stefan and Jeff can see it. > > On 09/26/2016 11:53 AM, Vladimir Sementsov-Ogievskiy wrote: >> Hi John! >> >> What about this series? If you are working on this, I can tell about >> another related problem. >> > > I got mired in trying to think about how to delete a BlockJob that > hasn't technically been started yet -- this patch series is unsafe in > that it will be unable to "cancel" a job that hasn't started yet. > > (And I don't want to start a job just to cancel it, that's gross.) > > I need to expand the job modifications here a little bit to make this > completely safe. I may need to create a job->started bool that allows > you to call a job_delete() function if and only if started is false. > > I'll try to send this along this week. > >> I'm working on async scheme for backup. I've started from something like >> just wrapping backup_do_cow into coroutine and use this coroutine in >> full-backup-loop instead of just call backup_do_cow. This simple >> approach seems better than used in mirror - in mirror we create new >> coroutine for each async read and write, when I create new coroutine for >> copy of = read + write. But then I decided to rewrite this to >> worker-coroutines scheme. I.e. we do not create new coroutine for each >> request and control their count by " while (job->inflight_requests >= >> MAX_IN_FLIGHT) { yield }" but we create MAX_IN_FLIGHT worker coroutines >> at backup start, and then they just copy cluster by cluster, using >> common cop_bitmap to synchronize. All cluster copying is done in >> backup-workers and write-notifiers only change the order in which >> workers copy clusters. And I liked it all until I figured out the error >> handling. >> > > Makes sense to me so far. > >> I can't handle errors in workers, because I need to call >> block_job_error_action, which may stop blockjob, and it seems to be a >> problem if several workers calls it simultaneously. So, worker have to > > What kind of problems are you seeing? Maybe we can fix them? is it for > STOP cases? (or also on IGNORE/REPORT?) > > Otherwise, just stash an error object in a location that the master > coroutine can find, pause/terminate all worker threads, and signal the > error. > >> defer error handling to main coroutine of the blockjob (this coroutine >> in my solution doesn't do copying but only handles blockjob staff like >> pauses, stops, throttling). So, we will have a queue of workers, waiting >> for main coroutine to handle their errors. This looks too complicated.. > > Really? I don't think it's too bad. The worker detects an error and > creates a record of the problem, then the worker yields. > > The controller detects the worker has yielded and sees the record of > the error. The controller opts not to re-schedule the worker. > > The controller then instructs all remaining workers to yield. Once all > workers are yielded, the controller invokes block_job_error_action. > > From there, we can handle the error accordingly. > > Either way it sounds like we're going to have to manage the complexity > of what happens when a job with worker coroutines is paused by e.g. > the QMP monitor: the controller needs to send and coordinate messages > to the workers; so this doesn't sound like too much added complexity > unless I'm dreaming. > >> And finally I come to the idea that all this problems and complications >> are because blockjob is not created for asynchronous work, because >> blockjob has only one working coroutine. So the true way is to maintain >> several working coroutines on generic blockjob layer. This will lead to >> simpler and transparent async schemes for backup and mirror (and may be >> other blockjobs). >> >> So, the question is: what about refactoring blockjobs, to move from one >> working coroutine to several ones? Is it hard and is it possible? I do >> not understand all block-jobs related staff.. >> > > I think jobs will need to remain "one coroutine, one job" for now, but > there's no reason why drive-backup or blockdev-backup can't just > create multiple jobs each if that's what they need to do. (The backup > job object could, in theory, just have another job pointer to a helper > job if it really became necessary.) > > Let's try to solve your design problem first and see if we can't make > error reporting behave nicely in a contested environment. Hmm, ok, I'll go this way for now. But anyway, I think that finally async scheme in backup and mirror should be the same. And it should share the same code, so all worker-related should be separated from backup to own file, or injected into block-jobs. > >> >> On 08.08.2016 22:09, John Snow wrote: >>> There are a few problems with transactional job completion right now. >>> >>> First, if jobs complete so quickly they complete before remaining jobs >>> get a chance to join the transaction, the completion mode can leave >>> well >>> known state and the QLIST can get corrupted and the transactional jobs >>> can complete in batches or phases instead of all together. >>> >>> Second, if two or more jobs defer to the main loop at roughly the same >>> time, it's possible for one job's cleanup to directly invoke the other >>> job's cleanup from within the same thread, leading to a situation that >>> will deadlock the entire transaction. >>> >>> Thanks to Vladimir for pointing out these modes of failure. >>> >>> ________________________________________________________________________________ >>> >>> >>> >>> For convenience, this branch is available at: >>> https://github.com/jnsnow/qemu.git branch job-manual-start >>> https://github.com/jnsnow/qemu/tree/job-manual-start >>> >>> This version is tagged job-manual-start-v1: >>> https://github.com/jnsnow/qemu/releases/tag/job-manual-start-v1 >>> >>> John Snow (4): >>> blockjob: add block_job_start >>> blockjob: refactor backup_start as backup_job_create >>> blockjob: add .clean property >>> iotests: add transactional failure race test >>> >>> Vladimir Sementsov-Ogievskiy (1): >>> blockjob: fix dead pointer in txn list >>> >>> block/backup.c | 50 +++++++----- >>> block/commit.c | 2 +- >>> block/mirror.c | 2 +- >>> block/stream.c | 2 +- >>> blockdev.c | 194 >>> ++++++++++++++++++++++++++------------------- >>> blockjob.c | 24 +++++- >>> include/block/block_int.h | 19 ++--- >>> include/block/blockjob.h | 16 ++++ >>> tests/qemu-iotests/124 | 91 +++++++++++++++++++++ >>> tests/qemu-iotests/124.out | 4 +- >>> tests/test-blockjob-txn.c | 2 +- >>> 11 files changed, 284 insertions(+), 122 deletions(-) >>> >> >> -- Best regards, Vladimir