From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:38328) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YYK0T-0000WN-OA for qemu-devel@nongnu.org; Wed, 18 Mar 2015 15:52:02 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YYK0Q-0006Va-Uz for qemu-devel@nongnu.org; Wed, 18 Mar 2015 15:51:57 -0400 Message-ID: <5509D74F.7010908@redhat.com> Date: Wed, 18 Mar 2015 15:51:43 -0400 From: John Snow MIME-Version: 1.0 References: <1425528911-10300-1-git-send-email-jsnow@redhat.com> <1425528911-10300-8-git-send-email-jsnow@redhat.com> <5508855D.5010505@redhat.com> <5508B85E.40604@redhat.com> <55098097.8060401@redhat.com> In-Reply-To: <55098097.8060401@redhat.com> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: quoted-printable Subject: Re: [Qemu-devel] [PATCH 07/11] block: drive_backup transaction callback support List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: Max Reitz , qemu-block@nongnu.org Cc: kwolf@redhat.com, famz@redhat.com, qemu-devel@nongnu.org, armbru@redhat.com, vsementsov@parallels.com, stefanha@redhat.com On 03/18/2015 09:41 AM, Max Reitz wrote: > On 2015-03-17 at 19:27, John Snow wrote: >> >> >> On 03/17/2015 03:49 PM, Max Reitz wrote: >>> On 2015-03-04 at 23:15, John Snow wrote: >>>> This patch actually implements the transactional callback system >>>> for the drive_backup transaction. >>>> >>>> (1) We manually pick up a reference to the bitmap if present to allo= w >>>> its cleanup to be delayed until after all drive_backup jobs >>>> launched >>>> by the transaction have fully completed. >>>> >>>> (2) We create a functional closure that envelops the original >>>> drive_backup >>>> callback, to be able to intercept the completion status and >>>> return code >>>> for the job. >>>> >>>> (3) We add the drive_backup_cb method for the drive_backup action, >>>> which >>>> unpacks the completion information and invokes the final cleanu= p. >>>> >>>> (4) backup_transaction_complete will perform the final cleanup on th= e >>>> backup job. >>>> >>>> (5) In the case of transaction cancellation, drive_backup_cb is stil= l >>>> responsible for cleaning up the mess we may have already made. >>>> >>>> Signed-off-by: John Snow >>>> --- >>>> block/backup.c | 9 +++++++ >>>> blockdev.c | 64 >>>> ++++++++++++++++++++++++++++++++++++++--------- >>>> include/block/block_int.h | 8 ++++++ >>>> 3 files changed, 69 insertions(+), 12 deletions(-) >>>> >>>> diff --git a/block/backup.c b/block/backup.c >>>> index 4332df4..3673fb0 100644 >>>> --- a/block/backup.c >>>> +++ b/block/backup.c >>>> @@ -233,6 +233,15 @@ typedef struct { >>>> int ret; >>>> } BackupCompleteData; >>>> +void backup_transaction_complete(BlockJob *job, int ret) >>>> +{ >>>> + BackupBlockJob *s =3D container_of(job, BackupBlockJob, common)= ; >>>> + >>>> + if (s->sync_bitmap) { >>>> + bdrv_dirty_bitmap_decref(job->bs, s->sync_bitmap, ret, NULL= ); >>>> + } >>>> +} >>>> + >>>> static void backup_complete(BlockJob *job, void *opaque) >>>> { >>>> BackupBlockJob *s =3D container_of(job, BackupBlockJob, common= ); >>>> diff --git a/blockdev.c b/blockdev.c >>>> index 9e3b9e9..3ff14a7 100644 >>>> --- a/blockdev.c >>>> +++ b/blockdev.c >>>> @@ -1363,14 +1363,6 @@ static void transaction_callback(void *opaque= , >>>> int ret) >>>> typedef void (CallbackFn)(void *opaque, int ret); >>>> -/* Temporary. Removed in the next patch. */ >>>> -CallbackFn *new_transaction_wrapper(BlkTransactionState *common, >>>> - void *opaque, >>>> - void (*callback)(void *, int), >>>> - void **new_opaque); >>>> - >>>> -void undo_transaction_wrapper(BlkTransactionState *common); >>>> - >>>> /** >>>> * Create a new transactional callback wrapper. >>>> * >>>> @@ -1389,7 +1381,7 @@ void >>>> undo_transaction_wrapper(BlkTransactionState *common); >>>> * >>>> * @return The callback to be used instead of @callback. >>>> */ >>>> -CallbackFn *new_transaction_wrapper(BlkTransactionState *common, >>>> +static CallbackFn *new_transaction_wrapper(BlkTransactionState >>>> *common, >>>> void *opaque, >>>> CallbackFn *callback, >>>> void **new_opaque) >>>> @@ -1416,7 +1408,7 @@ CallbackFn >>>> *new_transaction_wrapper(BlkTransactionState *common, >>>> /** >>>> * Undo any actions performed by the above call. >>>> */ >>>> -void undo_transaction_wrapper(BlkTransactionState *common) >>>> +static void undo_transaction_wrapper(BlkTransactionState *common) >>>> { >>>> BlkTransactionList *btl =3D common->list; >>>> BlkTransactionState *bts; >>>> @@ -1449,6 +1441,7 @@ void >>>> undo_transaction_wrapper(BlkTransactionState *common) >>>> blk_put_transaction_state(common); >>>> } >>>> +static void block_job_cb(void *opaque, int ret); >>>> static void _drive_backup(const char *device, const char *target, >>>> bool has_format, const char *format, >>>> enum MirrorSyncMode sync, >>>> @@ -1767,6 +1760,9 @@ static void >>>> drive_backup_prepare(BlkTransactionState *common, Error **errp) >>>> BlockDriverState *bs; >>>> DriveBackup *backup; >>>> Error *local_err =3D NULL; >>>> + CallbackFn *cb; >>>> + void *opaque; >>>> + BdrvDirtyBitmap *bmap =3D NULL; >>>> assert(common->action->kind =3D=3D >>>> TRANSACTION_ACTION_KIND_DRIVE_BACKUP); >>>> backup =3D common->action->drive_backup; >>>> @@ -1777,6 +1773,19 @@ static void >>>> drive_backup_prepare(BlkTransactionState *common, Error **errp) >>>> return; >>>> } >>>> + /* BackupBlockJob is opaque to us, so look up the bitmap >>>> ourselves */ >>>> + if (backup->has_bitmap) { >>>> + bmap =3D bdrv_find_dirty_bitmap(bs, backup->bitmap); >>>> + if (!bmap) { >>>> + error_setg(errp, "Bitmap '%s' could not be found", >>>> backup->bitmap); >>>> + return; >>>> + } >>>> + } >>>> + >>>> + /* Create our transactional callback wrapper, >>>> + and register that we'd like to call .cb() later. */ >>>> + cb =3D new_transaction_wrapper(common, bs, block_job_cb, &opaqu= e); >>>> + >>>> /* AioContext is released in .clean() */ >>>> state->aio_context =3D bdrv_get_aio_context(bs); >>>> aio_context_acquire(state->aio_context); >>>> @@ -1789,7 +1798,7 @@ static void >>>> drive_backup_prepare(BlkTransactionState *common, Error **errp) >>>> backup->has_bitmap, backup->bitmap, >>>> backup->has_on_source_error, >>>> backup->on_source_error, >>>> backup->has_on_target_error, >>>> backup->on_target_error, >>>> - NULL, NULL, >>>> + cb, opaque, >>>> &local_err); >>>> if (local_err) { >>>> error_propagate(errp, local_err); >>>> @@ -1798,6 +1807,11 @@ static void >>>> drive_backup_prepare(BlkTransactionState *common, Error **errp) >>>> state->bs =3D bs; >>>> state->job =3D state->bs->job; >>>> + /* Keep the job alive until .cb(), too. */ >>>> + block_job_incref(state->job); >>>> + if (bmap) { >>>> + bdrv_dirty_bitmap_incref(bmap); >>>> + } >>>> } >>>> static void drive_backup_abort(BlkTransactionState *common) >>>> @@ -1809,6 +1823,10 @@ static void >>>> drive_backup_abort(BlkTransactionState *common) >>>> if (bs && bs->job && bs->job =3D=3D state->job) { >>>> block_job_cancel_sync(bs->job); >>>> } >>>> + >>>> + /* Undo any callback actions we may have done. Putting down >>>> references >>>> + * obtained during prepare() is handled by cb(). */ >>>> + undo_transaction_wrapper(common); >>>> } >>>> static void drive_backup_clean(BlkTransactionState *common) >>>> @@ -1820,6 +1838,27 @@ static void >>>> drive_backup_clean(BlkTransactionState *common) >>>> } >>>> } >>>> +static void drive_backup_cb(BlkTransactionState *common) >>>> +{ >>>> + BlkTransactionData *btd =3D common->opaque; >>>> + BlockDriverState *bs =3D btd->opaque; >>>> + DriveBackupState *state =3D DO_UPCAST(DriveBackupState, common, >>>> common); >>>> + >>>> + assert(state->bs =3D=3D bs); >>>> + if (bs->job) { >>>> + assert(state->job =3D=3D bs->job); >>>> + } >>>> + >>>> + state->aio_context =3D bdrv_get_aio_context(bs); >>>> + aio_context_acquire(state->aio_context); >>>> + >>>> + /* Note: We also have the individual job's return code in >>>> btd->ret */ >>>> + backup_transaction_complete(state->job, common->list->status); >>>> + block_job_decref(state->job); >>>> + >>>> + aio_context_release(state->aio_context); >>>> +} >>>> + >>>> typedef struct BlockdevBackupState { >>>> BlkTransactionState common; >>>> BlockDriverState *bs; >>>> @@ -2004,7 +2043,8 @@ static const BdrvActionOps actions[] =3D { >>>> .instance_size =3D sizeof(DriveBackupState), >>>> .prepare =3D drive_backup_prepare, >>>> .abort =3D drive_backup_abort, >>>> - .clean =3D drive_backup_clean >>>> + .clean =3D drive_backup_clean, >>>> + .cb =3D drive_backup_cb >>>> }, >>>> [TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP] =3D { >>>> .instance_size =3D sizeof(BlockdevBackupState), >>>> diff --git a/include/block/block_int.h b/include/block/block_int.h >>>> index e0d5561..731684d 100644 >>>> --- a/include/block/block_int.h >>>> +++ b/include/block/block_int.h >>>> @@ -619,6 +619,14 @@ void backup_start(BlockDriverState *bs, >>>> BlockDriverState *target, >>>> BlockCompletionFunc *cb, void *opaque, >>>> Error **errp); >>>> +/* >>>> + * backup_transaction_complete >>>> + * @job The BackupJob that was associated with a transaction >>> >>> s/BackupJob/backup block job/ or s/BackupJob/backup job/? (there is n= o >>> structure named "BackupJob", but this looks like there might be one) >>> >>>> + * @ret Amalgamated return code for the entire transaction >>> >>> Hm. The call to this function you're introducing in this patch will >>> probably stay the only one so there won't be anyone who'll have to wo= rry >>> about what this means, but if there was, they probably won't reach a >>> conclusive result. >>> >>> I know what it means because I've seen patch 3 (right now it means >>> "everything OR-ed together so it's 0 for success or some non-zero (ma= ybe >>> positive, maybe negative, depending on whether you have an even or an >>> odd number of errors, and depending on whether the jobs return negati= ve >>> values for errors or not) for error"), but I wouldn't be able to infe= r >>> it from this. At the least you should add that 0 means success and >>> everything else means error (if you take my suggestion for patch 3, i= t >>> would be 0 for success and -errno for error, where that error number = is >>> one of the errors encountered). >>> >>> Other than that, looks good (as far as I can tell with my still limit= ed >>> insights into patch 3). >>> >>> Max >>> >> >> It's my opinion that this patch should give you insight into patch #3, >> instead of the other way around. This patch is useful for >> demonstrating the general flow, because I kept all drive_backup >> specific concerns strictly separated from patch #3. > > It only truly helps me understand patch 3 under the assumption that thi= s > patch is correct. For reviewing, I cannot really take that assumption. = ;-) > > I mean, it does help me with the interfaces patch 3 offers, but it stil= l > doesn't help me with the objects introduced in patch 3, for instance, > because those are all contained in patch 3 itself. And if I cannot > verify patch 3 on its own, I cannot really verify that what this patch > does is correct in regards to the interfaces offered there. > >> I'll write a more comprehensive document for the docs/ folder soon, >> but the general shape of the idea is "The cleanup actions defined in >> the .cb() method will be executed after every job in the transactional >> group has completed." > > Right, again, assuming your implementation is correct, but questioning > that is part of the natural reviewing process. :-) > Yes, that's fine! I am just trying to help you unravel this strange loop=20 (ha ha) and/or try to understand what is not necessarily clear with the=20 information as presented. >> But there's some fine print: >> >> "'every job' refers only to jobs that have the capability to register >> a post-transaction callback, which currently means only drive_backup." >> >> The general approach to this is, mechanically: >> >> (1) Extend the life of objects of interest with reference counts, >> including Jobs, Bitmaps, and BlkTransactionStates. >> >> (2) "steal" the callback from a BlockJob and, when invoked, update our >> management information for the transaction group in BlkTransactionList= . >> >> (3) Once all jobs we have sent out return, execute the .cb() methods >> as indicated in the BlkTransactionList. >> >> >> >> So, if you were adding a callback to a different QMP transaction: >> - Look at new_transaction_wrapper; you'll use this to bump up various >> references used internally for this whole system, and it'll keep >> qmp_transaction from being able to delete the list and state objects. >> >> This function will give to you (as a gimmick) a new callback and >> opaque data structure pointer for you to give to the job that you are >> starting. I obfuscate this just a bit to make it clear that you should >> be using this function every time to help manage the transactional sta= te. >> >> - Now, the job will run on its own happy way, When it finishes, it >> will call transaction_callback, which is the function that >> "intercepts" the callbacks, and dispatches the original enveloped >> callbacks. It ditches the original data we used to know how to call >> the original callback, and begins storing completion information for >> jobs instead. >> >> - transaction_callback will call blk_transaction_complete to store >> completion info for one job. In turn, it calls >> put_blk_transaction_list to decrement the pending jobs count (the >> "jobs" refcount) and once that hits zero ... >> >> - All of the callback mechanisms associated with each >> BlockTransactionState are run via blk_run_transaction_callbacks. >> >> - This is where drive_backup_cb is going to get invoked again, and >> then we will splinter back out into backup-specific concerns, with >> Jobs and Bitmaps and what not. > > Yes, I can see the control flow; my main problem lies in the three > different structs introduced in patch 3. I'll tell you why I'm confused > by them: > Paydirt! > First, there is BlkTransactionList. Sounds to me like it should be a > list of transactions. Judging from its members, it is rather a list of > operations ("actions") for a single transaction, however. Apparently, > there is a double-use for the word "transaction": I'd consider a > transaction to be the thing started by the QMP command which is a group > of atomic operations. However, that one structure is called > BlkTransactionState, but it apparently this is only the state of a > single operation in the transaction. Pre-existing, but doesn't make it > better. You added comments for its list entry members, with the first > one stating "All transactions in the current group" and the second bein= g > a subset of that. However, again, I consider these to be operations and > not transactions. The group is the transaction. > Ah, yes. Good points. We need to decide on consistent terminology. Transaction: The entire group. Action: One particular command within a transaction group. Action Op: One particular stage of one command (e.g. prepare, commit) BlkTransactionState is an existing structure that does indeed refer=20 instead to one particular action's state, not that of the entire=20 transaction. This can be changed to be clearer. Better names might be=20 BlkTransactionActionState or BlkActionState. The BlkTransactionList here is indeed intended to be a list for the=20 entire transaction, so it is actually a "list of actions" and not,=20 perhaps as its name implies, a list of transactions. (We have no data=20 structures with a scope this vast.) The BlkTransactionList is an object that is created once per transaction=20 group (once per qmp_transaction()) and keeps a count of how many jobs=20 we've launched (and thus how many to expect to return to us.) Once a job completes, it adds that *action's state*=20 (BlkTransactionState) to its list. Once all jobs sent out have returned,=20 it will iterate this list and execute their .cb() actions. So perhaps an amendment would be: typedef struct BlkTransactionList { int pending_jobs; int transaction_rc; QLIST(...) completed_actions; } BlkTransactionList; to make this part clearer. > So we have that adding to my confusion. Furthermore, I think we don't > need to list entry members (entry and list_entry) in > BlkTransactionState. I think it should be fine to drop the > snap_bdrv_states list in qmp_transaction() and just add all > BlkTransactionStates to the BlkTransactionList (or however you name it > if you rename it) in the preparation while () loop there. I don't think > we need to add them only in blk_transaction_complete() (and we should > drop that from there if it's been done in qmp_transaction() already, of > course). Then we can really just use that list generally as *the* list > of operations in a transaction. > Yes, there's not necessarily a strong reason to keep two lists. One of the existing reasons, though, is that not all transactions launch=20 jobs or have been modified to work with this system yet, so we=20 definitely don't want to consider all of these entries in THIS system=20 just yet. So one of the simplifications two lists makes is that we don't have to=20 worry about "unsupported actions" making it into the list. Everything in=20 *my* list can be processed without further thought. We can also just add more state into the BlkTransactionState (the=20 ActionState) to help us distinguish, and then we can get rid of both list= s. A state machine tag or something might be useful in that way. > Second, there is BlkTransactionData. From the name I cannot judge at al= l > how it is supposed to differ from BlkTransactionState, because, well, i= t > contains some data about a transaction... Maybe, because I consider > BlkTransactionState to be misnamed, BlkTransactionData actually contain= s > data about the transaction and not about a single operation? But from > looking at the members, I have no idea. It has "opaque" and "ret"... I > don't know how that is vital data to the transaction itself (and I don'= t > think it is, it is rather data for the completion callback, I guess?), > it only looks like some sort of pattern I have seen for AIO callbacks > and the like. > BlkTransactionData was meant to contain completion info about a "job"=20 that was launched by an action. I tried to avoid tying it explicitly to=20 a BlockJob -- it can be any action with a callback, really. Anyway, this structure contains "Data that was delivered to the original=20 callback." In our case here, it's data that would've gone to=20 block_job_complete. So it contains an opaque pointer (A BDS, we secretly know) and a return=20 code for that individual action's task. > Third, there is BlkTransactionWrapper. I suppose it wraps a > transaction...? Or maybe at least an operation, because apparently I ca= n > no longer be sure whether a transaction is a transaction or an > operation... But there is no transaction or even a single operation in > there. It's just a structure describing a callback. I don't know how > that is a "transaction wrapper"... > We have achieved maximum Strawberry. The BlkTransactionWrapper was meant to wrap any data necessary to be=20 able to deliver the original callback we're stealing. So in our case, it's for *one action's callback* and in this one=20 instance, it's being used to hold the data needed to manually invoke the=20 block_job_complete callback. So both BTD and BTW here contain callback related information for one=20 particular action. They are differentiated by "Wrapper" having a "Before we execute the=20 original callback" context, and "Data" having a "We've already executed=20 the original callback" context. Can they be combined into one structure? Yes, as long as we keep a tag=20 that lets us know what state we're in. Can they be inlined into "BlkTransactionState"? Yes -- but I opted to=20 keep it in a separate glob to not clutter up the BlkTransactionState=20 with values only useful for an optional feature. Is that wise? Maybe, maybe not. I'm open to names. The most semantically accurate would be=20 BlkTransactionActionCallbackData if we combined BTD/BTW to be one=20 structure. It's a mouthful, though. > So, I don't understand any of the names given to the objects, and while > I might understand what they are used for by looking at them, I have a > *really* hard time understanding the code that uses them, because if I > don't stare at their definitions all the time, I will immediately forge= t > what they contain and what they are used for because I don't understand > the names at all (and there is nothing which explicitly tells me what > they are used for either). The fact that all of them are starting with > BlkTransaction makes mixing them up even easier. > Understood. >> From the point of view of the "transaction action developer," the >> interface to the new feature is found via the >> "new_transaction_wrapper" function, But some care must be taken to >> make sure this callback actually gets used once it has been requested. >> The whole thing can be aborted with the "undo_transaction_wrapper" >> function. >> >> All of the other functions that got added to blockdev.c in Patch #3 >> are there to assist these two "interface" functions in doing what they >> claim to. >> >> Everything in patch 4 and 5 just adds more reference count flexibility >> to things that are only of interest to drive_backup. >> >> Patch 6 allows us to inject our special callback wrapper into the >> qmp_drive_backup function. >> >> >> Clear as mud? > > Control flow and concept, clear. Implementation, still very muddy, sorr= y... > > Max > >> --js >> >>>> + */ >>>> +void backup_transaction_complete(BlockJob *job, int ret); >>>> + >>>> + >>>> void blk_dev_change_media_cb(BlockBackend *blk, bool load); >>>> bool blk_dev_has_removable_media(BlockBackend *blk); >>>> void blk_dev_eject_request(BlockBackend *blk, bool force); >>> >>> > --=20 =E2=80=94js