From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:45729) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1YXw5p-0005wN-1B for qemu-devel@nongnu.org; Tue, 17 Mar 2015 14:19:55 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1YXw5j-0000i4-WF for qemu-devel@nongnu.org; Tue, 17 Mar 2015 14:19:52 -0400 Message-ID: <55087041.6050605@redhat.com> Date: Tue, 17 Mar 2015 14:19:45 -0400 From: Max Reitz MIME-Version: 1.0 References: <1425528911-10300-1-git-send-email-jsnow@redhat.com> <1425528911-10300-4-git-send-email-jsnow@redhat.com> <550868A1.9090508@redhat.com> <55086CC8.1020102@redhat.com> In-Reply-To: <55086CC8.1020102@redhat.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [PATCH 03/11] block: add transactional callbacks feature List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: John Snow , qemu-block@nongnu.org Cc: kwolf@redhat.com, famz@redhat.com, qemu-devel@nongnu.org, armbru@redhat.com, vsementsov@parallels.com, stefanha@redhat.com On 2015-03-17 at 14:04, John Snow wrote: > > > On 03/17/2015 01:47 PM, Max Reitz wrote: >> On 2015-03-04 at 23:15, John Snow wrote: >>> The goal here is to add a new method to transactions that allows >>> developers to specify a callback that will get invoked only once >>> all jobs spawned by a transaction are completed, allowing developers >>> the chance to perform actions conditionally pending complete success >>> or complete failure. >>> >>> In order to register the new callback to be invoked, a user must >>> request >>> a callback pointer and closure by calling new_transaction_wrapper, >>> which >>> creates a wrapper around a closure and callback that would originally >>> have >>> been passed to e.g. backup_start(). >>> >>> The function will return a function pointer and a new closure to be >>> passed >>> instead. The transaction system will effectively intercept these >>> callbacks >>> and execute the desired actions upon reception of all intercepted >>> callbacks. >>> >>> This means that the registered callbacks will be called after all other >>> transaction actions that requested a callback have completed. The >>> feature >>> has no knowledge of jobs spawned without informing the >>> BlkTransactionList. >>> >>> For an example of how to use the feature, please skip ahead to: >>> 'block: drive_backup transaction callback support' which serves as an >>> example >>> for how to hook in drive_backup (or any block job launched by >>> transactions). >>> >>> >>> Note 1: Defining a callback method alone is not sufficient to have the >>> new >>> method invoked. You must call new_transaction_wrapper AND >>> ensure the >>> callback it returns to you is used as the callback for the >>> job. >>> >>> Note 2: You can use this feature for any system that registers >>> completions of >>> an action via a callback of the form (void *opaque, int ret), >>> not just >>> block job callbacks. >>> >>> Note 3: new_blk_transaction has no users in this patch, but will in >>> the next patch where it will become static and local to >>> blockdev.c. >>> >>> Signed-off-by: John Snow >>> --- >>> blockdev.c | 225 >>> ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++- >>> 1 file changed, 223 insertions(+), 2 deletions(-) >>> >>> diff --git a/blockdev.c b/blockdev.c >>> index 5120af1..3153ee7 100644 >>> --- a/blockdev.c >>> +++ b/blockdev.c >>> @@ -1207,6 +1207,8 @@ static BdrvDirtyBitmap >>> *block_dirty_bitmap_lookup(const char *node, >>> /* New and old BlockDriverState structs for atomic group >>> operations */ >>> typedef struct BlkTransactionState BlkTransactionState; >>> +typedef struct BlkTransactionList BlkTransactionList; >>> +typedef struct BlkTransactionData BlkTransactionData; >>> /* Only prepare() may fail. In a single transaction, only one of >>> commit() or >>> abort() will be called, clean() will always be called if it >>> present. */ >>> @@ -1221,6 +1223,8 @@ typedef struct BdrvActionOps { >>> void (*abort)(BlkTransactionState *common); >>> /* Clean up resource in the end, can be NULL. */ >>> void (*clean)(BlkTransactionState *common); >>> + /* Execute this after +all+ jobs in the transaction finish */ >>> + void (*cb)(BlkTransactionState *common); >>> } BdrvActionOps; >>> /* >>> @@ -1231,9 +1235,220 @@ typedef struct BdrvActionOps { >>> struct BlkTransactionState { >>> TransactionAction *action; >>> const BdrvActionOps *ops; >>> + BlkTransactionList *list; >>> + void *opaque; >>> + /* Allow external users (callbacks) to reference this obj past >>> .clean() */ >>> + int refcount; >>> + /* All transactions in the current group */ >>> QSIMPLEQ_ENTRY(BlkTransactionState) entry; >>> + /* Transactions in the current group with callbacks */ >>> + QSIMPLEQ_ENTRY(BlkTransactionState) list_entry; >> >> "list_entry" seems very generic and it's hard for me to see a >> fundamental difference to just "entry". How about "cb_entry", or >> "cb_list_entry", or "cb_group_entry" or something like that? >> >>> }; >>> +struct BlkTransactionList { >>> + int jobs; /* Effectively: A refcount */ >> >> Good to know, but I'd like the reason why it's called like this to be >> here anyway ("Number of jobs remaining" or I don't know). >> > > Yes, it's basically a refcount where the references are held by jobs. > >>> + int status; /* Cumulative retcode */ >> >> The only places I currently find "retcode" in are linux-user/signal.c >> and tests/image-fuzzer/runner.py. How about the more common "return >> value" (or simply "cumulative status")? >> >> ("retcode" reads a bit like "retcon" to me, that's why I had to think >> about it for a second) >> > > Sure. > >>> + QSIMPLEQ_HEAD(actions, BlkTransactionState) actions; >>> +}; >>> + >>> +typedef struct BlkTransactionData { >>> + void *opaque; /* Data given to encapsulated callback */ >>> + int ret; /* Return code given to encapsulated callback */ >>> +} BlkTransactionData; >>> + >>> +typedef struct BlkTransactionWrapper { >>> + void *opaque; /* Data to be given to encapsulated callback */ >>> + void (*callback)(void *opaque, int ret); /* Encapsulated >>> callback */ >>> +} BlkTransactionWrapper; >> >> I find it pretty difficult to figure out what these objects are each >> for. Care to add comments for that, too? >> > > Will do. > >>> + >>> +static BlkTransactionList *new_blk_transaction_list(void) >>> +{ >>> + BlkTransactionList *btl = g_malloc0(sizeof(*btl)); >> >> Maybe use g_new0(BlkTransactionList, 1); (It's typesafe! And who doesn't >> love typesafety?). >> >> (Optional, though, the foo = malloc(sizeof(*foo)) pattern is pretty >> wide-spread, and as far as I remember, Markus purposely did not replace >> it by foo = g_new(Foo, 1)) >> >>> + >>> + /* Implicit 'job' for qmp_transaction itself */ >>> + btl->jobs = 1; >> >> Well, if it is a refcount, just call it that way... >> >> (Update: Okay, I see why you didn't call it that way. Maybe the comment >> needs improvement, like "The transaction itself is a job, too, that >> needs to be completed before the callbacks are called") >> > > That's exactly the trick in play, here. > >>> + QSIMPLEQ_INIT(&btl->actions); >>> + return btl; >>> +} >>> + >>> +static BlkTransactionState >>> *blk_put_transaction_state(BlkTransactionState *bts) >>> +{ >>> + bts->refcount--; >>> + if (bts->refcount == 0) { >>> + g_free(bts); >> >> How about removing it from the lists it's in? Doesn't appear to be >> necessary, but I'd find it cleaner. >> >>> + return NULL; >>> + } >>> + return bts; >>> +} >>> + >>> +static void del_blk_transaction_list(BlkTransactionList *btl) >>> +{ >>> + BlkTransactionState *bts, *bts_next; >>> + >>> + /* The list should in normal cases be empty, >>> + * but in case someone really just wants to kibosh the whole >>> deal: */ >> >> Thank you for teaching me a new word. >> >>> + QSIMPLEQ_FOREACH_SAFE(bts, &btl->actions, list_entry, bts_next) { >>> + g_free(bts->opaque); >> >> Urp... Are you sure? :-/ >> >> I'd rather have some callback for destroying the object... Apparently >> this will (for now) be always useless overhead, because that callback is >> only calling g_free(), but it's an opaque pointer, so it's not really up >> to you to do anything with it other than passing it around. >> >>> + blk_put_transaction_state(bts); >>> + } >>> + >>> + g_free(btl); >>> +} >>> + >>> +static void blk_run_transaction_callbacks(BlkTransactionList *btl) >>> +{ >>> + BlkTransactionState *bts, *bts_next; >>> + >>> + QSIMPLEQ_FOREACH_SAFE(bts, &btl->actions, list_entry, bts_next) { >>> + if (bts->ops->cb) { >>> + bts->ops->cb(bts); >>> + } >>> + >>> + /* Free the BlkTransactionData */ >>> + g_free(bts->opaque); >> >> Again... Urp. If you know it's a BlkTransactionData object, please make >> it a BlkTransactionData pointer and don't call it "opaque" if it's not >> so opaque in the end. >> > > A good point. I use this "opaque" pointer to actually hold two very > specific and knowable things; it's just that those things change > during the lifetime of the object. > > For whatever reason, I decided it was nicer to not have two pointers > that are never used simultaneously. > > I can either make two dedicated fields, or just introduce it as an > union. The state of the object otherwise dictates which union field to > access. > > Because I am a complicated individual, I am thinking fondly of the > union right now. Good. I like unions. Other people don't. But I do. Especially anonymous unions, but I guess we don't have the luxury of C11 in qemu... > >>> + bts->opaque = NULL; >>> + blk_put_transaction_state(bts); >>> + } >>> + >>> + QSIMPLEQ_INIT(&btl->actions); >>> +} >>> + >>> +static BlkTransactionList >>> *put_blk_transaction_list(BlkTransactionList *btl) >>> +{ >>> + btl->jobs--; >>> + if (btl->jobs == 0) { >> >> Okay, this is interesting, because on the one hand it makes it clear why >> you did not call it "refcount", while on the other hand this is clearly >> a pattern which looks very much like handling a refcount. >> >> Maybe you could call the function differently, like >> "blk_transaction_list_job_completed()" or something. It's longer, but I >> think it reduces the change of people staring at this thinking "Why is >> 'jobs' actually a refcount?". >> > > OK, I'm up for suggestions on naming. I didn't really (deep in my > heart) expect v1 to be perfect. I know that feeling... And it's completely fine. That's what the reviewing process is for. > I think your main problem here is "jobs" as an integral refcount. > Maybe "num_jobs" or "jobs_remaining" etc would make it clearer. My main problem is that the function is named exactly like a function that decreases a refcount and the content of the function is exactly like a function that decreases a refcount, but the refcount isn't called 'refcount', but 'jobs'. Therefore, if you're trying to tie the refcount to a non-abstract concept (the number of jobs left to complete) by representing it as such, I'd like the function to represent that concept to (a job has been completed). > And then, sure, "blk_transaction_list_job_completed" would be a fine > alternative to "put." > >>> + blk_run_transaction_callbacks(btl); >>> + del_blk_transaction_list(btl); >>> + return NULL; >>> + } >>> + return btl; >>> +} >>> + >>> +static void blk_transaction_complete(BlkTransactionState *common) >>> +{ >>> + BlkTransactionList *btl = common->list; >>> + >>> + /* Add this action into the pile to be completed */ >>> + QSIMPLEQ_INSERT_TAIL(&btl->actions, common, list_entry); >>> + >>> + /* Inform the list that we have a completion; >>> + * possibly run all the pending actions. */ >>> + put_blk_transaction_list(btl); >>> +} >>> + >>> +/** >>> + * Intercept a callback that was issued due to a transactional action. >>> + */ >>> +static void transaction_callback(void *opaque, int ret) >>> +{ >>> + BlkTransactionState *common = opaque; >>> + BlkTransactionWrapper *btw = common->opaque; >>> + >>> + /* Prepare data for ops->cb() */ >>> + BlkTransactionData *btd = g_malloc0(sizeof(*btd)); >> >> g_new0(BlkTransactionData, 1); >> >>> + btd->opaque = btw->opaque; >>> + btd->ret = ret; >>> + >>> + /* Transaction state now tracks OUR data */ >>> + common->opaque = btd; >> >> Sorry, but I really have a hard time following this opaqueness... Again, >> if you can, please clarify what the objects are for, and it would be >> very nice to separate truly opaque objects from these internally used >> objects (which are managed by you and thus are to be managed by you >> (g_free()), because reusing void * pointers for different kinds of >> objects like this makes my brain go strawberry. >> > > I'll shoot for Raspberry in v2. Please don't try to add whipped cream on top. >>> + >>> + /* Keep track of the amalgamated return code */ >>> + common->list->status |= ret; >> >> Hm, I guess you're expecting ret to be -errno. In that case, you'd >> probably rather want "if (ret && (!common->list->status || >> common->list->status == -ENOSPC)) { common->list->status = ret; }" or >> something like that, because bit-OR-ing different -errno values will >> probably not turn out so great. >> > > I am only super interested in keeping a zero-or-nonzero cumulative > status -- the actual error code winds up not being so important. Maybe, but most places in qemu try to follow the -errno convention. I'd only deviate from it if there is a compelling reason to. If you want to keep it short, just make it "if (ret) { common->list->status = ret; }". As long as you don't intend to use bitmasks for the return values, this will give you at least the same amount of information: Some error occurred. Also, it'll even tell you the kind of at least on error that occurred (which may even be meaningful, if it's ENOSPC or something like that), so I don't think it's worse (other than requiring three lines instead of one). > The mechanisms here allow the delivery of both the "cumulative" and > the individual return code to the transactional callback, so error > messages etc. can fill in the blanks if they need to. > >>> + >>> + /* Deliver the intercepted callback FIRST */ >>> + btw->callback(btw->opaque, ret); >>> + blk_transaction_complete(common); >>> + g_free(btw); >>> +} >>> + >>> +typedef void (CallbackFn)(void *opaque, int ret); >>> + >>> +/* Temporary. Removed in the next patch. */ >> >> Actually, no. :-) >> >> (remove in patch 7) >> >> Why are you making them non-static in the first place? I see both >> functions mentioned in patch 3 and patch 7 only (except for context in >> patch 6). >> > > Won't compile otherwise, because they have no users as of this patch. > I am trying to keep the size of these patches down and in a sane order. Ah, hm, right... > Once a user is added, we can re-add the static attribute. Well, there's __attribute__((used)), but I see, yep. (__attribute__((used)) would be cleaner because it doesn't litter the global namespace; on the other hand, if there are no conflict, in practice just not making it static is probably the better choice (I don't think any compiler that doesn't understand __attribute__((used)) will build qemu at all, but you never know what people try)) >>> +CallbackFn *new_transaction_wrapper(BlkTransactionState *common, >>> + void *opaque, >>> + void (*callback)(void *, int), >>> + void **new_opaque); >>> + >>> +void undo_transaction_wrapper(BlkTransactionState *common); >>> + >>> +/** >>> + * Create a new transactional callback wrapper. >>> + * >>> + * Given a callback and a closure, generate a new >>> + * callback and closure that will invoke the >>> + * given callback with the given closure. >>> + * >>> + * After all wrappers in the transactional group have >>> + * been processed, each action's .cb() method will be >>> + * invoked. >>> + * >>> + * @common The transactional state to set a callback for. >>> + * @opaque A closure intended for the encapsulated callback. >>> + * @callback The callback we are encapsulating. >>> + * @new_opaque The closure to be used instead of @opaque. >>> + * >>> + * @return The callback to be used instead of @callback. >>> + */ >>> +CallbackFn *new_transaction_wrapper(BlkTransactionState *common, >>> + void *opaque, >>> + CallbackFn *callback, >>> + void **new_opaque) >>> +{ >>> + BlkTransactionWrapper *btw = g_malloc0(sizeof(*btw)); >> >> g_new0(BlkTransactionWrapper, 1); >> >>> + assert(new_opaque); >>> + >>> + /* Stash the original callback information */ >>> + btw->opaque = opaque; >>> + btw->callback = callback; >>> + common->opaque = btw; >>> + >>> + /* The BTS will serve as our new closure */ >>> + *new_opaque = common; >>> + common->refcount++; >>> + >>> + /* Inform the transaction BTL to expect one more return */ >>> + common->list->jobs++; >>> + >>> + /* Lastly, the actual callback function to handle the >>> interception. */ >>> + return transaction_callback; >>> +} >>> + >>> +/** >>> + * Undo any actions performed by the above call. >>> + */ >>> +void undo_transaction_wrapper(BlkTransactionState *common) >>> +{ >>> + BlkTransactionList *btl = common->list; >>> + BlkTransactionState *bts; >>> + BlkTransactionData *btd; >>> + >>> + /* Stage 0: Wrapper was never created: */ >>> + if (common->opaque == NULL && common->refcount == 1) { >>> + return; >>> + } >>> + >>> + /* Stage 2: Job already completed or was canceled. >>> + * Force an error in the callback data and just invoke the >>> completion >>> + * handler to perform appropriate cleanup for us. >>> + */ >>> + QSIMPLEQ_FOREACH(bts, &btl->actions, list_entry) { >>> + if (bts == common) { >>> + btd = common->opaque; >>> + /* Force error for callback */ >>> + btd->ret = -1; >> >> No -errno? >> > > Hmm. I suppose I should emulate one. What errno should I emulate for > "One of your siblings died, and now you also need to take a dirtnap?" That's always a hard question, isn't it? :-) I'd really like an error code like "EEVERYTHINGWENTWRONG" or "EGOSHIDONTKNOW". I'm always consulting errno(3) at this point... ... Well, there's ECANCELED, maybe that's okay. > >>> + common->ops->cb(common); >>> + QSIMPLEQ_REMOVE(&btl->actions, common, >>> + BlkTransactionState, list_entry); >>> + goto cleanup; >>> + } >>> + } >>> + /* Stage 1: Callback created, but job never launched */ >>> + put_blk_transaction_list(common->list); >>> + cleanup: >>> + g_free(common->opaque); >> >> urp >> > > ok, ok! I'm sorry! ... kind of ... > >>> + blk_put_transaction_state(common); >>> +} >>> + >>> /* internal snapshot private data */ >>> typedef struct InternalSnapshotState { >>> BlkTransactionState common; >>> @@ -1775,7 +1990,7 @@ static const BdrvActionOps actions[] = { >>> .instance_size = sizeof(DriveBackupState), >>> .prepare = drive_backup_prepare, >>> .abort = drive_backup_abort, >>> - .clean = drive_backup_clean, >>> + .clean = drive_backup_clean >> >> Probably not intended, I guess. >> > > Did an earlier patch of mine goof this up? The way this patch leaves > it looks correct. Both are correct. However, generally qemu likes to keep commas at the end of initializers because it makes it easier to add new fields later on. > >>> }, >>> [TRANSACTION_ACTION_KIND_BLOCKDEV_BACKUP] = { >>> .instance_size = sizeof(BlockdevBackupState), >>> @@ -1815,10 +2030,12 @@ void qmp_transaction(TransactionActionList >>> *dev_list, Error **errp) >>> { >>> TransactionActionList *dev_entry = dev_list; >>> BlkTransactionState *state, *next; >>> + BlkTransactionList *btl; >>> Error *local_err = NULL; >>> QSIMPLEQ_HEAD(snap_bdrv_states, BlkTransactionState) >>> snap_bdrv_states; >>> QSIMPLEQ_INIT(&snap_bdrv_states); >>> + btl = new_blk_transaction_list(); >>> /* drain all i/o before any operations */ >>> bdrv_drain_all(); >>> @@ -1837,8 +2054,10 @@ void qmp_transaction(TransactionActionList >>> *dev_list, Error **errp) >>> assert(ops->instance_size > 0); >>> state = g_malloc0(ops->instance_size); >>> + state->refcount = 1; >>> state->ops = ops; >>> state->action = dev_info; >>> + state->list = btl; >>> QSIMPLEQ_INSERT_TAIL(&snap_bdrv_states, state, entry); >>> state->ops->prepare(state, &local_err); >>> @@ -1869,8 +2088,10 @@ exit: >>> if (state->ops->clean) { >>> state->ops->clean(state); >>> } >>> - g_free(state); >>> + blk_put_transaction_state(state); >>> } >>> + >>> + put_blk_transaction_list(btl); >>> } >> >> Sorry, as I said, I need more context on the objects and very much would >> like a separation between truly opaque pointers and not-so-opaque >> pointers, before I can really understand what's going on (or I put more >> effort into it, but since it's always more probable to belong to the >> majority, I guess I won't be the only one getting confused). >> >> Maybe the next patches will clear it up, I don't know. >> >> Max > > I'll put some more elbow grease into explaining the flow. Thanks! Max