From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:57653) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1aBZtz-00013l-Bu for qemu-devel@nongnu.org; Tue, 22 Dec 2015 22:15:48 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1aBZtw-0002ho-0k for qemu-devel@nongnu.org; Tue, 22 Dec 2015 22:15:47 -0500 References: <1450167779-9960-1-git-send-email-zhang.zhanghailiang@huawei.com> <1450167779-9960-26-git-send-email-zhang.zhanghailiang@huawei.com> <8737uypxi9.fsf@blackfin.pond.sub.org> <56786BA0.70400@redhat.com> From: Hailiang Zhang Message-ID: <567A1179.2040509@huawei.com> Date: Wed, 23 Dec 2015 11:14:01 +0800 MIME-Version: 1.0 In-Reply-To: <56786BA0.70400@redhat.com> Content-Type: text/plain; charset="windows-1252"; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , To: John Snow , Markus Armbruster Cc: qemu-block@nongnu.org, lizhijian@cn.fujitsu.com, quintela@redhat.com, qemu-devel@nongnu.org, yunhong.jiang@intel.com, eddie.dong@intel.com, peter.huangpeng@huawei.com, Michael Roth , arei.gonglei@huawei.com, stefanha@redhat.com, amit.shah@redhat.com, dgilbert@redhat.com, hongyang.yang@easystack.cn On 2015/12/22 5:14, John Snow wrote: > > > On 12/19/2015 05:02 AM, Markus Armbruster wrote: >> Copying qemu-block because this seems related to generalising block jobs >> to background jobs. >> >> zhanghailiang writes: >> >>> If some errors happen during VM's COLO FT stage, it's important to notify the users >>> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's >>> failover work immediately. >>> If users don't want to get involved in COLO's failover verdict, >>> it is still necessary to notify users that we exited COLO mode. >>> >>> Cc: Markus Armbruster >>> Cc: Michael Roth >>> Signed-off-by: zhanghailiang >>> Signed-off-by: Li Zhijian >>> --- >>> v11: >>> - Fix several typos found by Eric >>> >>> Signed-off-by: zhanghailiang >>> --- >>> docs/qmp-events.txt | 17 +++++++++++++++++ >>> migration/colo.c | 11 +++++++++++ >>> qapi-schema.json | 16 ++++++++++++++++ >>> qapi/event.json | 17 +++++++++++++++++ >>> 4 files changed, 61 insertions(+) >>> >>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt >>> index d2f1ce4..19f68fc 100644 >>> --- a/docs/qmp-events.txt >>> +++ b/docs/qmp-events.txt >>> @@ -184,6 +184,23 @@ Example: >>> Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR >>> event. >>> >>> +COLO_EXIT >>> +--------- >>> + >>> +Emitted when VM finishes COLO mode due to some errors happening or >>> +at the request of users. >> >> How would the event's recipient distinguish between "due to error" and >> "at the user's request"? >> >>> + >>> +Data: >>> + >>> + - "mode": COLO mode, primary or secondary side (json-string) >>> + - "reason": the exit reason, internal error or external request. (json-string) >>> + - "error": error message (json-string, operation) >>> + >>> +Example: >>> + >>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172}, >>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } } >>> + >> >> Pardon my ignorance again... Does "VM finishes COLO mode" means have >> some kind of COLO background job, and it just finished for whatever >> reason? >> >> If yes, this COLO job could be an instance of the general background job >> concept we're trying to grow from the existing block job concept. >> >> I'm not asking you to rebase your work onto the background job >> infrastructure, not least for the simple reason that it doesn't exist, >> yet. But I think it would be fruitful to compare your COLO job >> management QMP interface with the one we have for block jobs. Not only >> may that avoid unnecessary inconsistency, it could also help shape the >> general background job interface. >> > > Yes. The "background job" concept doesn't exist in a formal way outside > of the block layer yet, but we're looking to expand it as we re-tool the > block jobs themselves. > > It may be the case that the COLO commands and events need to go in as > they are now, but later we can bring them back into the generalized job > infrastructure. > Agreed. ;) >> Quick overview of the block job QMP interface: >> >> * Commands to create a job: block-commit, block-stream, drive-mirror, >> drive-backup. >> >> * Get information on jobs: query-block-jobs >> >> * Pause a job: block-job-pause >> >> * Resume a job: block-job-resume >> >> * Cancel a job: block-job-cancel >> >> * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED >> >> * Block job error event: BLOCK_JOB_ERROR >> >> * Block job synchronous completion: event BLOCK_JOB_READY and command >> block-job-complete >> > > The block-agnostic version of these commands would likely be: > > query-jobs > job-pause > job-resume > job-cancel > job-complete > > Events: JOB_COMPLETED, JOB_CANCELLED, JOB_ERROR, JOB_READY. > > > It looks like COLO_EXIT would be an instance of JOB_COMPLETED, and if it > occurred due to an error, we'd also see JOB_ERROR emitted. > Yes, if we use this job frame for COLO, the COLO_EXIT will be like that. >>> DEVICE_DELETED >>> -------------- >>> >>> diff --git a/migration/colo.c b/migration/colo.c >>> index d1dd4e1..d06c14f 100644 >>> --- a/migration/colo.c >>> +++ b/migration/colo.c >>> @@ -18,6 +18,7 @@ >>> #include "qemu/error-report.h" >>> #include "qemu/sockets.h" >>> #include "migration/failover.h" >>> +#include "qapi-event.h" >>> >>> /* colo buffer */ >>> #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024) >>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s) >>> out: >>> if (ret < 0) { >>> error_report("%s: %s", __func__, strerror(-ret)); >>> + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR, >>> + true, strerror(-ret), NULL); >>> + } else { >>> + qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST, >>> + false, NULL, NULL); >>> } >>> >>> qsb_free(buffer); >>> @@ -516,6 +522,11 @@ out: >>> if (ret < 0) { >>> error_report("colo incoming thread will exit, detect error: %s", >>> strerror(-ret)); >>> + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR, >>> + true, strerror(-ret), NULL); >>> + } else { >>> + qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_REQUEST, >>> + false, NULL, NULL); >>> } >>> >>> if (fb) { >>> diff --git a/qapi-schema.json b/qapi-schema.json >>> index feb7d53..f6ecb88 100644 >>> --- a/qapi-schema.json >>> +++ b/qapi-schema.json >>> @@ -778,6 +778,22 @@ >>> 'data': [ 'unknown', 'primary', 'secondary'] } >>> >>> ## >>> +# @COLOExitReason >>> +# >>> +# The reason for a COLO exit >>> +# >>> +# @unknown: unknown reason >> >> How can @unknown happen? >> >>> +# >>> +# @request: COLO exit is due to an external request >>> +# >>> +# @error: COLO exit is due to an internal error >>> +# >>> +# Since: 2.6 >>> +## >>> +{ 'enum': 'COLOExitReason', >>> + 'data': [ 'unknown', 'request', 'error'] } >>> + >>> +## >>> # @x-colo-lost-heartbeat >>> # >>> # Tell qemu that heartbeat is lost, request it to do takeover procedures. >>> diff --git a/qapi/event.json b/qapi/event.json >>> index f0cef01..f63d456 100644 >>> --- a/qapi/event.json >>> +++ b/qapi/event.json >>> @@ -255,6 +255,23 @@ >>> 'data': {'status': 'MigrationStatus'}} >>> >>> ## >>> +# @COLO_EXIT >>> +# >>> +# Emitted when VM finishes COLO mode due to some errors happening or >>> +# at the request of users. >>> +# >>> +# @mode: which COLO mode the VM was in when it exited. >> >> Can we get 'unknown' here? >> >>> +# >>> +# @reason: describes the reason for the COLO exit. >> >> Can we get 'unknown' here? >> >>> +# >>> +# @error: #optional, error message. Only present on error happening. >>> +# >>> +# Since: 2.6 >>> +## >>> +{ 'event': 'COLO_EXIT', >>> + 'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } } >>> + >>> +## >>> # @ACPI_DEVICE_OST >>> # >>> # Emitted when guest executes ACPI _OST method. >> >