qemu-devel.nongnu.org archive mirror
 help / color / mirror / Atom feed
From: Hailiang Zhang <zhang.zhanghailiang@huawei.com>
To: John Snow <jsnow@redhat.com>, Markus Armbruster <armbru@redhat.com>
Cc: qemu-block@nongnu.org, lizhijian@cn.fujitsu.com,
	quintela@redhat.com, qemu-devel@nongnu.org,
	yunhong.jiang@intel.com, eddie.dong@intel.com,
	peter.huangpeng@huawei.com,
	Michael Roth <mdroth@linux.vnet.ibm.com>,
	arei.gonglei@huawei.com, stefanha@redhat.com,
	amit.shah@redhat.com, dgilbert@redhat.com,
	hongyang.yang@easystack.cn
Subject: Re: [Qemu-devel] [Qemu-block] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error
Date: Wed, 23 Dec 2015 11:14:01 +0800	[thread overview]
Message-ID: <567A1179.2040509@huawei.com> (raw)
In-Reply-To: <56786BA0.70400@redhat.com>

On 2015/12/22 5:14, John Snow wrote:
>
>
> On 12/19/2015 05:02 AM, Markus Armbruster wrote:
>> Copying qemu-block because this seems related to generalising block jobs
>> to background jobs.
>>
>> zhanghailiang <zhang.zhanghailiang@huawei.com> writes:
>>
>>> If some errors happen during VM's COLO FT stage, it's important to notify the users
>>> of this event. Together with 'colo_lost_heartbeat', users can intervene in COLO's
>>> failover work immediately.
>>> If users don't want to get involved in COLO's failover verdict,
>>> it is still necessary to notify users that we exited COLO mode.
>>>
>>> Cc: Markus Armbruster <armbru@redhat.com>
>>> Cc: Michael Roth <mdroth@linux.vnet.ibm.com>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> Signed-off-by: Li Zhijian <lizhijian@cn.fujitsu.com>
>>> ---
>>> v11:
>>> - Fix several typos found by Eric
>>>
>>> Signed-off-by: zhanghailiang <zhang.zhanghailiang@huawei.com>
>>> ---
>>>   docs/qmp-events.txt | 17 +++++++++++++++++
>>>   migration/colo.c    | 11 +++++++++++
>>>   qapi-schema.json    | 16 ++++++++++++++++
>>>   qapi/event.json     | 17 +++++++++++++++++
>>>   4 files changed, 61 insertions(+)
>>>
>>> diff --git a/docs/qmp-events.txt b/docs/qmp-events.txt
>>> index d2f1ce4..19f68fc 100644
>>> --- a/docs/qmp-events.txt
>>> +++ b/docs/qmp-events.txt
>>> @@ -184,6 +184,23 @@ Example:
>>>   Note: The "ready to complete" status is always reset by a BLOCK_JOB_ERROR
>>>   event.
>>>
>>> +COLO_EXIT
>>> +---------
>>> +
>>> +Emitted when VM finishes COLO mode due to some errors happening or
>>> +at the request of users.
>>
>> How would the event's recipient distinguish between "due to error" and
>> "at the user's request"?
>>
>>> +
>>> +Data:
>>> +
>>> + - "mode": COLO mode, primary or secondary side (json-string)
>>> + - "reason":  the exit reason, internal error or external request. (json-string)
>>> + - "error": error message (json-string, operation)
>>> +
>>> +Example:
>>> +
>>> +{"timestamp": {"seconds": 2032141960, "microseconds": 417172},
>>> + "event": "COLO_EXIT", "data": {"mode": "primary", "reason": "request" } }
>>> +
>>
>> Pardon my ignorance again...  Does "VM finishes COLO mode" means have
>> some kind of COLO background job, and it just finished for whatever
>> reason?
>>
>> If yes, this COLO job could be an instance of the general background job
>> concept we're trying to grow from the existing block job concept.
>>
>> I'm not asking you to rebase your work onto the background job
>> infrastructure, not least for the simple reason that it doesn't exist,
>> yet.  But I think it would be fruitful to compare your COLO job
>> management QMP interface with the one we have for block jobs.  Not only
>> may that avoid unnecessary inconsistency, it could also help shape the
>> general background job interface.
>>
>
> Yes. The "background job" concept doesn't exist in a formal way outside
> of the block layer yet, but we're looking to expand it as we re-tool the
> block jobs themselves.
>
> It may be the case that the COLO commands and events need to go in as
> they are now, but later we can bring them back into the generalized job
> infrastructure.
>

Agreed. ;)

>> Quick overview of the block job QMP interface:
>>
>> * Commands to create a job: block-commit, block-stream, drive-mirror,
>>    drive-backup.
>>
>> * Get information on jobs: query-block-jobs
>>
>> * Pause a job: block-job-pause
>>
>> * Resume a job: block-job-resume
>>
>> * Cancel a job: block-job-cancel
>>
>> * Block job completion events: BLOCK_JOB_COMPLETED, BLOCK_JOB_CANCELLED
>>
>> * Block job error event: BLOCK_JOB_ERROR
>>
>> * Block job synchronous completion: event BLOCK_JOB_READY and command
>>    block-job-complete
>>
>
> The block-agnostic version of these commands would likely be:
>
> query-jobs
> job-pause
> job-resume
> job-cancel
> job-complete
>
> Events: JOB_COMPLETED, JOB_CANCELLED, JOB_ERROR, JOB_READY.
>
>
> It looks like COLO_EXIT would be an instance of JOB_COMPLETED, and if it
> occurred due to an error, we'd also see JOB_ERROR emitted.
>

Yes, if we use this job frame for COLO, the COLO_EXIT will be like that.

>>>   DEVICE_DELETED
>>>   --------------
>>>
>>> diff --git a/migration/colo.c b/migration/colo.c
>>> index d1dd4e1..d06c14f 100644
>>> --- a/migration/colo.c
>>> +++ b/migration/colo.c
>>> @@ -18,6 +18,7 @@
>>>   #include "qemu/error-report.h"
>>>   #include "qemu/sockets.h"
>>>   #include "migration/failover.h"
>>> +#include "qapi-event.h"
>>>
>>>   /* colo buffer */
>>>   #define COLO_BUFFER_BASE_SIZE (4 * 1024 * 1024)
>>> @@ -349,6 +350,11 @@ static void colo_process_checkpoint(MigrationState *s)
>>>   out:
>>>       if (ret < 0) {
>>>           error_report("%s: %s", __func__, strerror(-ret));
>>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_ERROR,
>>> +                                  true, strerror(-ret), NULL);
>>> +    } else {
>>> +        qapi_event_send_colo_exit(COLO_MODE_PRIMARY, COLO_EXIT_REASON_REQUEST,
>>> +                                  false, NULL, NULL);
>>>       }
>>>
>>>       qsb_free(buffer);
>>> @@ -516,6 +522,11 @@ out:
>>>       if (ret < 0) {
>>>           error_report("colo incoming thread will exit, detect error: %s",
>>>                        strerror(-ret));
>>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_ERROR,
>>> +                                  true, strerror(-ret), NULL);
>>> +    } else {
>>> +        qapi_event_send_colo_exit(COLO_MODE_SECONDARY, COLO_EXIT_REASON_REQUEST,
>>> +                                  false, NULL, NULL);
>>>       }
>>>
>>>       if (fb) {
>>> diff --git a/qapi-schema.json b/qapi-schema.json
>>> index feb7d53..f6ecb88 100644
>>> --- a/qapi-schema.json
>>> +++ b/qapi-schema.json
>>> @@ -778,6 +778,22 @@
>>>     'data': [ 'unknown', 'primary', 'secondary'] }
>>>
>>>   ##
>>> +# @COLOExitReason
>>> +#
>>> +# The reason for a COLO exit
>>> +#
>>> +# @unknown: unknown reason
>>
>> How can @unknown happen?
>>
>>> +#
>>> +# @request: COLO exit is due to an external request
>>> +#
>>> +# @error: COLO exit is due to an internal error
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'enum': 'COLOExitReason',
>>> +  'data': [ 'unknown', 'request', 'error'] }
>>> +
>>> +##
>>>   # @x-colo-lost-heartbeat
>>>   #
>>>   # Tell qemu that heartbeat is lost, request it to do takeover procedures.
>>> diff --git a/qapi/event.json b/qapi/event.json
>>> index f0cef01..f63d456 100644
>>> --- a/qapi/event.json
>>> +++ b/qapi/event.json
>>> @@ -255,6 +255,23 @@
>>>     'data': {'status': 'MigrationStatus'}}
>>>
>>>   ##
>>> +# @COLO_EXIT
>>> +#
>>> +# Emitted when VM finishes COLO mode due to some errors happening or
>>> +# at the request of users.
>>> +#
>>> +# @mode: which COLO mode the VM was in when it exited.
>>
>> Can we get 'unknown' here?
>>
>>> +#
>>> +# @reason: describes the reason for the COLO exit.
>>
>> Can we get 'unknown' here?
>>
>>> +#
>>> +# @error: #optional, error message. Only present on error happening.
>>> +#
>>> +# Since: 2.6
>>> +##
>>> +{ 'event': 'COLO_EXIT',
>>> +  'data': {'mode': 'COLOMode', 'reason': 'COLOExitReason', '*error': 'str' } }
>>> +
>>> +##
>>>   # @ACPI_DEVICE_OST
>>>   #
>>>   # Emitted when guest executes ACPI _OST method.
>>
>

  reply	other threads:[~2015-12-23  3:15 UTC|newest]

Thread overview: 94+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2015-12-15  8:22 [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 01/38] configure: Add parameter for configure to enable/disable COLO support zhanghailiang
2015-12-15  9:46   ` Wen Congyang
2015-12-15 11:19     ` Hailiang Zhang
2015-12-15 11:31     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 02/38] migration: Introduce capability 'x-colo' to migration zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 03/38] COLO: migrate colo related info to secondary node zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 04/38] migration: Export migrate_set_state() zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 05/38] migration: Add state records for migration incoming zhanghailiang
2015-12-15 17:36   ` Dr. David Alan Gilbert
2015-12-16  5:37     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 06/38] migration: Integrate COLO checkpoint process into migration zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 07/38] migration: Integrate COLO checkpoint process into loadvm zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 08/38] migration: Rename the'file' member of MigrationState zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 09/38] COLO/migration: Create a new communication path from destination to source zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 10/38] COLO: Implement colo checkpoint protocol zhanghailiang
2015-12-18 14:52   ` Dr. David Alan Gilbert
2015-12-28  7:34     ` Hailiang Zhang
2015-12-19  8:54   ` Markus Armbruster
2015-12-22  7:00     ` Hailiang Zhang
2016-01-11 12:47       ` Markus Armbruster
2016-01-12 12:57         ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 11/38] COLO: Add a new RunState RUN_STATE_COLO zhanghailiang
2015-12-19  9:27   ` Markus Armbruster
2015-12-22 13:32     ` Hailiang Zhang
2016-01-11 13:16       ` Markus Armbruster
2016-01-12 12:54         ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 12/38] QEMUSizedBuffer: Introduce two help functions for qsb zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 13/38] COLO: Save PVM state to secondary side when do checkpoint zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 14/38] ram: Split host_from_stream_offset() into two helper functions zhanghailiang
2015-12-18 15:18   ` Dr. David Alan Gilbert
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 15/38] COLO: Load PVM's dirty pages into SVM's RAM cache temporarily zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 16/38] ram/COLO: Record the dirty pages that SVM received zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 17/38] COLO: Load VMState into qsb before restore it zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 18/38] COLO: Flush PVM's cached RAM into SVM's memory zhanghailiang
2015-12-15 11:07   ` Changlong Xie
2015-12-25  3:03     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 19/38] COLO: Add checkpoint-delay parameter for migrate-set-parameters zhanghailiang
2015-12-19  9:33   ` Markus Armbruster
2015-12-22 13:43     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 20/38] COLO: synchronize PVM's state to SVM periodically zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 21/38] COLO failover: Introduce a new command to trigger a failover zhanghailiang
2015-12-18 15:27   ` Dr. David Alan Gilbert
2015-12-19  9:38   ` Markus Armbruster
2015-12-22 13:50     ` Hailiang Zhang
2015-12-25  2:27       ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 22/38] COLO failover: Introduce state to record failover process zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 23/38] COLO: Implement failover work for Primary VM zhanghailiang
2015-12-18 15:35   ` Dr. David Alan Gilbert
2015-12-28  7:39     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 24/38] COLO: Implement failover work for Secondary VM zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 25/38] qmp event: Add event notification for COLO error zhanghailiang
2015-12-18 16:03   ` Eric Blake
2015-12-23  1:55     ` Hailiang Zhang
2015-12-19 10:02   ` Markus Armbruster
2015-12-21 21:14     ` [Qemu-devel] [Qemu-block] " John Snow
2015-12-23  3:14       ` Hailiang Zhang [this message]
2015-12-23  1:24     ` [Qemu-devel] " Wen Congyang
2016-01-05 19:21       ` [Qemu-devel] [Qemu-block] " John Snow
2015-12-23  3:10     ` [Qemu-devel] " Hailiang Zhang
2016-01-11 13:24       ` Markus Armbruster
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 26/38] COLO failover: Shutdown related socket fd when do failover zhanghailiang
2015-12-15  9:44   ` Dr. David Alan Gilbert
2015-12-15 10:23   ` Dr. David Alan Gilbert
2015-12-16  5:58     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 27/38] COLO failover: Don't do failover during loading VM's state zhanghailiang
2015-12-15 10:21   ` Dr. David Alan Gilbert
2015-12-25  1:02     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 28/38] COLO: Process shutdown command for VM in COLO state zhanghailiang
2015-12-15 11:31   ` Dr. David Alan Gilbert
2015-12-25  6:13     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 29/38] COLO: Update the global runstate after going into colo state zhanghailiang
2015-12-15 11:52   ` Dr. David Alan Gilbert
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 30/38] savevm: Split load vm state function qemu_loadvm_state zhanghailiang
2015-12-15 12:08   ` Dr. David Alan Gilbert
2015-12-25  6:37     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 31/38] COLO: Separate the process of saving/loading ram and device state zhanghailiang
2015-12-18 10:53   ` Dr. David Alan Gilbert
2015-12-28  3:46     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 32/38] COLO: Split qemu_savevm_state_begin out of checkpoint process zhanghailiang
2015-12-18 12:01   ` Dr. David Alan Gilbert
2015-12-28  7:29     ` Hailiang Zhang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 33/38] net/filter-buffer: Add default filter-buffer for each netdev zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 34/38] filter-buffer: Accept zero interval zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 35/38] filter-buffer: Introduce a helper function to enable/disable default filter zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 36/38] filter-buffer: Introduce a helper function to release packets zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 37/38] colo: Use default buffer-filter to buffer and " zhanghailiang
2015-12-15  8:22 ` [Qemu-devel] [PATCH COLO-Frame v12 38/38] COLO: Add block replication into colo process zhanghailiang
2015-12-15 12:14 ` [Qemu-devel] [PATCH COLO-Frame v12 00/38] COarse-grain LOck-stepping(COLO) Virtual Machines for Non-stop Service (FT) Dr. David Alan Gilbert
2015-12-15 12:41   ` Hailiang Zhang
2015-12-17 10:52     ` Dr. David Alan Gilbert
2015-12-18  1:10       ` Hailiang Zhang
2015-12-18 15:47         ` Dr. David Alan Gilbert
2015-12-23  1:24           ` Hailiang Zhang

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=567A1179.2040509@huawei.com \
    --to=zhang.zhanghailiang@huawei.com \
    --cc=amit.shah@redhat.com \
    --cc=arei.gonglei@huawei.com \
    --cc=armbru@redhat.com \
    --cc=dgilbert@redhat.com \
    --cc=eddie.dong@intel.com \
    --cc=hongyang.yang@easystack.cn \
    --cc=jsnow@redhat.com \
    --cc=lizhijian@cn.fujitsu.com \
    --cc=mdroth@linux.vnet.ibm.com \
    --cc=peter.huangpeng@huawei.com \
    --cc=qemu-block@nongnu.org \
    --cc=qemu-devel@nongnu.org \
    --cc=quintela@redhat.com \
    --cc=stefanha@redhat.com \
    --cc=yunhong.jiang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).