* Re: [Question] virtio-serial misses irq delivery on migration?
@ 2016-08-06 10:19 Gonglei (Arei)
2016-08-08 12:13 ` [Qemu-devel] " Dr. David Alan Gilbert
0 siblings, 1 reply; 5+ messages in thread
From: Gonglei (Arei) @ 2016-08-06 10:19 UTC (permalink / raw)
To: Gonglei (Arei), Paolo Bonzini, jan.kiszka@siemens.com,
amit.shah@redhat.com, quintela@redhat.com
Cc: Huangweidong (C), qemu-devel@nongnu.org, kvm@vger.kernel.org
Hi all,
I might catch the bug. Now, we rely on a vm_clock timer to assure that the vcpus are started
before we invoke send_control_event(),which queue a message in the virtio ring and trigger a irq.
In actually, we can't attach our initial intention under a special situation:
1. the process of migration destination is in a coroutine
Commit 82a4da79fd6
2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
if the Qemu encounter a EAGIN while reading QEMUFile.
commit 595ab64169b
static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
size_t size)
{
QEMUFileSocket *s = opaque;
ssize_t len;
for (;;) {
len = qemu_recv(s->fd, buf, size, 0);
if (len != -1) {
break;
}
if (socket_error() == EAGAIN) {
yield_until_fd_readable(s->fd);
} else if (socket_error() != EINTR) {
break;
}
}
if (len == -1) {
len = -socket_error();
}
return len;
}
3. The main thread can get the cpu and timer will run, asumes that we get EAGIN
after invoking fetch_active_ports_list().
4. reproduce the problem by fault injection.
The debug logs:
[2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
[2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "xbzrle"}]}
[2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
[2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "rdma-pin-all"}]}
[2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter process_incoming_migration_co() // Enter corountine
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
[2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
[2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will init virtqueue
[2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: ************** begin to inject debug ********** // fault injection begin, migration coroutine yiled cpu
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1, current_time: 9750771061592 //vm_clock timer boom
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei: virtio_serial_post_load_timer_cb //calling the timer callback
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 0 // send a message to the guest, raise the irq line.
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
[2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 64056}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 0
[2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750770788478, current_time: 9750771061592
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state //Enter migration coroutine again, and the restore the lapic register, the previous lapic's info will be covered :(
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
[2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
[2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
[2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start //enter vm_start here
[2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that means DRIVER OK
[2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
[2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1, old: 1
[2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750772061592, current_time: 9750794269805
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750795284148, current_time: 9750795312685
[2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750796329656, current_time: 9750796369458
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750797396208, current_time: 9750797426752
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750798445707, current_time: 9750798477520
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750799496162, current_time: 9750799532183
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750800556817, current_time: 9750800588741
[2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750801606596, current_time: 9750801637961
The below is my debugging diff (Not based on the newest master branch)
diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
index 6a45af9..a00b02f 100644
--- a/hw/char/virtio-serial-bus.c
+++ b/hw/char/virtio-serial-bus.c
@@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
if (!s->post_load) {
return;
}
+ QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
port = s->post_load->connected[i].port;
host_connected = s->post_load->connected[i].host_connected;
@@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
*/
send_control_event(s, port->id, VIRTIO_CONSOLE_PORT_OPEN,
port->host_connected);
+ QEMU_LOG("gonglei: port->host_connected: %u\n", port->host_connected);
}
vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
if (vsc->set_guest_connected) {
@@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id,
}
}
timer_mod(s->post_load->timer, 1);
+ QEMU_LOG("gonglei: ************** begin to inject debug **********\n");
+ yield_until_fd_readable(qemu_get_fd(f));
return 0;
}
@@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
if (version_id > 3) {
return -EINVAL;
}
-
+ QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
/* The virtio device */
return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
}
diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
index 5b47056..2d988cb 100644
--- a/hw/i386/kvm/apic.c
+++ b/hw/i386/kvm/apic.c
@@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct kvm_lapic_state *kapic)
{
APICCommonState *s = APIC_COMMON(dev);
int i;
-
+ QEMU_LOG("gonglei: kvm_put_apic_state\n");
memset(kapic, 0, sizeof(*kapic));
kvm_apic_set_reg(kapic, 0x2, s->id << 24);
kvm_apic_set_reg(kapic, 0x8, s->tpr);
diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
index d2a6c4c..07699ec 100644
--- a/hw/i386/kvm/ioapic.c
+++ b/hw/i386/kvm/ioapic.c
@@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
chip.chip_id = KVM_IRQCHIP_IOAPIC;
kioapic = &chip.chip.ioapic;
-
+ QEMU_LOG("gonglei: kvm_ioapic_put\n");
kioapic->id = s->id;
kioapic->ioregsel = s->ioregsel;
kioapic->base_address = s->busdev.mmio[0].addr;
diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
index 042e960..110be20 100644
--- a/hw/intc/apic_common.c
+++ b/hw/intc/apic_common.c
@@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
if (info->post_load) {
info->post_load(s);
}
+ QEMU_LOG("gonglei: apic_dispatch_post_load\n");
return 0;
}
diff --git a/migration/migration.c b/migration/migration.c
index 92c1427..d074c0a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -167,7 +167,7 @@ static void process_incoming_migration_co(void *opaque)
const char *arpGuestCMD = "{\"execute\":\"guest-write-flag-arp\"}\n";
const char *portName = "charchannel1";
size_t uRet;
-
+ QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
ret = qemu_loadvm_state(f);
qemu_fclose(f);
free_xbzrle_decoded_buf();
diff --git a/qemu-timer.c b/qemu-timer.c
index 5741f0d..dd36bc9 100644
--- a/qemu-timer.c
+++ b/qemu-timer.c
@@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
QEMUTimerList *tl;
bool old = clock->enabled;
clock->enabled = enabled;
+ QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled, old);
if (enabled && !old) {
qemu_clock_notify(type);
} else if (!enabled && old) {
@@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
bool progress = false;
QEMUTimerCB *cb;
void *opaque;
+ static int count = 10;
qemu_event_reset(&timer_list->timers_done_ev);
if (!timer_list->clock->enabled) {
@@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
qemu_mutex_unlock(&timer_list->active_timers_lock);
break;
}
-
+ if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
+ QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
+ ts->expire_time, current_time);
+ }
/* remove timer from the list before calling the callback */
timer_list->active_timers = ts->next;
ts->next = NULL;
diff --git a/vl.c b/vl.c
index 06f34fe..824b246 100644
--- a/vl.c
+++ b/vl.c
@@ -978,7 +978,7 @@ void vm_start(void)
{
RunState requested;
int64_t start_time, end_time;
-
+ QEMU_LOG("gonglei: enter vm_start\n");
qemu_vmstop_requested(&requested);
if (runstate_is_running() && requested == RUN_STATE_MAX) {
return;
Can we wait looply the migration process finished when EAGAIN?
Regards,
-Gonglei
> -----Original Message-----
> From: Gonglei (Arei)
> Sent: Friday, August 05, 2016 5:14 PM
> To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> Subject: [Question] virtio-serial misses irq delivery on migration?
>
> Hi Paolo , Jan, Amit
>
> Recently we encountered a problem that the virtio-serial can't work after
> Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
>
> It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> because
> the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> interrupt
> can't be injected to VM because the irq line bit had been set to 1, but the
> frontend
> driver never handle it or never know it.
>
> Bug 867366 - virtio-serial misses irq delivery on migration
> https://bugzilla.redhat.com/show_bug.cgi?id=867366
>
> But my qemu is the newest qemu, Both commit 80dcfb8532"virtio-serial-bus:
> post_load
> send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> propagate
> guest_connected to the port on post_load" are applied.
>
> I noticed that Paolo posted another problem maybe have a pertential problem
> about
> apic in Comment 23. But this patch
> https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> haven't merged into qemu master.
>
> Would you give me some clues please? Thanks!
>
> Regards,
> -Gonglei
>
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
2016-08-06 10:19 [Question] virtio-serial misses irq delivery on migration? Gonglei (Arei)
@ 2016-08-08 12:13 ` Dr. David Alan Gilbert
2016-08-08 13:05 ` Gonglei (Arei)
0 siblings, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2016-08-08 12:13 UTC (permalink / raw)
To: Gonglei (Arei)
Cc: Paolo Bonzini, jan.kiszka@siemens.com, amit.shah@redhat.com,
quintela@redhat.com, Huangweidong (C), qemu-devel@nongnu.org,
kvm@vger.kernel.org
* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi all,
>
> I might catch the bug. Now, we rely on a vm_clock timer to assure that the vcpus are started
> before we invoke send_control_event(),which queue a message in the virtio ring and trigger a irq.
>
> In actually, we can't attach our initial intention under a special situation:
>
> 1. the process of migration destination is in a coroutine
> Commit 82a4da79fd6
> 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
> if the Qemu encounter a EAGIN while reading QEMUFile.
> commit 595ab64169b
>
> static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> size_t size)
> {
> QEMUFileSocket *s = opaque;
> ssize_t len;
>
> for (;;) {
> len = qemu_recv(s->fd, buf, size, 0);
> if (len != -1) {
> break;
> }
> if (socket_error() == EAGAIN) {
> yield_until_fd_readable(s->fd);
> } else if (socket_error() != EINTR) {
> break;
> }
> }
>
> if (len == -1) {
> len = -socket_error();
> }
> return len;
> }
>
> 3. The main thread can get the cpu and timer will run, asumes that we get EAGIN
> after invoking fetch_active_ports_list().
I don't understand the details of this interrupt injection, or why the timer
will run if we're still not finished migration; it doesn't sound right that a QEMU_CLOCK_VIRTUAL
timer should trigger while we're still receiving the VM and the guest is paused.
However, would it be fixed by using a vm_change_state_handler like ui/spice-core.c does?
Dave
> 4. reproduce the problem by fault injection.
>
> The debug logs:
>
> [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
> [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "xbzrle"}]}
> [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name: query-migrate-capabilities
> [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name: migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false, "capability": "rdma-pin-all"}]}
> [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter process_incoming_migration_co() // Enter corountine
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei: apic_dispatch_post_load
> [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will init virtqueue
> [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: ************** begin to inject debug ********** // fault injection begin, migration coroutine yiled cpu
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1, current_time: 9750771061592 //vm_clock timer boom
> [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei: virtio_serial_post_load_timer_cb //calling the timer callback
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 0 // send a message to the guest, raise the irq line.
> [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
> [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 64056}, "event": "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 0
> [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei: port->host_connected: 0
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750770788478, current_time: 9750771061592
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state //Enter migration coroutine again, and the restore the lapic register, the previous lapic's info will be covered :(
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start //enter vm_start here
> [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that means DRIVER OK
> [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7 that means DRIVER OK
> [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that means DRIVER OK
> [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1, old: 1
> [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp": {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750772061592, current_time: 9750794269805
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send control message event = 6, value = 1
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750795284148, current_time: 9750795312685
> [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send control message event = 6, value = 1
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750796329656, current_time: 9750796369458
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750797396208, current_time: 9750797426752
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750798445707, current_time: 9750798477520
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750799496162, current_time: 9750799532183
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750800556817, current_time: 9750800588741
> [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 9750801606596, current_time: 9750801637961
>
>
> The below is my debugging diff (Not based on the newest master branch)
>
> diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> index 6a45af9..a00b02f 100644
> --- a/hw/char/virtio-serial-bus.c
> +++ b/hw/char/virtio-serial-bus.c
> @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
> if (!s->post_load) {
> return;
> }
> + QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
> for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
> port = s->post_load->connected[i].port;
> host_connected = s->post_load->connected[i].host_connected;
> @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void *opaque)
> */
> send_control_event(s, port->id, VIRTIO_CONSOLE_PORT_OPEN,
> port->host_connected);
> + QEMU_LOG("gonglei: port->host_connected: %u\n", port->host_connected);
> }
> vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> if (vsc->set_guest_connected) {
> @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int version_id,
> }
> }
> timer_mod(s->post_load->timer, 1);
> + QEMU_LOG("gonglei: ************** begin to inject debug **********\n");
> + yield_until_fd_readable(qemu_get_fd(f));
> return 0;
> }
>
> @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void *opaque, int version_id)
> if (version_id > 3) {
> return -EINVAL;
> }
> -
> + QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
> /* The virtio device */
> return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
> }
> diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> index 5b47056..2d988cb 100644
> --- a/hw/i386/kvm/apic.c
> +++ b/hw/i386/kvm/apic.c
> @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct kvm_lapic_state *kapic)
> {
> APICCommonState *s = APIC_COMMON(dev);
> int i;
> -
> + QEMU_LOG("gonglei: kvm_put_apic_state\n");
> memset(kapic, 0, sizeof(*kapic));
> kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> kvm_apic_set_reg(kapic, 0x8, s->tpr);
> diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> index d2a6c4c..07699ec 100644
> --- a/hw/i386/kvm/ioapic.c
> +++ b/hw/i386/kvm/ioapic.c
> @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
>
> chip.chip_id = KVM_IRQCHIP_IOAPIC;
> kioapic = &chip.chip.ioapic;
> -
> + QEMU_LOG("gonglei: kvm_ioapic_put\n");
> kioapic->id = s->id;
> kioapic->ioregsel = s->ioregsel;
> kioapic->base_address = s->busdev.mmio[0].addr;
> diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> index 042e960..110be20 100644
> --- a/hw/intc/apic_common.c
> +++ b/hw/intc/apic_common.c
> @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int version_id)
> if (info->post_load) {
> info->post_load(s);
> }
> + QEMU_LOG("gonglei: apic_dispatch_post_load\n");
> return 0;
> }
>
> diff --git a/migration/migration.c b/migration/migration.c
> index 92c1427..d074c0a 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void *opaque)
> const char *arpGuestCMD = "{\"execute\":\"guest-write-flag-arp\"}\n";
> const char *portName = "charchannel1";
> size_t uRet;
> -
> + QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
> ret = qemu_loadvm_state(f);
> qemu_fclose(f);
> free_xbzrle_decoded_buf();
> diff --git a/qemu-timer.c b/qemu-timer.c
> index 5741f0d..dd36bc9 100644
> --- a/qemu-timer.c
> +++ b/qemu-timer.c
> @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool enabled)
> QEMUTimerList *tl;
> bool old = clock->enabled;
> clock->enabled = enabled;
> + QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled, old);
> if (enabled && !old) {
> qemu_clock_notify(type);
> } else if (!enabled && old) {
> @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
> bool progress = false;
> QEMUTimerCB *cb;
> void *opaque;
> + static int count = 10;
>
> qemu_event_reset(&timer_list->timers_done_ev);
> if (!timer_list->clock->enabled) {
> @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList *timer_list)
> qemu_mutex_unlock(&timer_list->active_timers_lock);
> break;
> }
> -
> + if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> + QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> + ts->expire_time, current_time);
> + }
> /* remove timer from the list before calling the callback */
> timer_list->active_timers = ts->next;
> ts->next = NULL;
> diff --git a/vl.c b/vl.c
> index 06f34fe..824b246 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -978,7 +978,7 @@ void vm_start(void)
> {
> RunState requested;
> int64_t start_time, end_time;
> -
> + QEMU_LOG("gonglei: enter vm_start\n");
> qemu_vmstop_requested(&requested);
> if (runstate_is_running() && requested == RUN_STATE_MAX) {
> return;
>
>
> Can we wait looply the migration process finished when EAGAIN?
>
> Regards,
> -Gonglei
>
>
> > -----Original Message-----
> > From: Gonglei (Arei)
> > Sent: Friday, August 05, 2016 5:14 PM
> > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > Subject: [Question] virtio-serial misses irq delivery on migration?
> >
> > Hi Paolo , Jan, Amit
> >
> > Recently we encountered a problem that the virtio-serial can't work after
> > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> >
> > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > because
> > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > interrupt
> > can't be injected to VM because the irq line bit had been set to 1, but the
> > frontend
> > driver never handle it or never know it.
> >
> > Bug 867366 - virtio-serial misses irq delivery on migration
> > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> >
> > But my qemu is the newest qemu, Both commit 80dcfb8532"virtio-serial-bus:
> > post_load
> > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > propagate
> > guest_connected to the port on post_load" are applied.
> >
> > I noticed that Paolo posted another problem maybe have a pertential problem
> > about
> > apic in Comment 23. But this patch
> > https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > haven't merged into qemu master.
> >
> > Would you give me some clues please? Thanks!
> >
> > Regards,
> > -Gonglei
> >
>
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 5+ messages in thread
* RE: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
2016-08-08 12:13 ` [Qemu-devel] " Dr. David Alan Gilbert
@ 2016-08-08 13:05 ` Gonglei (Arei)
2016-08-08 14:18 ` Dr. David Alan Gilbert
0 siblings, 1 reply; 5+ messages in thread
From: Gonglei (Arei) @ 2016-08-08 13:05 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Paolo Bonzini, jan.kiszka@siemens.com, amit.shah@redhat.com,
quintela@redhat.com, Huangweidong (C), qemu-devel@nongnu.org,
kvm@vger.kernel.org
Hi Dave,
> -----Original Message-----
> From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> Sent: Monday, August 08, 2016 8:13 PM
> To: Gonglei (Arei)
> Cc: Paolo Bonzini; jan.kiszka@siemens.com; amit.shah@redhat.com;
> quintela@redhat.com; Huangweidong (C); qemu-devel@nongnu.org;
> kvm@vger.kernel.org
> Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on
> migration?
>
> * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > Hi all,
> >
> > I might catch the bug. Now, we rely on a vm_clock timer to assure that the
> vcpus are started
> > before we invoke send_control_event(),which queue a message in the virtio
> ring and trigger a irq.
> >
> > In actually, we can't attach our initial intention under a special situation:
> >
> > 1. the process of migration destination is in a coroutine
> > Commit 82a4da79fd6
> > 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
> > if the Qemu encounter a EAGIN while reading QEMUFile.
> > commit 595ab64169b
> >
> > static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> > size_t size)
> > {
> > QEMUFileSocket *s = opaque;
> > ssize_t len;
> >
> > for (;;) {
> > len = qemu_recv(s->fd, buf, size, 0);
> > if (len != -1) {
> > break;
> > }
> > if (socket_error() == EAGAIN) {
> > yield_until_fd_readable(s->fd);
> > } else if (socket_error() != EINTR) {
> > break;
> > }
> > }
> >
> > if (len == -1) {
> > len = -socket_error();
> > }
> > return len;
> > }
> >
> > 3. The main thread can get the cpu and timer will run, asumes that we get
> EAGIN
> > after invoking fetch_active_ports_list().
>
> I don't understand the details of this interrupt injection, or why the timer
> will run if we're still not finished migration; it doesn't sound right that a
> QEMU_CLOCK_VIRTUAL
> timer should trigger while we're still receiving the VM and the guest is paused.
>
Because the migration coroutine will yiled cpu if it get a EAGAIN error from unix socket.
You can see the above step 2)
For the upstream qemu, you can see the hander of QIO_CHANNEL_ERR_BLOCK in channel_get_buffer()
static ssize_t channel_get_buffer(void *opaque,
uint8_t *buf,
int64_t pos,
size_t size)
{
QIOChannel *ioc = QIO_CHANNEL(opaque);
ssize_t ret;
do {
ret = qio_channel_read(ioc, (char *)buf, size, NULL);
if (ret < 0) {
if (ret == QIO_CHANNEL_ERR_BLOCK) {
qio_channel_yield(ioc, G_IO_IN); // yiled cpu
} else {
/* XXX handle Error * object */
return -EIO;
}
}
} while (ret == QIO_CHANNEL_ERR_BLOCK);
return ret;
}
> However, would it be fixed by using a vm_change_state_handler like
> ui/spice-core.c does?
>
I'll check this method, thanks!
Regards,
-Gonglei
> Dave
>
>
> > 4. reproduce the problem by fault injection.
> >
> > The debug logs:
> >
> > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> query-migrate-capabilities
> > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> "capability": "xbzrle"}]}
> > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> query-migrate-capabilities
> > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> "capability": "rdma-pin-all"}]}
> > [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter
> process_incoming_migration_co() // Enter corountine
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> apic_dispatch_post_load
> > [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> > [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will
> init virtqueue
> > [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: **************
> begin to inject debug ********** // fault injection begin, migration
> coroutine yiled cpu
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1,
> current_time: 9750771061592 //vm_clock timer boom
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei:
> virtio_serial_post_load_timer_cb //calling the timer callback
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> control message event = 6, value = 0 // send a message to the guest, raise the
> irq line.
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> port->host_connected: 0
> > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1470473581, "microseconds": 64056}, "event":
> "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> control message event = 6, value = 0
> > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> port->host_connected: 0
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750770788478, current_time: 9750771061592
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> //Enter migration coroutine again, and the restore the lapic register, the
> previous lapic's info will be covered :(
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> > [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> > [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start
> //enter vm_start here
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that
> means DRIVER OK
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7
> that means DRIVER OK
> > [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that
> means DRIVER OK
> > [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> > [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1,
> old: 1
> > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750772061592, current_time: 9750794269805
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> control message event = 6, value = 1
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750795284148, current_time: 9750795312685
> > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> control message event = 6, value = 1
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750796329656, current_time: 9750796369458
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750797396208, current_time: 9750797426752
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750798445707, current_time: 9750798477520
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750799496162, current_time: 9750799532183
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750800556817, current_time: 9750800588741
> > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> 9750801606596, current_time: 9750801637961
> >
> >
> > The below is my debugging diff (Not based on the newest master branch)
> >
> > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> > index 6a45af9..a00b02f 100644
> > --- a/hw/char/virtio-serial-bus.c
> > +++ b/hw/char/virtio-serial-bus.c
> > @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void
> *opaque)
> > if (!s->post_load) {
> > return;
> > }
> > + QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
> > for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
> > port = s->post_load->connected[i].port;
> > host_connected = s->post_load->connected[i].host_connected;
> > @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void
> *opaque)
> > */
> > send_control_event(s, port->id,
> VIRTIO_CONSOLE_PORT_OPEN,
> > port->host_connected);
> > + QEMU_LOG("gonglei: port->host_connected: %u\n",
> port->host_connected);
> > }
> > vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> > if (vsc->set_guest_connected) {
> > @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int
> version_id,
> > }
> > }
> > timer_mod(s->post_load->timer, 1);
> > + QEMU_LOG("gonglei: ************** begin to inject debug
> **********\n");
> > + yield_until_fd_readable(qemu_get_fd(f));
> > return 0;
> > }
> >
> > @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void
> *opaque, int version_id)
> > if (version_id > 3) {
> > return -EINVAL;
> > }
> > -
> > + QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
> > /* The virtio device */
> > return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
> > }
> > diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> > index 5b47056..2d988cb 100644
> > --- a/hw/i386/kvm/apic.c
> > +++ b/hw/i386/kvm/apic.c
> > @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct
> kvm_lapic_state *kapic)
> > {
> > APICCommonState *s = APIC_COMMON(dev);
> > int i;
> > -
> > + QEMU_LOG("gonglei: kvm_put_apic_state\n");
> > memset(kapic, 0, sizeof(*kapic));
> > kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> > kvm_apic_set_reg(kapic, 0x8, s->tpr);
> > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> > index d2a6c4c..07699ec 100644
> > --- a/hw/i386/kvm/ioapic.c
> > +++ b/hw/i386/kvm/ioapic.c
> > @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> >
> > chip.chip_id = KVM_IRQCHIP_IOAPIC;
> > kioapic = &chip.chip.ioapic;
> > -
> > + QEMU_LOG("gonglei: kvm_ioapic_put\n");
> > kioapic->id = s->id;
> > kioapic->ioregsel = s->ioregsel;
> > kioapic->base_address = s->busdev.mmio[0].addr;
> > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> > index 042e960..110be20 100644
> > --- a/hw/intc/apic_common.c
> > +++ b/hw/intc/apic_common.c
> > @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int
> version_id)
> > if (info->post_load) {
> > info->post_load(s);
> > }
> > + QEMU_LOG("gonglei: apic_dispatch_post_load\n");
> > return 0;
> > }
> >
> > diff --git a/migration/migration.c b/migration/migration.c
> > index 92c1427..d074c0a 100644
> > --- a/migration/migration.c
> > +++ b/migration/migration.c
> > @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void
> *opaque)
> > const char *arpGuestCMD =
> "{\"execute\":\"guest-write-flag-arp\"}\n";
> > const char *portName = "charchannel1";
> > size_t uRet;
> > -
> > + QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
> > ret = qemu_loadvm_state(f);
> > qemu_fclose(f);
> > free_xbzrle_decoded_buf();
> > diff --git a/qemu-timer.c b/qemu-timer.c
> > index 5741f0d..dd36bc9 100644
> > --- a/qemu-timer.c
> > +++ b/qemu-timer.c
> > @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool
> enabled)
> > QEMUTimerList *tl;
> > bool old = clock->enabled;
> > clock->enabled = enabled;
> > + QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled,
> old);
> > if (enabled && !old) {
> > qemu_clock_notify(type);
> > } else if (!enabled && old) {
> > @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList
> *timer_list)
> > bool progress = false;
> > QEMUTimerCB *cb;
> > void *opaque;
> > + static int count = 10;
> >
> > qemu_event_reset(&timer_list->timers_done_ev);
> > if (!timer_list->clock->enabled) {
> > @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList
> *timer_list)
> > qemu_mutex_unlock(&timer_list->active_timers_lock);
> > break;
> > }
> > -
> > + if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> > + QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> > + ts->expire_time, current_time);
> > + }
> > /* remove timer from the list before calling the callback */
> > timer_list->active_timers = ts->next;
> > ts->next = NULL;
> > diff --git a/vl.c b/vl.c
> > index 06f34fe..824b246 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -978,7 +978,7 @@ void vm_start(void)
> > {
> > RunState requested;
> > int64_t start_time, end_time;
> > -
> > + QEMU_LOG("gonglei: enter vm_start\n");
> > qemu_vmstop_requested(&requested);
> > if (runstate_is_running() && requested == RUN_STATE_MAX) {
> > return;
> >
> >
> > Can we wait looply the migration process finished when EAGAIN?
> >
> > Regards,
> > -Gonglei
> >
> >
> > > -----Original Message-----
> > > From: Gonglei (Arei)
> > > Sent: Friday, August 05, 2016 5:14 PM
> > > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > > Subject: [Question] virtio-serial misses irq delivery on migration?
> > >
> > > Hi Paolo , Jan, Amit
> > >
> > > Recently we encountered a problem that the virtio-serial can't work after
> > > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> > >
> > > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > > because
> > > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > > interrupt
> > > can't be injected to VM because the irq line bit had been set to 1, but the
> > > frontend
> > > driver never handle it or never know it.
> > >
> > > Bug 867366 - virtio-serial misses irq delivery on migration
> > > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> > >
> > > But my qemu is the newest qemu, Both commit
> 80dcfb8532"virtio-serial-bus:
> > > post_load
> > > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > > propagate
> > > guest_connected to the port on post_load" are applied.
> > >
> > > I noticed that Paolo posted another problem maybe have a pertential
> problem
> > > about
> > > apic in Comment 23. But this patch
> > > https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > > haven't merged into qemu master.
> > >
> > > Would you give me some clues please? Thanks!
> > >
> > > Regards,
> > > -Gonglei
> > >
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
2016-08-08 13:05 ` Gonglei (Arei)
@ 2016-08-08 14:18 ` Dr. David Alan Gilbert
2016-08-08 14:30 ` Gonglei (Arei)
0 siblings, 1 reply; 5+ messages in thread
From: Dr. David Alan Gilbert @ 2016-08-08 14:18 UTC (permalink / raw)
To: Gonglei (Arei)
Cc: Paolo Bonzini, jan.kiszka@siemens.com, amit.shah@redhat.com,
quintela@redhat.com, Huangweidong (C), qemu-devel@nongnu.org,
kvm@vger.kernel.org
* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi Dave,
>
>
> > -----Original Message-----
> > From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> > Sent: Monday, August 08, 2016 8:13 PM
> > To: Gonglei (Arei)
> > Cc: Paolo Bonzini; jan.kiszka@siemens.com; amit.shah@redhat.com;
> > quintela@redhat.com; Huangweidong (C); qemu-devel@nongnu.org;
> > kvm@vger.kernel.org
> > Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on
> > migration?
> >
> > * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > > Hi all,
> > >
> > > I might catch the bug. Now, we rely on a vm_clock timer to assure that the
> > vcpus are started
> > > before we invoke send_control_event(),which queue a message in the virtio
> > ring and trigger a irq.
> > >
> > > In actually, we can't attach our initial intention under a special situation:
> > >
> > > 1. the process of migration destination is in a coroutine
> > > Commit 82a4da79fd6
> > > 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
> > > if the Qemu encounter a EAGIN while reading QEMUFile.
> > > commit 595ab64169b
> > >
> > > static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> > > size_t size)
> > > {
> > > QEMUFileSocket *s = opaque;
> > > ssize_t len;
> > >
> > > for (;;) {
> > > len = qemu_recv(s->fd, buf, size, 0);
> > > if (len != -1) {
> > > break;
> > > }
> > > if (socket_error() == EAGAIN) {
> > > yield_until_fd_readable(s->fd);
> > > } else if (socket_error() != EINTR) {
> > > break;
> > > }
> > > }
> > >
> > > if (len == -1) {
> > > len = -socket_error();
> > > }
> > > return len;
> > > }
> > >
> > > 3. The main thread can get the cpu and timer will run, asumes that we get
> > EAGIN
> > > after invoking fetch_active_ports_list().
> >
> > I don't understand the details of this interrupt injection, or why the timer
> > will run if we're still not finished migration; it doesn't sound right that a
> > QEMU_CLOCK_VIRTUAL
> > timer should trigger while we're still receiving the VM and the guest is paused.
> >
>
> Because the migration coroutine will yiled cpu if it get a EAGAIN error from unix socket.
> You can see the above step 2)
>
> For the upstream qemu, you can see the hander of QIO_CHANNEL_ERR_BLOCK in channel_get_buffer()
>
> static ssize_t channel_get_buffer(void *opaque,
> uint8_t *buf,
> int64_t pos,
> size_t size)
> {
> QIOChannel *ioc = QIO_CHANNEL(opaque);
> ssize_t ret;
>
> do {
> ret = qio_channel_read(ioc, (char *)buf, size, NULL);
> if (ret < 0) {
> if (ret == QIO_CHANNEL_ERR_BLOCK) {
> qio_channel_yield(ioc, G_IO_IN); // yiled cpu
> } else {
> /* XXX handle Error * object */
> return -EIO;
> }
> }
> } while (ret == QIO_CHANNEL_ERR_BLOCK);
>
> return ret;
> }
Yes, I understand it yields; but I would have expected the timer code
not to fire virtual timers while the VM is in paused state.
> > However, would it be fixed by using a vm_change_state_handler like
> > ui/spice-core.c does?
> >
>
> I'll check this method, thanks!
>
Dave
> Regards,
> -Gonglei
>
> > Dave
> >
> >
> > > 4. reproduce the problem by fault injection.
> > >
> > > The debug logs:
> > >
> > > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> > query-migrate-capabilities
> > > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> > migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> > "capability": "xbzrle"}]}
> > > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> > query-migrate-capabilities
> > > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> > migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> > "capability": "rdma-pin-all"}]}
> > > [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter
> > process_incoming_migration_co() // Enter corountine
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> > > [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will
> > init virtqueue
> > > [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: **************
> > begin to inject debug ********** // fault injection begin, migration
> > coroutine yiled cpu
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1,
> > current_time: 9750771061592 //vm_clock timer boom
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei:
> > virtio_serial_post_load_timer_cb //calling the timer callback
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> > control message event = 6, value = 0 // send a message to the guest, raise the
> > irq line.
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> > port->host_connected: 0
> > > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> > {"seconds": 1470473581, "microseconds": 64056}, "event":
> > "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> > control message event = 6, value = 0
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> > port->host_connected: 0
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750770788478, current_time: 9750771061592
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > //Enter migration coroutine again, and the restore the lapic register, the
> > previous lapic's info will be covered :(
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> > > [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> > > [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start
> > //enter vm_start here
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that
> > means DRIVER OK
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7
> > that means DRIVER OK
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that
> > means DRIVER OK
> > > [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> > > [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1,
> > old: 1
> > > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> > {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750772061592, current_time: 9750794269805
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> > control message event = 6, value = 1
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750795284148, current_time: 9750795312685
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> > control message event = 6, value = 1
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750796329656, current_time: 9750796369458
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750797396208, current_time: 9750797426752
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750798445707, current_time: 9750798477520
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750799496162, current_time: 9750799532183
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750800556817, current_time: 9750800588741
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750801606596, current_time: 9750801637961
> > >
> > >
> > > The below is my debugging diff (Not based on the newest master branch)
> > >
> > > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> > > index 6a45af9..a00b02f 100644
> > > --- a/hw/char/virtio-serial-bus.c
> > > +++ b/hw/char/virtio-serial-bus.c
> > > @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void
> > *opaque)
> > > if (!s->post_load) {
> > > return;
> > > }
> > > + QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
> > > for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
> > > port = s->post_load->connected[i].port;
> > > host_connected = s->post_load->connected[i].host_connected;
> > > @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void
> > *opaque)
> > > */
> > > send_control_event(s, port->id,
> > VIRTIO_CONSOLE_PORT_OPEN,
> > > port->host_connected);
> > > + QEMU_LOG("gonglei: port->host_connected: %u\n",
> > port->host_connected);
> > > }
> > > vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> > > if (vsc->set_guest_connected) {
> > > @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int
> > version_id,
> > > }
> > > }
> > > timer_mod(s->post_load->timer, 1);
> > > + QEMU_LOG("gonglei: ************** begin to inject debug
> > **********\n");
> > > + yield_until_fd_readable(qemu_get_fd(f));
> > > return 0;
> > > }
> > >
> > > @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void
> > *opaque, int version_id)
> > > if (version_id > 3) {
> > > return -EINVAL;
> > > }
> > > -
> > > + QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
> > > /* The virtio device */
> > > return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
> > > }
> > > diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> > > index 5b47056..2d988cb 100644
> > > --- a/hw/i386/kvm/apic.c
> > > +++ b/hw/i386/kvm/apic.c
> > > @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct
> > kvm_lapic_state *kapic)
> > > {
> > > APICCommonState *s = APIC_COMMON(dev);
> > > int i;
> > > -
> > > + QEMU_LOG("gonglei: kvm_put_apic_state\n");
> > > memset(kapic, 0, sizeof(*kapic));
> > > kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> > > kvm_apic_set_reg(kapic, 0x8, s->tpr);
> > > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> > > index d2a6c4c..07699ec 100644
> > > --- a/hw/i386/kvm/ioapic.c
> > > +++ b/hw/i386/kvm/ioapic.c
> > > @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> > >
> > > chip.chip_id = KVM_IRQCHIP_IOAPIC;
> > > kioapic = &chip.chip.ioapic;
> > > -
> > > + QEMU_LOG("gonglei: kvm_ioapic_put\n");
> > > kioapic->id = s->id;
> > > kioapic->ioregsel = s->ioregsel;
> > > kioapic->base_address = s->busdev.mmio[0].addr;
> > > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> > > index 042e960..110be20 100644
> > > --- a/hw/intc/apic_common.c
> > > +++ b/hw/intc/apic_common.c
> > > @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int
> > version_id)
> > > if (info->post_load) {
> > > info->post_load(s);
> > > }
> > > + QEMU_LOG("gonglei: apic_dispatch_post_load\n");
> > > return 0;
> > > }
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 92c1427..d074c0a 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void
> > *opaque)
> > > const char *arpGuestCMD =
> > "{\"execute\":\"guest-write-flag-arp\"}\n";
> > > const char *portName = "charchannel1";
> > > size_t uRet;
> > > -
> > > + QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
> > > ret = qemu_loadvm_state(f);
> > > qemu_fclose(f);
> > > free_xbzrle_decoded_buf();
> > > diff --git a/qemu-timer.c b/qemu-timer.c
> > > index 5741f0d..dd36bc9 100644
> > > --- a/qemu-timer.c
> > > +++ b/qemu-timer.c
> > > @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool
> > enabled)
> > > QEMUTimerList *tl;
> > > bool old = clock->enabled;
> > > clock->enabled = enabled;
> > > + QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled,
> > old);
> > > if (enabled && !old) {
> > > qemu_clock_notify(type);
> > > } else if (!enabled && old) {
> > > @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList
> > *timer_list)
> > > bool progress = false;
> > > QEMUTimerCB *cb;
> > > void *opaque;
> > > + static int count = 10;
> > >
> > > qemu_event_reset(&timer_list->timers_done_ev);
> > > if (!timer_list->clock->enabled) {
> > > @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList
> > *timer_list)
> > > qemu_mutex_unlock(&timer_list->active_timers_lock);
> > > break;
> > > }
> > > -
> > > + if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> > > + QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> > > + ts->expire_time, current_time);
> > > + }
> > > /* remove timer from the list before calling the callback */
> > > timer_list->active_timers = ts->next;
> > > ts->next = NULL;
> > > diff --git a/vl.c b/vl.c
> > > index 06f34fe..824b246 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -978,7 +978,7 @@ void vm_start(void)
> > > {
> > > RunState requested;
> > > int64_t start_time, end_time;
> > > -
> > > + QEMU_LOG("gonglei: enter vm_start\n");
> > > qemu_vmstop_requested(&requested);
> > > if (runstate_is_running() && requested == RUN_STATE_MAX) {
> > > return;
> > >
> > >
> > > Can we wait looply the migration process finished when EAGAIN?
> > >
> > > Regards,
> > > -Gonglei
> > >
> > >
> > > > -----Original Message-----
> > > > From: Gonglei (Arei)
> > > > Sent: Friday, August 05, 2016 5:14 PM
> > > > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > > > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > > > Subject: [Question] virtio-serial misses irq delivery on migration?
> > > >
> > > > Hi Paolo , Jan, Amit
> > > >
> > > > Recently we encountered a problem that the virtio-serial can't work after
> > > > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> > > >
> > > > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > > > because
> > > > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > > > interrupt
> > > > can't be injected to VM because the irq line bit had been set to 1, but the
> > > > frontend
> > > > driver never handle it or never know it.
> > > >
> > > > Bug 867366 - virtio-serial misses irq delivery on migration
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> > > >
> > > > But my qemu is the newest qemu, Both commit
> > 80dcfb8532"virtio-serial-bus:
> > > > post_load
> > > > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > > > propagate
> > > > guest_connected to the port on post_load" are applied.
> > > >
> > > > I noticed that Paolo posted another problem maybe have a pertential
> > problem
> > > > about
> > > > apic in Comment 23. But this patch
> > > > https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > > > haven't merged into qemu master.
> > > >
> > > > Would you give me some clues please? Thanks!
> > > >
> > > > Regards,
> > > > -Gonglei
> > > >
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [Question] virtio-serial misses irq delivery on migration?
2016-08-08 14:18 ` Dr. David Alan Gilbert
@ 2016-08-08 14:30 ` Gonglei (Arei)
0 siblings, 0 replies; 5+ messages in thread
From: Gonglei (Arei) @ 2016-08-08 14:30 UTC (permalink / raw)
To: Dr. David Alan Gilbert
Cc: Huangweidong (C), kvm, Juan Quintela, Jan Kiszka, qemu-devel,
Amit Shah, Paolo Bonzini
That's because the virtual vm clock is enabled at initialization. It doesn't need to wait for invoking vm_start() to fire.
发件人:Dr. David Alan Gilbert
收件人:龚磊,
抄送:Paolo Bonzini,Jan Kiszka,Amit Shah,Juan Quintela,黄伟栋,qemu-devel@nongnu.org,kvm@vger.kernel.org,
时间:2016-08-08 22:18:15
主题:Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on migration?
* Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> Hi Dave,
>
>
> > -----Original Message-----
> > From: Dr. David Alan Gilbert [mailto:dgilbert@redhat.com]
> > Sent: Monday, August 08, 2016 8:13 PM
> > To: Gonglei (Arei)
> > Cc: Paolo Bonzini; jan.kiszka@siemens.com; amit.shah@redhat.com;
> > quintela@redhat.com; Huangweidong (C); qemu-devel@nongnu.org;
> > kvm@vger.kernel.org
> > Subject: Re: [Qemu-devel] [Question] virtio-serial misses irq delivery on
> > migration?
> >
> > * Gonglei (Arei) (arei.gonglei@huawei.com) wrote:
> > > Hi all,
> > >
> > > I might catch the bug. Now, we rely on a vm_clock timer to assure that the
> > vcpus are started
> > > before we invoke send_control_event(),which queue a message in the virtio
> > ring and trigger a irq.
> > >
> > > In actually, we can't attach our initial intention under a special situation:
> > >
> > > 1. the process of migration destination is in a coroutine
> > > Commit 82a4da79fd6
> > > 2. The corountine will yiled the cpu (calling yield_until_fd_readable(s->fd))
> > > if the Qemu encounter a EAGIN while reading QEMUFile.
> > > commit 595ab64169b
> > >
> > > static ssize_t socket_get_buffer(void *opaque, uint8_t *buf, int64_t pos,
> > > size_t size)
> > > {
> > > QEMUFileSocket *s = opaque;
> > > ssize_t len;
> > >
> > > for (;;) {
> > > len = qemu_recv(s->fd, buf, size, 0);
> > > if (len != -1) {
> > > break;
> > > }
> > > if (socket_error() == EAGAIN) {
> > > yield_until_fd_readable(s->fd);
> > > } else if (socket_error() != EINTR) {
> > > break;
> > > }
> > > }
> > >
> > > if (len == -1) {
> > > len = -socket_error();
> > > }
> > > return len;
> > > }
> > >
> > > 3. The main thread can get the cpu and timer will run, asumes that we get
> > EAGIN
> > > after invoking fetch_active_ports_list().
> >
> > I don't understand the details of this interrupt injection, or why the timer
> > will run if we're still not finished migration; it doesn't sound right that a
> > QEMU_CLOCK_VIRTUAL
> > timer should trigger while we're still receiving the VM and the guest is paused.
> >
>
> Because the migration coroutine will yiled cpu if it get a EAGAIN error from unix socket.
> You can see the above step 2)
>
> For the upstream qemu, you can see the hander of QIO_CHANNEL_ERR_BLOCK in channel_get_buffer()
>
> static ssize_t channel_get_buffer(void *opaque,
> uint8_t *buf,
> int64_t pos,
> size_t size)
> {
> QIOChannel *ioc = QIO_CHANNEL(opaque);
> ssize_t ret;
>
> do {
> ret = qio_channel_read(ioc, (char *)buf, size, NULL);
> if (ret < 0) {
> if (ret == QIO_CHANNEL_ERR_BLOCK) {
> qio_channel_yield(ioc, G_IO_IN); // yiled cpu
> } else {
> /* XXX handle Error * object */
> return -EIO;
> }
> }
> } while (ret == QIO_CHANNEL_ERR_BLOCK);
>
> return ret;
> }
Yes, I understand it yields; but I would have expected the timer code
not to fire virtual timers while the VM is in paused state.
> > However, would it be fixed by using a vm_change_state_handler like
> > ui/spice-core.c does?
> >
>
> I'll check this method, thanks!
>
Dave
> Regards,
> -Gonglei
>
> > Dave
> >
> >
> > > 4. reproduce the problem by fault injection.
> > >
> > > The debug logs:
> > >
> > > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> > query-migrate-capabilities
> > > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> > migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> > "capability": "xbzrle"}]}
> > > [2016-08-06 16:52:56] handle_qmp_command:5106 qmp_cmd_name:
> > query-migrate-capabilities
> > > [2016-08-06 16:52:56] handle_qmp_command:5116 qmp_cmd_name:
> > migrate-set-capabilities, qmp_cmd_arguments: {"capabilities": [{"state": false,
> > "capability": "rdma-pin-all"}]}
> > > [2016-08-06 16:52:56] process_incoming_migration_co:170 gonglei: enter
> > process_incoming_migration_co() // Enter corountine
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] apic_dispatch_post_load:370 gonglei:
> > apic_dispatch_post_load
> > > [2016-08-06 16:53:01] kvm_ioapic_put:97 gonglei: kvm_ioapic_put
> > > [2016-08-06 16:53:01] virtio_serial_load:745 gonglei: virtio_serial_load will
> > init virtqueue
> > > [2016-08-06 16:53:01] fetch_active_ports_list:735 gonglei: **************
> > begin to inject debug ********** // fault injection begin, migration
> > coroutine yiled cpu
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time: 1,
> > current_time: 9750771061592 //vm_clock timer boom
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:658 gonglei:
> > virtio_serial_post_load_timer_cb //calling the timer callback
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> > control message event = 6, value = 0 // send a message to the guest, raise the
> > irq line.
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> > port->host_connected: 0
> > > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> > {"seconds": 1470473581, "microseconds": 64056}, "event":
> > "VSERPORT_CHANGE", "data": {"open": true, "id": "channel0"}}
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> > control message event = 6, value = 0
> > > [2016-08-06 16:53:01] virtio_serial_post_load_timer_cb:669 gonglei:
> > port->host_connected: 0
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750770788478, current_time: 9750771061592
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > //Enter migration coroutine again, and the restore the lapic register, the
> > previous lapic's info will be covered :(
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] kvm_put_apic_state:32 gonglei: kvm_put_apic_state
> > > [2016-08-06 16:53:01] handle_qmp_command:5106 qmp_cmd_name: cont
> > > [2016-08-06 16:53:01] qmp_cont:189 qmp cont is received and vm is started
> > > [2016-08-06 16:53:01] vm_start:981 gonglei: enter vm_start
> > //enter vm_start here
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-net device status is 7 that
> > means DRIVER OK
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-serial device status is 7
> > that means DRIVER OK
> > > [2016-08-06 16:53:01] virtio_set_status:522 virtio-scsi device status is 7 that
> > means DRIVER OK
> > > [2016-08-06 16:53:01] vm_start:1000 vm_state-notify:23ms
> > > [2016-08-06 16:53:01] qemu_clock_enable:163 gonglei: type: 1, enabled: 1,
> > old: 1
> > > [2016-08-06 16:53:01] monitor_qapi_event_emit:483 {"timestamp":
> > {"seconds": 1470473581, "microseconds": 112191}, "event": "RESUME"}
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750772061592, current_time: 9750794269805
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 1 send
> > control message event = 6, value = 1
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750795284148, current_time: 9750795312685
> > > [2016-08-06 16:53:01] send_control_event:225 virtio serial port 3 send
> > control message event = 6, value = 1
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750796329656, current_time: 9750796369458
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750797396208, current_time: 9750797426752
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750798445707, current_time: 9750798477520
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750799496162, current_time: 9750799532183
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750800556817, current_time: 9750800588741
> > > [2016-08-06 16:53:01] timerlist_run_timers:496 gonglei: expire_time:
> > 9750801606596, current_time: 9750801637961
> > >
> > >
> > > The below is my debugging diff (Not based on the newest master branch)
> > >
> > > diff --git a/hw/char/virtio-serial-bus.c b/hw/char/virtio-serial-bus.c
> > > index 6a45af9..a00b02f 100644
> > > --- a/hw/char/virtio-serial-bus.c
> > > +++ b/hw/char/virtio-serial-bus.c
> > > @@ -654,6 +654,7 @@ static void virtio_serial_post_load_timer_cb(void
> > *opaque)
> > > if (!s->post_load) {
> > > return;
> > > }
> > > + QEMU_LOG("gonglei: virtio_serial_post_load_timer_cb\n");
> > > for (i = 0 ; i < s->post_load->nr_active_ports; ++i) {
> > > port = s->post_load->connected[i].port;
> > > host_connected = s->post_load->connected[i].host_connected;
> > > @@ -664,6 +665,7 @@ static void virtio_serial_post_load_timer_cb(void
> > *opaque)
> > > */
> > > send_control_event(s, port->id,
> > VIRTIO_CONSOLE_PORT_OPEN,
> > > port->host_connected);
> > > + QEMU_LOG("gonglei: port->host_connected: %u\n",
> > port->host_connected);
> > > }
> > > vsc = VIRTIO_SERIAL_PORT_GET_CLASS(port);
> > > if (vsc->set_guest_connected) {
> > > @@ -729,6 +731,8 @@ static int fetch_active_ports_list(QEMUFile *f, int
> > version_id,
> > > }
> > > }
> > > timer_mod(s->post_load->timer, 1);
> > > + QEMU_LOG("gonglei: ************** begin to inject debug
> > **********\n");
> > > + yield_until_fd_readable(qemu_get_fd(f));
> > > return 0;
> > > }
> > >
> > > @@ -737,7 +741,7 @@ static int virtio_serial_load(QEMUFile *f, void
> > *opaque, int version_id)
> > > if (version_id > 3) {
> > > return -EINVAL;
> > > }
> > > -
> > > + QEMU_LOG("gonglei: virtio_serial_load will init virtqueue\n");
> > > /* The virtio device */
> > > return virtio_load(VIRTIO_DEVICE(opaque), f, version_id);
> > > }
> > > diff --git a/hw/i386/kvm/apic.c b/hw/i386/kvm/apic.c
> > > index 5b47056..2d988cb 100644
> > > --- a/hw/i386/kvm/apic.c
> > > +++ b/hw/i386/kvm/apic.c
> > > @@ -29,7 +29,7 @@ void kvm_put_apic_state(DeviceState *dev, struct
> > kvm_lapic_state *kapic)
> > > {
> > > APICCommonState *s = APIC_COMMON(dev);
> > > int i;
> > > -
> > > + QEMU_LOG("gonglei: kvm_put_apic_state\n");
> > > memset(kapic, 0, sizeof(*kapic));
> > > kvm_apic_set_reg(kapic, 0x2, s->id << 24);
> > > kvm_apic_set_reg(kapic, 0x8, s->tpr);
> > > diff --git a/hw/i386/kvm/ioapic.c b/hw/i386/kvm/ioapic.c
> > > index d2a6c4c..07699ec 100644
> > > --- a/hw/i386/kvm/ioapic.c
> > > +++ b/hw/i386/kvm/ioapic.c
> > > @@ -94,7 +94,7 @@ static void kvm_ioapic_put(IOAPICCommonState *s)
> > >
> > > chip.chip_id = KVM_IRQCHIP_IOAPIC;
> > > kioapic = &chip.chip.ioapic;
> > > -
> > > + QEMU_LOG("gonglei: kvm_ioapic_put\n");
> > > kioapic->id = s->id;
> > > kioapic->ioregsel = s->ioregsel;
> > > kioapic->base_address = s->busdev.mmio[0].addr;
> > > diff --git a/hw/intc/apic_common.c b/hw/intc/apic_common.c
> > > index 042e960..110be20 100644
> > > --- a/hw/intc/apic_common.c
> > > +++ b/hw/intc/apic_common.c
> > > @@ -366,6 +366,7 @@ static int apic_dispatch_post_load(void *opaque, int
> > version_id)
> > > if (info->post_load) {
> > > info->post_load(s);
> > > }
> > > + QEMU_LOG("gonglei: apic_dispatch_post_load\n");
> > > return 0;
> > > }
> > >
> > > diff --git a/migration/migration.c b/migration/migration.c
> > > index 92c1427..d074c0a 100644
> > > --- a/migration/migration.c
> > > +++ b/migration/migration.c
> > > @@ -167,7 +167,7 @@ static void process_incoming_migration_co(void
> > *opaque)
> > > const char *arpGuestCMD =
> > "{\"execute\":\"guest-write-flag-arp\"}\n";
> > > const char *portName = "charchannel1";
> > > size_t uRet;
> > > -
> > > + QEMU_LOG("gonglei: enter process_incoming_migration_co()\n");
> > > ret = qemu_loadvm_state(f);
> > > qemu_fclose(f);
> > > free_xbzrle_decoded_buf();
> > > diff --git a/qemu-timer.c b/qemu-timer.c
> > > index 5741f0d..dd36bc9 100644
> > > --- a/qemu-timer.c
> > > +++ b/qemu-timer.c
> > > @@ -160,6 +160,7 @@ void qemu_clock_enable(QEMUClockType type, bool
> > enabled)
> > > QEMUTimerList *tl;
> > > bool old = clock->enabled;
> > > clock->enabled = enabled;
> > > + QEMU_LOG("gonglei: type: %u, enabled: %u, old: %u\n", type, enabled,
> > old);
> > > if (enabled && !old) {
> > > qemu_clock_notify(type);
> > > } else if (!enabled && old) {
> > > @@ -475,6 +476,7 @@ bool timerlist_run_timers(QEMUTimerList
> > *timer_list)
> > > bool progress = false;
> > > QEMUTimerCB *cb;
> > > void *opaque;
> > > + static int count = 10;
> > >
> > > qemu_event_reset(&timer_list->timers_done_ev);
> > > if (!timer_list->clock->enabled) {
> > > @@ -489,7 +491,10 @@ bool timerlist_run_timers(QEMUTimerList
> > *timer_list)
> > > qemu_mutex_unlock(&timer_list->active_timers_lock);
> > > break;
> > > }
> > > -
> > > + if (timer_list->clock->type == QEMU_CLOCK_VIRTUAL && count-- > 0) {
> > > + QEMU_LOG("gonglei: expire_time: %lld, current_time: %lld\n",
> > > + ts->expire_time, current_time);
> > > + }
> > > /* remove timer from the list before calling the callback */
> > > timer_list->active_timers = ts->next;
> > > ts->next = NULL;
> > > diff --git a/vl.c b/vl.c
> > > index 06f34fe..824b246 100644
> > > --- a/vl.c
> > > +++ b/vl.c
> > > @@ -978,7 +978,7 @@ void vm_start(void)
> > > {
> > > RunState requested;
> > > int64_t start_time, end_time;
> > > -
> > > + QEMU_LOG("gonglei: enter vm_start\n");
> > > qemu_vmstop_requested(&requested);
> > > if (runstate_is_running() && requested == RUN_STATE_MAX) {
> > > return;
> > >
> > >
> > > Can we wait looply the migration process finished when EAGAIN?
> > >
> > > Regards,
> > > -Gonglei
> > >
> > >
> > > > -----Original Message-----
> > > > From: Gonglei (Arei)
> > > > Sent: Friday, August 05, 2016 5:14 PM
> > > > To: Paolo Bonzini; 'jan.kiszka@siemens.com'; 'amit.shah@redhat.com'
> > > > Cc: qemu-devel@nongnu.org; kvm@vger.kernel.org; Huangweidong (C)
> > > > Subject: [Question] virtio-serial misses irq delivery on migration?
> > > >
> > > > Hi Paolo , Jan, Amit
> > > >
> > > > Recently we encountered a problem that the virtio-serial can't work after
> > > > Migration in RH5.5 VM. The bigger problem is, I can't reproduce it. :(
> > > >
> > > > It's phenomenon was much like BZ 867366, the usb-table mouse didn't work
> > > > because
> > > > the uhci and virtio-serial shard the irq line (using IOAPIC, not MSI). The new
> > > > interrupt
> > > > can't be injected to VM because the irq line bit had been set to 1, but the
> > > > frontend
> > > > driver never handle it or never know it.
> > > >
> > > > Bug 867366 - virtio-serial misses irq delivery on migration
> > > > https://bugzilla.redhat.com/show_bug.cgi?id=867366
> > > >
> > > > But my qemu is the newest qemu, Both commit
> > 80dcfb8532"virtio-serial-bus:
> > > > post_load
> > > > send_event when vm is running" and commit bc6b815d9e " virtio-serial:
> > > > propagate
> > > > guest_connected to the port on post_load" are applied.
> > > >
> > > > I noticed that Paolo posted another problem maybe have a pertential
> > problem
> > > > about
> > > > apic in Comment 23. But this patch
> > > > https://bugzilla.redhat.com/attachment.cgi?id=635535&action=diff
> > > > haven't merged into qemu master.
> > > >
> > > > Would you give me some clues please? Thanks!
> > > >
> > > > Regards,
> > > > -Gonglei
> > > >
> > >
> > >
> > --
> > Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2016-08-08 14:30 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-08-06 10:19 [Question] virtio-serial misses irq delivery on migration? Gonglei (Arei)
2016-08-08 12:13 ` [Qemu-devel] " Dr. David Alan Gilbert
2016-08-08 13:05 ` Gonglei (Arei)
2016-08-08 14:18 ` Dr. David Alan Gilbert
2016-08-08 14:30 ` Gonglei (Arei)
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).