XCP: sr driver question wrt vm-migrate

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* XCP: sr driver question wrt vm-migrate
@ 2010-06-07  8:26 YAMAMOTO Takashi
  2010-06-07 12:29 ` Jonathan Ludlam
  0 siblings, 1 reply; 12+ messages in thread
From: YAMAMOTO Takashi @ 2010-06-07  8:26 UTC (permalink / raw)
  To: xen-devel

hi,

on vm-migrate, xapi attaches a vdi on the migrate-to host
before detaching it on the migrate-from host.
unfortunately it doesn't work for our product, which doesn't
provide a way to attach a volume to multiple hosts at the same time.
is VDI_ACTIVATE something what i can use as a workaround?
or any other suggestions?

YAMAMOTO Takashi

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XCP: sr driver question wrt vm-migrate
  2010-06-07  8:26 XCP: sr driver question wrt vm-migrate YAMAMOTO Takashi
@ 2010-06-07 12:29 ` Jonathan Ludlam
  2010-06-08  7:11   ` YAMAMOTO Takashi
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Ludlam @ 2010-06-07 12:29 UTC (permalink / raw)
  To: YAMAMOTO Takashi; +Cc: xen-devel@lists.xensource.com

Yup, vdi activate is the way forward.

If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 'get_driver_info' response, xapi will call the following during the start-migrate-shutdown lifecycle:

VM start:

host A: VDI.attach
host A: VDI.activate

VM migrate:

host B: VDI.attach

  (VM pauses on host A)

host A: VDI.deactivate
host B: VDI.activate

  (VM unpauses on host B)

host A: VDI.detach

VM shutdown:

host B: VDI.deactivate
host B: VDI.detach

so the disk is never activated on both hosts at once, but it does still go through a period when it is attached to both hosts at once. So you could, for example, check that the disk *could* be attached on the vdi_attach SMAPI call, and actually attach it properly on the vdi_activate call.

Hope this helps,

Jon

On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:

> hi,
> 
> on vm-migrate, xapi attaches a vdi on the migrate-to host
> before detaching it on the migrate-from host.
> unfortunately it doesn't work for our product, which doesn't
> provide a way to attach a volume to multiple hosts at the same time.
> is VDI_ACTIVATE something what i can use as a workaround?
> or any other suggestions?
> 
> YAMAMOTO Takashi
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XCP: sr driver question wrt vm-migrate
  2010-06-07 12:29 ` Jonathan Ludlam
@ 2010-06-08  7:11   ` YAMAMOTO Takashi
  2010-06-16  6:19     ` YAMAMOTO Takashi
  0 siblings, 1 reply; 12+ messages in thread
From: YAMAMOTO Takashi @ 2010-06-08  7:11 UTC (permalink / raw)
  To: Jonathan.Ludlam; +Cc: xen-devel

hi,

i'll try deferring the attach operation to vdi_activate.
thanks!

YAMAMOTO Takashi

> Yup, vdi activate is the way forward.
> 
> If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 'get_driver_info' response, xapi will call the following during the start-migrate-shutdown lifecycle:
> 
> VM start:
> 
> host A: VDI.attach
> host A: VDI.activate
> 
> VM migrate:
> 
> host B: VDI.attach
> 
>   (VM pauses on host A)
> 
> host A: VDI.deactivate
> host B: VDI.activate
> 
>   (VM unpauses on host B)
> 
> host A: VDI.detach
> 
> VM shutdown:
> 
> host B: VDI.deactivate
> host B: VDI.detach
> 
> so the disk is never activated on both hosts at once, but it does still go through a period when it is attached to both hosts at once. So you could, for example, check that the disk *could* be attached on the vdi_attach SMAPI call, and actually attach it properly on the vdi_activate call.
> 
> Hope this helps,
> 
> Jon
> 
> 
> On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:
> 
>> hi,
>> 
>> on vm-migrate, xapi attaches a vdi on the migrate-to host
>> before detaching it on the migrate-from host.
>> unfortunately it doesn't work for our product, which doesn't
>> provide a way to attach a volume to multiple hosts at the same time.
>> is VDI_ACTIVATE something what i can use as a workaround?
>> or any other suggestions?
>> 
>> YAMAMOTO Takashi
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XCP: sr driver question wrt vm-migrate
  2010-06-08  7:11   ` YAMAMOTO Takashi
@ 2010-06-16  6:19     ` YAMAMOTO Takashi
  2010-06-16 12:06       ` Jonathan Ludlam
  0 siblings, 1 reply; 12+ messages in thread
From: YAMAMOTO Takashi @ 2010-06-16  6:19 UTC (permalink / raw)
  To: Jonathan.Ludlam; +Cc: xen-devel

hi,

after making my sr driver defer the attach operation as you suggested,
i got migration work.  thanks!

however, when repeating live migration between two hosts for testing,
i got the following error.  it doesn't seem so reproducable.
do you have any idea?

YAMAMOTO Takashi

+ xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
An error occurred during the migration process.
vm: 23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 (CentOS53x64-1)
source: eea41bdd-d2ce-4a9a-bc51-1ca286320296 (s6)
destination: 67b8b07b-8c50-4677-a511-beb196ea766f (s1)
msg: Caught exception INTERNAL_ERROR: [ Xapi_vm_migrate.Remote_failed("unmarshalling result code from remote") ] at last minute during migration

> hi,
> 
> i'll try deferring the attach operation to vdi_activate.
> thanks!
> 
> YAMAMOTO Takashi
> 
>> Yup, vdi activate is the way forward.
>> 
>> If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 'get_driver_info' response, xapi will call the following during the start-migrate-shutdown lifecycle:
>> 
>> VM start:
>> 
>> host A: VDI.attach
>> host A: VDI.activate
>> 
>> VM migrate:
>> 
>> host B: VDI.attach
>> 
>>   (VM pauses on host A)
>> 
>> host A: VDI.deactivate
>> host B: VDI.activate
>> 
>>   (VM unpauses on host B)
>> 
>> host A: VDI.detach
>> 
>> VM shutdown:
>> 
>> host B: VDI.deactivate
>> host B: VDI.detach
>> 
>> so the disk is never activated on both hosts at once, but it does still go through a period when it is attached to both hosts at once. So you could, for example, check that the disk *could* be attached on the vdi_attach SMAPI call, and actually attach it properly on the vdi_activate call.
>> 
>> Hope this helps,
>> 
>> Jon
>> 
>> 
>> On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:
>> 
>>> hi,
>>> 
>>> on vm-migrate, xapi attaches a vdi on the migrate-to host
>>> before detaching it on the migrate-from host.
>>> unfortunately it doesn't work for our product, which doesn't
>>> provide a way to attach a volume to multiple hosts at the same time.
>>> is VDI_ACTIVATE something what i can use as a workaround?
>>> or any other suggestions?
>>> 
>>> YAMAMOTO Takashi
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XCP: sr driver question wrt vm-migrate
  2010-06-16  6:19     ` YAMAMOTO Takashi
@ 2010-06-16 12:06       ` Jonathan Ludlam
  2010-06-17  9:52         ` YAMAMOTO Takashi
  0 siblings, 1 reply; 12+ messages in thread
From: Jonathan Ludlam @ 2010-06-16 12:06 UTC (permalink / raw)
  To: YAMAMOTO Takashi; +Cc: xen-devel@lists.xensource.com

This is usually the result of a failure earier on. Could you grep through the logs to get the whole trace of what went on? Best thing to do is grep for VM.pool_migrate, then find the task reference (the hex string beginning with 'R:' immediately after the 'VM.pool_migrate') and grep for this string in the logs on both the source and destination machines. 

Have a look  through these, and if it's still not obvious what went wrong, post them to the list and we can have a look.

Cheers,

Jon


On 16 Jun 2010, at 07:19, YAMAMOTO Takashi wrote:

> hi,
> 
> after making my sr driver defer the attach operation as you suggested,
> i got migration work.  thanks!
> 
> however, when repeating live migration between two hosts for testing,
> i got the following error.  it doesn't seem so reproducable.
> do you have any idea?
> 
> YAMAMOTO Takashi
> 
> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
> An error occurred during the migration process.
> vm: 23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 (CentOS53x64-1)
> source: eea41bdd-d2ce-4a9a-bc51-1ca286320296 (s6)
> destination: 67b8b07b-8c50-4677-a511-beb196ea766f (s1)
> msg: Caught exception INTERNAL_ERROR: [ Xapi_vm_migrate.Remote_failed("unmarshalling result code from remote") ] at last minute during migration
> 
>> hi,
>> 
>> i'll try deferring the attach operation to vdi_activate.
>> thanks!
>> 
>> YAMAMOTO Takashi
>> 
>>> Yup, vdi activate is the way forward.
>>> 
>>> If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 'get_driver_info' response, xapi will call the following during the start-migrate-shutdown lifecycle:
>>> 
>>> VM start:
>>> 
>>> host A: VDI.attach
>>> host A: VDI.activate
>>> 
>>> VM migrate:
>>> 
>>> host B: VDI.attach
>>> 
>>>  (VM pauses on host A)
>>> 
>>> host A: VDI.deactivate
>>> host B: VDI.activate
>>> 
>>>  (VM unpauses on host B)
>>> 
>>> host A: VDI.detach
>>> 
>>> VM shutdown:
>>> 
>>> host B: VDI.deactivate
>>> host B: VDI.detach
>>> 
>>> so the disk is never activated on both hosts at once, but it does still go through a period when it is attached to both hosts at once. So you could, for example, check that the disk *could* be attached on the vdi_attach SMAPI call, and actually attach it properly on the vdi_activate call.
>>> 
>>> Hope this helps,
>>> 
>>> Jon
>>> 
>>> 
>>> On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:
>>> 
>>>> hi,
>>>> 
>>>> on vm-migrate, xapi attaches a vdi on the migrate-to host
>>>> before detaching it on the migrate-from host.
>>>> unfortunately it doesn't work for our product, which doesn't
>>>> provide a way to attach a volume to multiple hosts at the same time.
>>>> is VDI_ACTIVATE something what i can use as a workaround?
>>>> or any other suggestions?
>>>> 
>>>> YAMAMOTO Takashi
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XCP: sr driver question wrt vm-migrate
  2010-06-16 12:06       ` Jonathan Ludlam
@ 2010-06-17  9:52         ` YAMAMOTO Takashi
  2010-06-18  2:45           ` XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate) YAMAMOTO Takashi
  0 siblings, 1 reply; 12+ messages in thread
From: YAMAMOTO Takashi @ 2010-06-17  9:52 UTC (permalink / raw)
  To: Jonathan.Ludlam; +Cc: xen-devel

hi,

thanks.  i'll take a look at the log if it happens again.

YAMAMOTO Takashi

> This is usually the result of a failure earier on. Could you grep through the logs to get the whole trace of what went on? Best thing to do is grep for VM.pool_migrate, then find the task reference (the hex string beginning with 'R:' immediately after the 'VM.pool_migrate') and grep for this string in the logs on both the source and destination machines. 
> 
> Have a look  through these, and if it's still not obvious what went wrong, post them to the list and we can have a look.
> 
> Cheers,
> 
> Jon
> 
> 
> On 16 Jun 2010, at 07:19, YAMAMOTO Takashi wrote:
> 
>> hi,
>> 
>> after making my sr driver defer the attach operation as you suggested,
>> i got migration work.  thanks!
>> 
>> however, when repeating live migration between two hosts for testing,
>> i got the following error.  it doesn't seem so reproducable.
>> do you have any idea?
>> 
>> YAMAMOTO Takashi
>> 
>> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
>> An error occurred during the migration process.
>> vm: 23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 (CentOS53x64-1)
>> source: eea41bdd-d2ce-4a9a-bc51-1ca286320296 (s6)
>> destination: 67b8b07b-8c50-4677-a511-beb196ea766f (s1)
>> msg: Caught exception INTERNAL_ERROR: [ Xapi_vm_migrate.Remote_failed("unmarshalling result code from remote") ] at last minute during migration
>> 
>>> hi,
>>> 
>>> i'll try deferring the attach operation to vdi_activate.
>>> thanks!
>>> 
>>> YAMAMOTO Takashi
>>> 
>>>> Yup, vdi activate is the way forward.
>>>> 
>>>> If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 'get_driver_info' response, xapi will call the following during the start-migrate-shutdown lifecycle:
>>>> 
>>>> VM start:
>>>> 
>>>> host A: VDI.attach
>>>> host A: VDI.activate
>>>> 
>>>> VM migrate:
>>>> 
>>>> host B: VDI.attach
>>>> 
>>>>  (VM pauses on host A)
>>>> 
>>>> host A: VDI.deactivate
>>>> host B: VDI.activate
>>>> 
>>>>  (VM unpauses on host B)
>>>> 
>>>> host A: VDI.detach
>>>> 
>>>> VM shutdown:
>>>> 
>>>> host B: VDI.deactivate
>>>> host B: VDI.detach
>>>> 
>>>> so the disk is never activated on both hosts at once, but it does still go through a period when it is attached to both hosts at once. So you could, for example, check that the disk *could* be attached on the vdi_attach SMAPI call, and actually attach it properly on the vdi_activate call.
>>>> 
>>>> Hope this helps,
>>>> 
>>>> Jon
>>>> 
>>>> 
>>>> On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:
>>>> 
>>>>> hi,
>>>>> 
>>>>> on vm-migrate, xapi attaches a vdi on the migrate-to host
>>>>> before detaching it on the migrate-from host.
>>>>> unfortunately it doesn't work for our product, which doesn't
>>>>> provide a way to attach a volume to multiple hosts at the same time.
>>>>> is VDI_ACTIVATE something what i can use as a workaround?
>>>>> or any other suggestions?
>>>>> 
>>>>> YAMAMOTO Takashi
>>>>> 
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate)
  2010-06-17  9:52         ` YAMAMOTO Takashi
@ 2010-06-18  2:45           ` YAMAMOTO Takashi
  2010-06-18 12:53             ` Jonathan Ludlam
  0 siblings, 1 reply; 12+ messages in thread
From: YAMAMOTO Takashi @ 2010-06-18  2:45 UTC (permalink / raw)
  To: Jonathan.Ludlam; +Cc: xen-devel

hi,

i got another error on vm-migrate.
"signal -7" in the log seems intersting.  does this ring your bell?

YAMAMOTO Takashi

+ date
Thu Jun 17 21:51:44 JST 2010
+ xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
Lost connection to the server.

/var/log/messages:

Jun 17 21:51:40 s1 ovs-cfg-mod: 00007|cfg_mod|INFO|-port.vif4958.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
Jun 17 21:51:41 s1 xapi: [ warn|s1|2416799 unix-RPC|VM.pool_migrate R:832813c0722b|hotplug] Warning, deleting 'vif' entry from /xapi/4958/hotplug/vif/1
Jun 17 21:51:41 s1 xapi: [error|s1|90 xal_listen|VM (domid: 4958) device_event = ChangeUncooperative false D:0e953bd99071|event] device_event could not be processed because VM record not in database
Jun 17 21:51:47 s1 xapi: [ warn|s1|2417066 inet_rpc|VM.pool_migrate R:fee54e870a4e|xapi] memory_required_bytes = 1080033280 > memory_static_max = 1073741824; clipping
Jun 17 21:51:57 s1 xenguest: Determined the following parameters from xenstore:
Jun 17 21:51:57 s1 xenguest: vcpu/number:1 vcpu/affinity:0 vcpu/weight:0 vcpu/cap:0 nx: 0 viridian: 1 apic: 1 acpi: 1 pae: 1 acpi_s4: 0 acpi_s3: 0
Jun 17 21:52:43 s1 scripts-vif: Called as "add vif" domid:4959 devid:1 mode:vswitch
Jun 17 21:52:44 s1 scripts-vif: Called as "online vif" domid:4959 devid:1 mode:vswitch
Jun 17 21:52:46 s1 scripts-vif: Adding vif4959.1 to xenbr0 with address fe:ff:ff:ff:ff:ff
Jun 17 21:52:46 s1 ovs-vsctl: Called as br-to-vlan xenbr0
Jun 17 21:52:49 s1 ovs-cfg-mod: 00001|cfg|INFO|using "/etc/ovs-vswitchd.conf" as configuration file, "/etc/.ovs-vswitchd.conf.~lock~" as lock file
Jun 17 21:52:49 s1 ovs-cfg-mod: 00002|cfg_mod|INFO|configuration changes:
Jun 17 21:52:49 s1 ovs-cfg-mod: 00003|cfg_mod|INFO|+bridge.xenbr0.port=vif4959.1
Jun 17 21:52:49 s1 ovs-cfg-mod: 00004|cfg_mod|INFO|+port.vif4959.1.net-uuid=9ca059b1-ac1e-8d3f-ff19-e5e74f7b7392
Jun 17 21:52:49 s1 ovs-cfg-mod: 00005|cfg_mod|INFO|+port.vif4959.1.vif-mac=2e:17:01:b0:05:fb
Jun 17 21:52:49 s1 ovs-cfg-mod: 00006|cfg_mod|INFO|+port.vif4959.1.vif-uuid=271f0001-06ca-c9ca-cabc-dc79f412d925
Jun 17 21:52:49 s1 ovs-cfg-mod: 00007|cfg_mod|INFO|+port.vif4959.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] received signal -7
Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] xapi watchdog exiting.
Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] Fatal: xapi died with signal -7: not restarting (watchdog never restarts on this signal)
Jun 17 21:55:11 s1 python: PERFMON: caught IOError: (socket error (111, 'Connection refused')) - restarting XAPI session
Jun 17 22:00:02 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jun 17 22:04:48 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jun 17 22:09:58 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jun 17 22:14:52 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
Jun 17 22:19:38 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session

> hi,
> 
> thanks.  i'll take a look at the log if it happens again.
> 
> YAMAMOTO Takashi
> 
>> This is usually the result of a failure earier on. Could you grep through the logs to get the whole trace of what went on? Best thing to do is grep for VM.pool_migrate, then find the task reference (the hex string beginning with 'R:' immediately after the 'VM.pool_migrate') and grep for this string in the logs on both the source and destination machines. 
>> 
>> Have a look  through these, and if it's still not obvious what went wrong, post them to the list and we can have a look.
>> 
>> Cheers,
>> 
>> Jon
>> 
>> 
>> On 16 Jun 2010, at 07:19, YAMAMOTO Takashi wrote:
>> 
>>> hi,
>>> 
>>> after making my sr driver defer the attach operation as you suggested,
>>> i got migration work.  thanks!
>>> 
>>> however, when repeating live migration between two hosts for testing,
>>> i got the following error.  it doesn't seem so reproducable.
>>> do you have any idea?
>>> 
>>> YAMAMOTO Takashi
>>> 
>>> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
>>> An error occurred during the migration process.
>>> vm: 23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 (CentOS53x64-1)
>>> source: eea41bdd-d2ce-4a9a-bc51-1ca286320296 (s6)
>>> destination: 67b8b07b-8c50-4677-a511-beb196ea766f (s1)
>>> msg: Caught exception INTERNAL_ERROR: [ Xapi_vm_migrate.Remote_failed("unmarshalling result code from remote") ] at last minute during migration
>>> 
>>>> hi,
>>>> 
>>>> i'll try deferring the attach operation to vdi_activate.
>>>> thanks!
>>>> 
>>>> YAMAMOTO Takashi
>>>> 
>>>>> Yup, vdi activate is the way forward.
>>>>> 
>>>>> If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 'get_driver_info' response, xapi will call the following during the start-migrate-shutdown lifecycle:
>>>>> 
>>>>> VM start:
>>>>> 
>>>>> host A: VDI.attach
>>>>> host A: VDI.activate
>>>>> 
>>>>> VM migrate:
>>>>> 
>>>>> host B: VDI.attach
>>>>> 
>>>>>  (VM pauses on host A)
>>>>> 
>>>>> host A: VDI.deactivate
>>>>> host B: VDI.activate
>>>>> 
>>>>>  (VM unpauses on host B)
>>>>> 
>>>>> host A: VDI.detach
>>>>> 
>>>>> VM shutdown:
>>>>> 
>>>>> host B: VDI.deactivate
>>>>> host B: VDI.detach
>>>>> 
>>>>> so the disk is never activated on both hosts at once, but it does still go through a period when it is attached to both hosts at once. So you could, for example, check that the disk *could* be attached on the vdi_attach SMAPI call, and actually attach it properly on the vdi_activate call.
>>>>> 
>>>>> Hope this helps,
>>>>> 
>>>>> Jon
>>>>> 
>>>>> 
>>>>> On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:
>>>>> 
>>>>>> hi,
>>>>>> 
>>>>>> on vm-migrate, xapi attaches a vdi on the migrate-to host
>>>>>> before detaching it on the migrate-from host.
>>>>>> unfortunately it doesn't work for our product, which doesn't
>>>>>> provide a way to attach a volume to multiple hosts at the same time.
>>>>>> is VDI_ACTIVATE something what i can use as a workaround?
>>>>>> or any other suggestions?
>>>>>> 
>>>>>> YAMAMOTO Takashi
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xensource.com
>>>>>> http://lists.xensource.com/xen-devel
>>>>> 
>>>>> 
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>> 
>> 
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate)
  2010-06-18  2:45           ` XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate) YAMAMOTO Takashi
@ 2010-06-18 12:53             ` Jonathan Ludlam
  2010-06-18 15:21               ` Jeremy Fitzhardinge
  2010-06-21  2:41               ` YAMAMOTO Takashi
  0 siblings, 2 replies; 12+ messages in thread
From: Jonathan Ludlam @ 2010-06-18 12:53 UTC (permalink / raw)
  To: YAMAMOTO Takashi; +Cc: xen-devel@lists.xensource.com

Very strange indeed. -7 is SIGKILL. 

Firstly, is this 0.1.1 or 0.5-RC? If it's 0.1.1 could you retry it on 0.5 just to check if it's already been fixed?

Secondly, can you check whether xapi has started properly on both machines (ie. the init script completed successfully)? I believe that if the init script doesn't detect that xapi has started correctly it might kill it. This is about the only thing we can think of that might cause the problem you described.

Cheers,

Jon


On 18 Jun 2010, at 03:45, YAMAMOTO Takashi wrote:

> hi,
> 
> i got another error on vm-migrate.
> "signal -7" in the log seems intersting.  does this ring your bell?
> 
> YAMAMOTO Takashi
> 
> + date
> Thu Jun 17 21:51:44 JST 2010
> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
> Lost connection to the server.
> 
> /var/log/messages:
> 
> Jun 17 21:51:40 s1 ovs-cfg-mod: 00007|cfg_mod|INFO|-port.vif4958.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
> Jun 17 21:51:41 s1 xapi: [ warn|s1|2416799 unix-RPC|VM.pool_migrate R:832813c0722b|hotplug] Warning, deleting 'vif' entry from /xapi/4958/hotplug/vif/1
> Jun 17 21:51:41 s1 xapi: [error|s1|90 xal_listen|VM (domid: 4958) device_event = ChangeUncooperative false D:0e953bd99071|event] device_event could not be processed because VM record not in database
> Jun 17 21:51:47 s1 xapi: [ warn|s1|2417066 inet_rpc|VM.pool_migrate R:fee54e870a4e|xapi] memory_required_bytes = 1080033280 > memory_static_max = 1073741824; clipping
> Jun 17 21:51:57 s1 xenguest: Determined the following parameters from xenstore:
> Jun 17 21:51:57 s1 xenguest: vcpu/number:1 vcpu/affinity:0 vcpu/weight:0 vcpu/cap:0 nx: 0 viridian: 1 apic: 1 acpi: 1 pae: 1 acpi_s4: 0 acpi_s3: 0
> Jun 17 21:52:43 s1 scripts-vif: Called as "add vif" domid:4959 devid:1 mode:vswitch
> Jun 17 21:52:44 s1 scripts-vif: Called as "online vif" domid:4959 devid:1 mode:vswitch
> Jun 17 21:52:46 s1 scripts-vif: Adding vif4959.1 to xenbr0 with address fe:ff:ff:ff:ff:ff
> Jun 17 21:52:46 s1 ovs-vsctl: Called as br-to-vlan xenbr0
> Jun 17 21:52:49 s1 ovs-cfg-mod: 00001|cfg|INFO|using "/etc/ovs-vswitchd.conf" as configuration file, "/etc/.ovs-vswitchd.conf.~lock~" as lock file
> Jun 17 21:52:49 s1 ovs-cfg-mod: 00002|cfg_mod|INFO|configuration changes:
> Jun 17 21:52:49 s1 ovs-cfg-mod: 00003|cfg_mod|INFO|+bridge.xenbr0.port=vif4959.1
> Jun 17 21:52:49 s1 ovs-cfg-mod: 00004|cfg_mod|INFO|+port.vif4959.1.net-uuid=9ca059b1-ac1e-8d3f-ff19-e5e74f7b7392
> Jun 17 21:52:49 s1 ovs-cfg-mod: 00005|cfg_mod|INFO|+port.vif4959.1.vif-mac=2e:17:01:b0:05:fb
> Jun 17 21:52:49 s1 ovs-cfg-mod: 00006|cfg_mod|INFO|+port.vif4959.1.vif-uuid=271f0001-06ca-c9ca-cabc-dc79f412d925
> Jun 17 21:52:49 s1 ovs-cfg-mod: 00007|cfg_mod|INFO|+port.vif4959.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] received signal -7
> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] xapi watchdog exiting.
> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] Fatal: xapi died with signal -7: not restarting (watchdog never restarts on this signal)
> Jun 17 21:55:11 s1 python: PERFMON: caught IOError: (socket error (111, 'Connection refused')) - restarting XAPI session
> Jun 17 22:00:02 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
> Jun 17 22:04:48 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
> Jun 17 22:09:58 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
> Jun 17 22:14:52 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
> Jun 17 22:19:38 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
> 
>> hi,
>> 
>> thanks.  i'll take a look at the log if it happens again.
>> 
>> YAMAMOTO Takashi
>> 
>>> This is usually the result of a failure earier on. Could you grep through the logs to get the whole trace of what went on? Best thing to do is grep for VM.pool_migrate, then find the task reference (the hex string beginning with 'R:' immediately after the 'VM.pool_migrate') and grep for this string in the logs on both the source and destination machines. 
>>> 
>>> Have a look  through these, and if it's still not obvious what went wrong, post them to the list and we can have a look.
>>> 
>>> Cheers,
>>> 
>>> Jon
>>> 
>>> 
>>> On 16 Jun 2010, at 07:19, YAMAMOTO Takashi wrote:
>>> 
>>>> hi,
>>>> 
>>>> after making my sr driver defer the attach operation as you suggested,
>>>> i got migration work.  thanks!
>>>> 
>>>> however, when repeating live migration between two hosts for testing,
>>>> i got the following error.  it doesn't seem so reproducable.
>>>> do you have any idea?
>>>> 
>>>> YAMAMOTO Takashi
>>>> 
>>>> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
>>>> An error occurred during the migration process.
>>>> vm: 23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 (CentOS53x64-1)
>>>> source: eea41bdd-d2ce-4a9a-bc51-1ca286320296 (s6)
>>>> destination: 67b8b07b-8c50-4677-a511-beb196ea766f (s1)
>>>> msg: Caught exception INTERNAL_ERROR: [ Xapi_vm_migrate.Remote_failed("unmarshalling result code from remote") ] at last minute during migration
>>>> 
>>>>> hi,
>>>>> 
>>>>> i'll try deferring the attach operation to vdi_activate.
>>>>> thanks!
>>>>> 
>>>>> YAMAMOTO Takashi
>>>>> 
>>>>>> Yup, vdi activate is the way forward.
>>>>>> 
>>>>>> If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 'get_driver_info' response, xapi will call the following during the start-migrate-shutdown lifecycle:
>>>>>> 
>>>>>> VM start:
>>>>>> 
>>>>>> host A: VDI.attach
>>>>>> host A: VDI.activate
>>>>>> 
>>>>>> VM migrate:
>>>>>> 
>>>>>> host B: VDI.attach
>>>>>> 
>>>>>> (VM pauses on host A)
>>>>>> 
>>>>>> host A: VDI.deactivate
>>>>>> host B: VDI.activate
>>>>>> 
>>>>>> (VM unpauses on host B)
>>>>>> 
>>>>>> host A: VDI.detach
>>>>>> 
>>>>>> VM shutdown:
>>>>>> 
>>>>>> host B: VDI.deactivate
>>>>>> host B: VDI.detach
>>>>>> 
>>>>>> so the disk is never activated on both hosts at once, but it does still go through a period when it is attached to both hosts at once. So you could, for example, check that the disk *could* be attached on the vdi_attach SMAPI call, and actually attach it properly on the vdi_activate call.
>>>>>> 
>>>>>> Hope this helps,
>>>>>> 
>>>>>> Jon
>>>>>> 
>>>>>> 
>>>>>> On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:
>>>>>> 
>>>>>>> hi,
>>>>>>> 
>>>>>>> on vm-migrate, xapi attaches a vdi on the migrate-to host
>>>>>>> before detaching it on the migrate-from host.
>>>>>>> unfortunately it doesn't work for our product, which doesn't
>>>>>>> provide a way to attach a volume to multiple hosts at the same time.
>>>>>>> is VDI_ACTIVATE something what i can use as a workaround?
>>>>>>> or any other suggestions?
>>>>>>> 
>>>>>>> YAMAMOTO Takashi
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xensource.com
>>>>>>> http://lists.xensource.com/xen-devel
>>>>>> 
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xensource.com
>>>>>> http://lists.xensource.com/xen-devel
>>>>> 
>>>>> _______________________________________________
>>>>> Xen-devel mailing list
>>>>> Xen-devel@lists.xensource.com
>>>>> http://lists.xensource.com/xen-devel
>>> 
>>> 
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate)
  2010-06-18 12:53             ` Jonathan Ludlam
@ 2010-06-18 15:21               ` Jeremy Fitzhardinge
  2010-06-18 21:54                 ` Vincent Hanquez
  2010-06-21  2:41               ` YAMAMOTO Takashi
  1 sibling, 1 reply; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-18 15:21 UTC (permalink / raw)
  To: Jonathan Ludlam; +Cc: YAMAMOTO Takashi, xen-devel@lists.xensource.com

On 06/18/2010 01:53 PM, Jonathan Ludlam wrote:
> Very strange indeed. -7 is SIGKILL. 
>   

It's actually SIGBUS, which is even more odd.  It generally means you've
truncated a file underneath an mmap mapping, then touched the
overhanging unbacked pages.

    J

> Firstly, is this 0.1.1 or 0.5-RC? If it's 0.1.1 could you retry it on 0.5 just to check if it's already been fixed?
>
> Secondly, can you check whether xapi has started properly on both machines (ie. the init script completed successfully)? I believe that if the init script doesn't detect that xapi has started correctly it might kill it. This is about the only thing we can think of that might cause the problem you described.
>
> Cheers,
>
> Jon
>
>
> On 18 Jun 2010, at 03:45, YAMAMOTO Takashi wrote:
>
>   
>> hi,
>>
>> i got another error on vm-migrate.
>> "signal -7" in the log seems intersting.  does this ring your bell?
>>
>> YAMAMOTO Takashi
>>
>> + date
>> Thu Jun 17 21:51:44 JST 2010
>> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
>> Lost connection to the server.
>>
>> /var/log/messages:
>>
>> Jun 17 21:51:40 s1 ovs-cfg-mod: 00007|cfg_mod|INFO|-port.vif4958.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
>> Jun 17 21:51:41 s1 xapi: [ warn|s1|2416799 unix-RPC|VM.pool_migrate R:832813c0722b|hotplug] Warning, deleting 'vif' entry from /xapi/4958/hotplug/vif/1
>> Jun 17 21:51:41 s1 xapi: [error|s1|90 xal_listen|VM (domid: 4958) device_event = ChangeUncooperative false D:0e953bd99071|event] device_event could not be processed because VM record not in database
>> Jun 17 21:51:47 s1 xapi: [ warn|s1|2417066 inet_rpc|VM.pool_migrate R:fee54e870a4e|xapi] memory_required_bytes = 1080033280 > memory_static_max = 1073741824; clipping
>> Jun 17 21:51:57 s1 xenguest: Determined the following parameters from xenstore:
>> Jun 17 21:51:57 s1 xenguest: vcpu/number:1 vcpu/affinity:0 vcpu/weight:0 vcpu/cap:0 nx: 0 viridian: 1 apic: 1 acpi: 1 pae: 1 acpi_s4: 0 acpi_s3: 0
>> Jun 17 21:52:43 s1 scripts-vif: Called as "add vif" domid:4959 devid:1 mode:vswitch
>> Jun 17 21:52:44 s1 scripts-vif: Called as "online vif" domid:4959 devid:1 mode:vswitch
>> Jun 17 21:52:46 s1 scripts-vif: Adding vif4959.1 to xenbr0 with address fe:ff:ff:ff:ff:ff
>> Jun 17 21:52:46 s1 ovs-vsctl: Called as br-to-vlan xenbr0
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00001|cfg|INFO|using "/etc/ovs-vswitchd.conf" as configuration file, "/etc/.ovs-vswitchd.conf.~lock~" as lock file
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00002|cfg_mod|INFO|configuration changes:
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00003|cfg_mod|INFO|+bridge.xenbr0.port=vif4959.1
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00004|cfg_mod|INFO|+port.vif4959.1.net-uuid=9ca059b1-ac1e-8d3f-ff19-e5e74f7b7392
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00005|cfg_mod|INFO|+port.vif4959.1.vif-mac=2e:17:01:b0:05:fb
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00006|cfg_mod|INFO|+port.vif4959.1.vif-uuid=271f0001-06ca-c9ca-cabc-dc79f412d925
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00007|cfg_mod|INFO|+port.vif4959.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
>> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] received signal -7
>> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] xapi watchdog exiting.
>> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] Fatal: xapi died with signal -7: not restarting (watchdog never restarts on this signal)
>> Jun 17 21:55:11 s1 python: PERFMON: caught IOError: (socket error (111, 'Connection refused')) - restarting XAPI session
>> Jun 17 22:00:02 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:04:48 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:09:58 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:14:52 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:19:38 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>>
>>     
>>> hi,
>>>
>>> thanks.  i'll take a look at the log if it happens again.
>>>
>>> YAMAMOTO Takashi
>>>
>>>       
>>>> This is usually the result of a failure earier on. Could you grep through the logs to get the whole trace of what went on? Best thing to do is grep for VM.pool_migrate, then find the task reference (the hex string beginning with 'R:' immediately after the 'VM.pool_migrate') and grep for this string in the logs on both the source and destination machines. 
>>>>
>>>> Have a look  through these, and if it's still not obvious what went wrong, post them to the list and we can have a look.
>>>>
>>>> Cheers,
>>>>
>>>> Jon
>>>>
>>>>
>>>> On 16 Jun 2010, at 07:19, YAMAMOTO Takashi wrote:
>>>>
>>>>         
>>>>> hi,
>>>>>
>>>>> after making my sr driver defer the attach operation as you suggested,
>>>>> i got migration work.  thanks!
>>>>>
>>>>> however, when repeating live migration between two hosts for testing,
>>>>> i got the following error.  it doesn't seem so reproducable.
>>>>> do you have any idea?
>>>>>
>>>>> YAMAMOTO Takashi
>>>>>
>>>>> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
>>>>> An error occurred during the migration process.
>>>>> vm: 23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 (CentOS53x64-1)
>>>>> source: eea41bdd-d2ce-4a9a-bc51-1ca286320296 (s6)
>>>>> destination: 67b8b07b-8c50-4677-a511-beb196ea766f (s1)
>>>>> msg: Caught exception INTERNAL_ERROR: [ Xapi_vm_migrate.Remote_failed("unmarshalling result code from remote") ] at last minute during migration
>>>>>
>>>>>           
>>>>>> hi,
>>>>>>
>>>>>> i'll try deferring the attach operation to vdi_activate.
>>>>>> thanks!
>>>>>>
>>>>>> YAMAMOTO Takashi
>>>>>>
>>>>>>             
>>>>>>> Yup, vdi activate is the way forward.
>>>>>>>
>>>>>>> If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 'get_driver_info' response, xapi will call the following during the start-migrate-shutdown lifecycle:
>>>>>>>
>>>>>>> VM start:
>>>>>>>
>>>>>>> host A: VDI.attach
>>>>>>> host A: VDI.activate
>>>>>>>
>>>>>>> VM migrate:
>>>>>>>
>>>>>>> host B: VDI.attach
>>>>>>>
>>>>>>> (VM pauses on host A)
>>>>>>>
>>>>>>> host A: VDI.deactivate
>>>>>>> host B: VDI.activate
>>>>>>>
>>>>>>> (VM unpauses on host B)
>>>>>>>
>>>>>>> host A: VDI.detach
>>>>>>>
>>>>>>> VM shutdown:
>>>>>>>
>>>>>>> host B: VDI.deactivate
>>>>>>> host B: VDI.detach
>>>>>>>
>>>>>>> so the disk is never activated on both hosts at once, but it does still go through a period when it is attached to both hosts at once. So you could, for example, check that the disk *could* be attached on the vdi_attach SMAPI call, and actually attach it properly on the vdi_activate call.
>>>>>>>
>>>>>>> Hope this helps,
>>>>>>>
>>>>>>> Jon
>>>>>>>
>>>>>>>
>>>>>>> On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:
>>>>>>>
>>>>>>>               
>>>>>>>> hi,
>>>>>>>>
>>>>>>>> on vm-migrate, xapi attaches a vdi on the migrate-to host
>>>>>>>> before detaching it on the migrate-from host.
>>>>>>>> unfortunately it doesn't work for our product, which doesn't
>>>>>>>> provide a way to attach a volume to multiple hosts at the same time.
>>>>>>>> is VDI_ACTIVATE something what i can use as a workaround?
>>>>>>>> or any other suggestions?
>>>>>>>>
>>>>>>>> YAMAMOTO Takashi
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Xen-devel mailing list
>>>>>>>> Xen-devel@lists.xensource.com
>>>>>>>> http://lists.xensource.com/xen-devel
>>>>>>>>                 
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xensource.com
>>>>>>> http://lists.xensource.com/xen-devel
>>>>>>>               
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xensource.com
>>>>>> http://lists.xensource.com/xen-devel
>>>>>>             
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>>>>         
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
>   

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate)
  2010-06-18 15:21               ` Jeremy Fitzhardinge
@ 2010-06-18 21:54                 ` Vincent Hanquez
  2010-06-19  7:54                   ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 12+ messages in thread
From: Vincent Hanquez @ 2010-06-18 21:54 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: YAMAMOTO Takashi, xen-devel@lists.xensource.com, Jonathan Ludlam

On 18/06/10 16:21, Jeremy Fitzhardinge wrote:
> On 06/18/2010 01:53 PM, Jonathan Ludlam wrote:
>> Very strange indeed. -7 is SIGKILL.
>>
>
> It's actually SIGBUS, which is even more odd.  It generally means you've

ocaml has different signal values for unknown reason.

$ ocaml
         Objective Caml version 3.11.2
   # Sys.sigkill;;
   - : int = -7


-- 
Vincent

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate)
  2010-06-18 21:54                 ` Vincent Hanquez
@ 2010-06-19  7:54                   ` Jeremy Fitzhardinge
  0 siblings, 0 replies; 12+ messages in thread
From: Jeremy Fitzhardinge @ 2010-06-19  7:54 UTC (permalink / raw)
  To: Vincent Hanquez
  Cc: YAMAMOTO Takashi, xen-devel@lists.xensource.com, Jonathan Ludlam

On 06/18/2010 10:54 PM, Vincent Hanquez wrote:
> On 18/06/10 16:21, Jeremy Fitzhardinge wrote:
>> On 06/18/2010 01:53 PM, Jonathan Ludlam wrote:
>>> Very strange indeed. -7 is SIGKILL.
>>>
>>
>> It's actually SIGBUS, which is even more odd.  It generally means you've
>
> ocaml has different signal values for unknown reason.
>
> $ ocaml
>         Objective Caml version 3.11.2
>   # Sys.sigkill;;
>   - : int = -7

How... odd.

    J

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate)
  2010-06-18 12:53             ` Jonathan Ludlam
  2010-06-18 15:21               ` Jeremy Fitzhardinge
@ 2010-06-21  2:41               ` YAMAMOTO Takashi
  1 sibling, 0 replies; 12+ messages in thread
From: YAMAMOTO Takashi @ 2010-06-21  2:41 UTC (permalink / raw)
  To: Jonathan.Ludlam; +Cc: xen-devel

hi,

> Very strange indeed. -7 is SIGKILL. 
> 
> Firstly, is this 0.1.1 or 0.5-RC? If it's 0.1.1 could you retry it on 0.5 just to check if it's already been fixed?

it's 0.1.1.
i'm not sure when i can try 0.5-RC.

> 
> Secondly, can you check whether xapi has started properly on both machines (ie. the init script completed successfully)? I believe that if the init script doesn't detect that xapi has started correctly it might kill it. This is about the only thing we can think of that might cause the problem you described.

i was running a script which repeats vm-migrate a vm between two hosts.
the error was after many of successful vm-migrate runs.
so i don't think it's related to the init script.

YAMAMOTO Takashi

> 
> Cheers,
> 
> Jon
> 
> 
> On 18 Jun 2010, at 03:45, YAMAMOTO Takashi wrote:
> 
>> hi,
>> 
>> i got another error on vm-migrate.
>> "signal -7" in the log seems intersting.  does this ring your bell?
>> 
>> YAMAMOTO Takashi
>> 
>> + date
>> Thu Jun 17 21:51:44 JST 2010
>> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
>> Lost connection to the server.
>> 
>> /var/log/messages:
>> 
>> Jun 17 21:51:40 s1 ovs-cfg-mod: 00007|cfg_mod|INFO|-port.vif4958.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
>> Jun 17 21:51:41 s1 xapi: [ warn|s1|2416799 unix-RPC|VM.pool_migrate R:832813c0722b|hotplug] Warning, deleting 'vif' entry from /xapi/4958/hotplug/vif/1
>> Jun 17 21:51:41 s1 xapi: [error|s1|90 xal_listen|VM (domid: 4958) device_event = ChangeUncooperative false D:0e953bd99071|event] device_event could not be processed because VM record not in database
>> Jun 17 21:51:47 s1 xapi: [ warn|s1|2417066 inet_rpc|VM.pool_migrate R:fee54e870a4e|xapi] memory_required_bytes = 1080033280 > memory_static_max = 1073741824; clipping
>> Jun 17 21:51:57 s1 xenguest: Determined the following parameters from xenstore:
>> Jun 17 21:51:57 s1 xenguest: vcpu/number:1 vcpu/affinity:0 vcpu/weight:0 vcpu/cap:0 nx: 0 viridian: 1 apic: 1 acpi: 1 pae: 1 acpi_s4: 0 acpi_s3: 0
>> Jun 17 21:52:43 s1 scripts-vif: Called as "add vif" domid:4959 devid:1 mode:vswitch
>> Jun 17 21:52:44 s1 scripts-vif: Called as "online vif" domid:4959 devid:1 mode:vswitch
>> Jun 17 21:52:46 s1 scripts-vif: Adding vif4959.1 to xenbr0 with address fe:ff:ff:ff:ff:ff
>> Jun 17 21:52:46 s1 ovs-vsctl: Called as br-to-vlan xenbr0
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00001|cfg|INFO|using "/etc/ovs-vswitchd.conf" as configuration file, "/etc/.ovs-vswitchd.conf.~lock~" as lock file
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00002|cfg_mod|INFO|configuration changes:
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00003|cfg_mod|INFO|+bridge.xenbr0.port=vif4959.1
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00004|cfg_mod|INFO|+port.vif4959.1.net-uuid=9ca059b1-ac1e-8d3f-ff19-e5e74f7b7392
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00005|cfg_mod|INFO|+port.vif4959.1.vif-mac=2e:17:01:b0:05:fb
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00006|cfg_mod|INFO|+port.vif4959.1.vif-uuid=271f0001-06ca-c9ca-cabc-dc79f412d925
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00007|cfg_mod|INFO|+port.vif4959.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
>> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] received signal -7
>> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] xapi watchdog exiting.
>> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] Fatal: xapi died with signal -7: not restarting (watchdog never restarts on this signal)
>> Jun 17 21:55:11 s1 python: PERFMON: caught IOError: (socket error (111, 'Connection refused')) - restarting XAPI session
>> Jun 17 22:00:02 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:04:48 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:09:58 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:14:52 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:19:38 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> 
>>> hi,
>>> 
>>> thanks.  i'll take a look at the log if it happens again.
>>> 
>>> YAMAMOTO Takashi
>>> 
>>>> This is usually the result of a failure earier on. Could you grep through the logs to get the whole trace of what went on? Best thing to do is grep for VM.pool_migrate, then find the task reference (the hex string beginning with 'R:' immediately after the 'VM.pool_migrate') and grep for this string in the logs on both the source and destination machines. 
>>>> 
>>>> Have a look  through these, and if it's still not obvious what went wrong, post them to the list and we can have a look.
>>>> 
>>>> Cheers,
>>>> 
>>>> Jon
>>>> 
>>>> 
>>>> On 16 Jun 2010, at 07:19, YAMAMOTO Takashi wrote:
>>>> 
>>>>> hi,
>>>>> 
>>>>> after making my sr driver defer the attach operation as you suggested,
>>>>> i got migration work.  thanks!
>>>>> 
>>>>> however, when repeating live migration between two hosts for testing,
>>>>> i got the following error.  it doesn't seem so reproducable.
>>>>> do you have any idea?
>>>>> 
>>>>> YAMAMOTO Takashi
>>>>> 
>>>>> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
>>>>> An error occurred during the migration process.
>>>>> vm: 23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 (CentOS53x64-1)
>>>>> source: eea41bdd-d2ce-4a9a-bc51-1ca286320296 (s6)
>>>>> destination: 67b8b07b-8c50-4677-a511-beb196ea766f (s1)
>>>>> msg: Caught exception INTERNAL_ERROR: [ Xapi_vm_migrate.Remote_failed("unmarshalling result code from remote") ] at last minute during migration
>>>>> 
>>>>>> hi,
>>>>>> 
>>>>>> i'll try deferring the attach operation to vdi_activate.
>>>>>> thanks!
>>>>>> 
>>>>>> YAMAMOTO Takashi
>>>>>> 
>>>>>>> Yup, vdi activate is the way forward.
>>>>>>> 
>>>>>>> If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 'get_driver_info' response, xapi will call the following during the start-migrate-shutdown lifecycle:
>>>>>>> 
>>>>>>> VM start:
>>>>>>> 
>>>>>>> host A: VDI.attach
>>>>>>> host A: VDI.activate
>>>>>>> 
>>>>>>> VM migrate:
>>>>>>> 
>>>>>>> host B: VDI.attach
>>>>>>> 
>>>>>>> (VM pauses on host A)
>>>>>>> 
>>>>>>> host A: VDI.deactivate
>>>>>>> host B: VDI.activate
>>>>>>> 
>>>>>>> (VM unpauses on host B)
>>>>>>> 
>>>>>>> host A: VDI.detach
>>>>>>> 
>>>>>>> VM shutdown:
>>>>>>> 
>>>>>>> host B: VDI.deactivate
>>>>>>> host B: VDI.detach
>>>>>>> 
>>>>>>> so the disk is never activated on both hosts at once, but it does still go through a period when it is attached to both hosts at once. So you could, for example, check that the disk *could* be attached on the vdi_attach SMAPI call, and actually attach it properly on the vdi_activate call.
>>>>>>> 
>>>>>>> Hope this helps,
>>>>>>> 
>>>>>>> Jon
>>>>>>> 
>>>>>>> 
>>>>>>> On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:
>>>>>>> 
>>>>>>>> hi,
>>>>>>>> 
>>>>>>>> on vm-migrate, xapi attaches a vdi on the migrate-to host
>>>>>>>> before detaching it on the migrate-from host.
>>>>>>>> unfortunately it doesn't work for our product, which doesn't
>>>>>>>> provide a way to attach a volume to multiple hosts at the same time.
>>>>>>>> is VDI_ACTIVATE something what i can use as a workaround?
>>>>>>>> or any other suggestions?
>>>>>>>> 
>>>>>>>> YAMAMOTO Takashi
>>>>>>>> 
>>>>>>>> _______________________________________________
>>>>>>>> Xen-devel mailing list
>>>>>>>> Xen-devel@lists.xensource.com
>>>>>>>> http://lists.xensource.com/xen-devel
>>>>>>> 
>>>>>>> 
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xensource.com
>>>>>>> http://lists.xensource.com/xen-devel
>>>>>> 
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xensource.com
>>>>>> http://lists.xensource.com/xen-devel
>>>> 
>>>> 
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2010-06-21  2:41 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-07  8:26 XCP: sr driver question wrt vm-migrate YAMAMOTO Takashi
2010-06-07 12:29 ` Jonathan Ludlam
2010-06-08  7:11   ` YAMAMOTO Takashi
2010-06-16  6:19     ` YAMAMOTO Takashi
2010-06-16 12:06       ` Jonathan Ludlam
2010-06-17  9:52         ` YAMAMOTO Takashi
2010-06-18  2:45           ` XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate) YAMAMOTO Takashi
2010-06-18 12:53             ` Jonathan Ludlam
2010-06-18 15:21               ` Jeremy Fitzhardinge
2010-06-18 21:54                 ` Vincent Hanquez
2010-06-19  7:54                   ` Jeremy Fitzhardinge
2010-06-21  2:41               ` YAMAMOTO Takashi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).