From: yamamoto@valinux.co.jp (YAMAMOTO Takashi)
To: Jonathan.Ludlam@eu.citrix.com
Cc: xen-devel@lists.xensource.com
Subject: Re: XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate)
Date: Mon, 21 Jun 2010 11:41:34 +0900 (JST) [thread overview]
Message-ID: <20100621024134.4EA2F7190F@kuma.localdomain> (raw)
In-Reply-To: Your message of "Fri, 18 Jun 2010 13:53:49 +0100" <13AE4CCE-1C46-4B1C-A98D-15E11A764319@eu.citrix.com>
hi,
> Very strange indeed. -7 is SIGKILL.
>
> Firstly, is this 0.1.1 or 0.5-RC? If it's 0.1.1 could you retry it on 0.5 just to check if it's already been fixed?
it's 0.1.1.
i'm not sure when i can try 0.5-RC.
>
> Secondly, can you check whether xapi has started properly on both machines (ie. the init script completed successfully)? I believe that if the init script doesn't detect that xapi has started correctly it might kill it. This is about the only thing we can think of that might cause the problem you described.
i was running a script which repeats vm-migrate a vm between two hosts.
the error was after many of successful vm-migrate runs.
so i don't think it's related to the init script.
YAMAMOTO Takashi
>
> Cheers,
>
> Jon
>
>
> On 18 Jun 2010, at 03:45, YAMAMOTO Takashi wrote:
>
>> hi,
>>
>> i got another error on vm-migrate.
>> "signal -7" in the log seems intersting. does this ring your bell?
>>
>> YAMAMOTO Takashi
>>
>> + date
>> Thu Jun 17 21:51:44 JST 2010
>> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
>> Lost connection to the server.
>>
>> /var/log/messages:
>>
>> Jun 17 21:51:40 s1 ovs-cfg-mod: 00007|cfg_mod|INFO|-port.vif4958.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
>> Jun 17 21:51:41 s1 xapi: [ warn|s1|2416799 unix-RPC|VM.pool_migrate R:832813c0722b|hotplug] Warning, deleting 'vif' entry from /xapi/4958/hotplug/vif/1
>> Jun 17 21:51:41 s1 xapi: [error|s1|90 xal_listen|VM (domid: 4958) device_event = ChangeUncooperative false D:0e953bd99071|event] device_event could not be processed because VM record not in database
>> Jun 17 21:51:47 s1 xapi: [ warn|s1|2417066 inet_rpc|VM.pool_migrate R:fee54e870a4e|xapi] memory_required_bytes = 1080033280 > memory_static_max = 1073741824; clipping
>> Jun 17 21:51:57 s1 xenguest: Determined the following parameters from xenstore:
>> Jun 17 21:51:57 s1 xenguest: vcpu/number:1 vcpu/affinity:0 vcpu/weight:0 vcpu/cap:0 nx: 0 viridian: 1 apic: 1 acpi: 1 pae: 1 acpi_s4: 0 acpi_s3: 0
>> Jun 17 21:52:43 s1 scripts-vif: Called as "add vif" domid:4959 devid:1 mode:vswitch
>> Jun 17 21:52:44 s1 scripts-vif: Called as "online vif" domid:4959 devid:1 mode:vswitch
>> Jun 17 21:52:46 s1 scripts-vif: Adding vif4959.1 to xenbr0 with address fe:ff:ff:ff:ff:ff
>> Jun 17 21:52:46 s1 ovs-vsctl: Called as br-to-vlan xenbr0
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00001|cfg|INFO|using "/etc/ovs-vswitchd.conf" as configuration file, "/etc/.ovs-vswitchd.conf.~lock~" as lock file
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00002|cfg_mod|INFO|configuration changes:
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00003|cfg_mod|INFO|+bridge.xenbr0.port=vif4959.1
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00004|cfg_mod|INFO|+port.vif4959.1.net-uuid=9ca059b1-ac1e-8d3f-ff19-e5e74f7b7392
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00005|cfg_mod|INFO|+port.vif4959.1.vif-mac=2e:17:01:b0:05:fb
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00006|cfg_mod|INFO|+port.vif4959.1.vif-uuid=271f0001-06ca-c9ca-cabc-dc79f412d925
>> Jun 17 21:52:49 s1 ovs-cfg-mod: 00007|cfg_mod|INFO|+port.vif4959.1.vm-uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592
>> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] received signal -7
>> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] xapi watchdog exiting.
>> Jun 17 21:52:51 s1 xapi: [ info|s1|0 thread_zero||watchdog] Fatal: xapi died with signal -7: not restarting (watchdog never restarts on this signal)
>> Jun 17 21:55:11 s1 python: PERFMON: caught IOError: (socket error (111, 'Connection refused')) - restarting XAPI session
>> Jun 17 22:00:02 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:04:48 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:09:58 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:14:52 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>> Jun 17 22:19:38 s1 python: PERFMON: caught socket.error: (111 Connection refused) - restarting XAPI session
>>
>>> hi,
>>>
>>> thanks. i'll take a look at the log if it happens again.
>>>
>>> YAMAMOTO Takashi
>>>
>>>> This is usually the result of a failure earier on. Could you grep through the logs to get the whole trace of what went on? Best thing to do is grep for VM.pool_migrate, then find the task reference (the hex string beginning with 'R:' immediately after the 'VM.pool_migrate') and grep for this string in the logs on both the source and destination machines.
>>>>
>>>> Have a look through these, and if it's still not obvious what went wrong, post them to the list and we can have a look.
>>>>
>>>> Cheers,
>>>>
>>>> Jon
>>>>
>>>>
>>>> On 16 Jun 2010, at 07:19, YAMAMOTO Takashi wrote:
>>>>
>>>>> hi,
>>>>>
>>>>> after making my sr driver defer the attach operation as you suggested,
>>>>> i got migration work. thanks!
>>>>>
>>>>> however, when repeating live migration between two hosts for testing,
>>>>> i got the following error. it doesn't seem so reproducable.
>>>>> do you have any idea?
>>>>>
>>>>> YAMAMOTO Takashi
>>>>>
>>>>> + xe vm-migrate live=true uuid=23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 host=67b8b07b-8c50-4677-a511-beb196ea766f
>>>>> An error occurred during the migration process.
>>>>> vm: 23ecfa58-aa30-ea6a-f9fe-7cb2a5487592 (CentOS53x64-1)
>>>>> source: eea41bdd-d2ce-4a9a-bc51-1ca286320296 (s6)
>>>>> destination: 67b8b07b-8c50-4677-a511-beb196ea766f (s1)
>>>>> msg: Caught exception INTERNAL_ERROR: [ Xapi_vm_migrate.Remote_failed("unmarshalling result code from remote") ] at last minute during migration
>>>>>
>>>>>> hi,
>>>>>>
>>>>>> i'll try deferring the attach operation to vdi_activate.
>>>>>> thanks!
>>>>>>
>>>>>> YAMAMOTO Takashi
>>>>>>
>>>>>>> Yup, vdi activate is the way forward.
>>>>>>>
>>>>>>> If you advertise VDI_ACTIVATE and VDI_DEACTIVATE in the 'get_driver_info' response, xapi will call the following during the start-migrate-shutdown lifecycle:
>>>>>>>
>>>>>>> VM start:
>>>>>>>
>>>>>>> host A: VDI.attach
>>>>>>> host A: VDI.activate
>>>>>>>
>>>>>>> VM migrate:
>>>>>>>
>>>>>>> host B: VDI.attach
>>>>>>>
>>>>>>> (VM pauses on host A)
>>>>>>>
>>>>>>> host A: VDI.deactivate
>>>>>>> host B: VDI.activate
>>>>>>>
>>>>>>> (VM unpauses on host B)
>>>>>>>
>>>>>>> host A: VDI.detach
>>>>>>>
>>>>>>> VM shutdown:
>>>>>>>
>>>>>>> host B: VDI.deactivate
>>>>>>> host B: VDI.detach
>>>>>>>
>>>>>>> so the disk is never activated on both hosts at once, but it does still go through a period when it is attached to both hosts at once. So you could, for example, check that the disk *could* be attached on the vdi_attach SMAPI call, and actually attach it properly on the vdi_activate call.
>>>>>>>
>>>>>>> Hope this helps,
>>>>>>>
>>>>>>> Jon
>>>>>>>
>>>>>>>
>>>>>>> On 7 Jun 2010, at 09:26, YAMAMOTO Takashi wrote:
>>>>>>>
>>>>>>>> hi,
>>>>>>>>
>>>>>>>> on vm-migrate, xapi attaches a vdi on the migrate-to host
>>>>>>>> before detaching it on the migrate-from host.
>>>>>>>> unfortunately it doesn't work for our product, which doesn't
>>>>>>>> provide a way to attach a volume to multiple hosts at the same time.
>>>>>>>> is VDI_ACTIVATE something what i can use as a workaround?
>>>>>>>> or any other suggestions?
>>>>>>>>
>>>>>>>> YAMAMOTO Takashi
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> Xen-devel mailing list
>>>>>>>> Xen-devel@lists.xensource.com
>>>>>>>> http://lists.xensource.com/xen-devel
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> Xen-devel mailing list
>>>>>>> Xen-devel@lists.xensource.com
>>>>>>> http://lists.xensource.com/xen-devel
>>>>>>
>>>>>> _______________________________________________
>>>>>> Xen-devel mailing list
>>>>>> Xen-devel@lists.xensource.com
>>>>>> http://lists.xensource.com/xen-devel
>>>>
>>>>
>>>> _______________________________________________
>>>> Xen-devel mailing list
>>>> Xen-devel@lists.xensource.com
>>>> http://lists.xensource.com/xen-devel
>
prev parent reply other threads:[~2010-06-21 2:41 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2010-06-07 8:26 XCP: sr driver question wrt vm-migrate YAMAMOTO Takashi
2010-06-07 12:29 ` Jonathan Ludlam
2010-06-08 7:11 ` YAMAMOTO Takashi
2010-06-16 6:19 ` YAMAMOTO Takashi
2010-06-16 12:06 ` Jonathan Ludlam
2010-06-17 9:52 ` YAMAMOTO Takashi
2010-06-18 2:45 ` XCP: signal -7 (Re: XCP: sr driver question wrt vm-migrate) YAMAMOTO Takashi
2010-06-18 12:53 ` Jonathan Ludlam
2010-06-18 15:21 ` Jeremy Fitzhardinge
2010-06-18 21:54 ` Vincent Hanquez
2010-06-19 7:54 ` Jeremy Fitzhardinge
2010-06-21 2:41 ` YAMAMOTO Takashi [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20100621024134.4EA2F7190F@kuma.localdomain \
--to=yamamoto@valinux.co.jp \
--cc=Jonathan.Ludlam@eu.citrix.com \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).