From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?windows-1252?Q?Roger_Pau_Monn=E9?= Subject: Re: Second regression due to libxl: Remove linux udev rules (2ba368d13893402b2f1fb3c283ddcc714659dd9b) Date: Tue, 4 Aug 2015 10:14:32 +0200 Message-ID: <55C07468.4040909@citrix.com> References: <20150728194741.GA13430@l.oracle.com> <55B9E614.4040504@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="windows-1252" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail6.bemta3.messagelabs.com ([195.245.230.39]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZMXMt-0004rZ-7o for xen-devel@lists.xenproject.org; Tue, 04 Aug 2015 08:14:39 +0000 In-Reply-To: <55B9E614.4040504@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk , george.dunlap@eu.citrix.com, xen-devel@lists.xenproject.org, wei.liu2@citrix.com, Ian Campbell List-Id: xen-devel@lists.xenproject.org El 30/07/15 a les 10.53, Roger Pau Monn=E9 ha escrit: > El 28/07/15 a les 21.47, Konrad Rzeszutek Wilk ha escrit: >> Hey, >> >> I launch a bunch of guests at the same time or in parallel and = >> the scripts end up timing out with: >> >> >> Parsing config from //g-vm8.cfg >> WARNING: you seem to be using "kernel" directive to override HVM guest f= irmware. Ignore that. Use "firmware_override" instead if you really want a = non-default firmware >> Jul 28 19:20:53 tst036 logger: /etc/xen/scripts/block: add XENBUS_PATH= =3Dbackend/vbd/13/5632 >> libxl: error: libxl_aoutils.c:539:async_exec_timeout: killing execution = of /etc/xen/scripts/block add because of timeout >> libxl: error: libxl_create.c:1157:domcreate_launch_dm: unable to add dis= k devices >> libxl: error: libxl_dm.c:1955:kill_device_model: unable to find device m= odel pid in /local/domain/13/image/device-model-pid >> libxl: error: libxl.c:1606:libxl__destroy_domid: libxl__destroy_device_m= odel failed for 13 >> Jul 28 19:21:03 tst036 logger: /etc/xen/scripts/block: remove XENBUS_PAT= H=3Dbackend/vbd/13/5632 >> Jul 28 19:21:04 tst036 logger: /etc/xen/scripts/block: Writing backend/v= bd/13/5632/hotplug-error xenstore-read backend/vbd/13/5632/node failed. bac= kend/vbd/13/5632/hotplug-status error to xenstore. >> Jul 28 19:21:04 tst036 logger: /etc/xen/scripts/block: xenstore-read bac= kend/vbd/13/5632/node failed. >> Jul 28 19:21:05 tst036 logger: /etc/xen/scripts/block: Writing backend/v= bd/13/5632/hotplug-error /etc/xen/scripts/block failed; error detected. bac= kend/vbd/13/5632/hotplug-status error to xenstore. >> Jul 28 19:21:05 tst036 logger: /etc/xen/scripts/block: /etc/xen/scripts/= block failed; error detected. >> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen/s= cripts/block remove [10344] exited with error status 1 >> libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: script:= /etc/xen/scripts/block failed; error detected. >> libxl: error: libxl.c:1569:libxl__destroy_domid: non-existant domain 13 >> libxl: error: libxl.c:1527:domain_destroy_callback: unable to destroy gu= est with domid 13 >> libxl: error: libxl.c:1454:domain_destroy_cb: destruction of domain 13 f= ailed >> >> And I cannot start the guest. >> >> While if I revert the mentioned commit everything works peachy. >> >> What is interesting is that if I have the revert I can see that the >> >> Jul 28 19:39:03 tst036 logger: /etc/xen/scripts/block: Writing backend/v= bd/14/5632/physical-device 7:d to xenstore. >> Jul 28 19:39:03 tst036 logger: /etc/xen/scripts/block: Writing backend/v= bd/14/5632/hotplug-status connected to xenstore. >> >> or often done much much later after xl create has started. >> >> Attached is the bad log and the good log. > = > Can you do the same test with xl -vvv and the following patch applied = > (with and without 2ba368 reverted): Ping? I've looked into this, and AFAICT you were probably using the udev = rules (you have run_hotplug_scripts=3D0 in xl.conf?) before 2ba368, and = now you are forcefully switched to launching hotplug scripts from libxl. The issue is that you have multiple guests all using the same image = file, so the time to execute the block hotplug script is O(n), where n = is the number of times the same image is used: shared_list=3D$(losetup -a | sed -n -e "s@^\([^:]\+\)\(:[[:blank:]]\[0*${dev}\]:${inode}[[:blank:]= ](.*)\)@\1@p" ) for dev in $shared_list do if [ -n "$dev" ] then check_file_sharing "$file" "$dev" "$mode" fi done This was not a problem when using udev, because there's no timeout, but = libxl has a hard timeout (10s) regarding hotplug script execution. The = only way I see to solve this is to remove the checks done in the block = hotplug script, or to increase the timeout (but since the execution = time is not bounded this is doomed to fail if enough guests are using = the same image). Roger.