From mboxrd@z Thu Jan 1 00:00:00 1970 From: Konrad Rzeszutek Wilk Subject: Re: Second regression due to libxl: Remove linux udev rules (2ba368d13893402b2f1fb3c283ddcc714659dd9b) Date: Fri, 7 Aug 2015 10:54:40 -0400 Message-ID: <20150807145440.GK29527@l.oracle.com> References: <20150728194741.GA13430@l.oracle.com> <55B9E614.4040504@citrix.com> <55C07468.4040909@citrix.com> Mime-Version: 1.0 Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable Return-path: Received: from mail6.bemta5.messagelabs.com ([195.245.231.135]) by lists.xen.org with esmtp (Exim 4.72) (envelope-from ) id 1ZNj2p-0001dI-Dm for xen-devel@lists.xenproject.org; Fri, 07 Aug 2015 14:54:51 +0000 Content-Disposition: inline In-Reply-To: <55C07468.4040909@citrix.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Roger Pau =?iso-8859-1?Q?Monn=E9?= Cc: george.dunlap@eu.citrix.com, xen-devel@lists.xenproject.org, wei.liu2@citrix.com, Ian Campbell List-Id: xen-devel@lists.xenproject.org On Tue, Aug 04, 2015 at 10:14:32AM +0200, Roger Pau Monn=E9 wrote: > El 30/07/15 a les 10.53, Roger Pau Monn=E9 ha escrit: > > El 28/07/15 a les 21.47, Konrad Rzeszutek Wilk ha escrit: > >> Hey, > >> > >> I launch a bunch of guests at the same time or in parallel and = > >> the scripts end up timing out with: > >> > >> > >> Parsing config from //g-vm8.cfg > >> WARNING: you seem to be using "kernel" directive to override HVM guest= firmware. Ignore that. Use "firmware_override" instead if you really want = a non-default firmware > >> Jul 28 19:20:53 tst036 logger: /etc/xen/scripts/block: add XENBUS_PATH= =3Dbackend/vbd/13/5632 > >> libxl: error: libxl_aoutils.c:539:async_exec_timeout: killing executio= n of /etc/xen/scripts/block add because of timeout > >> libxl: error: libxl_create.c:1157:domcreate_launch_dm: unable to add d= isk devices > >> libxl: error: libxl_dm.c:1955:kill_device_model: unable to find device= model pid in /local/domain/13/image/device-model-pid > >> libxl: error: libxl.c:1606:libxl__destroy_domid: libxl__destroy_device= _model failed for 13 > >> Jul 28 19:21:03 tst036 logger: /etc/xen/scripts/block: remove XENBUS_P= ATH=3Dbackend/vbd/13/5632 > >> Jul 28 19:21:04 tst036 logger: /etc/xen/scripts/block: Writing backend= /vbd/13/5632/hotplug-error xenstore-read backend/vbd/13/5632/node failed. b= ackend/vbd/13/5632/hotplug-status error to xenstore. > >> Jul 28 19:21:04 tst036 logger: /etc/xen/scripts/block: xenstore-read b= ackend/vbd/13/5632/node failed. > >> Jul 28 19:21:05 tst036 logger: /etc/xen/scripts/block: Writing backend= /vbd/13/5632/hotplug-error /etc/xen/scripts/block failed; error detected. b= ackend/vbd/13/5632/hotplug-status error to xenstore. > >> Jul 28 19:21:05 tst036 logger: /etc/xen/scripts/block: /etc/xen/script= s/block failed; error detected. > >> libxl: error: libxl_exec.c:118:libxl_report_child_exitstatus: /etc/xen= /scripts/block remove [10344] exited with error status 1 > >> libxl: error: libxl_device.c:1085:device_hotplug_child_death_cb: scrip= t: /etc/xen/scripts/block failed; error detected. > >> libxl: error: libxl.c:1569:libxl__destroy_domid: non-existant domain 13 > >> libxl: error: libxl.c:1527:domain_destroy_callback: unable to destroy = guest with domid 13 > >> libxl: error: libxl.c:1454:domain_destroy_cb: destruction of domain 13= failed > >> > >> And I cannot start the guest. > >> > >> While if I revert the mentioned commit everything works peachy. > >> > >> What is interesting is that if I have the revert I can see that the > >> > >> Jul 28 19:39:03 tst036 logger: /etc/xen/scripts/block: Writing backend= /vbd/14/5632/physical-device 7:d to xenstore. > >> Jul 28 19:39:03 tst036 logger: /etc/xen/scripts/block: Writing backend= /vbd/14/5632/hotplug-status connected to xenstore. > >> > >> or often done much much later after xl create has started. > >> > >> Attached is the bad log and the good log. > > = > > Can you do the same test with xl -vvv and the following patch applied = > > (with and without 2ba368 reverted): > = > Ping? Hey! > = > I've looked into this, and AFAICT you were probably using the udev = > rules (you have run_hotplug_scripts=3D0 in xl.conf?) before 2ba368, and = Correct. I think I needed that for driver domains and had left it in there. > now you are forcefully switched to launching hotplug scripts from libxl. OK. > = > The issue is that you have multiple guests all using the same image = > file, so the time to execute the block hotplug script is O(n), where n = > is the number of times the same image is used: > = > shared_list=3D$(losetup -a | > sed -n -e "s@^\([^:]\+\)\(:[[:blank:]]\[0*${dev}\]:${inode}[[:blank= :]](.*)\)@\1@p" ) > for dev in $shared_list > do > if [ -n "$dev" ] > then > check_file_sharing "$file" "$dev" "$mode" > fi > done > = > This was not a problem when using udev, because there's no timeout, but = > libxl has a hard timeout (10s) regarding hotplug script execution. The = > only way I see to solve this is to remove the checks done in the block = > hotplug script, or to increase the timeout (but since the execution = > time is not bounded this is doomed to fail if enough guests are using = > the same image). Ok. I hadn't run your patch yet. Do you want me to run the latest staging instead once more with my test-case? > = > Roger. > =