From mboxrd@z Thu Jan 1 00:00:00 1970 From: "Daniel P. Berrange" Subject: [PATCH] Error reporting for hotplug scripts Date: Wed, 7 Feb 2007 02:55:17 +0000 Message-ID: <20070207025516.GA29320@redhat.com> Reply-To: "Daniel P. Berrange" Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="+QahgC5+KEYLbs62" Return-path: Content-Disposition: inline List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xensource.com Errors-To: xen-devel-bounces@lists.xensource.com To: xen-devel@lists.xensource.com List-Id: xen-devel@lists.xenproject.org --+QahgC5+KEYLbs62 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline The current way the hotplug scripts report errors is sub-optimal, rarely giving the user enough information to diagnose the problem. This is particularly true of the networking scripts whose typical failure mode is very obsecure. eg, if I configure a guest to use bridge7 which does not exist, I will be told: # xm create demo ....60 seconds passes... Error: Device 0 (vif) could not be connected. Hotplug scripts not working. Having looked at the hotplug scripts & the way things tie into XenD it looks like we can do much better. Currently one of 3 things happens: - The script explicitly calls 'fatal' which sets the 'hotplug-status' field in xenstore to 'error'. The real error message is sent to syslog, which if you're lucky, may end up in /var/log/messages - A command exits with non-zero exit status, resulting in the signal trap being executed. This in turn calls 'fatal', with no useful error messages - For block devices where a sharing mode violation is detected, the 'hotplug-status' is set to 'busy' and a real error message is also set in xenstored under the 'hotplug-error' field. The latter case is the interesting one, because when it sees a 'busy' status code, XenD will extract the actual error message from 'hotplug-error' and feed it all the way back to the end user/tool. Basically I want the first two cases to also behave like the error, so that whenever anything goes wrong in the hotplug scripts, the user always gets a useful error message back. It also avoids having to wait for the 60 second hotplug timeout. The first case was easy to fix in this way - simply change XenD so that it always looks for 'hotplug-error' for 'fatal' codes as well as 'busy' codes. The second case is a little tricker because we have to identify commands in the hotplug scripts which are likely to fail, and add explicit handling to enable meaningful error feedback. I've done such analysis for the vif-bridge script and realized that the most likely cause of failure for any command in this script is a missing bridge device. It is trivial to do an upfront check for existance of the bridge device and immediately feed an error back to the user. I'm using 'ip link show $bridge' to detect whether the bridge exists or not. I expect this is reasonably portable, but as always its worth people double-checking it works on their particular distro. I'm attaching a patch which implements both of these fixups. The net result is that now when creating a guest with a config refering to a non-existant bridge device I see: # xm create demo Error: Device 0 (vif) could not be connected. Could not find bridge device xenbr7 There may well be other places in the hotplug scripts which need fixing up - we can address these as they turn up if we still see people getting the generic 'Hotplug scripts no working' message. Signed-off-by: Daniel P. Berrange Regards, Dan. -- |=- Red Hat, Engineering, Emerging Technologies, Boston. +1 978 392 2496 -=| |=- Perl modules: http://search.cpan.org/~danberr/ -=| |=- Projects: http://freshmeat.net/~danielpb/ -=| |=- GnuPG: 7D3B9505 F3C9 553F A1DA 4AC2 5648 23C1 B3DF F742 7D3B 9505 -=| --+QahgC5+KEYLbs62 Content-Type: text/plain; charset=us-ascii Content-Disposition: attachment; filename="xen-hotplug-error-reporting.patch" diff -r e212203d7d34 tools/python/xen/xend/server/DevController.py --- a/tools/python/xen/xend/server/DevController.py Tue Feb 06 23:01:35 2007 +0000 +++ b/tools/python/xen/xend/server/DevController.py Fri Feb 02 10:32:00 2007 -0500 @@ -153,9 +153,9 @@ class DevController: log.debug("Waiting for %s.", devid) if not self.hotplug: - return - - status = self.waitForBackend(devid) + return + + (status, err) = self.waitForBackend(devid) if status == Timeout: self.destroyDevice(devid, False) @@ -165,25 +165,22 @@ class DevController: elif status == Error: self.destroyDevice(devid, False) - raise VmError("Device %s (%s) could not be connected. " - "Backend device not found." % - (devid, self.deviceClass)) - + if err is None: + raise VmError("Device %s (%s) could not be connected. " + "Backend device not found." % + (devid, self.deviceClass)) + else: + raise VmError("Device %s (%s) could not be connected. " + "%s" % (devid, self.deviceClass, err)) elif status == Missing: # Don't try to destroy the device; it's already gone away. raise VmError("Device %s (%s) could not be connected. " "Device not found." % (devid, self.deviceClass)) elif status == Busy: - err = None - frontpath = self.frontendPath(devid) - backpath = xstransact.Read(frontpath, "backend") - if backpath: - err = xstransact.Read(backpath, HOTPLUG_ERROR_NODE) - if not err: + self.destroyDevice(devid, False) + if err is None: err = "Busy." - - self.destroyDevice(devid, False) raise VmError("Device %s (%s) could not be connected.\n%s" % (devid, self.deviceClass, err)) @@ -478,17 +475,21 @@ class DevController: frontpath = self.frontendPath(devid) backpath = xstransact.Read(frontpath, "backend") + if backpath: statusPath = backpath + '/' + HOTPLUG_STATUS_NODE ev = Event() result = { 'status': Timeout } - + xswatch(statusPath, hotplugStatusCallback, ev, result) ev.wait(DEVICE_CREATE_TIMEOUT) - return result['status'] - else: - return Missing + + err = xstransact.Read(backpath, HOTPLUG_ERROR_NODE) + + return (result['status'], err) + else: + return (Missing, None) def backendPath(self, backdom, devid): diff -r e212203d7d34 tools/examples/vif-bridge --- a/tools/examples/vif-bridge Tue Feb 06 23:01:35 2007 +0000 +++ b/tools/examples/vif-bridge Tue Feb 06 21:04:31 2007 -0500 @@ -46,6 +46,13 @@ then fi fi +RET=0 +ip link show $bridge 1>/dev/null 2>&1 || RET=1 +if [ "$RET" -eq 1 ] +then + fatal "Could not find bridge device $bridge" +fi + case "$command" in online) setup_bridge_port "$vif" diff -r e212203d7d34 tools/examples/xen-hotplug-common.sh --- a/tools/examples/xen-hotplug-common.sh Tue Feb 06 23:01:35 2007 +0000 +++ b/tools/examples/xen-hotplug-common.sh Tue Feb 06 21:03:49 2007 -0500 @@ -28,14 +28,15 @@ unset $(set | grep ^LC_ | cut -d= -f1) unset $(set | grep ^LC_ | cut -d= -f1) fatal() { - xenstore_write "$XENBUS_PATH"/hotplug-status error + xenstore_write "$XENBUS_PATH/hotplug-error" "$*" \ + "$XENBUS_PATH/hotplug-status" error log err "$@" exit 1 } success() { # Tell DevController that backend is "connected" - xenstore_write "$XENBUS_PATH"/hotplug-status connected + xenstore_write "$XENBUS_PATH/hotplug-status" connected } do_or_die() { --+QahgC5+KEYLbs62 Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Content-Disposition: inline _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel --+QahgC5+KEYLbs62--