From mboxrd@z Thu Jan 1 00:00:00 1970 From: Matt Wilson Subject: Re: Is: linux, xenbus mutex hangs when rebooting dom0 and guests hung." Was:Re: test report for Xen 4.3 RC1 Date: Sun, 10 Nov 2013 12:20:18 -0800 Message-ID: <20131110202018.GA20536@u109add4315675089e695.ant.amazon.com> References: <1B4B44D9196EFF41AE41FDA404FC0A1001AE1B1D@SHSMSX102.ccr.corp.intel.com> <20130528151537.GS724@phenom.dumpdata.com> <20130528152156.GB3027@phenom.dumpdata.com> <20131108162121.GA25007@phenom.dumpdata.com> Mime-Version: 1.0 Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit Return-path: Content-Disposition: inline In-Reply-To: <20131108162121.GA25007@phenom.dumpdata.com> List-Unsubscribe: , List-Post: List-Help: List-Subscribe: , Sender: xen-devel-bounces@lists.xen.org Errors-To: xen-devel-bounces@lists.xen.org To: Konrad Rzeszutek Wilk Cc: "Ren, Yongjie" , "Tian, Yongxue" , george.dunlap@eu.citrix.com, xen@bugs.xenproject.org, "Xu, YongweiX" , "xen-devel@lists.xen.org" , "Liu, SongtaoX" List-Id: xen-devel@lists.xenproject.org On Fri, Nov 08, 2013 at 11:21:21AM -0500, Konrad Rzeszutek Wilk wrote: [...] > This patch should solve it: > From 228bb2fcde1267ed2a0b0d386f54d79ecacd0eb4 Mon Sep 17 00:00:00 2001 > From: Konrad Rzeszutek Wilk > Date: Fri, 8 Nov 2013 10:48:58 -0500 > Subject: [PATCH] xen/xenbus: Avoid synchronous wait on XenBus stalling > shutdown/restart. > > The 'read_reply' works with 'process_msg' to read of a reply in XenBus. > 'process_msg' is running from within the 'xenbus' thread. Whenever > a message shows up in XenBus it is put on a xs_state.reply_list list > and 'read_reply' picks it up. > > The problem is if the backend domain or the xenstored process is killed. > In which case 'xenbus' is still awaiting - and 'read_reply' if called - > stuck forever waiting for the reply_list to have some contents. > > This is normally not a problem - as the backend domain can come back > or the xenstored process can be restarted. However if the domain > is in process of being powered off/restarted/halted - there is no > point of waiting on it coming back - as we are effectively being > terminated and should not impede the progress. > > This patch solves this problem by checking the 'system_state' value > to see if we are in heading towards death. We also make the wait > mechanism a bit more asynchronous. > > Fixes-Bug: http://bugs.xenproject.org/xen/bug/8 > Signed-off-by: Konrad Rzeszutek Wilk Makes sense to me. Acked-by: Matt Wilson > --- > drivers/xen/xenbus/xenbus_xs.c | 24 +++++++++++++++++++++--- > 1 files changed, 21 insertions(+), 3 deletions(-) > > diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c > index b6d5fff..177fb19 100644 > --- a/drivers/xen/xenbus/xenbus_xs.c > +++ b/drivers/xen/xenbus/xenbus_xs.c > @@ -148,9 +148,24 @@ static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len) > > while (list_empty(&xs_state.reply_list)) { > spin_unlock(&xs_state.reply_lock); > - /* XXX FIXME: Avoid synchronous wait for response here. */ > - wait_event(xs_state.reply_waitq, > - !list_empty(&xs_state.reply_list)); > + wait_event_timeout(xs_state.reply_waitq, > + !list_empty(&xs_state.reply_list), > + msecs_to_jiffies(500)); > + > + /* > + * If we are in the process of being shut-down there is > + * no point of trying to contact XenBus - it is either > + * killed (xenstored application) or the other domain > + * has been killed or is unreachable. > + */ > + switch (system_state) { > + case SYSTEM_POWER_OFF: > + case SYSTEM_RESTART: > + case SYSTEM_HALT: > + return ERR_PTR(-EIO); > + default: > + break; > + } > spin_lock(&xs_state.reply_lock); > } > > @@ -215,6 +230,9 @@ void *xenbus_dev_request_and_reply(struct xsd_sockmsg *msg) > > mutex_unlock(&xs_state.request_mutex); > > + if (IS_ERR(ret)) > + return ret; > + > if ((msg->type == XS_TRANSACTION_END) || > ((req_msg.type == XS_TRANSACTION_START) && > (msg->type == XS_ERROR)))