All of lore.kernel.org
 help / color / mirror / Atom feed
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
To: David Vrabel <david.vrabel@citrix.com>
Cc: Ian.Campbell@citrix.com, xen-devel@lists.xenproject.org,
	linux-kernel@vger.kernel.org, JBeulich@suse.com,
	boris.ostrovsky@oracle.com
Subject: Re: [Xen-devel] [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart.
Date: Mon, 31 Mar 2014 16:33:13 -0400	[thread overview]
Message-ID: <20140331203313.GA6756@phenom.dumpdata.com> (raw)
In-Reply-To: <529C7205.3060406@citrix.com>

On Mon, Dec 02, 2013 at 11:41:57AM +0000, David Vrabel wrote:
> On 26/11/13 16:50, Konrad Rzeszutek Wilk wrote:
> > On Thu, Nov 21, 2013 at 05:52:28PM +0000, David Vrabel wrote:
> >> On 08/11/13 17:38, Konrad Rzeszutek Wilk wrote:
> >>> The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
> >>> 'process_msg' is running from within the 'xenbus' thread. Whenever
> >>> a message shows up in XenBus it is put on a xs_state.reply_list list
> >>> and 'read_reply' picks it up.
> >>>
> >>> The problem is if the backend domain or the xenstored process is killed.
> >>> In which case 'xenbus' is still awaiting - and 'read_reply' if called -
> >>> stuck forever waiting for the reply_list to have some contents.
> >>>
> >>> This is normally not a problem - as the backend domain can come back
> >>> or the xenstored process can be restarted. However if the domain
> >>> is in process of being powered off/restarted/halted - there is no
> >>> point of waiting on it coming back - as we are effectively being
> >>> terminated and should not impede the progress.
> >>>
> >>> This patch solves this problem by checking the 'system_state' value
> >>> to see if we are in heading towards death. We also make the wait
> >>> mechanism a bit more asynchronous.
> >>
> >> This seems to be checking the wrong thing conceptually.  We should abort
> >> the wait if xenstored is dead not if our domain is dying.
> >>
> >> I think you can consider xenstored as dead if:
> >>
> >> a) it's local and we're dying.
> > 
> > OK. Not sure exactly how to do that but that should be possible.
> 
> xen_store_domain_type == XS_LOCAL and looking at system_state?
> 
> >> b) it's remote and the remote domain is dead.
> > 
> > OK, any idea how to do that? As in check if a remote domain is dead?
> 
> Let someone who cares about xenstore domains fix this -- this is not the
> most common use case.
> 
> I'd be happy to have some thing like:
> 
> bool xenbus_ok(void)
> {
>     switch (xen_store_domain_type) {
>     case XS_LOCAL:
>          return system_state != dying;
>     case XS_PV:
>     case XS_HVM;
>          /* FIXME: could check remote domain is alive, but it's
>             normally dom0. */
>          return true;
>     // ...
>     default:
>          return true;
>     }
> }

>From 227d72806311694ced6cdedfd61a05f5bb1893f7 Mon Sep 17 00:00:00 2001
From: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
Date: Fri, 8 Nov 2013 10:48:58 -0500
Subject: [PATCH] xen/xenbus: Avoid synchronous wait on XenBus stalling
 shutdown/restart.

The 'read_reply' works with 'process_msg' to read of a reply in XenBus.
'process_msg' is running from within the 'xenbus' thread. Whenever
a message shows up in XenBus it is put on a xs_state.reply_list list
and 'read_reply' picks it up.

The problem is if the backend domain or the xenstored process is killed.
In which case 'xenbus' is still awaiting - and 'read_reply' if called -
stuck forever waiting for the reply_list to have some contents.

This is normally not a problem - as the backend domain can come back
or the xenstored process can be restarted. However if the domain
is in process of being powered off/restarted/halted - there is no
point of waiting on it coming back - as we are effectively being
terminated and should not impede the progress.

This patch solves this problem by checking whether the guest is
the right domain. If it is an initial domain and hurtling towards
death - there is no point of continuing the wait. All other type
of guests continue with their behavior.
mechanism a bit more asynchronous.

Fixes-Bug: http://bugs.xenproject.org/xen/bug/8
Signed-off-by: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com>
[v2: Fixed it up per David's suggestions]
---
 drivers/xen/xenbus/xenbus_xs.c | 44 +++++++++++++++++++++++++++++++++++++++---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/drivers/xen/xenbus/xenbus_xs.c b/drivers/xen/xenbus/xenbus_xs.c
index b6d5fff..ba804f3 100644
--- a/drivers/xen/xenbus/xenbus_xs.c
+++ b/drivers/xen/xenbus/xenbus_xs.c
@@ -50,6 +50,7 @@
 #include <xen/xenbus.h>
 #include <xen/xen.h>
 #include "xenbus_comms.h"
+#include "xenbus_probe.h"
 
 struct xs_stored_msg {
 	struct list_head list;
@@ -139,6 +140,29 @@ static int get_error(const char *errorstring)
 	return xsd_errors[i].errnum;
 }
 
+static bool xenbus_ok(void)
+{
+	switch (xen_store_domain_type) {
+	case XS_LOCAL:
+		switch (system_state) {
+		case SYSTEM_POWER_OFF:
+		case SYSTEM_RESTART:
+		case SYSTEM_HALT:
+			return false;
+		default:
+			break;
+		}
+		return true;
+	case XS_PV:
+	case XS_HVM:
+		/* FIXME: Could check that the remote domain is alive,
+		 * but it is normally initial domain. */
+		return true;
+	default:
+		break;
+	}
+	return false;
+}
 static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
 {
 	struct xs_stored_msg *msg;
@@ -148,9 +172,20 @@ static void *read_reply(enum xsd_sockmsg_type *type, unsigned int *len)
 
 	while (list_empty(&xs_state.reply_list)) {
 		spin_unlock(&xs_state.reply_lock);
-		/* XXX FIXME: Avoid synchronous wait for response here. */
-		wait_event(xs_state.reply_waitq,
-			   !list_empty(&xs_state.reply_list));
+		if (xenbus_ok())
+			/* XXX FIXME: Avoid synchronous wait for response here. */
+			wait_event_timeout(xs_state.reply_waitq,
+					   !list_empty(&xs_state.reply_list),
+					   msecs_to_jiffies(500));
+		else {
+			/*
+			 * If we are in the process of being shut-down there is
+			 * no point of trying to contact XenBus - it is either
+			 * killed (xenstored application) or the other domain
+			 * has been killed or is unreachable.
+			 */
+			return ERR_PTR(-EIO);
+		}
 		spin_lock(&xs_state.reply_lock);
 	}
 
@@ -215,6 +250,9 @@ void *xenbus_dev_request_and_reply(struct xsd_sockmsg *msg)
 
 	mutex_unlock(&xs_state.request_mutex);
 
+	if (IS_ERR(ret))
+		return ret;
+
 	if ((msg->type == XS_TRANSACTION_END) ||
 	    ((req_msg.type == XS_TRANSACTION_START) &&
 	     (msg->type == XS_ERROR)))
-- 
1.8.5.3


  parent reply	other threads:[~2014-03-31 20:33 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-11-08 17:38 [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) Konrad Rzeszutek Wilk
2013-11-08 17:38 ` [PATCH 1/4] xen/mcfg: Call PHYSDEVOP_pci_mmcfg_reserved for MCFG areas Konrad Rzeszutek Wilk
2013-11-08 17:38 ` Konrad Rzeszutek Wilk
2013-11-21 10:37   ` David Vrabel
2013-11-21 10:37   ` David Vrabel
2013-11-08 17:38 ` [PATCH 2/4] xen/manage: Poweroff forcefully if user-space is not yet up Konrad Rzeszutek Wilk
2013-11-20 21:11   ` Boris Ostrovsky
2013-11-20 21:11   ` Boris Ostrovsky
2013-11-21 11:33   ` David Vrabel
2013-11-26 16:47     ` Konrad Rzeszutek Wilk
2013-11-26 16:47     ` Konrad Rzeszutek Wilk
2014-04-01 15:43     ` Konrad Rzeszutek Wilk
2014-04-01 15:43     ` Konrad Rzeszutek Wilk
2013-11-21 11:33   ` David Vrabel
2013-11-08 17:38 ` Konrad Rzeszutek Wilk
2013-11-08 17:38 ` [PATCH 3/4] xen/manage: Guard against user-space initiated poweroff and XenBus Konrad Rzeszutek Wilk
2013-11-08 17:38 ` Konrad Rzeszutek Wilk
2013-11-20 21:40   ` Boris Ostrovsky
2013-11-20 21:40   ` Boris Ostrovsky
2013-11-21 11:09   ` David Vrabel
2013-11-26 16:45     ` Konrad Rzeszutek Wilk
2013-12-02 11:27       ` David Vrabel
2013-12-02 11:27       ` David Vrabel
2014-03-31 19:09         ` Konrad Rzeszutek Wilk
2014-03-31 19:09         ` Konrad Rzeszutek Wilk
2013-11-26 16:45     ` Konrad Rzeszutek Wilk
2013-11-21 11:09   ` David Vrabel
2014-04-01 13:18   ` David Vrabel
2014-04-01 14:03     ` Konrad Rzeszutek Wilk
2014-04-01 14:03     ` Konrad Rzeszutek Wilk
2014-04-01 13:18   ` David Vrabel
2013-11-08 17:38 ` [PATCH 4/4] xen/xenbus: Avoid synchronous wait on XenBus stalling shutdown/restart Konrad Rzeszutek Wilk
2013-11-08 17:38 ` Konrad Rzeszutek Wilk
2013-11-21 17:52   ` [Xen-devel] " David Vrabel
2013-11-22  9:30     ` Ian Campbell
2013-11-22  9:30     ` [Xen-devel] " Ian Campbell
2013-11-22  9:45       ` Processed: " xen
2013-11-26 16:50     ` Konrad Rzeszutek Wilk
2013-11-26 16:50     ` [Xen-devel] " Konrad Rzeszutek Wilk
2013-12-02 11:41       ` David Vrabel
2014-03-31 20:33         ` Konrad Rzeszutek Wilk
2014-03-31 20:33         ` Konrad Rzeszutek Wilk [this message]
2014-04-01 12:53           ` [Xen-devel] " David Vrabel
2014-04-01 12:53           ` David Vrabel
2013-12-02 11:41       ` David Vrabel
2013-11-21 17:52   ` David Vrabel
2014-01-26  1:13   ` Zhang, Yang Z
2014-01-26  1:13   ` [Xen-devel] " Zhang, Yang Z
2014-01-26  3:44     ` Konrad Rzeszutek Wilk
2014-01-26  3:44     ` [Xen-devel] " Konrad Rzeszutek Wilk
2014-04-03 11:59 ` [PATCH] Fixes to Linux v3.13 - bugs.xenproject.org ones. (v1) David Vrabel
2014-04-03 11:59   ` David Vrabel
2014-04-03 18:07   ` Konrad Rzeszutek Wilk
2014-04-03 18:07   ` Konrad Rzeszutek Wilk

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20140331203313.GA6756@phenom.dumpdata.com \
    --to=konrad.wilk@oracle.com \
    --cc=Ian.Campbell@citrix.com \
    --cc=JBeulich@suse.com \
    --cc=boris.ostrovsky@oracle.com \
    --cc=david.vrabel@citrix.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=xen-devel@lists.xenproject.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.