* libxl: error handling before xenstored runs @ 2011-02-09 11:13 Christoph Egger 2011-02-09 14:29 ` Kamala Narasimhan 0 siblings, 1 reply; 22+ messages in thread From: Christoph Egger @ 2011-02-09 11:13 UTC (permalink / raw) To: xen-devel Hi! When I start a guest *before* xenstored runs then I get this list of error messages: libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 0: Bad file descriptor libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction failed: Bad file descriptor libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath for 1: Bad file descriptor libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1 IMO a simple message like "xenstored is not running." would be enough. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-09 11:13 libxl: error handling before xenstored runs Christoph Egger @ 2011-02-09 14:29 ` Kamala Narasimhan 2011-02-09 14:42 ` Christoph Egger 0 siblings, 1 reply; 22+ messages in thread From: Kamala Narasimhan @ 2011-02-09 14:29 UTC (permalink / raw) To: Christoph Egger; +Cc: xen-devel Christoph Egger wrote: > Hi! > > When I start a guest *before* xenstored runs > then I get this list of error messages: > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 0: Bad file descriptor > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > failed: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 0: Bad file descriptor > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > failed: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 0: Bad file descriptor > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > failed: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 0: Bad file descriptor > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > failed: Bad file descriptor > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath > for 1: Bad file descriptor > libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1 > > > IMO a simple message like "xenstored is not running." would be enough. > xl now has a check and newer versions of the toolstack should display similar message when you invoke an xl command. Is it possible you are using a slightly older version of the toolstack or directly invoking libxl library elsewhere? Kamala ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-09 14:29 ` Kamala Narasimhan @ 2011-02-09 14:42 ` Christoph Egger 2011-02-09 14:46 ` Kamala Narasimhan 0 siblings, 1 reply; 22+ messages in thread From: Christoph Egger @ 2011-02-09 14:42 UTC (permalink / raw) To: xen-devel; +Cc: Kamala Narasimhan On Wednesday 09 February 2011 15:29:55 Kamala Narasimhan wrote: > Christoph Egger wrote: > > Hi! > > > > When I start a guest *before* xenstored runs > > then I get this list of error messages: > > > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 0: Bad file descriptor > > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > > failed: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 0: Bad file descriptor > > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > > failed: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 0: Bad file descriptor > > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > > failed: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 0: Bad file descriptor > > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction > > failed: Bad file descriptor > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model > > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > > dompath for 1: Bad file descriptor > > libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1 > > > > > > IMO a simple message like "xenstored is not running." would be enough. > > xl now has a check and newer versions of the toolstack should display > similar message when you invoke an xl command. Is it possible you are > using a slightly older version of the toolstack or directly invoking libxl > library elsewhere? I'm currently on c/s 22834. Which c/s added the check you are talking about? Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-09 14:42 ` Christoph Egger @ 2011-02-09 14:46 ` Kamala Narasimhan 2011-02-09 15:32 ` Christoph Egger 0 siblings, 1 reply; 22+ messages in thread From: Kamala Narasimhan @ 2011-02-09 14:46 UTC (permalink / raw) To: Christoph Egger; +Cc: xen-devel Christoph Egger wrote: > On Wednesday 09 February 2011 15:29:55 Kamala Narasimhan wrote: >> Christoph Egger wrote: >>> Hi! >>> >>> When I start a guest *before* xenstored runs >>> then I get this list of error messages: >>> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 0: Bad file descriptor >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction >>> failed: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 0: Bad file descriptor >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction >>> failed: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 0: Bad file descriptor >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction >>> failed: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 0: Bad file descriptor >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction >>> failed: Bad file descriptor >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get >>> dompath for 1: Bad file descriptor >>> libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1 >>> >>> >>> IMO a simple message like "xenstored is not running." would be enough. >> xl now has a check and newer versions of the toolstack should display >> similar message when you invoke an xl command. Is it possible you are >> using a slightly older version of the toolstack or directly invoking libxl >> library elsewhere? > > I'm currently on c/s 22834. Which c/s added the check you are talking about? > http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 Kamala ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-09 14:46 ` Kamala Narasimhan @ 2011-02-09 15:32 ` Christoph Egger 2011-02-09 15:42 ` Kamala Narasimhan 0 siblings, 1 reply; 22+ messages in thread From: Christoph Egger @ 2011-02-09 15:32 UTC (permalink / raw) To: Kamala Narasimhan; +Cc: xen-devel@lists.xensource.com On Wednesday 09 February 2011 15:46:56 Kamala Narasimhan wrote: > Christoph Egger wrote: > > On Wednesday 09 February 2011 15:29:55 Kamala Narasimhan wrote: > >> Christoph Egger wrote: > >>> Hi! > >>> > >>> When I start a guest *before* xenstored runs > >>> then I get this list of error messages: > >>> > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 0: Bad file descriptor > >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs > >>> transaction failed: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 0: Bad file descriptor > >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs > >>> transaction failed: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 0: Bad file descriptor > >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs > >>> transaction failed: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 0: Bad file descriptor > >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs > >>> transaction failed: Bad file descriptor > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model > >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get > >>> dompath for 1: Bad file descriptor > >>> libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1 > >>> > >>> > >>> IMO a simple message like "xenstored is not running." would be enough. > >> > >> xl now has a check and newer versions of the toolstack should display > >> similar message when you invoke an xl command. Is it possible you are > >> using a slightly older version of the toolstack or directly invoking > >> libxl library elsewhere? > > > > I'm currently on c/s 22834. Which c/s added the check you are talking > > about? > > http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 This is c/s 22806. So my tree is new enough. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-09 15:32 ` Christoph Egger @ 2011-02-09 15:42 ` Kamala Narasimhan 2011-02-09 15:52 ` Christoph Egger 0 siblings, 1 reply; 22+ messages in thread From: Kamala Narasimhan @ 2011-02-09 15:42 UTC (permalink / raw) To: Christoph Egger; +Cc: xen-devel@lists.xensource.com >>> I'm currently on c/s 22834. Which c/s added the check you are talking >>> about? >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > This is c/s 22806. So my tree is new enough. > Right, but did you happen to check how you got past the check done by that patch for the case in question? Kamala ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-09 15:42 ` Kamala Narasimhan @ 2011-02-09 15:52 ` Christoph Egger 2011-02-09 15:54 ` Christoph Egger 0 siblings, 1 reply; 22+ messages in thread From: Christoph Egger @ 2011-02-09 15:52 UTC (permalink / raw) To: Kamala Narasimhan; +Cc: xen-devel@lists.xensource.com On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote: > >>> I'm currently on c/s 22834. Which c/s added the check you are talking > >>> about? > >> > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > > > This is c/s 22806. So my tree is new enough. > > Right, but did you happen to check how you got past the check done by that > patch for the case in question? The pid file simply doesn't exist. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-09 15:52 ` Christoph Egger @ 2011-02-09 15:54 ` Christoph Egger 2011-02-09 17:39 ` Gianni Tedesco 0 siblings, 1 reply; 22+ messages in thread From: Christoph Egger @ 2011-02-09 15:54 UTC (permalink / raw) To: xen-devel; +Cc: Kamala Narasimhan On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote: > On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote: > > >>> I'm currently on c/s 22834. Which c/s added the check you are talking > > >>> about? > > >> > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > > > > > This is c/s 22806. So my tree is new enough. > > > > Right, but did you happen to check how you got past the check done by > > that patch for the case in question? > > The pid file simply doesn't exist. Oh wait. Hit the 'send' button too fast. The pid file does exist from previous boot. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-09 15:54 ` Christoph Egger @ 2011-02-09 17:39 ` Gianni Tedesco 2011-02-10 8:55 ` Ian Campbell 0 siblings, 1 reply; 22+ messages in thread From: Gianni Tedesco @ 2011-02-09 17:39 UTC (permalink / raw) To: Christoph Egger; +Cc: Narasimhan, xen-devel@lists.xensource.com, Kamala On Wed, 2011-02-09 at 15:54 +0000, Christoph Egger wrote: > On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote: > > On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote: > > > >>> I'm currently on c/s 22834. Which c/s added the check you are talking > > > >>> about? > > > >> > > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > > > > > > > This is c/s 22806. So my tree is new enough. > > > > > > Right, but did you happen to check how you got past the check done by > > > that patch for the case in question? > > > > The pid file simply doesn't exist. > > Oh wait. Hit the 'send' button too fast. > > The pid file does exist from previous boot. Bleh, precisely my problem with these heuristic checks. It's worse on my box because if this happens I end up with unkillable xl processes due to libxenstore wanting to open /dev/xen/xenbus or whatever it is. Gianni ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-09 17:39 ` Gianni Tedesco @ 2011-02-10 8:55 ` Ian Campbell 2011-02-10 9:26 ` Vincent Hanquez 2011-02-10 18:30 ` Gianni Tedesco 0 siblings, 2 replies; 22+ messages in thread From: Ian Campbell @ 2011-02-10 8:55 UTC (permalink / raw) To: Gianni Tedesco Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com On Wed, 2011-02-09 at 17:39 +0000, Gianni Tedesco wrote: > On Wed, 2011-02-09 at 15:54 +0000, Christoph Egger wrote: > > On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote: > > > On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote: > > > > >>> I'm currently on c/s 22834. Which c/s added the check you are talking > > > > >>> about? > > > > >> > > > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > > > > > > > > > This is c/s 22806. So my tree is new enough. > > > > > > > > Right, but did you happen to check how you got past the check done by > > > > that patch for the case in question? > > > > > > The pid file simply doesn't exist. > > > > Oh wait. Hit the 'send' button too fast. > > > > The pid file does exist from previous boot. > > Bleh, precisely my problem with these heuristic checks. It's worse on my > box because if this happens I end up with unkillable xl processes due to > libxenstore wanting to open /dev/xen/xenbus or whatever it is. That's the underlying bug which the heuristic is trying to avoid... Fundamentally the xs ring protocol is missing any way to tell if someone is listening on the other end so you have no choice but to try communicating and see if anyone responds. It's a pretty straightforward bug that the kernel does the waiting to see if anyone responds bit with an uninterruptible sleep. I took a quick look a little while ago but unfortunately it didn't look straightforward to fix on the kernel side :-( I can't remember why though. It might be simpler to support allowing the userspace client to explicitly specify a timeout. I'm not sure what the impact on the ring is of leaving unconsumed requests on the ring when the other end does show up. Presumably the kernel driver just needs to be prepared to swallow responses whose target has given up and gone home. Maybe we should add an explicit ping/pong ring message to the xs ring protocol? Ian. > > Gianni > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 8:55 ` Ian Campbell @ 2011-02-10 9:26 ` Vincent Hanquez 2011-02-10 11:24 ` Ian Campbell 2011-02-10 18:30 ` Gianni Tedesco 1 sibling, 1 reply; 22+ messages in thread From: Vincent Hanquez @ 2011-02-10 9:26 UTC (permalink / raw) To: Ian Campbell Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com, Gianni Tedesco On 10/02/11 08:55, Ian Campbell wrote: > That's the underlying bug which the heuristic is trying to avoid... > > Fundamentally the xs ring protocol is missing any way to tell if someone > is listening on the other end so you have no choice but to try > communicating and see if anyone responds. > > It's a pretty straightforward bug that the kernel does the waiting to > see if anyone responds bit with an uninterruptible sleep. I took a quick > look a little while ago but unfortunately it didn't look straightforward > to fix on the kernel side :-( I can't remember why though. For starter, the protocol requires the messages to sit on the ring for a underdetermined amount of time (boot watches). > It might be simpler to support allowing the userspace client to > explicitly specify a timeout. I'm not sure what the impact on the ring > is of leaving unconsumed requests on the ring when the other end does > show up. Presumably the kernel driver just needs to be prepared to > swallow responses whose target has given up and gone home. No, the simplest thing to do is to use the socket connection exclusively. Just how we're doing it in XCP and XCI. The protocol is not design to do async either, so leaving unconsumed request, could be pretty disastrous if the other end show up. Providing the kernel doesn't detect it (i don't think it does [1]), it would imply spurious reply, for example the previous waiting read on "/abc/def" could reply to a next read on "/xyz/123". > Maybe we should add an explicit ping/pong ring message to the xs ring > protocol? And who's going to reply to this if xenstored is missing ? you would require the kernel to introspect the messages and reply by itself. [1] the kernel would be happy to read the previous reply on the ring after xenstored has put the actual reply after it, and trigger the eventchn. (the kernel could actually check the requestid and see if they match, but it doesn't.) -- Vincent ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 9:26 ` Vincent Hanquez @ 2011-02-10 11:24 ` Ian Campbell 2011-02-10 11:32 ` Christoph Egger 2011-02-10 21:55 ` Vincent Hanquez 0 siblings, 2 replies; 22+ messages in thread From: Ian Campbell @ 2011-02-10 11:24 UTC (permalink / raw) To: Vincent Hanquez Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com, Gianni Tedesco On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: > On 10/02/11 08:55, Ian Campbell wrote: > > That's the underlying bug which the heuristic is trying to avoid... > > > > Fundamentally the xs ring protocol is missing any way to tell if someone > > is listening on the other end so you have no choice but to try > > communicating and see if anyone responds. > > > > It's a pretty straightforward bug that the kernel does the waiting to > > see if anyone responds bit with an uninterruptible sleep. I took a quick > > look a little while ago but unfortunately it didn't look straightforward > > to fix on the kernel side :-( I can't remember why though. > > For starter, the protocol requires the messages to sit on the ring for a > underdetermined amount of time (boot watches). > > > It might be simpler to support allowing the userspace client to > > explicitly specify a timeout. I'm not sure what the impact on the ring > > is of leaving unconsumed requests on the ring when the other end does > > show up. Presumably the kernel driver just needs to be prepared to > > swallow responses whose target has given up and gone home. > > No, the simplest thing to do is to use the socket connection > exclusively. Just how we're doing it in XCP and XCI. Right but this approach doesn't work with xenstored in a stubdomain. Part of the point of using the ring protocol even when this isn't the case is to help ensure that it is possible and help avoid regressions etc. > The protocol is not design to do async either, so leaving unconsumed > request, could be pretty disastrous if the other end show up. Providing > the kernel doesn't detect it (i don't think it does [1]), it would imply > spurious reply, for example the previous waiting read on "/abc/def" > could reply to a next read on "/xyz/123". The wire protocol includes a req_id which is echoed in the response which sh/could facilitate multiplexing this sort of thing. The pvops kernel currently always sets it to zero but that's just an implementation detail ;-) Currently the kernel does (roughly): take_lock write_request wait_for_reply release_lock instead it should/could be: take_lock(timeout) write_request (++req_id) while read_reply.req_id != req_id && not (timeout) wait some more release lock OK, so may be this is not in the "might be simpler" bucket any more, but it sounds like plausibly the right direction to take. Properly handling multiple userspace clients asynchronously a demuxes the responses etc would be even better but I don't think necessary to solve this particular issue. > > Maybe we should add an explicit ping/pong ring message to the xs ring > > protocol? > > And who's going to reply to this if xenstored is missing ? you would > require the kernel to introspect the messages and reply by itself. The reason I suggested new messages was that I would solve that by declaring that these new messages have whatever magic semantics I need to make this work ;-) Ian. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 11:24 ` Ian Campbell @ 2011-02-10 11:32 ` Christoph Egger 2011-02-10 11:43 ` Ian Campbell 2011-02-10 21:55 ` Vincent Hanquez 1 sibling, 1 reply; 22+ messages in thread From: Christoph Egger @ 2011-02-10 11:32 UTC (permalink / raw) To: Ian Campbell Cc: Kamala Narasimhan, xen-devel@lists.xensource.com, Vincent Hanquez, Gianni Tedesco On Thursday 10 February 2011 12:24:41 Ian Campbell wrote: > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: > > On 10/02/11 08:55, Ian Campbell wrote: > > > That's the underlying bug which the heuristic is trying to avoid... > > > > > > Fundamentally the xs ring protocol is missing any way to tell if > > > someone is listening on the other end so you have no choice but to try > > > communicating and see if anyone responds. > > > > > > It's a pretty straightforward bug that the kernel does the waiting to > > > see if anyone responds bit with an uninterruptible sleep. I took a > > > quick look a little while ago but unfortunately it didn't look > > > straightforward to fix on the kernel side :-( I can't remember why > > > though. > > > > For starter, the protocol requires the messages to sit on the ring for a > > underdetermined amount of time (boot watches). > > > > > It might be simpler to support allowing the userspace client to > > > explicitly specify a timeout. I'm not sure what the impact on the ring > > > is of leaving unconsumed requests on the ring when the other end does > > > show up. Presumably the kernel driver just needs to be prepared to > > > swallow responses whose target has given up and gone home. > > > > No, the simplest thing to do is to use the socket connection > > exclusively. Just how we're doing it in XCP and XCI. > > Right but this approach doesn't work with xenstored in a stubdomain. > Part of the point of using the ring protocol even when this isn't the > case is to help ensure that it is possible and help avoid regressions > etc. > > > The protocol is not design to do async either, so leaving unconsumed > > request, could be pretty disastrous if the other end show up. Providing > > the kernel doesn't detect it (i don't think it does [1]), it would imply > > spurious reply, for example the previous waiting read on "/abc/def" > > could reply to a next read on "/xyz/123". > > The wire protocol includes a req_id which is echoed in the response > which sh/could facilitate multiplexing this sort of thing. The pvops > kernel currently always sets it to zero but that's just an > implementation detail ;-) Currently the kernel does (roughly): > take_lock > write_request > wait_for_reply > release_lock > instead it should/could be: > take_lock(timeout) > write_request (++req_id) > while read_reply.req_id != req_id && not (timeout) > wait some more > release lock I prefer a userland solution. Fixing Linux Dom0 doesn't help NetBSD Dom0. Christoph > OK, so may be this is not in the "might be simpler" bucket any more, but > it sounds like plausibly the right direction to take. > > Properly handling multiple userspace clients asynchronously a demuxes > the responses etc would be even better but I don't think necessary to > solve this particular issue. > > > > Maybe we should add an explicit ping/pong ring message to the xs ring > > > protocol? > > > > And who's going to reply to this if xenstored is missing ? you would > > require the kernel to introspect the messages and reply by itself. > > The reason I suggested new messages was that I would solve that by > declaring that these new messages have whatever magic semantics I need > to make this work ;-) > > Ian. -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 11:32 ` Christoph Egger @ 2011-02-10 11:43 ` Ian Campbell 2011-02-10 12:23 ` Christoph Egger 0 siblings, 1 reply; 22+ messages in thread From: Ian Campbell @ 2011-02-10 11:43 UTC (permalink / raw) To: Christoph Egger Cc: Kamala Narasimhan, xen-devel@lists.xensource.com, Vincent Hanquez, Gianni Tedesco On Thu, 2011-02-10 at 11:32 +0000, Christoph Egger wrote: > On Thursday 10 February 2011 12:24:41 Ian Campbell wrote: > > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: > > > On 10/02/11 08:55, Ian Campbell wrote: > > > > That's the underlying bug which the heuristic is trying to avoid... > > > > > > > > Fundamentally the xs ring protocol is missing any way to tell if > > > > someone is listening on the other end so you have no choice but to try > > > > communicating and see if anyone responds. > > > > > > > > It's a pretty straightforward bug that the kernel does the waiting to > > > > see if anyone responds bit with an uninterruptible sleep. I took a > > > > quick look a little while ago but unfortunately it didn't look > > > > straightforward to fix on the kernel side :-( I can't remember why > > > > though. > > > > > > For starter, the protocol requires the messages to sit on the ring for a > > > underdetermined amount of time (boot watches). > > > > > > > It might be simpler to support allowing the userspace client to > > > > explicitly specify a timeout. I'm not sure what the impact on the ring > > > > is of leaving unconsumed requests on the ring when the other end does > > > > show up. Presumably the kernel driver just needs to be prepared to > > > > swallow responses whose target has given up and gone home. > > > > > > No, the simplest thing to do is to use the socket connection > > > exclusively. Just how we're doing it in XCP and XCI. > > > > Right but this approach doesn't work with xenstored in a stubdomain. > > Part of the point of using the ring protocol even when this isn't the > > case is to help ensure that it is possible and help avoid regressions > > etc. > > > > > The protocol is not design to do async either, so leaving unconsumed > > > request, could be pretty disastrous if the other end show up. Providing > > > the kernel doesn't detect it (i don't think it does [1]), it would imply > > > spurious reply, for example the previous waiting read on "/abc/def" > > > could reply to a next read on "/xyz/123". > > > > The wire protocol includes a req_id which is echoed in the response > > which sh/could facilitate multiplexing this sort of thing. The pvops > > kernel currently always sets it to zero but that's just an > > implementation detail ;-) Currently the kernel does (roughly): > > take_lock > > write_request > > wait_for_reply > > release_lock > > instead it should/could be: > > take_lock(timeout) > > write_request (++req_id) > > while read_reply.req_id != req_id && not (timeout) > > wait some more > > release lock > > I prefer a userland solution. Fixing Linux Dom0 doesn't help NetBSD Dom0. Fixing the NetBSD dom0 does though. Seriously, if kernels are lacking in functionality needed to make the system work smoothly and correctly we should fix them, not just default to adding hacks in userspace because it seems easier in the short term. (Obviously if the userspace solution is the right thing to do and/or more correct in its own right then fine lets do that). Ian. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 11:43 ` Ian Campbell @ 2011-02-10 12:23 ` Christoph Egger 2011-02-10 12:42 ` Ian Jackson 0 siblings, 1 reply; 22+ messages in thread From: Christoph Egger @ 2011-02-10 12:23 UTC (permalink / raw) To: Ian Campbell Cc: Kamala Narasimhan, xen-devel@lists.xensource.com, Vincent Hanquez, Gianni Tedesco On Thursday 10 February 2011 12:43:47 Ian Campbell wrote: > On Thu, 2011-02-10 at 11:32 +0000, Christoph Egger wrote: > > On Thursday 10 February 2011 12:24:41 Ian Campbell wrote: > > > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: > > > > On 10/02/11 08:55, Ian Campbell wrote: > > > > > That's the underlying bug which the heuristic is trying to avoid... > > > > > > > > > > Fundamentally the xs ring protocol is missing any way to tell if > > > > > someone is listening on the other end so you have no choice but to > > > > > try communicating and see if anyone responds. > > > > > > > > > > It's a pretty straightforward bug that the kernel does the waiting > > > > > to see if anyone responds bit with an uninterruptible sleep. I took > > > > > a quick look a little while ago but unfortunately it didn't look > > > > > straightforward to fix on the kernel side :-( I can't remember why > > > > > though. > > > > > > > > For starter, the protocol requires the messages to sit on the ring > > > > for a underdetermined amount of time (boot watches). > > > > > > > > > It might be simpler to support allowing the userspace client to > > > > > explicitly specify a timeout. I'm not sure what the impact on the > > > > > ring is of leaving unconsumed requests on the ring when the other > > > > > end does show up. Presumably the kernel driver just needs to be > > > > > prepared to swallow responses whose target has given up and gone > > > > > home. > > > > > > > > No, the simplest thing to do is to use the socket connection > > > > exclusively. Just how we're doing it in XCP and XCI. > > > > > > Right but this approach doesn't work with xenstored in a stubdomain. > > > Part of the point of using the ring protocol even when this isn't the > > > case is to help ensure that it is possible and help avoid regressions > > > etc. > > > > > > > The protocol is not design to do async either, so leaving unconsumed > > > > request, could be pretty disastrous if the other end show up. > > > > Providing the kernel doesn't detect it (i don't think it does [1]), > > > > it would imply spurious reply, for example the previous waiting read > > > > on "/abc/def" could reply to a next read on "/xyz/123". > > > > > > The wire protocol includes a req_id which is echoed in the response > > > which sh/could facilitate multiplexing this sort of thing. The pvops > > > kernel currently always sets it to zero but that's just an > > > implementation detail ;-) Currently the kernel does (roughly): > > > take_lock > > > write_request > > > wait_for_reply > > > release_lock > > > instead it should/could be: > > > take_lock(timeout) > > > write_request (++req_id) > > > while read_reply.req_id != req_id && not (timeout) > > > wait some more > > > release lock > > > > I prefer a userland solution. Fixing Linux Dom0 doesn't help NetBSD Dom0. > > Fixing the NetBSD dom0 does though. > > Seriously, if kernels are lacking in functionality needed to make the > system work smoothly and correctly we should fix them, not just default > to adding hacks in userspace because it seems easier in the short term. > (Obviously if the userspace solution is the right thing to do and/or > more correct in its own right then fine lets do that). Does xl communicate with xenstored through a named socket ? If yes then 'connect()' should check for ECONNREFUSED. Christoph -- ---to satisfy European Law for business letters: Advanced Micro Devices GmbH Einsteinring 24, 85609 Dornach b. Muenchen Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen Registergericht Muenchen, HRB Nr. 43632 ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 12:23 ` Christoph Egger @ 2011-02-10 12:42 ` Ian Jackson 0 siblings, 0 replies; 22+ messages in thread From: Ian Jackson @ 2011-02-10 12:42 UTC (permalink / raw) To: Christoph Egger Cc: Ian Campbell, Kamala Narasimhan, xen-devel@lists.xensource.com, Vincent Hanquez, Gianni Tedesco Christoph Egger writes ("Re: [Xen-devel] libxl: error handling before xenstored runs"): > Does xl communicate with xenstored through a named socket ? Sometimes; or it can use the shared ring. Please see the earlier parts of the thread where Kamala explained why she wants it to use the ring in the usual case. I'm not sure that's necessary, but _some_ arrangement for making it work with the ring _is_ necessary. Ian. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 11:24 ` Ian Campbell 2011-02-10 11:32 ` Christoph Egger @ 2011-02-10 21:55 ` Vincent Hanquez 2011-02-11 8:03 ` Ian Campbell 2011-02-11 9:49 ` Tim Deegan 1 sibling, 2 replies; 22+ messages in thread From: Vincent Hanquez @ 2011-02-10 21:55 UTC (permalink / raw) To: Ian Campbell Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com, Gianni Tedesco On 10/02/11 11:24, Ian Campbell wrote: > Right but this approach doesn't work with xenstored in a stubdomain. yeah I know. xenstored in a stubdom is just an experiment, when it become a serious feature, this argument would hold. however it's not going to be use in 4.1, and in any production settings. > Part of the point of using the ring protocol even when this isn't the > case is to help ensure that it is possible and help avoid regressions > etc. > >> The protocol is not design to do async either, so leaving unconsumed >> request, could be pretty disastrous if the other end show up. Providing >> the kernel doesn't detect it (i don't think it does [1]), it would imply >> spurious reply, for example the previous waiting read on "/abc/def" >> could reply to a next read on "/xyz/123". > > The wire protocol includes a req_id which is echoed in the response > which sh/could facilitate multiplexing this sort of thing. The pvops > kernel currently always sets it to zero but that's just an > implementation detail ;-) Currently the kernel does (roughly): The kernel is not the one exclusively setting the rid. this is a client initialized value. any xs implementation can use it any way they want (including the kernel implementation). Turns out that most of the implementations are actually putting rid to 0 anyway (the ocaml and C implementation are, the windows one isn't). Even then, if you could initialize it to some value, what value is that going to be ? there's just no way to know if someone else is not using this rid already globally (since the ring is a global OS thing). Which basically would means tracking pid (the kernel meaning) along with the rid ? >>> Maybe we should add an explicit ping/pong ring message to the xs ring >>> protocol? >> >> And who's going to reply to this if xenstored is missing ? you would >> require the kernel to introspect the messages and reply by itself. > > The reason I suggested new messages was that I would solve that by > declaring that these new messages have whatever magic semantics I need > to make this work ;-) ah right, the famous DeusExMachina message type then :-) -- Vincent ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 21:55 ` Vincent Hanquez @ 2011-02-11 8:03 ` Ian Campbell 2011-02-11 9:49 ` Tim Deegan 1 sibling, 0 replies; 22+ messages in thread From: Ian Campbell @ 2011-02-11 8:03 UTC (permalink / raw) To: Vincent Hanquez Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com, Gianni Tedesco On Thu, 2011-02-10 at 21:55 +0000, Vincent Hanquez wrote: > On 10/02/11 11:24, Ian Campbell wrote: > > Right but this approach doesn't work with xenstored in a stubdomain. > > yeah I know. xenstored in a stubdom is just an experiment, when it > become a serious feature, this argument would hold. however it's not > going to be use in 4.1, and in any production settings. Accepted. As I understand it people are actively using stub-xenstored in disaggregation research today. Reinvigorating the stub domain approach for Xen is also (going to be) one of our GSoC proposals this year. > > Part of the point of using the ring protocol even when this isn't the > > case is to help ensure that it is possible and help avoid regressions > > etc. > > > >> The protocol is not design to do async either, so leaving unconsumed > >> request, could be pretty disastrous if the other end show up. Providing > >> the kernel doesn't detect it (i don't think it does [1]), it would imply > >> spurious reply, for example the previous waiting read on "/abc/def" > >> could reply to a next read on "/xyz/123". > > > > The wire protocol includes a req_id which is echoed in the response > > which sh/could facilitate multiplexing this sort of thing. The pvops > > kernel currently always sets it to zero but that's just an > > implementation detail ;-) Currently the kernel does (roughly): > > The kernel is not the one exclusively setting the rid. this is a client > initialized value. any xs implementation can use it any way they want > (including the kernel implementation). > > Turns out that most of the implementations are actually putting rid to 0 > anyway (the ocaml and C implementation are, the windows one isn't). > > Even then, if you could initialize it to some value, what value is that > going to be ? there's just no way to know if someone else is not using > this rid already globally (since the ring is a global OS thing). Which > basically would means tracking pid (the kernel meaning) along with the rid ? Since the kernel mediates all access to the actual ring it can handle the req_id with a single incrementing integer and fake out whatever is necessary to its users. It's trivial to fixup the in-kernel xs users. Most likely it only involves changing a single function in the core xs kernel code which everyone else must use anyway. For userspace users it's a little trickier but the kernel just needs to remember the userspace supplied req_id before inserting its own and to reverse the substitution in the reply. If you were to support multiple outstanding active requests then it would be natural to stash the id in whatever data structure you were using for that. If you only want to simplify by only supporting a single active request (by throwing away responses to aborted/timed out requests as I suggested earlier) you only need to remember the user provided req_id for the one request which is trivial. AFAIK we don't have any kernel code which does clever things such as using a pointer to a datastructure as the req_id but if we did (and we were unwilling to simply change it) then the userspace solution would work there too. > >>> Maybe we should add an explicit ping/pong ring message to the xs ring > >>> protocol? > >> > >> And who's going to reply to this if xenstored is missing ? you would > >> require the kernel to introspect the messages and reply by itself. > > > > The reason I suggested new messages was that I would solve that by > > declaring that these new messages have whatever magic semantics I need > > to make this work ;-) > > ah right, the famous DeusExMachina message type then :-) :-) Ian ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 21:55 ` Vincent Hanquez 2011-02-11 8:03 ` Ian Campbell @ 2011-02-11 9:49 ` Tim Deegan 2011-02-11 11:16 ` Vincent Hanquez 1 sibling, 1 reply; 22+ messages in thread From: Tim Deegan @ 2011-02-11 9:49 UTC (permalink / raw) To: Vincent Hanquez Cc: Christoph Egger, xen-devel@lists.xensource.com, Kamala Narasimhan, Ian Campbell, Gianni, Tedesco At 21:55 +0000 on 10 Feb (1297374910), Vincent Hanquez wrote: > On 10/02/11 11:24, Ian Campbell wrote: > > Right but this approach doesn't work with xenstored in a stubdomain. > > yeah I know. xenstored in a stubdom is just an experiment, when it > become a serious feature, this argument would hold. however it's not > going to be use in 4.1, and in any production settings. You seem to be arguing that we shouldn't fix a bug in the kernel. I don't understand that. How is it going to become a "serious feature" if we don't fix the bugs that affect it? In any case, I can think of three projects off the top of my head that are using stub domains aggressively, one of which I know is using xenstore stub domains in particular. I'm sure there are others. Tim. -- Tim Deegan <Tim.Deegan@citrix.com> Principal Software Engineer, Xen Platform Team Citrix Systems UK Ltd. (Company #02937203, SL9 0BG) ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-11 9:49 ` Tim Deegan @ 2011-02-11 11:16 ` Vincent Hanquez 0 siblings, 0 replies; 22+ messages in thread From: Vincent Hanquez @ 2011-02-11 11:16 UTC (permalink / raw) To: Tim Deegan Cc: Christoph Egger, xen-devel@lists.xensource.com, Kamala Narasimhan, Ian Campbell, Gianni, Tedesco On 11/02/11 09:49, Tim Deegan wrote: > At 21:55 +0000 on 10 Feb (1297374910), Vincent Hanquez wrote: >> On 10/02/11 11:24, Ian Campbell wrote: >>> Right but this approach doesn't work with xenstored in a stubdomain. >> >> yeah I know. xenstored in a stubdom is just an experiment, when it >> become a serious feature, this argument would hold. however it's not >> going to be use in 4.1, and in any production settings. > > You seem to be arguing that we shouldn't fix a bug in the kernel. I > don't understand that. How is it going to become a "serious feature" if > we don't fix the bugs that affect it? If you want to fix the behaviour which is present since xen 3.0 (you can understand why i'm not holding my breath anymore), all the best. What I'm arguing is that behaviour should not exists in the next stable version of xen, specially for a feature that is not serious yet (at least upstream). > In any case, I can think of three projects off the top of my head that > are using stub domains aggressively, one of which I know is using > xenstore stub domains in particular. I'm sure there are others. Good for them. I'm sure they can carry a 2 liner patch to connect to the ring instead of the unix socket, until they actually submit upstream their stuff that make xenstored stubdomain great. -- Vincent ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 8:55 ` Ian Campbell 2011-02-10 9:26 ` Vincent Hanquez @ 2011-02-10 18:30 ` Gianni Tedesco 2011-02-10 19:33 ` Ian Jackson 1 sibling, 1 reply; 22+ messages in thread From: Gianni Tedesco @ 2011-02-10 18:30 UTC (permalink / raw) To: Ian Campbell Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com On Thu, 2011-02-10 at 08:55 +0000, Ian Campbell wrote: > On Wed, 2011-02-09 at 17:39 +0000, Gianni Tedesco wrote: > > On Wed, 2011-02-09 at 15:54 +0000, Christoph Egger wrote: > > > On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote: > > > > On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote: > > > > > >>> I'm currently on c/s 22834. Which c/s added the check you are talking > > > > > >>> about? > > > > > >> > > > > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5 > > > > > > > > > > > > This is c/s 22806. So my tree is new enough. > > > > > > > > > > Right, but did you happen to check how you got past the check done by > > > > > that patch for the case in question? > > > > > > > > The pid file simply doesn't exist. > > > > > > Oh wait. Hit the 'send' button too fast. > > > > > > The pid file does exist from previous boot. > > > > Bleh, precisely my problem with these heuristic checks. It's worse on my > > box because if this happens I end up with unkillable xl processes due to > > libxenstore wanting to open /dev/xen/xenbus or whatever it is. > > That's the underlying bug which the heuristic is trying to avoid... > > Fundamentally the xs ring protocol is missing any way to tell if someone > is listening on the other end so you have no choice but to try > communicating and see if anyone responds. > > It's a pretty straightforward bug that the kernel does the waiting to > see if anyone responds bit with an uninterruptible sleep. I took a quick > look a little while ago but unfortunately it didn't look straightforward > to fix on the kernel side :-( I can't remember why though. I suppose it's because we don't want to be killable after sending the message but before receiving the reply, since the ring is going to get jammed up due to nobody consuming the reply. The reply that in this case never comes, but the kernel can't know that it won't eventually come, right? Gianni ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: libxl: error handling before xenstored runs 2011-02-10 18:30 ` Gianni Tedesco @ 2011-02-10 19:33 ` Ian Jackson 0 siblings, 0 replies; 22+ messages in thread From: Ian Jackson @ 2011-02-10 19:33 UTC (permalink / raw) To: Gianni Tedesco Cc: Ian Campbell, Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com Gianni Tedesco writes ("Re: [Xen-devel] libxl: error handling before xenstored runs"): > I suppose it's because we don't want to be killable after sending the > message but before receiving the reply, since the ring is going to get > jammed up due to nobody consuming the reply. The reply that in this case > never comes, but the kernel can't know that it won't eventually come, > right? The bookkeeping for the fact that there is a command outstanding should take care of that problem - when matching up the replies with requests it will find that the caller for the request has gone away and discard the reply. (Obviously it doesn't atm because the whole thing is synchronous.) Ian. ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2011-02-11 11:16 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2011-02-09 11:13 libxl: error handling before xenstored runs Christoph Egger 2011-02-09 14:29 ` Kamala Narasimhan 2011-02-09 14:42 ` Christoph Egger 2011-02-09 14:46 ` Kamala Narasimhan 2011-02-09 15:32 ` Christoph Egger 2011-02-09 15:42 ` Kamala Narasimhan 2011-02-09 15:52 ` Christoph Egger 2011-02-09 15:54 ` Christoph Egger 2011-02-09 17:39 ` Gianni Tedesco 2011-02-10 8:55 ` Ian Campbell 2011-02-10 9:26 ` Vincent Hanquez 2011-02-10 11:24 ` Ian Campbell 2011-02-10 11:32 ` Christoph Egger 2011-02-10 11:43 ` Ian Campbell 2011-02-10 12:23 ` Christoph Egger 2011-02-10 12:42 ` Ian Jackson 2011-02-10 21:55 ` Vincent Hanquez 2011-02-11 8:03 ` Ian Campbell 2011-02-11 9:49 ` Tim Deegan 2011-02-11 11:16 ` Vincent Hanquez 2011-02-10 18:30 ` Gianni Tedesco 2011-02-10 19:33 ` Ian Jackson
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.