libxl: error handling before xenstored runs

All of lore.kernel.org
 help / color / mirror / Atom feed

* libxl: error handling before xenstored runs
@ 2011-02-09 11:13 Christoph Egger
  2011-02-09 14:29 ` Kamala Narasimhan
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Egger @ 2011-02-09 11:13 UTC (permalink / raw)
  To: xen-devel


Hi!

When I start a guest *before* xenstored runs
then I get this list of error messages:

libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
for 1: Bad file descriptor
libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
for 0: Bad file descriptor
libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction 
failed: Bad file descriptor
libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
for 1: Bad file descriptor
libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
for 0: Bad file descriptor
libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction 
failed: Bad file descriptor
libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
for 1: Bad file descriptor
libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
for 0: Bad file descriptor
libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction 
failed: Bad file descriptor
libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
for 1: Bad file descriptor
libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
for 0: Bad file descriptor
libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction 
failed: Bad file descriptor
libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
for 1: Bad file descriptor
xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model
libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
for 1: Bad file descriptor
libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1


IMO a simple message like "xenstored is not running." would be enough.

Christoph

-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-09 11:13 libxl: error handling before xenstored runs Christoph Egger
@ 2011-02-09 14:29 ` Kamala Narasimhan
  2011-02-09 14:42   ` Christoph Egger
  0 siblings, 1 reply; 22+ messages in thread
From: Kamala Narasimhan @ 2011-02-09 14:29 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel

Christoph Egger wrote:
> Hi!
> 
> When I start a guest *before* xenstored runs
> then I get this list of error messages:
> 
> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
> for 1: Bad file descriptor
> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
> for 0: Bad file descriptor
> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction 
> failed: Bad file descriptor
> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
> for 1: Bad file descriptor
> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
> for 0: Bad file descriptor
> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction 
> failed: Bad file descriptor
> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
> for 1: Bad file descriptor
> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
> for 0: Bad file descriptor
> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction 
> failed: Bad file descriptor
> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
> for 1: Bad file descriptor
> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
> for 0: Bad file descriptor
> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction 
> failed: Bad file descriptor
> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
> for 1: Bad file descriptor
> xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model
> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get dompath 
> for 1: Bad file descriptor
> libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1
> 
> 
> IMO a simple message like "xenstored is not running." would be enough.
> 

xl now has a check and newer versions of the toolstack should display similar
message when you invoke an xl command.  Is it possible you are using a slightly
older version of the toolstack or directly invoking libxl library elsewhere?

Kamala

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-09 14:29 ` Kamala Narasimhan
@ 2011-02-09 14:42   ` Christoph Egger
  2011-02-09 14:46     ` Kamala Narasimhan
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Egger @ 2011-02-09 14:42 UTC (permalink / raw)
  To: xen-devel; +Cc: Kamala Narasimhan

On Wednesday 09 February 2011 15:29:55 Kamala Narasimhan wrote:
> Christoph Egger wrote:
> > Hi!
> >
> > When I start a guest *before* xenstored runs
> > then I get this list of error messages:
> >
> > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> > dompath for 1: Bad file descriptor
> > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> > dompath for 0: Bad file descriptor
> > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction
> > failed: Bad file descriptor
> > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> > dompath for 1: Bad file descriptor
> > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> > dompath for 0: Bad file descriptor
> > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction
> > failed: Bad file descriptor
> > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> > dompath for 1: Bad file descriptor
> > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> > dompath for 0: Bad file descriptor
> > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction
> > failed: Bad file descriptor
> > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> > dompath for 1: Bad file descriptor
> > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> > dompath for 0: Bad file descriptor
> > libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction
> > failed: Bad file descriptor
> > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> > dompath for 1: Bad file descriptor
> > xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model
> > libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> > dompath for 1: Bad file descriptor
> > libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1
> >
> >
> > IMO a simple message like "xenstored is not running." would be enough.
>
> xl now has a check and newer versions of the toolstack should display
> similar message when you invoke an xl command.  Is it possible you are
> using a slightly older version of the toolstack or directly invoking libxl
> library elsewhere?

I'm currently on c/s 22834. Which c/s added the check you are talking about?

Christoph



-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-09 14:42   ` Christoph Egger
@ 2011-02-09 14:46     ` Kamala Narasimhan
  2011-02-09 15:32       ` Christoph Egger
  0 siblings, 1 reply; 22+ messages in thread
From: Kamala Narasimhan @ 2011-02-09 14:46 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel

Christoph Egger wrote:
> On Wednesday 09 February 2011 15:29:55 Kamala Narasimhan wrote:
>> Christoph Egger wrote:
>>> Hi!
>>>
>>> When I start a guest *before* xenstored runs
>>> then I get this list of error messages:
>>>
>>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
>>> dompath for 1: Bad file descriptor
>>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
>>> dompath for 0: Bad file descriptor
>>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction
>>> failed: Bad file descriptor
>>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
>>> dompath for 1: Bad file descriptor
>>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
>>> dompath for 0: Bad file descriptor
>>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction
>>> failed: Bad file descriptor
>>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
>>> dompath for 1: Bad file descriptor
>>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
>>> dompath for 0: Bad file descriptor
>>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction
>>> failed: Bad file descriptor
>>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
>>> dompath for 1: Bad file descriptor
>>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
>>> dompath for 0: Bad file descriptor
>>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs transaction
>>> failed: Bad file descriptor
>>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
>>> dompath for 1: Bad file descriptor
>>> xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model
>>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
>>> dompath for 1: Bad file descriptor
>>> libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1
>>>
>>>
>>> IMO a simple message like "xenstored is not running." would be enough.
>> xl now has a check and newer versions of the toolstack should display
>> similar message when you invoke an xl command.  Is it possible you are
>> using a slightly older version of the toolstack or directly invoking libxl
>> library elsewhere?
> 
> I'm currently on c/s 22834. Which c/s added the check you are talking about?
> 

http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5

Kamala

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-09 14:46     ` Kamala Narasimhan
@ 2011-02-09 15:32       ` Christoph Egger
  2011-02-09 15:42         ` Kamala Narasimhan
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Egger @ 2011-02-09 15:32 UTC (permalink / raw)
  To: Kamala Narasimhan; +Cc: xen-devel@lists.xensource.com

On Wednesday 09 February 2011 15:46:56 Kamala Narasimhan wrote:
> Christoph Egger wrote:
> > On Wednesday 09 February 2011 15:29:55 Kamala Narasimhan wrote:
> >> Christoph Egger wrote:
> >>> Hi!
> >>>
> >>> When I start a guest *before* xenstored runs
> >>> then I get this list of error messages:
> >>>
> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> >>> dompath for 1: Bad file descriptor
> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> >>> dompath for 0: Bad file descriptor
> >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs
> >>> transaction failed: Bad file descriptor
> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> >>> dompath for 1: Bad file descriptor
> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> >>> dompath for 0: Bad file descriptor
> >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs
> >>> transaction failed: Bad file descriptor
> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> >>> dompath for 1: Bad file descriptor
> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> >>> dompath for 0: Bad file descriptor
> >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs
> >>> transaction failed: Bad file descriptor
> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> >>> dompath for 1: Bad file descriptor
> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> >>> dompath for 0: Bad file descriptor
> >>> libxl: error: libxl_device.c:116:libxl__device_generic_add xs
> >>> transaction failed: Bad file descriptor
> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> >>> dompath for 1: Bad file descriptor
> >>> xl: fatal error: libxl_create.c:487, rc=-3: libxl__create_device_model
> >>> libxl: error: libxl_xshelp.c:109:libxl__xs_get_dompath failed to get
> >>> dompath for 1: Bad file descriptor
> >>> libxl: error: libxl.c:675:libxl_domain_destroy non-existant domain -1
> >>>
> >>>
> >>> IMO a simple message like "xenstored is not running." would be enough.
> >>
> >> xl now has a check and newer versions of the toolstack should display
> >> similar message when you invoke an xl command.  Is it possible you are
> >> using a slightly older version of the toolstack or directly invoking
> >> libxl library elsewhere?
> >
> > I'm currently on c/s 22834. Which c/s added the check you are talking
> > about?
>
> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5

This is c/s 22806. So my tree is new enough.

Christoph



-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-09 15:32       ` Christoph Egger
@ 2011-02-09 15:42         ` Kamala Narasimhan
  2011-02-09 15:52           ` Christoph Egger
  0 siblings, 1 reply; 22+ messages in thread
From: Kamala Narasimhan @ 2011-02-09 15:42 UTC (permalink / raw)
  To: Christoph Egger; +Cc: xen-devel@lists.xensource.com

>>> I'm currently on c/s 22834. Which c/s added the check you are talking
>>> about?
>> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5
> 
> This is c/s 22806. So my tree is new enough.
> 
Right, but did you happen to check how you got past the check done by that patch
for the case in question?

Kamala

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-09 15:42         ` Kamala Narasimhan
@ 2011-02-09 15:52           ` Christoph Egger
  2011-02-09 15:54             ` Christoph Egger
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Egger @ 2011-02-09 15:52 UTC (permalink / raw)
  To: Kamala Narasimhan; +Cc: xen-devel@lists.xensource.com

On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote:
> >>> I'm currently on c/s 22834. Which c/s added the check you are talking
> >>> about?
> >>
> >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5
> >
> > This is c/s 22806. So my tree is new enough.
>
> Right, but did you happen to check how you got past the check done by that
> patch for the case in question?

The pid file simply doesn't exist.

Christoph

-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-09 15:52           ` Christoph Egger
@ 2011-02-09 15:54             ` Christoph Egger
  2011-02-09 17:39               ` Gianni Tedesco
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Egger @ 2011-02-09 15:54 UTC (permalink / raw)
  To: xen-devel; +Cc: Kamala Narasimhan

On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote:
> On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote:
> > >>> I'm currently on c/s 22834. Which c/s added the check you are talking
> > >>> about?
> > >>
> > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5
> > >
> > > This is c/s 22806. So my tree is new enough.
> >
> > Right, but did you happen to check how you got past the check done by
> > that patch for the case in question?
>
> The pid file simply doesn't exist.

Oh wait. Hit the 'send' button too fast.

The pid file does exist from previous boot.


Christoph


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-09 15:54             ` Christoph Egger
@ 2011-02-09 17:39               ` Gianni Tedesco
  2011-02-10  8:55                 ` Ian Campbell
  0 siblings, 1 reply; 22+ messages in thread
From: Gianni Tedesco @ 2011-02-09 17:39 UTC (permalink / raw)
  To: Christoph Egger; +Cc: Narasimhan, xen-devel@lists.xensource.com, Kamala

On Wed, 2011-02-09 at 15:54 +0000, Christoph Egger wrote:
> On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote:
> > On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote:
> > > >>> I'm currently on c/s 22834. Which c/s added the check you are talking
> > > >>> about?
> > > >>
> > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5
> > > >
> > > > This is c/s 22806. So my tree is new enough.
> > >
> > > Right, but did you happen to check how you got past the check done by
> > > that patch for the case in question?
> >
> > The pid file simply doesn't exist.
> 
> Oh wait. Hit the 'send' button too fast.
> 
> The pid file does exist from previous boot.

Bleh, precisely my problem with these heuristic checks. It's worse on my
box because if this happens I end up with unkillable xl processes due to
libxenstore wanting to open /dev/xen/xenbus or whatever it is.

Gianni 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-09 17:39               ` Gianni Tedesco
@ 2011-02-10  8:55                 ` Ian Campbell
  2011-02-10  9:26                   ` Vincent Hanquez
  2011-02-10 18:30                   ` Gianni Tedesco
  0 siblings, 2 replies; 22+ messages in thread
From: Ian Campbell @ 2011-02-10  8:55 UTC (permalink / raw)
  To: Gianni Tedesco
  Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com

On Wed, 2011-02-09 at 17:39 +0000, Gianni Tedesco wrote: 
> On Wed, 2011-02-09 at 15:54 +0000, Christoph Egger wrote:
> > On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote:
> > > On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote:
> > > > >>> I'm currently on c/s 22834. Which c/s added the check you are talking
> > > > >>> about?
> > > > >>
> > > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5
> > > > >
> > > > > This is c/s 22806. So my tree is new enough.
> > > >
> > > > Right, but did you happen to check how you got past the check done by
> > > > that patch for the case in question?
> > >
> > > The pid file simply doesn't exist.
> > 
> > Oh wait. Hit the 'send' button too fast.
> > 
> > The pid file does exist from previous boot.
> 
> Bleh, precisely my problem with these heuristic checks. It's worse on my
> box because if this happens I end up with unkillable xl processes due to
> libxenstore wanting to open /dev/xen/xenbus or whatever it is.

That's the underlying bug which the heuristic is trying to avoid...

Fundamentally the xs ring protocol is missing any way to tell if someone
is listening on the other end so you have no choice but to try
communicating and see if anyone responds.

It's a pretty straightforward bug that the kernel does the waiting to
see if anyone responds bit with an uninterruptible sleep. I took a quick
look a little while ago but unfortunately it didn't look straightforward
to fix on the kernel side :-( I can't remember why though.

It might be simpler to support allowing the userspace client to
explicitly specify a timeout. I'm not sure what the impact on the ring
is of leaving unconsumed requests on the ring when the other end does
show up. Presumably the kernel driver just needs to be prepared to
swallow responses whose target has given up and gone home.

Maybe we should add an explicit ping/pong ring message to the xs ring
protocol?

Ian.

> 
> Gianni 
> 
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10  8:55                 ` Ian Campbell
@ 2011-02-10  9:26                   ` Vincent Hanquez
  2011-02-10 11:24                     ` Ian Campbell
  2011-02-10 18:30                   ` Gianni Tedesco
  1 sibling, 1 reply; 22+ messages in thread
From: Vincent Hanquez @ 2011-02-10  9:26 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com,
	Gianni Tedesco

On 10/02/11 08:55, Ian Campbell wrote:
> That's the underlying bug which the heuristic is trying to avoid...
>
> Fundamentally the xs ring protocol is missing any way to tell if someone
> is listening on the other end so you have no choice but to try
> communicating and see if anyone responds.
>
> It's a pretty straightforward bug that the kernel does the waiting to
> see if anyone responds bit with an uninterruptible sleep. I took a quick
> look a little while ago but unfortunately it didn't look straightforward
> to fix on the kernel side :-( I can't remember why though.

For starter, the protocol requires the messages to sit on the ring for a 
underdetermined amount of time (boot watches).

> It might be simpler to support allowing the userspace client to
> explicitly specify a timeout. I'm not sure what the impact on the ring
> is of leaving unconsumed requests on the ring when the other end does
> show up. Presumably the kernel driver just needs to be prepared to
> swallow responses whose target has given up and gone home.

No, the simplest thing to do is to use the socket connection 
exclusively. Just how we're doing it in XCP and XCI.

The protocol is not design to do async either, so leaving unconsumed 
request, could be pretty disastrous if the other end show up. Providing
the kernel doesn't detect it (i don't think it does [1]), it would imply 
spurious reply, for example the previous waiting read on "/abc/def" 
could reply to a next read on "/xyz/123".

> Maybe we should add an explicit ping/pong ring message to the xs ring
> protocol?

And who's going to reply to this if xenstored is missing ? you would 
require the kernel to introspect the messages and reply by itself.

[1] the kernel would be happy to read the previous reply on the ring 
after xenstored has put the actual reply after it, and trigger the 
eventchn. (the kernel could actually check the requestid and see if they 
match, but it doesn't.)

-- 
Vincent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10  9:26                   ` Vincent Hanquez
@ 2011-02-10 11:24                     ` Ian Campbell
  2011-02-10 11:32                       ` Christoph Egger
  2011-02-10 21:55                       ` Vincent Hanquez
  0 siblings, 2 replies; 22+ messages in thread
From: Ian Campbell @ 2011-02-10 11:24 UTC (permalink / raw)
  To: Vincent Hanquez
  Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com,
	Gianni Tedesco

On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote: 
> On 10/02/11 08:55, Ian Campbell wrote:
> > That's the underlying bug which the heuristic is trying to avoid...
> >
> > Fundamentally the xs ring protocol is missing any way to tell if someone
> > is listening on the other end so you have no choice but to try
> > communicating and see if anyone responds.
> >
> > It's a pretty straightforward bug that the kernel does the waiting to
> > see if anyone responds bit with an uninterruptible sleep. I took a quick
> > look a little while ago but unfortunately it didn't look straightforward
> > to fix on the kernel side :-( I can't remember why though.
> 
> For starter, the protocol requires the messages to sit on the ring for a 
> underdetermined amount of time (boot watches).
> 
> > It might be simpler to support allowing the userspace client to
> > explicitly specify a timeout. I'm not sure what the impact on the ring
> > is of leaving unconsumed requests on the ring when the other end does
> > show up. Presumably the kernel driver just needs to be prepared to
> > swallow responses whose target has given up and gone home.
> 
> No, the simplest thing to do is to use the socket connection 
> exclusively. Just how we're doing it in XCP and XCI.

Right but this approach doesn't work with xenstored in a stubdomain.
Part of the point of using the ring protocol even when this isn't the
case is to help ensure that it is possible and help avoid regressions
etc.

> The protocol is not design to do async either, so leaving unconsumed 
> request, could be pretty disastrous if the other end show up. Providing
> the kernel doesn't detect it (i don't think it does [1]), it would imply 
> spurious reply, for example the previous waiting read on "/abc/def" 
> could reply to a next read on "/xyz/123".

The wire protocol includes a req_id which is echoed in the response
which sh/could facilitate multiplexing this sort of thing. The pvops
kernel currently always sets it to zero but that's just an
implementation detail ;-) Currently the kernel does (roughly):
	take_lock
	write_request
	wait_for_reply
	release_lock
instead it should/could be:
	take_lock(timeout)
	write_request (++req_id)
	while read_reply.req_id != req_id && not (timeout)
		wait some more
	release lock

OK, so may be this is not in the "might be simpler" bucket any more, but
it sounds like plausibly the right direction to take.

Properly handling multiple userspace clients asynchronously a demuxes
the responses etc would be even better but I don't think necessary to
solve this particular issue.

> > Maybe we should add an explicit ping/pong ring message to the xs ring
> > protocol?
> 
> And who's going to reply to this if xenstored is missing ? you would 
> require the kernel to introspect the messages and reply by itself.

The reason I suggested new messages was that I would solve that by
declaring that these new messages have whatever magic semantics I need
to make this work ;-)

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10 11:24                     ` Ian Campbell
@ 2011-02-10 11:32                       ` Christoph Egger
  2011-02-10 11:43                         ` Ian Campbell
  2011-02-10 21:55                       ` Vincent Hanquez
  1 sibling, 1 reply; 22+ messages in thread
From: Christoph Egger @ 2011-02-10 11:32 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Kamala Narasimhan, xen-devel@lists.xensource.com, Vincent Hanquez,
	Gianni Tedesco

On Thursday 10 February 2011 12:24:41 Ian Campbell wrote:
> On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote:
> > On 10/02/11 08:55, Ian Campbell wrote:
> > > That's the underlying bug which the heuristic is trying to avoid...
> > >
> > > Fundamentally the xs ring protocol is missing any way to tell if
> > > someone is listening on the other end so you have no choice but to try
> > > communicating and see if anyone responds.
> > >
> > > It's a pretty straightforward bug that the kernel does the waiting to
> > > see if anyone responds bit with an uninterruptible sleep. I took a
> > > quick look a little while ago but unfortunately it didn't look
> > > straightforward to fix on the kernel side :-( I can't remember why
> > > though.
> >
> > For starter, the protocol requires the messages to sit on the ring for a
> > underdetermined amount of time (boot watches).
> >
> > > It might be simpler to support allowing the userspace client to
> > > explicitly specify a timeout. I'm not sure what the impact on the ring
> > > is of leaving unconsumed requests on the ring when the other end does
> > > show up. Presumably the kernel driver just needs to be prepared to
> > > swallow responses whose target has given up and gone home.
> >
> > No, the simplest thing to do is to use the socket connection
> > exclusively. Just how we're doing it in XCP and XCI.
>
> Right but this approach doesn't work with xenstored in a stubdomain.
> Part of the point of using the ring protocol even when this isn't the
> case is to help ensure that it is possible and help avoid regressions
> etc.
>
> > The protocol is not design to do async either, so leaving unconsumed
> > request, could be pretty disastrous if the other end show up. Providing
> > the kernel doesn't detect it (i don't think it does [1]), it would imply
> > spurious reply, for example the previous waiting read on "/abc/def"
> > could reply to a next read on "/xyz/123".
>
> The wire protocol includes a req_id which is echoed in the response
> which sh/could facilitate multiplexing this sort of thing. The pvops
> kernel currently always sets it to zero but that's just an
> implementation detail ;-) Currently the kernel does (roughly):
> 	take_lock
> 	write_request
> 	wait_for_reply
> 	release_lock
> instead it should/could be:
> 	take_lock(timeout)
> 	write_request (++req_id)
> 	while read_reply.req_id != req_id && not (timeout)
> 		wait some more
> 	release lock

I prefer a userland solution. Fixing Linux Dom0 doesn't help NetBSD Dom0.

Christoph


> OK, so may be this is not in the "might be simpler" bucket any more, but
> it sounds like plausibly the right direction to take.
>
> Properly handling multiple userspace clients asynchronously a demuxes
> the responses etc would be even better but I don't think necessary to
> solve this particular issue.
>
> > > Maybe we should add an explicit ping/pong ring message to the xs ring
> > > protocol?
> >
> > And who's going to reply to this if xenstored is missing ? you would
> > require the kernel to introspect the messages and reply by itself.
>
> The reason I suggested new messages was that I would solve that by
> declaring that these new messages have whatever magic semantics I need
> to make this work ;-)
>
> Ian.



-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10 11:32                       ` Christoph Egger
@ 2011-02-10 11:43                         ` Ian Campbell
  2011-02-10 12:23                           ` Christoph Egger
  0 siblings, 1 reply; 22+ messages in thread
From: Ian Campbell @ 2011-02-10 11:43 UTC (permalink / raw)
  To: Christoph Egger
  Cc: Kamala Narasimhan, xen-devel@lists.xensource.com, Vincent Hanquez,
	Gianni Tedesco

On Thu, 2011-02-10 at 11:32 +0000, Christoph Egger wrote:
> On Thursday 10 February 2011 12:24:41 Ian Campbell wrote:
> > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote:
> > > On 10/02/11 08:55, Ian Campbell wrote:
> > > > That's the underlying bug which the heuristic is trying to avoid...
> > > >
> > > > Fundamentally the xs ring protocol is missing any way to tell if
> > > > someone is listening on the other end so you have no choice but to try
> > > > communicating and see if anyone responds.
> > > >
> > > > It's a pretty straightforward bug that the kernel does the waiting to
> > > > see if anyone responds bit with an uninterruptible sleep. I took a
> > > > quick look a little while ago but unfortunately it didn't look
> > > > straightforward to fix on the kernel side :-( I can't remember why
> > > > though.
> > >
> > > For starter, the protocol requires the messages to sit on the ring for a
> > > underdetermined amount of time (boot watches).
> > >
> > > > It might be simpler to support allowing the userspace client to
> > > > explicitly specify a timeout. I'm not sure what the impact on the ring
> > > > is of leaving unconsumed requests on the ring when the other end does
> > > > show up. Presumably the kernel driver just needs to be prepared to
> > > > swallow responses whose target has given up and gone home.
> > >
> > > No, the simplest thing to do is to use the socket connection
> > > exclusively. Just how we're doing it in XCP and XCI.
> >
> > Right but this approach doesn't work with xenstored in a stubdomain.
> > Part of the point of using the ring protocol even when this isn't the
> > case is to help ensure that it is possible and help avoid regressions
> > etc.
> >
> > > The protocol is not design to do async either, so leaving unconsumed
> > > request, could be pretty disastrous if the other end show up. Providing
> > > the kernel doesn't detect it (i don't think it does [1]), it would imply
> > > spurious reply, for example the previous waiting read on "/abc/def"
> > > could reply to a next read on "/xyz/123".
> >
> > The wire protocol includes a req_id which is echoed in the response
> > which sh/could facilitate multiplexing this sort of thing. The pvops
> > kernel currently always sets it to zero but that's just an
> > implementation detail ;-) Currently the kernel does (roughly):
> > 	take_lock
> > 	write_request
> > 	wait_for_reply
> > 	release_lock
> > instead it should/could be:
> > 	take_lock(timeout)
> > 	write_request (++req_id)
> > 	while read_reply.req_id != req_id && not (timeout)
> > 		wait some more
> > 	release lock
> 
> I prefer a userland solution. Fixing Linux Dom0 doesn't help NetBSD Dom0.

Fixing the NetBSD dom0 does though.

Seriously, if kernels are lacking in functionality needed to make the
system work smoothly and correctly we should fix them, not just default
to adding hacks in userspace because it seems easier in the short term.
(Obviously if the userspace solution is the right thing to do and/or
more correct in its own right then fine lets do that).

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10 11:43                         ` Ian Campbell
@ 2011-02-10 12:23                           ` Christoph Egger
  2011-02-10 12:42                             ` Ian Jackson
  0 siblings, 1 reply; 22+ messages in thread
From: Christoph Egger @ 2011-02-10 12:23 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Kamala Narasimhan, xen-devel@lists.xensource.com, Vincent Hanquez,
	Gianni Tedesco

On Thursday 10 February 2011 12:43:47 Ian Campbell wrote:
> On Thu, 2011-02-10 at 11:32 +0000, Christoph Egger wrote:
> > On Thursday 10 February 2011 12:24:41 Ian Campbell wrote:
> > > On Thu, 2011-02-10 at 09:26 +0000, Vincent Hanquez wrote:
> > > > On 10/02/11 08:55, Ian Campbell wrote:
> > > > > That's the underlying bug which the heuristic is trying to avoid...
> > > > >
> > > > > Fundamentally the xs ring protocol is missing any way to tell if
> > > > > someone is listening on the other end so you have no choice but to
> > > > > try communicating and see if anyone responds.
> > > > >
> > > > > It's a pretty straightforward bug that the kernel does the waiting
> > > > > to see if anyone responds bit with an uninterruptible sleep. I took
> > > > > a quick look a little while ago but unfortunately it didn't look
> > > > > straightforward to fix on the kernel side :-( I can't remember why
> > > > > though.
> > > >
> > > > For starter, the protocol requires the messages to sit on the ring
> > > > for a underdetermined amount of time (boot watches).
> > > >
> > > > > It might be simpler to support allowing the userspace client to
> > > > > explicitly specify a timeout. I'm not sure what the impact on the
> > > > > ring is of leaving unconsumed requests on the ring when the other
> > > > > end does show up. Presumably the kernel driver just needs to be
> > > > > prepared to swallow responses whose target has given up and gone
> > > > > home.
> > > >
> > > > No, the simplest thing to do is to use the socket connection
> > > > exclusively. Just how we're doing it in XCP and XCI.
> > >
> > > Right but this approach doesn't work with xenstored in a stubdomain.
> > > Part of the point of using the ring protocol even when this isn't the
> > > case is to help ensure that it is possible and help avoid regressions
> > > etc.
> > >
> > > > The protocol is not design to do async either, so leaving unconsumed
> > > > request, could be pretty disastrous if the other end show up.
> > > > Providing the kernel doesn't detect it (i don't think it does [1]),
> > > > it would imply spurious reply, for example the previous waiting read
> > > > on "/abc/def" could reply to a next read on "/xyz/123".
> > >
> > > The wire protocol includes a req_id which is echoed in the response
> > > which sh/could facilitate multiplexing this sort of thing. The pvops
> > > kernel currently always sets it to zero but that's just an
> > > implementation detail ;-) Currently the kernel does (roughly):
> > > 	take_lock
> > > 	write_request
> > > 	wait_for_reply
> > > 	release_lock
> > > instead it should/could be:
> > > 	take_lock(timeout)
> > > 	write_request (++req_id)
> > > 	while read_reply.req_id != req_id && not (timeout)
> > > 		wait some more
> > > 	release lock
> >
> > I prefer a userland solution. Fixing Linux Dom0 doesn't help NetBSD Dom0.
>
> Fixing the NetBSD dom0 does though.
>
> Seriously, if kernels are lacking in functionality needed to make the
> system work smoothly and correctly we should fix them, not just default
> to adding hacks in userspace because it seems easier in the short term.
> (Obviously if the userspace solution is the right thing to do and/or
> more correct in its own right then fine lets do that).

Does xl communicate with xenstored through a named socket ?
If yes then 'connect()' should check for ECONNREFUSED.

Christoph


-- 
---to satisfy European Law for business letters:
Advanced Micro Devices GmbH
Einsteinring 24, 85609 Dornach b. Muenchen
Geschaeftsfuehrer: Alberto Bozzo, Andrew Bowd
Sitz: Dornach, Gemeinde Aschheim, Landkreis Muenchen
Registergericht Muenchen, HRB Nr. 43632

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10 12:23                           ` Christoph Egger
@ 2011-02-10 12:42                             ` Ian Jackson
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Jackson @ 2011-02-10 12:42 UTC (permalink / raw)
  To: Christoph Egger
  Cc: Ian Campbell, Kamala Narasimhan, xen-devel@lists.xensource.com,
	Vincent Hanquez, Gianni Tedesco

Christoph Egger writes ("Re: [Xen-devel] libxl: error handling before xenstored runs"):
> Does xl communicate with xenstored through a named socket ?

Sometimes; or it can use the shared ring.  Please see the earlier
parts of the thread where Kamala explained why she wants it to use the
ring in the usual case.

I'm not sure that's necessary, but _some_ arrangement for making it
work with the ring _is_ necessary.

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10 11:24                     ` Ian Campbell
  2011-02-10 11:32                       ` Christoph Egger
@ 2011-02-10 21:55                       ` Vincent Hanquez
  2011-02-11  8:03                         ` Ian Campbell
  2011-02-11  9:49                         ` Tim Deegan
  1 sibling, 2 replies; 22+ messages in thread
From: Vincent Hanquez @ 2011-02-10 21:55 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com,
	Gianni Tedesco

On 10/02/11 11:24, Ian Campbell wrote:
> Right but this approach doesn't work with xenstored in a stubdomain.

yeah I know. xenstored in a stubdom is just an experiment, when it 
become a serious feature, this argument would hold. however it's not 
going to be use in 4.1, and in any production settings.

> Part of the point of using the ring protocol even when this isn't the
> case is to help ensure that it is possible and help avoid regressions
> etc.
>
>> The protocol is not design to do async either, so leaving unconsumed
>> request, could be pretty disastrous if the other end show up. Providing
>> the kernel doesn't detect it (i don't think it does [1]), it would imply
>> spurious reply, for example the previous waiting read on "/abc/def"
>> could reply to a next read on "/xyz/123".
>
> The wire protocol includes a req_id which is echoed in the response
> which sh/could facilitate multiplexing this sort of thing. The pvops
> kernel currently always sets it to zero but that's just an
> implementation detail ;-) Currently the kernel does (roughly):

The kernel is not the one exclusively setting the rid. this is a client 
initialized value. any xs implementation can use it any way they want 
(including the kernel implementation).

Turns out that most of the implementations are actually putting rid to 0 
anyway (the ocaml and C implementation are, the windows one isn't).

Even then, if you could initialize it to some value, what value is that 
going to be ? there's just no way to know if someone else is not using 
this rid already globally (since the ring is a global OS thing). Which 
basically would means tracking pid (the kernel meaning) along with the rid ?

>>> Maybe we should add an explicit ping/pong ring message to the xs ring
>>> protocol?
>>
>> And who's going to reply to this if xenstored is missing ? you would
>> require the kernel to introspect the messages and reply by itself.
>
> The reason I suggested new messages was that I would solve that by
> declaring that these new messages have whatever magic semantics I need
> to make this work ;-)

ah right, the famous DeusExMachina message type then :-)

-- 
Vincent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10 21:55                       ` Vincent Hanquez
@ 2011-02-11  8:03                         ` Ian Campbell
  2011-02-11  9:49                         ` Tim Deegan
  1 sibling, 0 replies; 22+ messages in thread
From: Ian Campbell @ 2011-02-11  8:03 UTC (permalink / raw)
  To: Vincent Hanquez
  Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com,
	Gianni Tedesco

On Thu, 2011-02-10 at 21:55 +0000, Vincent Hanquez wrote: 
> On 10/02/11 11:24, Ian Campbell wrote:
> > Right but this approach doesn't work with xenstored in a stubdomain.
> 
> yeah I know. xenstored in a stubdom is just an experiment, when it 
> become a serious feature, this argument would hold. however it's not 
> going to be use in 4.1, and in any production settings.

Accepted.

As I understand it people are actively using stub-xenstored in
disaggregation research today. Reinvigorating the stub domain approach
for Xen is also (going to be) one of our GSoC proposals this year.

> > Part of the point of using the ring protocol even when this isn't the
> > case is to help ensure that it is possible and help avoid regressions
> > etc.
> >
> >> The protocol is not design to do async either, so leaving unconsumed
> >> request, could be pretty disastrous if the other end show up. Providing
> >> the kernel doesn't detect it (i don't think it does [1]), it would imply
> >> spurious reply, for example the previous waiting read on "/abc/def"
> >> could reply to a next read on "/xyz/123".
> >
> > The wire protocol includes a req_id which is echoed in the response
> > which sh/could facilitate multiplexing this sort of thing. The pvops
> > kernel currently always sets it to zero but that's just an
> > implementation detail ;-) Currently the kernel does (roughly):
> 
> The kernel is not the one exclusively setting the rid. this is a client 
> initialized value. any xs implementation can use it any way they want 
> (including the kernel implementation).
> 
> Turns out that most of the implementations are actually putting rid to 0 
> anyway (the ocaml and C implementation are, the windows one isn't).
> 
> Even then, if you could initialize it to some value, what value is that 
> going to be ? there's just no way to know if someone else is not using 
> this rid already globally (since the ring is a global OS thing). Which 
> basically would means tracking pid (the kernel meaning) along with the rid ?

Since the kernel mediates all access to the actual ring it can handle
the req_id with a single incrementing integer and fake out whatever is
necessary to its users.

It's trivial to fixup the in-kernel xs users. Most likely it only
involves changing a single function in the core xs kernel code which
everyone else must use anyway.

For userspace users it's a little trickier but the kernel just needs to
remember the userspace supplied req_id before inserting its own and to
reverse the substitution in the reply. If you were to support multiple
outstanding active requests then it would be natural to stash the id in
whatever data structure you were using for that. If you only want to
simplify by only supporting a single active request (by throwing away
responses to aborted/timed out requests as I suggested earlier) you only
need to remember the user provided req_id for the one request which is
trivial.

AFAIK we don't have any kernel code which does clever things such as
using a pointer to a datastructure as the req_id but if we did (and we
were unwilling to simply change it) then the userspace solution would
work there too.

> >>> Maybe we should add an explicit ping/pong ring message to the xs ring
> >>> protocol?
> >>
> >> And who's going to reply to this if xenstored is missing ? you would
> >> require the kernel to introspect the messages and reply by itself.
> >
> > The reason I suggested new messages was that I would solve that by
> > declaring that these new messages have whatever magic semantics I need
> > to make this work ;-)
> 
> ah right, the famous DeusExMachina message type then :-)

:-)

Ian

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10 21:55                       ` Vincent Hanquez
  2011-02-11  8:03                         ` Ian Campbell
@ 2011-02-11  9:49                         ` Tim Deegan
  2011-02-11 11:16                           ` Vincent Hanquez
  1 sibling, 1 reply; 22+ messages in thread
From: Tim Deegan @ 2011-02-11  9:49 UTC (permalink / raw)
  To: Vincent Hanquez
  Cc: Christoph Egger, xen-devel@lists.xensource.com, Kamala Narasimhan,
	Ian Campbell, Gianni, Tedesco

At 21:55 +0000 on 10 Feb (1297374910), Vincent Hanquez wrote:
> On 10/02/11 11:24, Ian Campbell wrote:
> > Right but this approach doesn't work with xenstored in a stubdomain.
> 
> yeah I know. xenstored in a stubdom is just an experiment, when it 
> become a serious feature, this argument would hold. however it's not 
> going to be use in 4.1, and in any production settings.

You seem to be arguing that we shouldn't fix a bug in the kernel.  I
don't understand that.  How is it going to become a "serious feature" if
we don't fix the bugs that affect it?

In any case, I can think of three projects off the top of my head that
are using stub domains aggressively, one of which I know is using
xenstore stub domains in particular.  I'm sure there are others.

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, Xen Platform Team
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-11  9:49                         ` Tim Deegan
@ 2011-02-11 11:16                           ` Vincent Hanquez
  0 siblings, 0 replies; 22+ messages in thread
From: Vincent Hanquez @ 2011-02-11 11:16 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Christoph Egger, xen-devel@lists.xensource.com, Kamala Narasimhan,
	Ian Campbell, Gianni, Tedesco

On 11/02/11 09:49, Tim Deegan wrote:
> At 21:55 +0000 on 10 Feb (1297374910), Vincent Hanquez wrote:
>> On 10/02/11 11:24, Ian Campbell wrote:
>>> Right but this approach doesn't work with xenstored in a stubdomain.
>>
>> yeah I know. xenstored in a stubdom is just an experiment, when it
>> become a serious feature, this argument would hold. however it's not
>> going to be use in 4.1, and in any production settings.
>
> You seem to be arguing that we shouldn't fix a bug in the kernel.  I
> don't understand that.  How is it going to become a "serious feature" if
> we don't fix the bugs that affect it?

If you want to fix the behaviour which is present since xen 3.0 (you can 
understand why i'm not holding my breath anymore), all the best.

What I'm arguing is that behaviour should not exists in the next stable 
version of xen, specially for a feature that is not serious yet (at 
least upstream).

> In any case, I can think of three projects off the top of my head that
> are using stub domains aggressively, one of which I know is using
> xenstore stub domains in particular.  I'm sure there are others.

Good for them. I'm sure they can carry a 2 liner patch to connect to the 
ring instead of the unix socket, until they actually submit upstream 
their stuff that make xenstored stubdomain great.

-- 
Vincent

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10  8:55                 ` Ian Campbell
  2011-02-10  9:26                   ` Vincent Hanquez
@ 2011-02-10 18:30                   ` Gianni Tedesco
  2011-02-10 19:33                     ` Ian Jackson
  1 sibling, 1 reply; 22+ messages in thread
From: Gianni Tedesco @ 2011-02-10 18:30 UTC (permalink / raw)
  To: Ian Campbell
  Cc: Kamala Narasimhan, Christoph Egger, xen-devel@lists.xensource.com

On Thu, 2011-02-10 at 08:55 +0000, Ian Campbell wrote:
> On Wed, 2011-02-09 at 17:39 +0000, Gianni Tedesco wrote: 
> > On Wed, 2011-02-09 at 15:54 +0000, Christoph Egger wrote:
> > > On Wednesday 09 February 2011 16:52:08 Christoph Egger wrote:
> > > > On Wednesday 09 February 2011 16:42:21 Kamala Narasimhan wrote:
> > > > > >>> I'm currently on c/s 22834. Which c/s added the check you are talking
> > > > > >>> about?
> > > > > >>
> > > > > >> http://xenbits.xen.org/staging/xen-unstable.hg?rev/eefb8e971be5
> > > > > >
> > > > > > This is c/s 22806. So my tree is new enough.
> > > > >
> > > > > Right, but did you happen to check how you got past the check done by
> > > > > that patch for the case in question?
> > > >
> > > > The pid file simply doesn't exist.
> > > 
> > > Oh wait. Hit the 'send' button too fast.
> > > 
> > > The pid file does exist from previous boot.
> > 
> > Bleh, precisely my problem with these heuristic checks. It's worse on my
> > box because if this happens I end up with unkillable xl processes due to
> > libxenstore wanting to open /dev/xen/xenbus or whatever it is.
> 
> That's the underlying bug which the heuristic is trying to avoid...
> 
> Fundamentally the xs ring protocol is missing any way to tell if someone
> is listening on the other end so you have no choice but to try
> communicating and see if anyone responds.
> 
> It's a pretty straightforward bug that the kernel does the waiting to
> see if anyone responds bit with an uninterruptible sleep. I took a quick
> look a little while ago but unfortunately it didn't look straightforward
> to fix on the kernel side :-( I can't remember why though.

I suppose it's because we don't want to be killable after sending the
message but before receiving the reply, since the ring is going to get
jammed up due to nobody consuming the reply. The reply that in this case
never comes, but the kernel can't know that it won't eventually come,
right?

Gianni

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: libxl: error handling before xenstored runs
  2011-02-10 18:30                   ` Gianni Tedesco
@ 2011-02-10 19:33                     ` Ian Jackson
  0 siblings, 0 replies; 22+ messages in thread
From: Ian Jackson @ 2011-02-10 19:33 UTC (permalink / raw)
  To: Gianni Tedesco
  Cc: Ian Campbell, Kamala Narasimhan, Christoph Egger,
	xen-devel@lists.xensource.com

Gianni Tedesco writes ("Re: [Xen-devel] libxl: error handling before xenstored runs"):
> I suppose it's because we don't want to be killable after sending the
> message but before receiving the reply, since the ring is going to get
> jammed up due to nobody consuming the reply. The reply that in this case
> never comes, but the kernel can't know that it won't eventually come,
> right?

The bookkeeping for the fact that there is a command outstanding
should take care of that problem - when matching up the replies with
requests it will find that the caller for the request has gone away
and discard the reply.  (Obviously it doesn't atm because the whole
thing is synchronous.)

Ian.

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2011-02-11 11:16 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-02-09 11:13 libxl: error handling before xenstored runs Christoph Egger
2011-02-09 14:29 ` Kamala Narasimhan
2011-02-09 14:42   ` Christoph Egger
2011-02-09 14:46     ` Kamala Narasimhan
2011-02-09 15:32       ` Christoph Egger
2011-02-09 15:42         ` Kamala Narasimhan
2011-02-09 15:52           ` Christoph Egger
2011-02-09 15:54             ` Christoph Egger
2011-02-09 17:39               ` Gianni Tedesco
2011-02-10  8:55                 ` Ian Campbell
2011-02-10  9:26                   ` Vincent Hanquez
2011-02-10 11:24                     ` Ian Campbell
2011-02-10 11:32                       ` Christoph Egger
2011-02-10 11:43                         ` Ian Campbell
2011-02-10 12:23                           ` Christoph Egger
2011-02-10 12:42                             ` Ian Jackson
2011-02-10 21:55                       ` Vincent Hanquez
2011-02-11  8:03                         ` Ian Campbell
2011-02-11  9:49                         ` Tim Deegan
2011-02-11 11:16                           ` Vincent Hanquez
2011-02-10 18:30                   ` Gianni Tedesco
2011-02-10 19:33                     ` Ian Jackson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.