cs:21768 causes guest spend more time on boot up

xen-devel.lists.xenproject.org archive mirror
 help / color / mirror / Atom feed

* cs:21768 causes guest spend more time on boot up
@ 2010-07-15  8:07 Zhang, Yang Z
  2010-07-15  9:48 ` Tim Deegan
  0 siblings, 1 reply; 17+ messages in thread
From: Zhang, Yang Z @ 2010-07-15  8:07 UTC (permalink / raw)
  To: Tim.Deegan@citrix.com
  Cc: Xu, Jiajun, Zhang, Jianwu, xen-devel@lists.xensource.com


[-- Attachment #1.1: Type: text/plain, Size: 316 bytes --]

Hi Tim
         In our recently nightly test, we find guest will cost more time to boot up. After our investigation, we find that for rhel5u3 and rh5u5 guest it will stop at "start udev " for long time when boot up. And we find cs:21768 will cause this issue. Do you meet the same problem?

best regards
yang


[-- Attachment #1.2: Type: text/html, Size: 5225 bytes --]

[-- Attachment #2: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cs:21768 causes guest spend more time on boot up
  2010-07-15  8:07 cs:21768 causes guest spend more time on boot up Zhang, Yang Z
@ 2010-07-15  9:48 ` Tim Deegan
  2010-07-15 12:22   ` Tim Deegan
  0 siblings, 1 reply; 17+ messages in thread
From: Tim Deegan @ 2010-07-15  9:48 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: Xu, Jiajun, Zhang, Jianwu, xen-devel@lists.xensource.com

Hi, 

At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
>          In our recently nightly test, we find guest will cost more
> time to boot up. After our investigation, we find that for rhel5u3 and
> rh5u5 guest it will stop at ?start udev ? for long time when boot
> up. And we find cs:21768 will cause this issue. Do you meet the same
> problem?

Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
seconds, but that's not unusual.  I'll try RHEL 5.3.

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cs:21768 causes guest spend more time on boot up
  2010-07-15  9:48 ` Tim Deegan
@ 2010-07-15 12:22   ` Tim Deegan
  2010-07-15 15:14     ` Zhang, Yang Z
  0 siblings, 1 reply; 17+ messages in thread
From: Tim Deegan @ 2010-07-15 12:22 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: Xu, Jiajun, Zhang, Jianwu, xen-devel@lists.xensource.com

At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:
> At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
> >          In our recently nightly test, we find guest will cost more
> > time to boot up. After our investigation, we find that for rhel5u3 and
> > rh5u5 guest it will stop at ?start udev ? for long time when boot
> > up. And we find cs:21768 will cause this issue. Do you meet the same
> > problem?
> 
> Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
> seconds, but that's not unusual.  I'll try RHEL 5.3.

I've reproduced this slowdown on CentOS 5.5 x64.  It seems to be caused
by the size of the SMBIOS tables - reverting the part of this cset that
adds a type 11 object fixes the boot time; then just making some of the
other SMBIOS strings longer causes it to hang again. I wonder whether
we're running into some other BIOS datastructure around 0xEB180

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: cs:21768 causes guest spend more time on boot up
  2010-07-15 12:22   ` Tim Deegan
@ 2010-07-15 15:14     ` Zhang, Yang Z
  2010-07-15 15:20       ` Tim Deegan
  0 siblings, 1 reply; 17+ messages in thread
From: Zhang, Yang Z @ 2010-07-15 15:14 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Xu, Jiajun, Zhang, Jianwu, xen-devel@lists.xensource.com

Is type 11 data structure cover 0xEB180 ? where is the start address of type 11 structure? and the max length ?


best regards
yang


-----Original Message-----
From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
Sent: Thursday, July 15, 2010 8:23 PM
To: Zhang, Yang Z
Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
Subject: Re: cs:21768 causes guest spend more time on boot up

At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:
> At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
> >          In our recently nightly test, we find guest will cost more
> > time to boot up. After our investigation, we find that for rhel5u3 and
> > rh5u5 guest it will stop at ?start udev ? for long time when boot
> > up. And we find cs:21768 will cause this issue. Do you meet the same
> > problem?
> 
> Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
> seconds, but that's not unusual.  I'll try RHEL 5.3.

I've reproduced this slowdown on CentOS 5.5 x64.  It seems to be caused
by the size of the SMBIOS tables - reverting the part of this cset that
adds a type 11 object fixes the boot time; then just making some of the
other SMBIOS strings longer causes it to hang again. I wonder whether
we're running into some other BIOS datastructure around 0xEB180

Tim.

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cs:21768 causes guest spend more time on boot up
  2010-07-15 15:14     ` Zhang, Yang Z
@ 2010-07-15 15:20       ` Tim Deegan
  2010-07-15 15:30         ` Zhang, Yang Z
  0 siblings, 1 reply; 17+ messages in thread
From: Tim Deegan @ 2010-07-15 15:20 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: Xu, Jiajun, Zhang, Jianwu, xen-devel@lists.xensource.com

At 16:14 +0100 on 15 Jul (1279210449), Zhang, Yang Z wrote:
> Is type 11 data structure cover 0xEB180 ? where is the start address
> of type 11 structure? and the max length ?

No, the type-11 table is in the middle of the other SMBIOS tables.  The
SMBIOS tables cover 0xEB000 -- 0xEB171 before my patch; after the patch
they go a little further.  Extending them to cover 0xEB000 -- 0xEB187 
(just by making existing strings a little longer, not adding a type-11
table) makes CentOS 5.5 x64 hang up on boot.

Cheers,

Tim.

> 
> best regards
> yang
> 
> 
> -----Original Message-----
> From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
> Sent: Thursday, July 15, 2010 8:23 PM
> To: Zhang, Yang Z
> Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
> Subject: Re: cs:21768 causes guest spend more time on boot up
> 
> At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:
> > At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
> > >          In our recently nightly test, we find guest will cost more
> > > time to boot up. After our investigation, we find that for rhel5u3 and
> > > rh5u5 guest it will stop at ?start udev ? for long time when boot
> > > up. And we find cs:21768 will cause this issue. Do you meet the same
> > > problem?
> > 
> > Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
> > seconds, but that's not unusual.  I'll try RHEL 5.3.
> 
> I've reproduced this slowdown on CentOS 5.5 x64.  It seems to be caused
> by the size of the SMBIOS tables - reverting the part of this cset that
> adds a type 11 object fixes the boot time; then just making some of the
> other SMBIOS strings longer causes it to hang again. I wonder whether
> we're running into some other BIOS datastructure around 0xEB180
> 
> Tim.
> 
> -- 
> Tim Deegan <Tim.Deegan@citrix.com>
> Principal Software Engineer, XenServer Engineering
> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: cs:21768 causes guest spend more time on boot up
  2010-07-15 15:20       ` Tim Deegan
@ 2010-07-15 15:30         ` Zhang, Yang Z
  2010-07-15 15:33           ` Tim Deegan
  0 siblings, 1 reply; 17+ messages in thread
From: Zhang, Yang Z @ 2010-07-15 15:30 UTC (permalink / raw)
  To: Tim Deegan; +Cc: Xu, Jiajun, Zhang, Jianwu, xen-devel@lists.xensource.com

ok, I get it. Will you work on this issue ? it has block our nightly test. Hope will be fixed as soon as possible.

best regards
yang


-----Original Message-----
From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
Sent: Thursday, July 15, 2010 11:20 PM
To: Zhang, Yang Z
Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
Subject: Re: cs:21768 causes guest spend more time on boot up

At 16:14 +0100 on 15 Jul (1279210449), Zhang, Yang Z wrote:
> Is type 11 data structure cover 0xEB180 ? where is the start address
> of type 11 structure? and the max length ?

No, the type-11 table is in the middle of the other SMBIOS tables.  The
SMBIOS tables cover 0xEB000 -- 0xEB171 before my patch; after the patch
they go a little further.  Extending them to cover 0xEB000 -- 0xEB187 
(just by making existing strings a little longer, not adding a type-11
table) makes CentOS 5.5 x64 hang up on boot.

Cheers,

Tim.

> 
> best regards
> yang
> 
> 
> -----Original Message-----
> From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
> Sent: Thursday, July 15, 2010 8:23 PM
> To: Zhang, Yang Z
> Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
> Subject: Re: cs:21768 causes guest spend more time on boot up
> 
> At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:
> > At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
> > >          In our recently nightly test, we find guest will cost more
> > > time to boot up. After our investigation, we find that for rhel5u3 and
> > > rh5u5 guest it will stop at ?start udev ? for long time when boot
> > > up. And we find cs:21768 will cause this issue. Do you meet the same
> > > problem?
> > 
> > Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
> > seconds, but that's not unusual.  I'll try RHEL 5.3.
> 
> I've reproduced this slowdown on CentOS 5.5 x64.  It seems to be caused
> by the size of the SMBIOS tables - reverting the part of this cset that
> adds a type 11 object fixes the boot time; then just making some of the
> other SMBIOS strings longer causes it to hang again. I wonder whether
> we're running into some other BIOS datastructure around 0xEB180
> 
> Tim.
> 
> -- 
> Tim Deegan <Tim.Deegan@citrix.com>
> Principal Software Engineer, XenServer Engineering
> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: cs:21768 causes guest spend more time on boot up
  2010-07-15 15:30         ` Zhang, Yang Z
@ 2010-07-15 15:33           ` Tim Deegan
  2010-07-19 10:55             ` [PATCH] " Tim Deegan
  0 siblings, 1 reply; 17+ messages in thread
From: Tim Deegan @ 2010-07-15 15:33 UTC (permalink / raw)
  To: Zhang, Yang Z; +Cc: Xu, Jiajun, Zhang, Jianwu, xen-devel@lists.xensource.com

At 16:30 +0100 on 15 Jul (1279211440), Zhang, Yang Z wrote:
> ok, I get it. Will you work on this issue ? it has block our nightly
> test. Hope will be fixed as soon as possible.

Yes, I am still looking into it.  I think the address of the table is
not causing it, though, so it might take some time to find the real
cause.

Cheers,

Tim.

> best regards
> yang
> 
> 
> -----Original Message-----
> From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
> Sent: Thursday, July 15, 2010 11:20 PM
> To: Zhang, Yang Z
> Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
> Subject: Re: cs:21768 causes guest spend more time on boot up
> 
> At 16:14 +0100 on 15 Jul (1279210449), Zhang, Yang Z wrote:
> > Is type 11 data structure cover 0xEB180 ? where is the start address
> > of type 11 structure? and the max length ?
> 
> No, the type-11 table is in the middle of the other SMBIOS tables.  The
> SMBIOS tables cover 0xEB000 -- 0xEB171 before my patch; after the patch
> they go a little further.  Extending them to cover 0xEB000 -- 0xEB187 
> (just by making existing strings a little longer, not adding a type-11
> table) makes CentOS 5.5 x64 hang up on boot.
> 
> Cheers,
> 
> Tim.
> 
> > 
> > best regards
> > yang
> > 
> > 
> > -----Original Message-----
> > From: Tim Deegan [mailto:Tim.Deegan@citrix.com] 
> > Sent: Thursday, July 15, 2010 8:23 PM
> > To: Zhang, Yang Z
> > Cc: xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
> > Subject: Re: cs:21768 causes guest spend more time on boot up
> > 
> > At 10:48 +0100 on 15 Jul (1279190937), Tim Deegan wrote:
> > > At 09:07 +0100 on 15 Jul (1279184841), Zhang, Yang Z wrote:
> > > >          In our recently nightly test, we find guest will cost more
> > > > time to boot up. After our investigation, we find that for rhel5u3 and
> > > > rh5u5 guest it will stop at ?start udev ? for long time when boot
> > > > up. And we find cs:21768 will cause this issue. Do you meet the same
> > > > problem?
> > > 
> > > Nope, works fine for me[tm].  RHEL 5.5 stops at start_udev for about 5
> > > seconds, but that's not unusual.  I'll try RHEL 5.3.
> > 
> > I've reproduced this slowdown on CentOS 5.5 x64.  It seems to be caused
> > by the size of the SMBIOS tables - reverting the part of this cset that
> > adds a type 11 object fixes the boot time; then just making some of the
> > other SMBIOS strings longer causes it to hang again. I wonder whether
> > we're running into some other BIOS datastructure around 0xEB180
> > 
> > Tim.
> > 
> > -- 
> > Tim Deegan <Tim.Deegan@citrix.com>
> > Principal Software Engineer, XenServer Engineering
> > Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)
> 
> -- 
> Tim Deegan <Tim.Deegan@citrix.com>
> Principal Software Engineer, XenServer Engineering
> Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] Re: cs:21768 causes guest spend more time on boot up
  2010-07-15 15:33           ` Tim Deegan
@ 2010-07-19 10:55             ` Tim Deegan
  2010-07-19 11:46               ` Keir Fraser
  0 siblings, 1 reply; 17+ messages in thread
From: Tim Deegan @ 2010-07-19 10:55 UTC (permalink / raw)
  To: Zhang, Yang Z
  Cc: Xu, Jiajun, Zhang, Jianwu, xen-devel@lists.xensource.com, Fraser


At 16:33 +0100 on 15 Jul (1279211635), Tim Deegan wrote:
> At 16:30 +0100 on 15 Jul (1279211440), Zhang, Yang Z wrote:
> > ok, I get it. Will you work on this issue ? it has block our nightly
> > test. Hope will be fixed as soon as possible.
> 
> Yes, I am still looking into it.  I think the address of the table is
> not causing it, though, so it might take some time to find the real
> cause.

The hang turned out to be entirely unrelated to the SMBIOS tables; the 
xenbus client zeroes out teh xenstore ring entirely, and it looks like
newer dom0 xenbus backends can't handle that, so:


hvmloader: don't zero out the xenbus page.  
Not all xenbus backends accept that gracefully.  Instead rely on the
xenbus frontend in the guest being able to start up with non-zero ring
offsets. 

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c
--- a/tools/firmware/hvmloader/xenbus.c	Thu Jul 15 18:18:16 2010 +0100
+++ b/tools/firmware/hvmloader/xenbus.c	Mon Jul 19 11:51:11 2010 +0100
@@ -53,14 +53,14 @@
            (unsigned long) rings, (unsigned long) event);
 }
 
-/* Reset the xenbus connection so the next kernel can start again. 
- * We zero out the whole ring -- the backend can handle this, and it's 
- * not going to surprise any frontends since it's equivalent to never 
- * having used the rings. */
+/* Drop the Xenbus connection, leaving it ready for the next user. 
+ * There should be no messages on the ring but make sure the rsp 
+ * consumer is up to date just in case. */
 void xenbus_shutdown(void)
 {
     ASSERT(rings != NULL);
-    memset(rings, 0, sizeof *rings);
+    ASSERT(rings->req_cons == rings->req_prod);
+    rings->rsp_cons = rings->rsp_prod;
     rings = NULL;
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up
  2010-07-19 10:55             ` [PATCH] " Tim Deegan
@ 2010-07-19 11:46               ` Keir Fraser
  2010-07-19 12:19                 ` Tim Deegan
  0 siblings, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2010-07-19 11:46 UTC (permalink / raw)
  To: Tim Deegan, Zhang, Yang Z
  Cc: Xu, Jiajun, Zhang, Jianwu, xen-devel@lists.xensource.com

On 19/07/2010 11:55, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:

>> Yes, I am still looking into it.  I think the address of the table is
>> not causing it, though, so it might take some time to find the real
>> cause.
> 
> The hang turned out to be entirely unrelated to the SMBIOS tables; the
> xenbus client zeroes out teh xenstore ring entirely, and it looks like
> newer dom0 xenbus backends can't handle that, so:

What would it have to do with an in-kernel driver? Doesn't the comms page
only get looked at by [o]xenstored? In which case we could fix them.

 -- Keir

> hvmloader: don't zero out the xenbus page.
> Not all xenbus backends accept that gracefully.  Instead rely on the
> xenbus frontend in the guest being able to start up with non-zero ring
> offsets. 
> 
> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up
  2010-07-19 11:46               ` Keir Fraser
@ 2010-07-19 12:19                 ` Tim Deegan
  2010-07-19 12:24                   ` Keir Fraser
  2010-07-20 10:38                   ` [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up) Tim Deegan
  0 siblings, 2 replies; 17+ messages in thread
From: Tim Deegan @ 2010-07-19 12:19 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Zhang, Yang Z, Zhang, Jianwu, xen-devel@lists.xensource.com,
	Xu, Jiajun

At 12:46 +0100 on 19 Jul (1279543585), Keir Fraser wrote:
> On 19/07/2010 11:55, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:
> 
> >> Yes, I am still looking into it.  I think the address of the table is
> >> not causing it, though, so it might take some time to find the real
> >> cause.
> > 
> > The hang turned out to be entirely unrelated to the SMBIOS tables; the
> > xenbus client zeroes out teh xenstore ring entirely, and it looks like
> > newer dom0 xenbus backends can't handle that, so:
> 
> What would it have to do with an in-kernel driver? Doesn't the comms page
> only get looked at by [o]xenstored? In which case we could fix them.

Ah, so it does.  I assumed it was the kernel because that's all that
changed on my test box since I last tested this stuff.  I'm using the C
xenstored and its code looks like it should work fine with the page
getting zeroed under its feet.  I'll dig further.

Cheers,

Tim.

> 
>  -- Keir
> 
> > hvmloader: don't zero out the xenbus page.
> > Not all xenbus backends accept that gracefully.  Instead rely on the
> > xenbus frontend in the guest being able to start up with non-zero ring
> > offsets. 
> > 
> > Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
> 
> 

-- 
Tim Deegan <Tim.Deegan@citrix.com>
Principal Software Engineer, XenServer Engineering
Citrix Systems UK Ltd.  (Company #02937203, SL9 0BG)

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up
  2010-07-19 12:19                 ` Tim Deegan
@ 2010-07-19 12:24                   ` Keir Fraser
  2010-07-19 12:30                     ` Keir Fraser
  2010-07-20 10:38                   ` [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up) Tim Deegan
  1 sibling, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2010-07-19 12:24 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Zhang, Yang Z, Zhang, Jianwu, xen-devel@lists.xensource.com,
	Xu, Jiajun

On 19/07/2010 13:19, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:

>>> The hang turned out to be entirely unrelated to the SMBIOS tables; the
>>> xenbus client zeroes out teh xenstore ring entirely, and it looks like
>>> newer dom0 xenbus backends can't handle that, so:
>> 
>> What would it have to do with an in-kernel driver? Doesn't the comms page
>> only get looked at by [o]xenstored? In which case we could fix them.
> 
> Ah, so it does.  I assumed it was the kernel because that's all that
> changed on my test box since I last tested this stuff.  I'm using the C
> xenstored and its code looks like it should work fine with the page
> getting zeroed under its feet.  I'll dig further.

Thanks. The revised patch might be acceptable, but if possible we're better
off relying on undocumented xenstored behaviour than domU frontend behaviour
since the former we have full control over (albeit we now have two daemons
to consider). Really nice would be some kind of explicit reset command or
protocol, but in this context since there are no watches or pending requests
or anything, I guess whacking the xenstore page is sufficient engineering
effort.

 -- Keir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: Re: [PATCH] Re: cs:21768 causes guest spend more time on boot up
  2010-07-19 12:24                   ` Keir Fraser
@ 2010-07-19 12:30                     ` Keir Fraser
  0 siblings, 0 replies; 17+ messages in thread
From: Keir Fraser @ 2010-07-19 12:30 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Zhang, Yang Z, Zhang, Jianwu, xen-devel@lists.xensource.com,
	Xu, Jiajun

On 19/07/2010 13:24, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:

>> Ah, so it does.  I assumed it was the kernel because that's all that
>> changed on my test box since I last tested this stuff.  I'm using the C
>> xenstored and its code looks like it should work fine with the page
>> getting zeroed under its feet.  I'll dig further.
> 
> Thanks. The revised patch might be acceptable, but if possible we're better
> off relying on undocumented xenstored behaviour than domU frontend behaviour
> since the former we have full control over (albeit we now have two daemons
> to consider). Really nice would be some kind of explicit reset command or
> protocol, but in this context since there are no watches or pending requests
> or anything, I guess whacking the xenstore page is sufficient engineering
> effort.

And, yes, I agree it looks like the existing code should just work. The C
xenstored, at least, doesn't cache ring indexes.

 -- Keir

^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
  2010-07-19 12:19                 ` Tim Deegan
  2010-07-19 12:24                   ` Keir Fraser
@ 2010-07-20 10:38                   ` Tim Deegan
  2010-07-20 11:16                     ` Keir Fraser
  1 sibling, 1 reply; 17+ messages in thread
From: Tim Deegan @ 2010-07-20 10:38 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Zhang, Yang Z, Zhang, Jianwu, xen-devel@lists.xensource.com,
	Xu, Jiajun

Here's another patch that also unsticks CentOS 5.5 boot for me, and
seems safer and saner (even if it turns out that the bug is somewhere
else and I'm just perturbing the inputs to some more complex system...)

Cheers,

Tim.

hvmloader: clear the xenbus event-channel when we're done with it.
Otherwise a later xenbus client that naively waits for the rising edge
could get stuck.

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c
--- a/tools/firmware/hvmloader/util.c	Thu Jul 15 18:18:16 2010 +0100
+++ b/tools/firmware/hvmloader/util.c	Tue Jul 20 11:34:06 2010 +0100
@@ -587,10 +587,28 @@
     return table;
 }
 
+struct shared_info *get_shared_info(void) 
+{
+    static struct shared_info *shared_info = NULL;
+    struct xen_add_to_physmap xatp;
+
+    if ( shared_info == NULL )
+    {
+        /* Map shared-info page. */
+        xatp.domid = DOMID_SELF;
+        xatp.space = XENMAPSPACE_shared_info;
+        xatp.idx   = 0;
+        xatp.gpfn  = 0xfffff;
+        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
+            BUG();
+        shared_info = (struct shared_info *) (xatp.gpfn << 12);
+    }
+    return shared_info;
+}
+
 uint16_t get_cpu_mhz(void)
 {
-    struct xen_add_to_physmap xatp;
-    struct shared_info *shared_info = (struct shared_info *)0xfffff000;
+    struct shared_info *shared_info = get_shared_info();
     struct vcpu_time_info *info = &shared_info->vcpu_info[0].time;
     uint64_t cpu_khz;
     uint32_t tsc_to_nsec_mul, version;
@@ -599,14 +617,6 @@
     static uint16_t cpu_mhz;
     if ( cpu_mhz != 0 )
         return cpu_mhz;
-
-    /* Map shared-info page. */
-    xatp.domid = DOMID_SELF;
-    xatp.space = XENMAPSPACE_shared_info;
-    xatp.idx   = 0;
-    xatp.gpfn  = (unsigned long)shared_info >> 12;
-    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
-        BUG();
 
     /* Get a consistent snapshot of scale factor (multiplier and shift). */
     do {
diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h
--- a/tools/firmware/hvmloader/util.h	Thu Jul 15 18:18:16 2010 +0100
+++ b/tools/firmware/hvmloader/util.h	Tue Jul 20 11:34:06 2010 +0100
@@ -68,6 +68,9 @@
 #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t) val))
 #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2, (uint16_t)val))
 #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4, (uint32_t)val))
+
+/* Get a pointer to the shared-info page */
+struct shared_info *get_shared_info(void);
 
 /* Get CPU speed in MHz. */
 uint16_t get_cpu_mhz(void);
diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c
--- a/tools/firmware/hvmloader/xenbus.c	Thu Jul 15 18:18:16 2010 +0100
+++ b/tools/firmware/hvmloader/xenbus.c	Tue Jul 20 11:34:06 2010 +0100
@@ -53,14 +53,20 @@
            (unsigned long) rings, (unsigned long) event);
 }
 
-/* Reset the xenbus connection so the next kernel can start again. 
- * We zero out the whole ring -- the backend can handle this, and it's 
- * not going to surprise any frontends since it's equivalent to never 
- * having used the rings. */
+/* Reset the xenbus connection so the next kernel can start again. */
 void xenbus_shutdown(void)
 {
     ASSERT(rings != NULL);
+
+    /* We zero out the whole ring -- the backend can handle this, and it's 
+     * not going to surprise any frontends since it's equivalent to never 
+     * having used the rings. */
     memset(rings, 0, sizeof *rings);
+
+    /* Clear the xenbus event-channel too */
+    get_shared_info()->evtchn_pending[event / sizeof (unsigned long)]
+        &= ~(1UL << ((event % sizeof (unsigned long))));    
+
     rings = NULL;
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
  2010-07-20 10:38                   ` [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up) Tim Deegan
@ 2010-07-20 11:16                     ` Keir Fraser
  2010-07-22  9:01                       ` Zhang, Jianwu
  0 siblings, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2010-07-20 11:16 UTC (permalink / raw)
  To: Tim Deegan
  Cc: Zhang, Yang Z, Zhang, Jianwu, xen-devel@lists.xensource.com,
	Xu, Jiajun

Ah, this is because you poll the ring and never actually use let alone clear
the event channel rx port? This looks like a good fix, thanks.

 -- Keir

On 20/07/2010 11:38, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:

> Here's another patch that also unsticks CentOS 5.5 boot for me, and
> seems safer and saner (even if it turns out that the bug is somewhere
> else and I'm just perturbing the inputs to some more complex system...)
> 
> Cheers,
> 
> Tim.
> 
> hvmloader: clear the xenbus event-channel when we're done with it.
> Otherwise a later xenbus client that naively waits for the rising edge
> could get stuck.
> 
> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
> 
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c
> --- a/tools/firmware/hvmloader/util.c Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/util.c Tue Jul 20 11:34:06 2010 +0100
> @@ -587,10 +587,28 @@
>      return table;
>  }
>  
> +struct shared_info *get_shared_info(void)
> +{
> +    static struct shared_info *shared_info = NULL;
> +    struct xen_add_to_physmap xatp;
> +
> +    if ( shared_info == NULL )
> +    {
> +        /* Map shared-info page. */
> +        xatp.domid = DOMID_SELF;
> +        xatp.space = XENMAPSPACE_shared_info;
> +        xatp.idx   = 0;
> +        xatp.gpfn  = 0xfffff;
> +        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
> +            BUG();
> +        shared_info = (struct shared_info *) (xatp.gpfn << 12);
> +    }
> +    return shared_info;
> +}
> +
>  uint16_t get_cpu_mhz(void)
>  {
> -    struct xen_add_to_physmap xatp;
> -    struct shared_info *shared_info = (struct shared_info *)0xfffff000;
> +    struct shared_info *shared_info = get_shared_info();
>      struct vcpu_time_info *info = &shared_info->vcpu_info[0].time;
>      uint64_t cpu_khz;
>      uint32_t tsc_to_nsec_mul, version;
> @@ -599,14 +617,6 @@
>      static uint16_t cpu_mhz;
>      if ( cpu_mhz != 0 )
>          return cpu_mhz;
> -
> -    /* Map shared-info page. */
> -    xatp.domid = DOMID_SELF;
> -    xatp.space = XENMAPSPACE_shared_info;
> -    xatp.idx   = 0;
> -    xatp.gpfn  = (unsigned long)shared_info >> 12;
> -    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
> -        BUG();
>  
>      /* Get a consistent snapshot of scale factor (multiplier and shift). */
>      do {
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h
> --- a/tools/firmware/hvmloader/util.h Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/util.h Tue Jul 20 11:34:06 2010 +0100
> @@ -68,6 +68,9 @@
>  #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t) val))
>  #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2, (uint16_t)val))
>  #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4, (uint32_t)val))
> +
> +/* Get a pointer to the shared-info page */
> +struct shared_info *get_shared_info(void);
>  
>  /* Get CPU speed in MHz. */
>  uint16_t get_cpu_mhz(void);
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c
> --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/xenbus.c Tue Jul 20 11:34:06 2010 +0100
> @@ -53,14 +53,20 @@
>             (unsigned long) rings, (unsigned long) event);
>  }
>  
> -/* Reset the xenbus connection so the next kernel can start again.
> - * We zero out the whole ring -- the backend can handle this, and it's
> - * not going to surprise any frontends since it's equivalent to never
> - * having used the rings. */
> +/* Reset the xenbus connection so the next kernel can start again. */
>  void xenbus_shutdown(void)
>  {
>      ASSERT(rings != NULL);
> +
> +    /* We zero out the whole ring -- the backend can handle this, and it's
> +     * not going to surprise any frontends since it's equivalent to never
> +     * having used the rings. */
>      memset(rings, 0, sizeof *rings);
> +
> +    /* Clear the xenbus event-channel too */
> +    get_shared_info()->evtchn_pending[event / sizeof (unsigned long)]
> +        &= ~(1UL << ((event % sizeof (unsigned long))));
> +
>      rings = NULL;
>  }
>  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* RE: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
  2010-07-20 11:16                     ` Keir Fraser
@ 2010-07-22  9:01                       ` Zhang, Jianwu
  2010-07-22 13:20                         ` Keir Fraser
  0 siblings, 1 reply; 17+ messages in thread
From: Zhang, Jianwu @ 2010-07-22  9:01 UTC (permalink / raw)
  To: Keir Fraser, Tim Deegan
  Cc: Zhang, Yang Z, Xu, Jiajun, xen-devel@lists.xensource.com

Hi Tim
	When we set the VCPUS parameter more than 2 in guest configure file, We meet this issue again . We try it on cs21837, and the guest os is rhel5u3. 

Thanks 
Jianwu

-----Original Message-----
From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] 
Sent: Tuesday, July 20, 2010 7:17 PM
To: Tim Deegan
Cc: Zhang, Yang Z; xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
Subject: Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)

Ah, this is because you poll the ring and never actually use let alone clear
the event channel rx port? This looks like a good fix, thanks.

 -- Keir

On 20/07/2010 11:38, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:

> Here's another patch that also unsticks CentOS 5.5 boot for me, and
> seems safer and saner (even if it turns out that the bug is somewhere
> else and I'm just perturbing the inputs to some more complex system...)
> 
> Cheers,
> 
> Tim.
> 
> hvmloader: clear the xenbus event-channel when we're done with it.
> Otherwise a later xenbus client that naively waits for the rising edge
> could get stuck.
> 
> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
> 
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c
> --- a/tools/firmware/hvmloader/util.c Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/util.c Tue Jul 20 11:34:06 2010 +0100
> @@ -587,10 +587,28 @@
>      return table;
>  }
>  
> +struct shared_info *get_shared_info(void)
> +{
> +    static struct shared_info *shared_info = NULL;
> +    struct xen_add_to_physmap xatp;
> +
> +    if ( shared_info == NULL )
> +    {
> +        /* Map shared-info page. */
> +        xatp.domid = DOMID_SELF;
> +        xatp.space = XENMAPSPACE_shared_info;
> +        xatp.idx   = 0;
> +        xatp.gpfn  = 0xfffff;
> +        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
> +            BUG();
> +        shared_info = (struct shared_info *) (xatp.gpfn << 12);
> +    }
> +    return shared_info;
> +}
> +
>  uint16_t get_cpu_mhz(void)
>  {
> -    struct xen_add_to_physmap xatp;
> -    struct shared_info *shared_info = (struct shared_info *)0xfffff000;
> +    struct shared_info *shared_info = get_shared_info();
>      struct vcpu_time_info *info = &shared_info->vcpu_info[0].time;
>      uint64_t cpu_khz;
>      uint32_t tsc_to_nsec_mul, version;
> @@ -599,14 +617,6 @@
>      static uint16_t cpu_mhz;
>      if ( cpu_mhz != 0 )
>          return cpu_mhz;
> -
> -    /* Map shared-info page. */
> -    xatp.domid = DOMID_SELF;
> -    xatp.space = XENMAPSPACE_shared_info;
> -    xatp.idx   = 0;
> -    xatp.gpfn  = (unsigned long)shared_info >> 12;
> -    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
> -        BUG();
>  
>      /* Get a consistent snapshot of scale factor (multiplier and shift). */
>      do {
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h
> --- a/tools/firmware/hvmloader/util.h Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/util.h Tue Jul 20 11:34:06 2010 +0100
> @@ -68,6 +68,9 @@
>  #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t) val))
>  #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2, (uint16_t)val))
>  #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4, (uint32_t)val))
> +
> +/* Get a pointer to the shared-info page */
> +struct shared_info *get_shared_info(void);
>  
>  /* Get CPU speed in MHz. */
>  uint16_t get_cpu_mhz(void);
> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c
> --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100
> +++ b/tools/firmware/hvmloader/xenbus.c Tue Jul 20 11:34:06 2010 +0100
> @@ -53,14 +53,20 @@
>             (unsigned long) rings, (unsigned long) event);
>  }
>  
> -/* Reset the xenbus connection so the next kernel can start again.
> - * We zero out the whole ring -- the backend can handle this, and it's
> - * not going to surprise any frontends since it's equivalent to never
> - * having used the rings. */
> +/* Reset the xenbus connection so the next kernel can start again. */
>  void xenbus_shutdown(void)
>  {
>      ASSERT(rings != NULL);
> +
> +    /* We zero out the whole ring -- the backend can handle this, and it's
> +     * not going to surprise any frontends since it's equivalent to never
> +     * having used the rings. */
>      memset(rings, 0, sizeof *rings);
> +
> +    /* Clear the xenbus event-channel too */
> +    get_shared_info()->evtchn_pending[event / sizeof (unsigned long)]
> +        &= ~(1UL << ((event % sizeof (unsigned long))));
> +
>      rings = NULL;
>  }
>  

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
  2010-07-22  9:01                       ` Zhang, Jianwu
@ 2010-07-22 13:20                         ` Keir Fraser
  2010-07-22 13:41                           ` Tim Deegan
  0 siblings, 1 reply; 17+ messages in thread
From: Keir Fraser @ 2010-07-22 13:20 UTC (permalink / raw)
  To: Zhang, Jianwu, Tim Deegan
  Cc: Zhang, Yang Z, Xu, Jiajun, xen-devel@lists.xensource.com

I suggest to try zapping the entire shared-info page when hvmloader
finishes. There is nothing in there that is useful to keep across hvmloader
and guest OS; zapping will ensure that other flags with rising-edge
semantics such as per-vcpu evtchn selector words get reset; and doing
anything more than zeroing is pointless since e.g., the evtchn_mask array
offset and size is dependent on whether the guest OS is 32-bit or 64-bit. If
hvmloader were to set the mask to all 1s and then boot a 64-bit guest, the
rearranged shared_info would actually mean that hvmloader has set 1s in part
of the 64-bit extended evtchn_pending array!

Just a thought...

 -- Keir

On 22/07/2010 10:01, "Zhang, Jianwu" <jianwu.zhang@intel.com> wrote:

> Hi Tim
> When we set the VCPUS parameter more than 2 in guest configure file, We meet
> this issue again . We try it on cs21837, and the guest os is rhel5u3.
> 
> Thanks 
> Jianwu
> 
> -----Original Message-----
> From: Keir Fraser [mailto:keir.fraser@eu.citrix.com]
> Sent: Tuesday, July 20, 2010 7:17 PM
> To: Tim Deegan
> Cc: Zhang, Yang Z; xen-devel@lists.xensource.com; Zhang, Jianwu; Xu, Jiajun
> Subject: Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
> 
> Ah, this is because you poll the ring and never actually use let alone clear
> the event channel rx port? This looks like a good fix, thanks.
> 
>  -- Keir
> 
> On 20/07/2010 11:38, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:
> 
>> Here's another patch that also unsticks CentOS 5.5 boot for me, and
>> seems safer and saner (even if it turns out that the bug is somewhere
>> else and I'm just perturbing the inputs to some more complex system...)
>> 
>> Cheers,
>> 
>> Tim.
>> 
>> hvmloader: clear the xenbus event-channel when we're done with it.
>> Otherwise a later xenbus client that naively waits for the rising edge
>> could get stuck.
>> 
>> Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>
>> 
>> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.c
>> --- a/tools/firmware/hvmloader/util.c Thu Jul 15 18:18:16 2010 +0100
>> +++ b/tools/firmware/hvmloader/util.c Tue Jul 20 11:34:06 2010 +0100
>> @@ -587,10 +587,28 @@
>>      return table;
>>  }
>>  
>> +struct shared_info *get_shared_info(void)
>> +{
>> +    static struct shared_info *shared_info = NULL;
>> +    struct xen_add_to_physmap xatp;
>> +
>> +    if ( shared_info == NULL )
>> +    {
>> +        /* Map shared-info page. */
>> +        xatp.domid = DOMID_SELF;
>> +        xatp.space = XENMAPSPACE_shared_info;
>> +        xatp.idx   = 0;
>> +        xatp.gpfn  = 0xfffff;
>> +        if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
>> +            BUG();
>> +        shared_info = (struct shared_info *) (xatp.gpfn << 12);
>> +    }
>> +    return shared_info;
>> +}
>> +
>>  uint16_t get_cpu_mhz(void)
>>  {
>> -    struct xen_add_to_physmap xatp;
>> -    struct shared_info *shared_info = (struct shared_info *)0xfffff000;
>> +    struct shared_info *shared_info = get_shared_info();
>>      struct vcpu_time_info *info = &shared_info->vcpu_info[0].time;
>>      uint64_t cpu_khz;
>>      uint32_t tsc_to_nsec_mul, version;
>> @@ -599,14 +617,6 @@
>>      static uint16_t cpu_mhz;
>>      if ( cpu_mhz != 0 )
>>          return cpu_mhz;
>> -
>> -    /* Map shared-info page. */
>> -    xatp.domid = DOMID_SELF;
>> -    xatp.space = XENMAPSPACE_shared_info;
>> -    xatp.idx   = 0;
>> -    xatp.gpfn  = (unsigned long)shared_info >> 12;
>> -    if ( hypercall_memory_op(XENMEM_add_to_physmap, &xatp) != 0 )
>> -        BUG();
>>  
>>      /* Get a consistent snapshot of scale factor (multiplier and shift). */
>>      do {
>> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/util.h
>> --- a/tools/firmware/hvmloader/util.h Thu Jul 15 18:18:16 2010 +0100
>> +++ b/tools/firmware/hvmloader/util.h Tue Jul 20 11:34:06 2010 +0100
>> @@ -68,6 +68,9 @@
>>  #define pci_writeb(devfn, reg, val) (pci_write(devfn, reg, 1, (uint8_t)
>> val))
>>  #define pci_writew(devfn, reg, val) (pci_write(devfn, reg, 2,
>> (uint16_t)val))
>>  #define pci_writel(devfn, reg, val) (pci_write(devfn, reg, 4,
>> (uint32_t)val))
>> +
>> +/* Get a pointer to the shared-info page */
>> +struct shared_info *get_shared_info(void);
>>  
>>  /* Get CPU speed in MHz. */
>>  uint16_t get_cpu_mhz(void);
>> diff -r c4a83e3cc6b4 tools/firmware/hvmloader/xenbus.c
>> --- a/tools/firmware/hvmloader/xenbus.c Thu Jul 15 18:18:16 2010 +0100
>> +++ b/tools/firmware/hvmloader/xenbus.c Tue Jul 20 11:34:06 2010 +0100
>> @@ -53,14 +53,20 @@
>>             (unsigned long) rings, (unsigned long) event);
>>  }
>>  
>> -/* Reset the xenbus connection so the next kernel can start again.
>> - * We zero out the whole ring -- the backend can handle this, and it's
>> - * not going to surprise any frontends since it's equivalent to never
>> - * having used the rings. */
>> +/* Reset the xenbus connection so the next kernel can start again. */
>>  void xenbus_shutdown(void)
>>  {
>>      ASSERT(rings != NULL);
>> +
>> +    /* We zero out the whole ring -- the backend can handle this, and it's
>> +     * not going to surprise any frontends since it's equivalent to never
>> +     * having used the rings. */
>>      memset(rings, 0, sizeof *rings);
>> +
>> +    /* Clear the xenbus event-channel too */
>> +    get_shared_info()->evtchn_pending[event / sizeof (unsigned long)]
>> +        &= ~(1UL << ((event % sizeof (unsigned long))));
>> +
>>      rings = NULL;
>>  }
>>  
> 
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up)
  2010-07-22 13:20                         ` Keir Fraser
@ 2010-07-22 13:41                           ` Tim Deegan
  0 siblings, 0 replies; 17+ messages in thread
From: Tim Deegan @ 2010-07-22 13:41 UTC (permalink / raw)
  To: Keir Fraser
  Cc: Zhang, Yang Z, Zhang, Jianwu, xen-devel@lists.xensource.com,
	Xu, Jiajun

At 14:20 +0100 on 22 Jul (1279808457), Keir Fraser wrote:
> I suggest to try zapping the entire shared-info page when hvmloader
> finishes. There is nothing in there that is useful to keep across hvmloader
> and guest OS; zapping will ensure that other flags with rising-edge
> semantics such as per-vcpu evtchn selector words get reset; and doing
> anything more than zeroing is pointless since e.g., the evtchn_mask array
> offset and size is dependent on whether the guest OS is 32-bit or 64-bit. If
> hvmloader were to set the mask to all 1s and then boot a 64-bit guest, the
> rearranged shared_info would actually mean that hvmloader has set 1s in part
> of the 64-bit extended evtchn_pending array!

Good point.  That seems to do the trick.

hvmloader: clear the whole shared-info page when shutting down xenbus
since the contents might be in the wrong word-size for later users. 

Signed-off-by: Tim Deegan <Tim.Deegan@citrix.com>

diff -r e8dbc1262f52 tools/firmware/hvmloader/xenbus.c
--- a/tools/firmware/hvmloader/xenbus.c	Wed Jul 21 09:02:10 2010 +0100
+++ b/tools/firmware/hvmloader/xenbus.c	Thu Jul 22 14:39:28 2010 +0100
@@ -63,9 +63,8 @@ void xenbus_shutdown(void)
      * having used the rings. */
     memset(rings, 0, sizeof *rings);
 
-    /* Clear the xenbus event-channel too */
-    get_shared_info()->evtchn_pending[event / sizeof (unsigned long)]
-        &= ~(1UL << ((event % sizeof (unsigned long))));    
+    /* Clear the event-channel state too. */
+    memset(get_shared_info(), 0, PAGE_SIZE);
 
     rings = NULL;
 }

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2010-07-22 13:41 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-07-15  8:07 cs:21768 causes guest spend more time on boot up Zhang, Yang Z
2010-07-15  9:48 ` Tim Deegan
2010-07-15 12:22   ` Tim Deegan
2010-07-15 15:14     ` Zhang, Yang Z
2010-07-15 15:20       ` Tim Deegan
2010-07-15 15:30         ` Zhang, Yang Z
2010-07-15 15:33           ` Tim Deegan
2010-07-19 10:55             ` [PATCH] " Tim Deegan
2010-07-19 11:46               ` Keir Fraser
2010-07-19 12:19                 ` Tim Deegan
2010-07-19 12:24                   ` Keir Fraser
2010-07-19 12:30                     ` Keir Fraser
2010-07-20 10:38                   ` [PATCH] v2 (Re: cs:21768 causes guest spend more time on boot up) Tim Deegan
2010-07-20 11:16                     ` Keir Fraser
2010-07-22  9:01                       ` Zhang, Jianwu
2010-07-22 13:20                         ` Keir Fraser
2010-07-22 13:41                           ` Tim Deegan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).