* Re: Re: PoD issue
2010-02-04 8:17 ` Jan Beulich
@ 2010-02-04 19:12 ` George Dunlap
2010-02-19 0:03 ` Keith Coleman
0 siblings, 1 reply; 9+ messages in thread
From: George Dunlap @ 2010-02-04 19:12 UTC (permalink / raw)
To: Jan Beulich; +Cc: xen-devel@lists.xensource.com
Yeah, the OSS tree doesn't get the kind of regression testing it
really needs at the moment. I was using the OSS balloon drivers when
I implemented and submitted the PoD code last year. I didn't have any
trouble then, and I was definitely using up all of the memory. But I
haven't done any testing on OSS since then, basically.
-George
On Thu, Feb 4, 2010 at 12:17 AM, Jan Beulich <JBeulich@novell.com> wrote:
> It was in the balloon driver's interaction with xenstore - see 2.6.18 c/s
> 989.
>
> I have to admit that I cannot see how this issue could slip attention
> when the PoD code was introduced - any guest with PoD in use and
> an unfixed balloon driver is set to crash sooner or later (implying the
> unfortunate effect of requiring an update of the pv drivers in HVM
> guests when upgrading Xen from a PoD-incapable to a PoD-capable
> version).
>
> Jan
>
>>>> George Dunlap <george.dunlap@eu.citrix.com> 03.02.10 19:42 >>>
> So did you track down where the math error is? Do we have a plan to fix
> this going forward?
> -George
>
> Jan Beulich wrote:
>>>>> George Dunlap 01/29/10 7:30 PM >>>
>>>>>
>>> PoD is not critical to balloon out guest memory. You can boot with mem
>>> == maxmem and then balloon down afterwards just as you could before,
>>> without involving PoD. (Or at least, you should be able to; if you
>>> can't then it's a bug.) It's just that with PoD you can do something
>>> you've always wanted to do but never knew it: boot with 1GiB with the
>>> option of expanding up to 2GiB later. :-)
>>>
>>
>> Oh, no, that's not what I meant. What I really wanted to say is that
>> with PoD, a properly functioning balloon driver in the guest is crucial
>> for it to stay alive long enough.
>>
>>
>>> With the 54 megabyte difference: It's not like a GiB vs GB thing, is
>>> it? (i.e., 2^30 vs 10^9?) The difference between 1GiB (2^30) and 1 GB
>>> (10^9) is about 74 megs, or 18,000 pages.
>>>
>>
>> No, that's not the problem. As I understand it now, the problem is
>> that totalram_pages (which the balloon driver bases its calculations
>> on) reflects all memory available after all bootmem allocations were
>> done (i.e. includes neither the static kernel image nor any memory
>> allocated before or from the bootmem allocator).
>>
>>
>>> I guess that is a weakness of PoD in general: we can't control the guest
>>> balloon driver, but we rely on it to have the same model of how to
>>> translate "target" into # pages in the balloon as the PoD code.
>>>
>>
>> I think this isn't a weakness of PoD, but a design issue in the balloon
>> driver's xenstore interface: While a target value shown in or obtained
>> from the /proc and /sys interfaces naturally can be based on (and
>> reflect) any internal kernel state, the xenstore interface should only
>> use numbers in terms of full memory amount given to the guest.
>> Hence a target value read from the memory/target node should be
>> adjusted before put in relation to totalram_pages. And I think this
>> is a general misconception in the current implementation (i.e. it
>> should be corrected not only for the HVM case, but for the pv one
>> as well).
>>
>> The bad aspect of this is that it will require a fixed balloon driver
>> in any HVM guest that has maxmem>mem when the underlying Xen
>> gets updated to a version that supports PoD. I cannot, however,
>> see an OS and OS-version independent alternative (i.e. something
>> to be done in the PoD code or the tools).
>>
>> Jan
>>
>>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: PoD issue
2010-02-04 19:12 ` George Dunlap
@ 2010-02-19 0:03 ` Keith Coleman
2010-02-19 6:53 ` Ian Pratt
2010-02-19 8:19 ` Jan Beulich
0 siblings, 2 replies; 9+ messages in thread
From: Keith Coleman @ 2010-02-19 0:03 UTC (permalink / raw)
To: George Dunlap; +Cc: xen-devel@lists.xensource.com, Keir Fraser, Jan Beulich
On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap
<George.Dunlap@eu.citrix.com> wrote:
> Yeah, the OSS tree doesn't get the kind of regression testing it
> really needs at the moment. I was using the OSS balloon drivers when
> I implemented and submitted the PoD code last year. I didn't have any
> trouble then, and I was definitely using up all of the memory. But I
> haven't done any testing on OSS since then, basically.
>
Is it expected that booting HVM guests with maxmem > memory is
unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily
crash the guest and occasionally the entire server.
Keith Coleman
^ permalink raw reply [flat|nested] 9+ messages in thread
* RE: Re: PoD issue
2010-02-19 0:03 ` Keith Coleman
@ 2010-02-19 6:53 ` Ian Pratt
2010-02-19 21:28 ` Keith Coleman
2010-02-19 8:19 ` Jan Beulich
1 sibling, 1 reply; 9+ messages in thread
From: Ian Pratt @ 2010-02-19 6:53 UTC (permalink / raw)
To: Keith Coleman, George Dunlap
Cc: Ian, Jan Beulich, Pratt, xen-devel@lists.xensource.com, Fraser
> On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap
> <George.Dunlap@eu.citrix.com> wrote:
> > Yeah, the OSS tree doesn't get the kind of regression testing it
> > really needs at the moment. I was using the OSS balloon drivers when
> > I implemented and submitted the PoD code last year. I didn't have any
> > trouble then, and I was definitely using up all of the memory. But I
> > haven't done any testing on OSS since then, basically.
> >
>
> Is it expected that booting HVM guests with maxmem > memory is
> unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily
> crash the guest and occasionally the entire server.
Obviously the platform should never crash, and that's very concerning.
Are you running a balloon driver in the guest? It's essential that you do, because it needs to get in fairly early in the guest boot and allocate the difference between maxmem and target memory. The populate-on-demand code exists just to cope with things like the memory scrubber running ahead of the balloon driver. If you're not running a balloon driver the guest is doomed to crash as soon as it tries using more than target memory.
All of this requires coordination between the tool stack, PoD code, and PV drivers so that sufficient memory gets ballooned out. I expect the combination that has had most testing is the XCP toolstack and Citrix PV windows drivers.
Ian
You need to be running a balloon driver in the guest: it needs
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: PoD issue
2010-02-19 0:03 ` Keith Coleman
2010-02-19 6:53 ` Ian Pratt
@ 2010-02-19 8:19 ` Jan Beulich
2010-06-04 15:03 ` Pasi Kärkkäinen
1 sibling, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2010-02-19 8:19 UTC (permalink / raw)
To: Keith Coleman; +Cc: George Dunlap, xen-devel@lists.xensource.com, Keir Fraser
>>> Keith Coleman <list.keith@scaltro.com> 19.02.10 01:03 >>>
>On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap
><George.Dunlap@eu.citrix.com> wrote:
>> Yeah, the OSS tree doesn't get the kind of regression testing it
>> really needs at the moment. I was using the OSS balloon drivers when
>> I implemented and submitted the PoD code last year. I didn't have any
>> trouble then, and I was definitely using up all of the memory. But I
>> haven't done any testing on OSS since then, basically.
>>
>
>Is it expected that booting HVM guests with maxmem > memory is
>unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily
>crash the guest and occasionally the entire server.
Crashing the guest is expected if the guest doesn't have a fixed
balloon driver (i.e. the mentioned c/s would need to be in the
sources the pv drivers for the guest were built from).
Crashing the host is certainly unacceptable - please provide logs
thereof.
Jan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: PoD issue
2010-02-19 6:53 ` Ian Pratt
@ 2010-02-19 21:28 ` Keith Coleman
0 siblings, 0 replies; 9+ messages in thread
From: Keith Coleman @ 2010-02-19 21:28 UTC (permalink / raw)
To: Ian Pratt
Cc: George Dunlap, xen-devel@lists.xensource.com, Keir Fraser,
Jan Beulich
On Fri, Feb 19, 2010 at 1:53 AM, Ian Pratt <Ian.Pratt@eu.citrix.com> wrote:
>> On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap
>> <George.Dunlap@eu.citrix.com> wrote:
>> > Yeah, the OSS tree doesn't get the kind of regression testing it
>> > really needs at the moment. I was using the OSS balloon drivers when
>> > I implemented and submitted the PoD code last year. I didn't have any
>> > trouble then, and I was definitely using up all of the memory. But I
>> > haven't done any testing on OSS since then, basically.
>> >
>>
>> Is it expected that booting HVM guests with maxmem > memory is
>> unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily
>> crash the guest and occasionally the entire server.
>
> Obviously the platform should never crash, and that's very concerning.
>
> Are you running a balloon driver in the guest? It's essential that you do, because it needs to get in fairly early in the guest boot and allocate the difference between maxmem and target memory. The populate-on-demand code exists just to cope with things like the memory scrubber running ahead of the balloon driver. If you're not running a balloon driver the guest is doomed to crash as soon as it tries using more than target memory.
>
> All of this requires coordination between the tool stack, PoD code, and PV drivers so that sufficient memory gets ballooned out. I expect the combination that has had most testing is the XCP toolstack and Citrix PV windows drivers.
>
Initially I was using the XCP 0.1.1 WinPV drivers (win server 2003
sp2) and the guest crashed when I tried to install software via
emulated cdrom. Nothing about the crash was reported in the qemu log
file and xend.log wasn't very helpful either but here's the relevant
portion:
[2010-02-17 20:42:49 4253] DEBUG (DevController:139) Waiting for devices vtpm.
[2010-02-17 20:42:49 4253] INFO (XendDomain:1182) Domain win2 (30) unpaused.
[2010-02-17 20:48:05 4253] WARNING (XendDomainInfo:1888) Domain has
crashed: name=win2 id=30.
[2010-02-17 20:48:06 4253] DEBUG (XendDomainInfo:2734)
XendDomainInfo.destroy: domid=30
[2010-02-17 20:48:06 4253] DEBUG (XendDomainInfo:2209) Destroying device model
I unsuccessfully attempted the install several more times then tried
copying files from the emulated cd which also crashed the guest each
time. I wasn't even thinking about the fact that I had set maxmem/pod
so I blamed the xcp winpv drivers and switched to gplpv (0.10.0.138).
Same crashes with gplpv. At this point I hadn't checked 'xm dmesg'
which was the only place that the pod/p2m error is reported so I
changed to pure HVM mode and tried to copy the files from emulated cd.
That's when the real trouble started.
The rdp and vnc connections to the guest froze as did the ssh to the
dom0. This server was also hosting 7 linux pv guests. I could ping the
guests and partially load some of their websites but couldn't login
via ssh. I suspeced that the HDDs were overloaded causing disk io to
block the guests. I was on site so I went to check server and was
shocked to find no disk activity. The monitor output was blank and I
couldnt wake it up. Maybe the usb keyboard was unable to be enumerated
because I couldnt even toggle the numlock, etc after several
reconnections.
I power cycled the host and checked the logs but there was no evidence
of a crash other than one of the software raid devices being unclean
on startup. Perhaps there was interesting data logged to 'xm dmesg' or
waiting to be written to disk at the time of the crash. I'm afraid
this server/mb is incapable of logging data to the serial port. I've
attempted to do so several times both before and after this crash.
Of course the simple fix is to remove maxmem from the domU config file
for the time being. Eventually people will use pod on production
systems. Relying on the guest to have a solid balloon driver is
unacceptable. A guest could accidentally (or otherwise) remove the pv
drivers to bring down an entire host.
When I can free up a server with serial logging for testing I will try
to reproduce this crash.
Keith Coleman
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: PoD issue
2010-02-19 8:19 ` Jan Beulich
@ 2010-06-04 15:03 ` Pasi Kärkkäinen
0 siblings, 0 replies; 9+ messages in thread
From: Pasi Kärkkäinen @ 2010-06-04 15:03 UTC (permalink / raw)
To: Jan Beulich
Cc: George Dunlap, xen-devel@lists.xensource.com, Keir Fraser,
Keith Coleman
On Fri, Feb 19, 2010 at 08:19:15AM +0000, Jan Beulich wrote:
> >>> Keith Coleman <list.keith@scaltro.com> 19.02.10 01:03 >>>
> >On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap
> ><George.Dunlap@eu.citrix.com> wrote:
> >> Yeah, the OSS tree doesn't get the kind of regression testing it
> >> really needs at the moment. I was using the OSS balloon drivers when
> >> I implemented and submitted the PoD code last year. I didn't have any
> >> trouble then, and I was definitely using up all of the memory. But I
> >> haven't done any testing on OSS since then, basically.
> >>
> >
> >Is it expected that booting HVM guests with maxmem > memory is
> >unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily
> >crash the guest and occasionally the entire server.
>
> Crashing the guest is expected if the guest doesn't have a fixed
> balloon driver (i.e. the mentioned c/s would need to be in the
> sources the pv drivers for the guest were built from).
>
> Crashing the host is certainly unacceptable - please provide logs
> thereof.
>
Was this resolved? Someone was complaining recently that maxmem != memory
crashes his Xen host..
-- Pasi
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: PoD issue
@ 2010-06-05 16:15 Jan Beulich
2010-06-07 9:28 ` George Dunlap
0 siblings, 1 reply; 9+ messages in thread
From: Jan Beulich @ 2010-06-05 16:15 UTC (permalink / raw)
To: pasik; +Cc: George.Dunlap, xen-devel, Keir.Fraser, list.keith
>>> Pasi Kärkkäinen 06/04/10 5:03 PM >>>
>On Fri, Feb 19, 2010 at 08:19:15AM +0000, Jan Beulich wrote:
>> >>> Keith Coleman 19.02.10 01:03 >>>
>> >On Thu, Feb 4, 2010 at 2:12 PM, George Dunlap
>> > wrote:
>> >> Yeah, the OSS tree doesn't get the kind of regression testing it
>> >> really needs at the moment. I was using the OSS balloon drivers when
>> >> I implemented and submitted the PoD code last year. I didn't have any
>> >> trouble then, and I was definitely using up all of the memory. But I
>> >> haven't done any testing on OSS since then, basically.
>> >>
>> >
>> >Is it expected that booting HVM guests with maxmem > memory is
>> >unstable? In testing 3.4.3-rc2 (kernel 2.6.18 c/s 993) I can easily
>> >crash the guest and occasionally the entire server.
>>
>> Crashing the guest is expected if the guest doesn't have a fixed
>> balloon driver (i.e. the mentioned c/s would need to be in the
>> sources the pv drivers for the guest were built from).
>>
>> Crashing the host is certainly unacceptable - please provide logs
>> thereof.
>>
>
>Was this resolved? Someone was complaining recently that maxmem != memory
>crashes his Xen host..
I don 't recall ever having seen logs of a host crash of this sort,
so if this ever was the case and no-one else fixed it, I would
believe it still to be an issue.
Jan
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: PoD issue
2010-06-05 16:15 Re: PoD issue Jan Beulich
@ 2010-06-07 9:28 ` George Dunlap
2010-06-07 9:51 ` Pasi Kärkkäinen
0 siblings, 1 reply; 9+ messages in thread
From: George Dunlap @ 2010-06-07 9:28 UTC (permalink / raw)
To: Jan Beulich
Cc: list.keith@scaltro.com, xen-devel@lists.xensource.com,
Keir Fraser
Jan Beulich wrote:
>> Was this resolved? Someone was complaining recently that maxmem != memory
>> crashes his Xen host..
>>
>
> I don 't recall ever having seen logs of a host crash of this sort,
> so if this ever was the case and no-one else fixed it, I would
> believe it still to be an issue.
>
>
There have been a number of fixes to the PoD code, so it's possible that
it has been fixed. I'll see if our testing team has time to add "Boot
memory < maxmem w/o balloon driver" to our testing matrix and see if we
can get a host crash.
-George
^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: Re: PoD issue
2010-06-07 9:28 ` George Dunlap
@ 2010-06-07 9:51 ` Pasi Kärkkäinen
0 siblings, 0 replies; 9+ messages in thread
From: Pasi Kärkkäinen @ 2010-06-07 9:51 UTC (permalink / raw)
To: George Dunlap
Cc: list.keith@scaltro.com, xen-devel@lists.xensource.com,
Keir Fraser
On Mon, Jun 07, 2010 at 10:28:11AM +0100, George Dunlap wrote:
> Jan Beulich wrote:
>>> Was this resolved? Someone was complaining recently that maxmem != memory
>>> crashes his Xen host..
>>>
>>
>> I don 't recall ever having seen logs of a host crash of this sort,
>> so if this ever was the case and no-one else fixed it, I would
>> believe it still to be an issue.
>>
>>
> There have been a number of fixes to the PoD code, so it's possible that
> it has been fixed. I'll see if our testing team has time to add "Boot
> memory < maxmem w/o balloon driver" to our testing matrix and see if we
> can get a host crash.
>
Ok, good. There has been manu queries/problems about PoD lately,
so it's good to get that tested.
-- Pasi
^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2010-06-07 9:51 UTC | newest]
Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-06-05 16:15 Re: PoD issue Jan Beulich
2010-06-07 9:28 ` George Dunlap
2010-06-07 9:51 ` Pasi Kärkkäinen
-- strict thread matches above, loose matches on Subject: below --
2010-01-31 17:48 Jan Beulich
2010-02-03 18:42 ` George Dunlap
2010-02-04 8:17 ` Jan Beulich
2010-02-04 19:12 ` George Dunlap
2010-02-19 0:03 ` Keith Coleman
2010-02-19 6:53 ` Ian Pratt
2010-02-19 21:28 ` Keith Coleman
2010-02-19 8:19 ` Jan Beulich
2010-06-04 15:03 ` Pasi Kärkkäinen
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).