From: Bill Burns <bburns@redhat.com>
To: Keir Fraser <Keir.Fraser@cl.cam.ac.uk>
Cc: Ian Pratt <Ian.Pratt@eu.citrix.com>,
xen-devel@lists.xensource.com, "Carb,
Brian A" <Brian.Carb@unisys.com>
Subject: Re: Test results on Unisys ES7000 64x 256gb using unstablec/s 16693 on 3.2.0 Release Candidate
Date: Wed, 30 Jan 2008 11:20:40 -0500 [thread overview]
Message-ID: <47A0A3D8.4080103@redhat.com> (raw)
In-Reply-To: <C3C39267.1B7C8%Keir.Fraser@cl.cam.ac.uk>
Keir Fraser wrote:
> On 28/1/08 14:02, "Bill Burns" <bburns@redhat.com> wrote:
>
>> Ok, some progress. Background is that 3.1.2 (and 3.1.3 at least
>> as it was a wek or two ago) fails to boot on a 64 CPU es7000 with
>> over 112GB of memory. This is with both HV & dom0 being x86_64.
>> The symptom is that the dom0 kernel gets time went backwards
>> error during init.
>>
>> The patch at which this first fails is 15137, which is the patch
>> that introduces using the ACPI PM timer as the clock
>> source. If I include the next patch (that allows for clock
>> selection) and choose pit as clock source the system boots
>> fine. Without the arg the ACPI timer is used and I get the hang.
>
> The obvious question then is what happens to the ACPI PM timer when dom0
> gets more than 112GB of memory. Perhaps it's worth adding some tracing to
> Xen and see whether e.g., the platform timer stops running?
>
> -- Keir
>
>
I enabled the printk in local_time_calibration in Xen's time.c
and added a similar one to init_cpu_khz in time-xen.c in the
dom0 kernel.
The hypervisor outputs many line like:
(XEN) ---10: 00000000 9697086f -1
where the key value is always 969xxxxx...
Until we get to scrubing free ram:
(XEN) Initrd len 0x894600, start at 0xffffffff80702000
(XEN) Scrubbing Free RAM: ---0: 80000000 498c0b61 -2
(XEN) .done.
The bogus 498c0b61 value is seen by the dom0 kernel and is
used to improperly calculate the CPU speed. A further printk
in the dom0's get_time_values_from_xen shows that all the CPUs
except the first have good values, leading right into the
first time went backwards message...
ACPI: Core revision 20060707
get_time_values_from_xen tsc_to_nsec_mul 498c0b61 ver 2
Initializing CPU#1
get_time_values_from_xen tsc_to_nsec_mul 969703ce ver 2
Initializing CPU#2
get_time_values_from_xen tsc_to_nsec_mul 9697099e ver 2
Initializing CPU#3
get_time_values_from_xen tsc_to_nsec_mul 9697068c ver 2
Initializing CPU#4
get_time_values_from_xen tsc_to_nsec_mul 96970374 ver 2
Initializing CPU#5
get_time_values_from_xen tsc_to_nsec_mul 96970a55 ver 2
Initializing CPU#6
get_time_values_from_xen tsc_to_nsec_mul 96970a7c ver 2
Initializing CPU#7
get_time_values_from_xen tsc_to_nsec_mul 96970952 ver 2
Initializing CPU#8
get_time_values_from_xen tsc_to_nsec_mul 969708f3 ver 2
Initializing CPU#9
get_time_values_from_xen tsc_to_nsec_mul 969708f8 ver 2
Initializing CPU#10
get_time_values_from_xen tsc_to_nsec_mul 96970a55 ver 2
Initializing CPU#11
get_time_values_from_xen tsc_to_nsec_mul 969706f6 ver 2
Initializing CPU#12
get_time_values_from_xen tsc_to_nsec_mul 96970bdc ver 2
Initializing CPU#13
get_time_values_from_xen tsc_to_nsec_mul 9697069b ver 2
Initializing CPU#14
get_time_values_from_xen tsc_to_nsec_mul 96970997 ver 2
Initializing CPU#15
get_time_values_from_xen tsc_to_nsec_mul 969707c5 ver 2
Initializing CPU#16
get_time_values_from_xen tsc_to_nsec_mul 969707ff ver 2
Initializing CPU#17
get_time_values_from_xen tsc_to_nsec_mul 969707aa ver 2
Initializing CPU#18
get_time_values_from_xen tsc_to_nsec_mul 9697062a ver 2
Initializing CPU#19
get_time_values_from_xen tsc_to_nsec_mul 969707d7 ver 2
Initializing CPU#20
get_time_values_from_xen tsc_to_nsec_mul 969709be ver 2
Initializing CPU#21
get_time_values_from_xen tsc_to_nsec_mul 9697096f ver 2
Initializing CPU#22
get_time_values_from_xen tsc_to_nsec_mul 96970902 ver 2
Initializing CPU#23
get_time_values_from_xen tsc_to_nsec_mul 969709a8 ver 2
Initializing CPU#24
get_time_values_from_xen tsc_to_nsec_mul 96970778 ver 2
Initializing CPU#25
get_time_values_from_xen tsc_to_nsec_mul 969705ad ver 2
Initializing CPU#26
get_time_values_from_xen tsc_to_nsec_mul 96970b44 ver 2
Initializing CPU#27
get_time_values_from_xen tsc_to_nsec_mul 96970974 ver 2
Initializing CPU#28
get_time_values_from_xen tsc_to_nsec_mul 96970bb4 ver 2
Initializing CPU#29
get_time_values_from_xen tsc_to_nsec_mul 969708c1 ver 2
Initializing CPU#30
get_time_values_from_xen tsc_to_nsec_mul 96970c23 ver 2
Brought up 32 CPUs
Initializing CPU#31
get_time_values_from_xen tsc_to_nsec_mul 9697085b ver 2
get_time_values_from_xen tsc_to_nsec_mul 969703ce ver 2
get_time_values_from_xen tsc_to_nsec_mul 498c0b61 ver 2
Timer ISR/0: Time went backwards: delta=-35017583219 delta_cpu=10416781
shadow=9160708347 off=11318417225 processed=55496708347 cpu_processed=20468708347
Note that this Hypervisor was only built for 32 CPUs,
not all 64.
So the problem seems to occur in the HV itself when it
tries to scrub the free memory. Funny that when it has
lots to scrub, when dom0 is restricted to less memory,
there is no issue. But then there is little to scrub,
but a lot of memory for dom0, things go wrong.
Bill
next prev parent reply other threads:[~2008-01-30 16:20 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-01-10 2:15 Test results on Unisys ES7000 64x 256gb using unstable c/s 16693 on 3.2.0 Release Candidate Carb, Brian A
2008-01-15 13:50 ` Bill Burns
2008-01-15 14:44 ` Keir Fraser
2008-01-15 16:15 ` Bill Burns
2008-01-15 16:29 ` Keir Fraser
2008-01-16 15:45 ` Bill Burns
2008-01-17 14:10 ` Test results on Unisys ES7000 64x 256gb using unstablec/s " Ian Pratt
2008-01-18 13:03 ` Bill Burns
2008-01-24 17:23 ` Bill Burns
2008-01-25 13:06 ` Bill Burns
2008-01-28 14:02 ` Bill Burns
2008-01-28 14:08 ` Keir Fraser
2008-01-28 20:38 ` Test results on Unisys ES7000 64x 256gb usingunstablec/s " Carb, Brian A
2008-01-28 21:12 ` Bill Burns
2008-01-29 8:44 ` Test results on Unisys ES7000 64x 256gbusingunstablec/s " Jan Beulich
2008-01-30 16:20 ` Bill Burns [this message]
2008-01-30 16:45 ` Test results on Unisys ES7000 64x 256gb using unstablec/s " Keir Fraser
2008-01-31 18:12 ` Bill Burns
2008-02-01 8:36 ` Keir Fraser
2008-02-01 12:40 ` Bill Burns
2008-02-01 20:10 ` Bill Burns
2008-02-08 13:49 ` Large system boot problems Bill Burns
2008-02-08 14:04 ` Keir Fraser
2008-02-08 15:10 ` Bill Burns
2008-02-08 15:14 ` Keir Fraser
2008-02-08 15:22 ` Bill Burns
2008-02-08 15:45 ` Keir Fraser
2008-02-12 16:34 ` Bill Burns
2008-02-12 16:54 ` Keir Fraser
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=47A0A3D8.4080103@redhat.com \
--to=bburns@redhat.com \
--cc=Brian.Carb@unisys.com \
--cc=Ian.Pratt@eu.citrix.com \
--cc=Keir.Fraser@cl.cam.ac.uk \
--cc=xen-devel@lists.xensource.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.