Large system boot problems

All of lore.kernel.org
 help / color / mirror / Atom feed

From: Bill Burns <bburns@redhat.com>
To: xen-devel@lists.xensource.com
Cc: Bill Burns <bburns@redhat.com>,
	Ian Pratt <Ian.Pratt@eu.citrix.com>,
	"Carb, Brian A" <Brian.Carb@unisys.com>
Subject: Large system boot problems
Date: Fri, 08 Feb 2008 08:49:46 -0500	[thread overview]
Message-ID: <47AC5DFA.4070609@redhat.com> (raw)
In-Reply-To: <47A37CBE.4070009@redhat.com>

Here is some debug of the large memory / pmtimer issue.

(for background see [Xen-devel] Test results on Unisys ES7000 64x 256gb using
unstablec/s 16693 on 3.2.0 Release Candidate from Jan 9, 2008)

Symptom: A system with lots of CPUs and memory can fail
to boot up properly. Dom0 gets time going backwards
errors and effectively hangs during initialization.
The cause of dom0's init failure is due to it
using bogus values for CPU0's speed, while the other
CPUs have proper speed info.

Workarounds: Increasing the memory retained by the
Hypervisor by either a dom0_mem or a xenheap arg will delay
the start of dom0 enough (while that memory is scrubbed)
that the HV cpu speed calculation will self-correct.
Changing the timer used can also work (pit works for me)
but basically it's a race and I expect that
with the right hardware situation it could fail too.

Details: With either pmtimer or pit the initial calculation
done for CPU0 speed is bad (at least on a large system). If the
dom0 starts quickly enough that it reads the bad CPU speed
data from the Hypervisor shared area before the Hypervisor
corrects it, dom0 is in trouble.

Debug details:
When the Xen boot has sized memory, detected and
booted all the CPUs, and gets to the point of

	(XEN) ENABLING IO-APIC IRQs

init_percpu_time gets called for the CPU0 and the
cpu_time values recorded are:

	(XEN) dump_cpu_time cpu0 addr ffff828c801ca520
	(XEN) local_tsc_stamp 1691332805
	(XEN) stime_master_stamp 0
	(XEN) stime_local_stamp 0
	(XEN) Platform timer overflows in 234 jiffies.
	(XEN) Platform timer is 3.579MHz ACPI PM Timer

Then domain 0 is loaded, and local_time_calibration for CPU0
gets called and actually does something. The "out count" below
indicates that it was called 315 times and due to

    if ( ((s64)stime_elapsed64 < (EPOCH / 2)) )

effectively did nothing on those calls.

The result of the calculations in local_time_calibration
with the huge difference in the tsc value screws up pretty
badly:

	(XEN) local_time_calibration error factor cpu0 is 0x80000000. out count 315
	(XEN) PRE0: tsc=1691332805 stime=0 master=0
	(XEN) CUR0: tsc=33466953185 stime=9345455787 master=4641208868 -> -4704246919
	(XEN) calibration_mul_frac 4ac8a18d tsc_shift -2

The bogus values here are then used by dom0 to incorrectly determine
the frequency of CPU0, while all other CPUs have correct values.

	Xen reported: 13692.820 MHz processor.

For the HV, this self corrects, as the next time local_time_calibration
gets called the data in cpu_time is properly set. But the damage
has been done and dom0 struggles to make progress and reports
time going backwards, etc.

The reason that limiting the memory given to dom0 fixes
the problem is that the loop that scrubs the memory
that the HV is keeping (scrub_heap_pages) periodically
calls process_pending_timers and if there is enough memory
there, then the correction happens before dom0 starts.

This recalls a comment from a vendor a few months ago
where they said you needed to add a xenheap arg to make
large memory work.

When doing clocksource=pit a similar thing happens where
the initial calc is bad, but it gets fixed before dom0
gets going (debug from PIT):

	(XEN) dump_cpu_time cpu0 addr ffff828c801ca520
	(XEN) local_tsc_stamp 226384274
	(XEN) stime_master_stamp 0
	(XEN) stime_local_stamp 0
	(XEN) Platform timer overflows in 2 jiffies.
	(XEN) Platform timer is 1.193MHz PIT

there are no "goto out's" taken, the next call to local_time_calibration
does the bad calculation

	(XEN) Scrubbing Free RAM: .local_time_calibration error factor cpu0 is
0x80000000. out count 0
	(XEN) PRE0: tsc=226384274 stime=0 master=0
	(XEN) CUR0: tsc=35424564759 stime=10351900878 master=1052517641 -> -9299383237
	(XEN) calibration_mul_frac 7a7b2a1a tsc_shift -5

next call to local_time_calibration fixes it..

	(XEN) calibration_mul_frac 969714d2 tsc_shift -1

and dom0 get the right stuff

	Xen reported: 3399.956 MHz processor.

Looking for ideas or suggestions on how to solve this issue.
Ideally we'd be able to prevent the bogus calculation in the
first place.

 Bill

next prev parent reply	other threads:[~2008-02-08 13:49 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-01-10  2:15 Test results on Unisys ES7000 64x 256gb using unstable c/s 16693 on 3.2.0 Release Candidate Carb, Brian A
2008-01-15 13:50 ` Bill Burns
2008-01-15 14:44   ` Keir Fraser
2008-01-15 16:15     ` Bill Burns
2008-01-15 16:29       ` Keir Fraser
2008-01-16 15:45         ` Bill Burns
2008-01-17 14:10           ` Test results on Unisys ES7000 64x 256gb using unstablec/s " Ian Pratt
2008-01-18 13:03             ` Bill Burns
2008-01-24 17:23               ` Bill Burns
2008-01-25 13:06                 ` Bill Burns
2008-01-28 14:02                   ` Bill Burns
2008-01-28 14:08                     ` Keir Fraser
2008-01-28 20:38                       ` Test results on Unisys ES7000 64x 256gb usingunstablec/s " Carb, Brian A
2008-01-28 21:12                         ` Bill Burns
2008-01-29  8:44                         ` Test results on Unisys ES7000 64x 256gbusingunstablec/s " Jan Beulich
2008-01-30 16:20                       ` Test results on Unisys ES7000 64x 256gb using unstablec/s " Bill Burns
2008-01-30 16:45                         ` Keir Fraser
2008-01-31 18:12                           ` Bill Burns
2008-02-01  8:36                             ` Keir Fraser
2008-02-01 12:40                               ` Bill Burns
2008-02-01 20:10                                 ` Bill Burns
2008-02-08 13:49                                   ` Bill Burns [this message]
2008-02-08 14:04                                     ` Large system boot problems Keir Fraser
2008-02-08 15:10                                       ` Bill Burns
2008-02-08 15:14                                         ` Keir Fraser
2008-02-08 15:22                                           ` Bill Burns
2008-02-08 15:45                                             ` Keir Fraser
2008-02-12 16:34                                               ` Bill Burns
2008-02-12 16:54                                                 ` Keir Fraser

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=47AC5DFA.4070609@redhat.com \
    --to=bburns@redhat.com \
    --cc=Brian.Carb@unisys.com \
    --cc=Ian.Pratt@eu.citrix.com \
    --cc=xen-devel@lists.xensource.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.