* [timer/ticks related] dom0 hang during boot on large 1TB system
@ 2009-12-18 4:36 Mukesh Rathor
2009-12-18 7:02 ` Keir Fraser
0 siblings, 1 reply; 51+ messages in thread
From: Mukesh Rathor @ 2009-12-18 4:36 UTC (permalink / raw)
To: Xen-devel@lists.xensource.com; +Cc: Kurt Hackel, Dan Magenheimer
[-- Attachment #1: Type: text/plain, Size: 1414 bytes --]
Hi,
I finally solved a hang on a 1TB box during our dom0 boot on xen 3.4.0,
that I'd been working on. The hang comes from:
calibrate_delay_direct():
....
for (i = 0; i < MAX_DIRECT_CALIBRATION_RETRIES; i++) {
pre_start = 0;
start_jiffies = jiffies;
while (jiffies <= (start_jiffies + tick_divider)) {
pre_start = start;
read_current_timer(&start);
}
read_current_timer(&post_start);
...
start_jiffies is set to : INITIAL_JIFFIES == 0xfffedb08
now, timer interrupt comes in and finding delta to be rather
huge (thanks to the page scrubbing of 1TB in xen), makes jiffies
wrap around. This causes hang in the loop, that would resolve after
say several days.
delta: 940b7d68a4, jiffies:00009f8b
I came up with fix (is there a reason it doesn't use 64bit values?) :
while (jiffies <= (start_jiffies + tick_divider)) {
pre_start = start;
read_current_timer(&start);
+ if (jiffies < start_jiffies) /* jiffies wrapped */
+ start_jiffies = jiffies;
}
The other fix I thought of was to change INITIAL_JIFFIES to something
sooner.
Would appreciate any help, I don't understand xen time management well.
thanks,
Mukesh
PS: I'm attaching output of 'xm debug-key t'.
[-- Attachment #2: skew.out --]
[-- Type: application/octet-stream, Size: 16359 bytes --]
(XEN) Synced cycles skew: max=128755 avg=128755 samples=1 current=128755
(XEN) Synced stime skew: max=63373ns avg=61935ns samples=2 current=63373ns
(XEN) Synced cycles skew: max=129906 avg=129330 samples=2 current=129906
(XEN) Synced stime skew: max=63373ns avg=60583ns samples=3 current=57881ns
(XEN) Synced cycles skew: max=133764 avg=130808 samples=3 current=133764
(XEN) Synced stime skew: max=63373ns avg=60431ns samples=4 current=59976ns
(XEN) Synced cycles skew: max=133764 avg=130397 samples=4 current=129165
(XEN) Synced stime skew: max=64212ns avg=61187ns samples=5 current=64212ns
(XEN) Synced cycles skew: max=133764 avg=129990 samples=5 current=128363
(XEN) Synced stime skew: max=77171ns avg=63851ns samples=6 current=77171ns
(XEN) Synced cycles skew: max=133764 avg=129874 samples=6 current=129293
(XEN) Synced stime skew: max=80539ns avg=66235ns samples=7 current=80539ns
(XEN) Synced cycles skew: max=133764 avg=129796 samples=7 current=129329
(XEN) Synced stime skew: max=86713ns avg=68795ns samples=8 current=86713ns
(XEN) Synced cycles skew: max=133764 avg=128902 samples=8 current=122645
(XEN) Synced stime skew: max=86713ns avg=67529ns samples=9 current=57401ns
(XEN) Synced cycles skew: max=133764 avg=128793 samples=9 current=127920
(XEN) Synced stime skew: max=86713ns avg=68076ns samples=10 current=73000ns
(XEN) Synced cycles skew: max=133764 avg=128779 samples=10 current=128656
(XEN) Synced stime skew: max=86713ns avg=67276ns samples=11 current=59282ns
(XEN) Synced cycles skew: max=133764 avg=128856 samples=11 current=129622
(XEN) Synced stime skew: max=86713ns avg=67577ns samples=12 current=70889ns
(XEN) Synced cycles skew: max=133764 avg=128868 samples=12 current=129006
(XEN) Synced stime skew: max=86713ns avg=67438ns samples=13 current=65768ns
(XEN) Synced cycles skew: max=133764 avg=128891 samples=13 current=129166
(XEN) Synced stime skew: max=86713ns avg=68475ns samples=14 current=81951ns
(XEN) Synced cycles skew: max=133764 avg=128910 samples=14 current=129159
(XEN) Synced stime skew: max=86713ns avg=68557ns samples=15 current=69716ns
(XEN) Synced cycles skew: max=140030 avg=129651 samples=15 current=140030
(XEN) Synced stime skew: max=86713ns avg=68187ns samples=16 current=62623ns
(XEN) Synced cycles skew: max=140030 avg=130028 samples=16 current=135669
(XEN) Synced stime skew: max=86713ns avg=67589ns samples=17 current=58036ns
(XEN) Synced cycles skew: max=140030 avg=130010 samples=17 current=129726
(XEN) Synced stime skew: max=86713ns avg=67628ns samples=18 current=68276ns
(XEN) Synced cycles skew: max=140702 avg=130604 samples=18 current=140702
(XEN) Synced stime skew: max=86713ns avg=67436ns samples=19 current=63995ns
(XEN) Synced cycles skew: max=140702 avg=130851 samples=19 current=135296
(XEN) Synced stime skew: max=86713ns avg=66869ns samples=20 current=56100ns
(XEN) Synced cycles skew: max=140702 avg=130728 samples=20 current=128388
(XEN) Synced stime skew: max=86713ns avg=67756ns samples=21 current=85495ns
(XEN) Synced cycles skew: max=141906 avg=131260 samples=21 current=141906
(XEN) Synced stime skew: max=86713ns avg=68000ns samples=22 current=73117ns
(XEN) Synced cycles skew: max=141906 avg=131701 samples=22 current=140977
(XEN) Synced stime skew: max=86713ns avg=66884ns samples=23 current=42339ns
(XEN) Synced cycles skew: max=141906 avg=131575 samples=23 current=128793
(XEN) Synced stime skew: max=86713ns avg=66076ns samples=24 current=47488ns
(XEN) Synced cycles skew: max=141906 avg=131472 samples=24 current=129115
(XEN) Synced stime skew: max=86713ns avg=65573ns samples=25 current=53510ns
(XEN) Synced cycles skew: max=141906 avg=131363 samples=25 current=128730
(XEN) Synced stime skew: max=86713ns avg=65448ns samples=26 current=62315ns
(XEN) Synced cycles skew: max=141906 avg=131535 samples=26 current=135841
(XEN) Synced stime skew: max=86713ns avg=66054ns samples=27 current=81795ns
(XEN) Synced cycles skew: max=141906 avg=131689 samples=27 current=135698
(XEN) Synced stime skew: max=89642ns avg=66896ns samples=28 current=89642ns
(XEN) Synced cycles skew: max=141906 avg=131823 samples=28 current=135425
(XEN) Synced stime skew: max=114074ns avg=68523ns samples=29 current=114074ns
(XEN) Synced cycles skew: max=141906 avg=131940 samples=29 current=135243
(XEN) Synced stime skew: max=114074ns avg=68299ns samples=30 current=61804ns
(XEN) Synced cycles skew: max=141906 avg=131864 samples=30 current=129658
(XEN) Synced stime skew: max=114074ns avg=67987ns samples=31 current=58635ns
(XEN) Synced cycles skew: max=141906 avg=131777 samples=31 current=129171
(XEN) Synced stime skew: max=114074ns avg=67455ns samples=32 current=50977ns
(XEN) Synced cycles skew: max=141906 avg=131705 samples=32 current=129449
(XEN) Synced stime skew: max=114074ns avg=67072ns samples=33 current=54801ns
(XEN) Synced cycles skew: max=141906 avg=131814 samples=33 current=135323
(XEN) Synced stime skew: max=114074ns avg=67185ns samples=34 current=70930ns
(XEN) Synced cycles skew: max=147335 avg=132271 samples=34 current=147335
(XEN) Synced stime skew: max=114074ns avg=67510ns samples=35 current=78543ns
(XEN) Synced cycles skew: max=147335 avg=132369 samples=35 current=135705
(XEN) Synced stime skew: max=114074ns avg=67665ns samples=36 current=73100ns
(XEN) Synced cycles skew: max=147335 avg=132436 samples=36 current=134786
(XEN) Synced stime skew: max=114074ns avg=66970ns samples=37 current=41961ns
(XEN) Synced cycles skew: max=147335 avg=132326 samples=37 current=128380
(XEN) Synced stime skew: max=114074ns avg=67081ns samples=38 current=71167ns
(XEN) Synced cycles skew: max=147335 avg=132433 samples=38 current=136364
(XEN) Synced stime skew: max=114074ns avg=66538ns samples=39 current=45891ns
(XEN) Synced cycles skew: max=147335 avg=132376 samples=39 current=130216
(XEN) Synced stime skew: max=114074ns avg=66474ns samples=40 current=63994ns
(XEN) Synced cycles skew: max=147335 avg=132464 samples=40 current=135885
(XEN) Synced stime skew: max=114074ns avg=66308ns samples=41 current=59661ns
(XEN) Synced cycles skew: max=147335 avg=132522 samples=41 current=134858
(XEN) Synced stime skew: max=114074ns avg=66824ns samples=42 current=87998ns
(XEN) Synced cycles skew: max=147335 avg=132719 samples=42 current=140812
(XEN) Synced stime skew: max=114074ns avg=67367ns samples=43 current=90148ns
(XEN) Synced cycles skew: max=147335 avg=132625 samples=43 current=128661
(XEN) Synced stime skew: max=114074ns avg=67279ns samples=44 current=63503ns
(XEN) Synced cycles skew: max=147335 avg=132533 samples=44 current=128561
(XEN) Synced stime skew: max=114074ns avg=67156ns samples=45 current=61761ns
(XEN) Synced cycles skew: max=147335 avg=132447 samples=45 current=128703
(XEN) Synced stime skew: max=114074ns avg=67049ns samples=46 current=62220ns
(XEN) Synced cycles skew: max=147335 avg=132532 samples=46 current=136336
(XEN) Synced stime skew: max=114074ns avg=66937ns samples=47 current=61780ns
(XEN) Synced cycles skew: max=147335 avg=132458 samples=47 current=129036
(XEN) Synced stime skew: max=114074ns avg=66972ns samples=48 current=68615ns
(XEN) Synced cycles skew: max=147335 avg=132399 samples=48 current=129646
(XEN) Synced stime skew: max=114074ns avg=67036ns samples=49 current=70121ns
(XEN) Synced cycles skew: max=147335 avg=132618 samples=49 current=143138
(XEN) Synced stime skew: max=114074ns avg=67235ns samples=50 current=76987ns
(XEN) Synced cycles skew: max=147335 avg=132554 samples=50 current=129434
(XEN) Synced stime skew: max=114074ns avg=67088ns samples=51 current=59725ns
(XEN) Synced cycles skew: max=147335 avg=132502 samples=51 current=129876
(XEN) Synced stime skew: max=114074ns avg=66717ns samples=52 current=47798ns
(XEN) Synced cycles skew: max=147335 avg=132458 samples=52 current=130224
(XEN) Synced stime skew: max=114074ns avg=66424ns samples=53 current=51182ns
(XEN) Synced cycles skew: max=147335 avg=132391 samples=53 current=128909
(XEN) Synced stime skew: max=114074ns avg=66357ns samples=54 current=62804ns
(XEN) Synced cycles skew: max=147335 avg=132544 samples=54 current=140644
(XEN) Synced stime skew: max=114074ns avg=66269ns samples=55 current=61534ns
(XEN) Synced cycles skew: max=147335 avg=132460 samples=55 current=127940
(XEN) Synced stime skew: max=114074ns avg=66123ns samples=56 current=58106ns
(XEN) Synced cycles skew: max=147335 avg=132387 samples=56 current=128369
(XEN) Synced stime skew: max=114074ns avg=66275ns samples=57 current=74757ns
(XEN) Synced cycles skew: max=147335 avg=132333 samples=57 current=129325
(XEN) Synced stime skew: max=114074ns avg=66295ns samples=58 current=67448ns
(XEN) Synced cycles skew: max=147335 avg=132288 samples=58 current=129711
(XEN) Synced stime skew: max=114074ns avg=66062ns samples=59 current=52570ns
(XEN) Synced cycles skew: max=147335 avg=132360 samples=59 current=136536
(XEN) Synced stime skew: max=114074ns avg=65885ns samples=60 current=55437ns
(XEN) Synced cycles skew: max=147335 avg=132320 samples=60 current=129941
(XEN) Synced stime skew: max=114074ns avg=65619ns samples=61 current=49685ns
(XEN) Synced cycles skew: max=147335 avg=132374 samples=61 current=135631
(XEN) Synced stime skew: max=114074ns avg=65465ns samples=62 current=56050ns
(XEN) Synced cycles skew: max=147335 avg=132302 samples=62 current=127925
(XEN) Synced stime skew: max=114074ns avg=65952ns samples=63 current=96125ns
(XEN) Synced cycles skew: max=147335 avg=132378 samples=63 current=137096
(XEN) Synced stime skew: max=114074ns avg=65938ns samples=64 current=65091ns
(XEN) Synced cycles skew: max=147335 avg=132336 samples=64 current=129665
(XEN) Synced stime skew: max=114074ns avg=66090ns samples=65 current=75796ns
(XEN) Synced cycles skew: max=147335 avg=132391 samples=65 current=135939
(XEN) Synced stime skew: max=114074ns avg=65911ns samples=66 current=54300ns
(XEN) Synced cycles skew: max=147335 avg=132321 samples=66 current=127769
(XEN) Synced stime skew: max=114074ns avg=65943ns samples=67 current=68057ns
(XEN) Synced cycles skew: max=147335 avg=132461 samples=67 current=141686
(XEN) Synced stime skew: max=114074ns avg=66035ns samples=68 current=72154ns
(XEN) Synced cycles skew: max=148713 avg=132700 samples=68 current=148713
(XEN) Synced stime skew: max=114074ns avg=65906ns samples=69 current=57182ns
(XEN) Synced cycles skew: max=148713 avg=132655 samples=69 current=129605
(XEN) Synced stime skew: max=114074ns avg=66096ns samples=70 current=79181ns
(XEN) Synced cycles skew: max=148713 avg=132600 samples=70 current=128797
(XEN) Synced stime skew: max=114074ns avg=66121ns samples=71 current=67886ns
(XEN) Synced cycles skew: max=148713 avg=132452 samples=71 current=122099
(XEN) Synced stime skew: max=114074ns avg=66355ns samples=72 current=82956ns
(XEN) Synced cycles skew: max=148713 avg=132486 samples=72 current=134861
(XEN) Synced stime skew: max=114074ns avg=66301ns samples=73 current=62392ns
(XEN) Synced cycles skew: max=148713 avg=132448 samples=73 current=129702
(XEN) Synced stime skew: max=114074ns avg=66148ns samples=74 current=54990ns
(XEN) Synced cycles skew: max=148713 avg=132396 samples=74 current=128615
(XEN) Synced stime skew: max=114074ns avg=65986ns samples=75 current=54038ns
(XEN) Synced cycles skew: max=148713 avg=132365 samples=75 current=130113
(XEN) Synced stime skew: max=114074ns avg=66055ns samples=76 current=71208ns
(XEN) Synced cycles skew: max=148713 avg=132322 samples=76 current=129063
(XEN) Synced stime skew: max=114074ns avg=66353ns samples=77 current=89023ns
(XEN) Synced cycles skew: max=148713 avg=132374 samples=77 current=136354
(XEN) Synced stime skew: max=114074ns avg=66410ns samples=78 current=70776ns
(XEN) Synced cycles skew: max=148713 avg=132525 samples=78 current=144157
(XEN) Synced stime skew: max=114074ns avg=66671ns samples=79 current=87005ns
(XEN) Synced cycles skew: max=148713 avg=132549 samples=79 current=134402
(XEN) Synced stime skew: max=114074ns avg=66643ns samples=80 current=64455ns
(XEN) Synced cycles skew: max=148713 avg=132594 samples=80 current=136110
(XEN) Synced stime skew: max=114074ns avg=66541ns samples=81 current=58358ns
(XEN) Synced cycles skew: max=148713 avg=132682 samples=81 current=139775
(XEN) Synced stime skew: max=114074ns avg=66389ns samples=82 current=54126ns
(XEN) Synced cycles skew: max=148713 avg=132722 samples=82 current=135982
(XEN) Synced stime skew: max=114074ns avg=66359ns samples=83 current=63867ns
(XEN) Synced cycles skew: max=148713 avg=132679 samples=83 current=129097
(XEN) Synced stime skew: max=114074ns avg=66396ns samples=84 current=69508ns
(XEN) Synced cycles skew: max=148713 avg=132623 samples=84 current=128036
(XEN) Synced stime skew: max=114074ns avg=66701ns samples=85 current=92294ns
(XEN) Synced cycles skew: max=148713 avg=132700 samples=85 current=139133
(XEN) Synced stime skew: max=114074ns avg=67039ns samples=86 current=95791ns
(XEN) Synced cycles skew: max=148713 avg=132666 samples=86 current=129790
(XEN) Synced stime skew: max=114074ns avg=66894ns samples=87 current=54424ns
(XEN) Synced cycles skew: max=148713 avg=132722 samples=87 current=137499
(XEN) Synced stime skew: max=114074ns avg=66797ns samples=88 current=58290ns
(XEN) Synced cycles skew: max=148713 avg=132756 samples=88 current=135705
(XEN) Synced stime skew: max=114074ns avg=66711ns samples=89 current=59225ns
(XEN) Synced cycles skew: max=148713 avg=132773 samples=89 current=134259
(XEN) Synced stime skew: max=114074ns avg=66694ns samples=90 current=65127ns
(XEN) Synced cycles skew: max=148713 avg=132945 samples=90 current=148336
(XEN) Synced stime skew: max=114074ns avg=66703ns samples=91 current=67514ns
(XEN) Synced cycles skew: max=148713 avg=133027 samples=91 current=140348
(XEN) Synced stime skew: max=114074ns avg=66860ns samples=92 current=81205ns
(XEN) Synced cycles skew: max=148713 avg=132980 samples=92 current=128738
(XEN) Synced stime skew: max=114074ns avg=67012ns samples=93 current=80923ns
(XEN) Synced cycles skew: max=148713 avg=133064 samples=93 current=140746
(XEN) Synced stime skew: max=114074ns avg=66937ns samples=94 current=59997ns
(XEN) Synced cycles skew: max=148713 avg=133017 samples=94 current=128660
(XEN) Synced stime skew: max=114074ns avg=66893ns samples=95 current=62789ns
(XEN) Synced cycles skew: max=148713 avg=133148 samples=95 current=145500
(XEN) Synced stime skew: max=114074ns avg=66788ns samples=96 current=56783ns
(XEN) Synced cycles skew: max=148713 avg=133169 samples=96 current=135154
(XEN) Synced stime skew: max=114074ns avg=66842ns samples=97 current=72061ns
(XEN) Synced cycles skew: max=148713 avg=133311 samples=97 current=146910
(XEN) Synced stime skew: max=114074ns avg=66872ns samples=98 current=69710ns
(XEN) Synced cycles skew: max=148713 avg=133417 samples=98 current=143715
(XEN) Synced stime skew: max=114074ns avg=67128ns samples=99 current=92284ns
(XEN) Synced cycles skew: max=148713 avg=133452 samples=99 current=136916
(XEN) Synced stime skew: max=114074ns avg=67447ns samples=100 current=98950ns
(XEN) Synced cycles skew: max=148713 avg=133445 samples=100 current=132776
(XEN) Synced stime skew: max=114074ns avg=67331ns samples=101 current=55763ns
(XEN) Synced cycles skew: max=148713 avg=133451 samples=101 current=133971
(XEN) Synced stime skew: max=114074ns avg=67234ns samples=102 current=57429ns
(XEN) Synced cycles skew: max=148713 avg=133413 samples=102 current=129592
(XEN) Synced stime skew: max=114074ns avg=67117ns samples=103 current=55250ns
(XEN) Synced cycles skew: max=148713 avg=133437 samples=103 current=135877
(XEN) Synced stime skew: max=114074ns avg=67104ns samples=104 current=65668ns
(XEN) Synced cycles skew: max=148713 avg=133391 samples=104 current=128627
(XEN) Synced stime skew: max=114074ns avg=67143ns samples=105 current=71277ns
(XEN) Synced cycles skew: max=149119 avg=133540 samples=105 current=149119
(XEN) Synced stime skew: max=114074ns avg=67119ns samples=106 current=64521ns
(XEN) Synced cycles skew: max=149119 avg=133654 samples=106 current=145602
(XEN) Synced stime skew: max=114074ns avg=67119ns samples=107 current=67195ns
(XEN) Synced cycles skew: max=149119 avg=133606 samples=107 current=128547
(XEN) Synced stime skew: max=114074ns avg=67200ns samples=108 current=75854ns
(XEN) Synced cycles skew: max=149119 avg=133678 samples=108 current=141399
(XEN) Synced stime skew: max=114074ns avg=67132ns samples=109 current=59729ns
(XEN) Synced cycles skew: max=149119 avg=133750 samples=109 current=141474
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 51+ messages in thread* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-18 4:36 [timer/ticks related] dom0 hang during boot on large 1TB system Mukesh Rathor @ 2009-12-18 7:02 ` Keir Fraser 2009-12-18 8:42 ` Jan Beulich ` (2 more replies) 0 siblings, 3 replies; 51+ messages in thread From: Keir Fraser @ 2009-12-18 7:02 UTC (permalink / raw) To: Mukesh Rathor, Xen-devel@lists.xensource.com; +Cc: Kurt Hackel, Dan Magenheimer On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > The other fix I thought of was to change INITIAL_JIFFIES to something > sooner. > > Would appreciate any help, I don't understand xen time management well. This isn't really Xen time code, but unchanged Linux time code. I don't know which tree you quoted the code from -- 2.6.18 has similar but not identical. Anyway, I suggest try using the jiffy-comparison macros from <linux/jiffies.h>: time_before(), time_after(), etc. These are designed to work even when jiffies wraps. Feel free to send patch(es) for that, if you test that out and it works okay. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-18 7:02 ` Keir Fraser @ 2009-12-18 8:42 ` Jan Beulich 2009-12-18 9:13 ` Keir Fraser 2009-12-18 19:25 ` Mukesh Rathor 2009-12-19 4:43 ` Mukesh Rathor 2 siblings, 1 reply; 51+ messages in thread From: Jan Beulich @ 2009-12-18 8:42 UTC (permalink / raw) To: Keir Fraser; +Cc: Kurt Hackel, Dan Magenheimer, Xen-devel@lists.xensource.com >>> Keir Fraser <keir.fraser@eu.citrix.com> 18.12.09 08:02 >>> >On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > >> The other fix I thought of was to change INITIAL_JIFFIES to something >> sooner. >> >> Would appreciate any help, I don't understand xen time management well. > >This isn't really Xen time code, but unchanged Linux time code. I don't know >which tree you quoted the code from -- 2.6.18 has similar but not identical. >Anyway, I suggest try using the jiffy-comparison macros from ><linux/jiffies.h>: time_before(), time_after(), etc. These are designed to >work even when jiffies wraps. Feel free to send patch(es) for that, if you >test that out and it works okay. But regardless of that - shouldn't the page scrubbing really be a background operation these days, and as such be (relatively) performance neutral to the booting of Dom0? Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-18 8:42 ` Jan Beulich @ 2009-12-18 9:13 ` Keir Fraser 2009-12-18 16:35 ` Dan Magenheimer 2009-12-18 19:28 ` Mukesh Rathor 0 siblings, 2 replies; 51+ messages in thread From: Keir Fraser @ 2009-12-18 9:13 UTC (permalink / raw) To: Jan Beulich; +Cc: Kurt Hackel, Dan Magenheimer, Xen-devel@lists.xensource.com On 18/12/2009 08:42, "Jan Beulich" <JBeulich@novell.com> wrote: >> This isn't really Xen time code, but unchanged Linux time code. I don't know >> which tree you quoted the code from -- 2.6.18 has similar but not identical. >> Anyway, I suggest try using the jiffy-comparison macros from >> <linux/jiffies.h>: time_before(), time_after(), etc. These are designed to >> work even when jiffies wraps. Feel free to send patch(es) for that, if you >> test that out and it works okay. > > But regardless of that - shouldn't the page scrubbing really be a > background operation these days, and as such be (relatively) > performance neutral to the booting of Dom0? We synchronously scrub free memory before starting dom0, and then subsequently scrub memory only for dying domains. So I don't know what scrubbing would be going on during dom0's boot-time calibrations, on any version of Xen, actually. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-18 9:13 ` Keir Fraser @ 2009-12-18 16:35 ` Dan Magenheimer 2009-12-18 17:15 ` Keir Fraser 2009-12-18 19:28 ` Mukesh Rathor 1 sibling, 1 reply; 51+ messages in thread From: Dan Magenheimer @ 2009-12-18 16:35 UTC (permalink / raw) To: Keir Fraser, Jan Beulich; +Cc: kurt.hackel, Xen-devel > So I don't know what > scrubbing would be going on during dom0's boot-time > calibrations, on any > version of Xen, actually. Wasn't the async page scrubbing removed post 3.4.0? (I think Mukesh's bug was seen on 3.4.0.) I see c/s 19886 in July 2009 is "Remove page-scrub lists and async scrubbing"... if that patch were not applied, would Mukesh's observed bug make more sense? Thanks, Dan > -----Original Message----- > From: Keir Fraser [mailto:keir.fraser@eu.citrix.com] > Sent: Friday, December 18, 2009 2:14 AM > To: Jan Beulich > Cc: Xen-devel@lists.xensource.com; Dan Magenheimer; Kurt > Hackel; Mukesh > Rathor > Subject: Re: [Xen-devel] [timer/ticks related] dom0 hang > during boot on > large 1TB system > > > On 18/12/2009 08:42, "Jan Beulich" <JBeulich@novell.com> wrote: > > >> This isn't really Xen time code, but unchanged Linux time > code. I don't know > >> which tree you quoted the code from -- 2.6.18 has similar > but not identical. > >> Anyway, I suggest try using the jiffy-comparison macros from > >> <linux/jiffies.h>: time_before(), time_after(), etc. These > are designed to > >> work even when jiffies wraps. Feel free to send patch(es) > for that, if you > >> test that out and it works okay. > > > > But regardless of that - shouldn't the page scrubbing really be a > > background operation these days, and as such be (relatively) > > performance neutral to the booting of Dom0? > > We synchronously scrub free memory before starting dom0, and then > subsequently scrub memory only for dying domains. So I don't know what > scrubbing would be going on during dom0's boot-time > calibrations, on any > version of Xen, actually. > > -- Keir > > > ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-18 16:35 ` Dan Magenheimer @ 2009-12-18 17:15 ` Keir Fraser 0 siblings, 0 replies; 51+ messages in thread From: Keir Fraser @ 2009-12-18 17:15 UTC (permalink / raw) To: Dan Magenheimer, Jan Beulich Cc: kurt.hackel@oracle.com, Xen-devel@lists.xensource.com On 18/12/2009 16:35, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote: >> So I don't know what >> scrubbing would be going on during dom0's boot-time >> calibrations, on any >> version of Xen, actually. > > Wasn't the async page scrubbing removed post 3.4.0? > (I think Mukesh's bug was seen on 3.4.0.) I see > c/s 19886 in July 2009 is "Remove page-scrub lists > and async scrubbing"... if that patch were not > applied, would Mukesh's observed bug make more sense? Async page scrubbing was for scrubbing pages of dying domains. No domains are dying while dom0 is still booting. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-18 9:13 ` Keir Fraser 2009-12-18 16:35 ` Dan Magenheimer @ 2009-12-18 19:28 ` Mukesh Rathor 1 sibling, 0 replies; 51+ messages in thread From: Mukesh Rathor @ 2009-12-18 19:28 UTC (permalink / raw) To: Keir Fraser Cc: Kurt Hackel, Dan Magenheimer, Xen-devel@lists.xensource.com, Jan Beulich On Fri, 18 Dec 2009 09:13:32 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 18/12/2009 08:42, "Jan Beulich" <JBeulich@novell.com> wrote: > > >> This isn't really Xen time code, but unchanged Linux time code. I > >> don't know which tree you quoted the code from -- 2.6.18 has > >> similar but not identical. Anyway, I suggest try using the > >> jiffy-comparison macros from <linux/jiffies.h>: time_before(), > >> time_after(), etc. These are designed to work even when jiffies > >> wraps. Feel free to send patch(es) for that, if you test that out > >> and it works okay. > > > > But regardless of that - shouldn't the page scrubbing really be a > > background operation these days, and as such be (relatively) > > performance neutral to the booting of Dom0? > > We synchronously scrub free memory before starting dom0, and then > subsequently scrub memory only for dying domains. So I don't know what > scrubbing would be going on during dom0's boot-time calibrations, on > any version of Xen, actually. > > -- Keir > Scrubbing has nothing to do with the bug. It's just that the timing is just right to expose the bug. The system boots fine with lesser memory. Since hyp does: create dom0, page scrub, unpause dom0. It appears with large scrubbing, this gets delta in dom0 timer_interrupt() to be large enough that jiffies wraps. Hope that makes sense. thanks, Mukesh ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-18 7:02 ` Keir Fraser 2009-12-18 8:42 ` Jan Beulich @ 2009-12-18 19:25 ` Mukesh Rathor 2009-12-19 4:43 ` Mukesh Rathor 2 siblings, 0 replies; 51+ messages in thread From: Mukesh Rathor @ 2009-12-18 19:25 UTC (permalink / raw) To: Keir Fraser; +Cc: Hackel, Kurt, Xen-devel@lists.xensource.com, Dan Magenheimer On Fri, 18 Dec 2009 07:02:55 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > > > The other fix I thought of was to change INITIAL_JIFFIES to > > something sooner. > > > > Would appreciate any help, I don't understand xen time management > > well. > > This isn't really Xen time code, but unchanged Linux time code. I > don't know which tree you quoted the code from -- 2.6.18 has similar > but not identical. Anyway, I suggest try using the jiffy-comparison > macros from <linux/jiffies.h>: time_before(), time_after(), etc. > These are designed to work even when jiffies wraps. Feel free to send > patch(es) for that, if you test that out and it works okay. > > -- Keir > It's from the unstable version 2.6.18 tree from http://xenbits.xensource.com/linux-2.6.18-xen.hg file init/calibrate.c, function calibrate_delay_direct(). I see the code exactly the same as I mentioned. Anyways, I'm testing out the patch, trying to reproduce and make sure fix works. thanks, Mukesh ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-18 7:02 ` Keir Fraser 2009-12-18 8:42 ` Jan Beulich 2009-12-18 19:25 ` Mukesh Rathor @ 2009-12-19 4:43 ` Mukesh Rathor 2009-12-21 9:55 ` Jan Beulich ` (2 more replies) 2 siblings, 3 replies; 51+ messages in thread From: Mukesh Rathor @ 2009-12-19 4:43 UTC (permalink / raw) To: Keir Fraser Cc: Hackel, Kurt, Xen-devel@lists.xensource.com, Dan Magenheimer, jeremy [-- Attachment #1: Type: text/plain, Size: 2966 bytes --] On Fri, 18 Dec 2009 07:02:55 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > > > The other fix I thought of was to change INITIAL_JIFFIES to > > something sooner. > > > > Would appreciate any help, I don't understand xen time management > > well. > > This isn't really Xen time code, but unchanged Linux time code. I > don't know which tree you quoted the code from -- 2.6.18 has similar > but not identical. Anyway, I suggest try using the jiffy-comparison > macros from <linux/jiffies.h>: time_before(), time_after(), etc. > These are designed to work even when jiffies wraps. Feel free to send > patch(es) for that, if you test that out and it works okay. > > -- Keir > Ok, I came up with the following patch. Jeremy, can you please take a look also, and comment on my fix since I noticed you've got the same issue in your tree. Here's a summary for your benefit: init/calibrate.c : calibrate_delay_direct(): start_jiffies = get_jiffies_64(); while (get_jiffies_64() <= (start_jiffies + tick_divider)) { pre_start = start; read_current_timer(&start); } if first ever timer interrupt comes after start_jiffies is set, dom0 boot may hang if delta in timer_interrupt() is so huge that it causes jiffies to wrap. It appears delta is very large when memory is more than 512GB on certain boxes causing wrap around. why is delta in dom0->timer_interrupt() related to memory on system? Because hyp creates dom0, then page scrubs, then unpauses vcpu. so it appears lot of page scurbbing results in huge delta on first tick. thanks, Mukesh Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> diff --git a/init/calibrate.c b/init/calibrate.c index 06066a6..14f62c8 100644 --- a/init/calibrate.c +++ b/init/calibrate.c @@ -32,7 +32,7 @@ static unsigned long __devinit calibrate_delay_direct(void) { unsigned long pre_start, start, post_start; unsigned long pre_end, end, post_end; - unsigned long start_jiffies; + u64 start_jiffies; unsigned long tsc_rate_min, tsc_rate_max; unsigned long good_tsc_sum = 0; unsigned long good_tsc_count = 0; @@ -64,8 +64,8 @@ static unsigned long __devinit calibrate_delay_direct(void) for (i = 0; i < MAX_DIRECT_CALIBRATION_RETRIES; i++) { pre_start = 0; read_current_timer(&start); - start_jiffies = jiffies; - while (jiffies <= (start_jiffies + tick_divider)) { + start_jiffies = get_jiffies_64(); + while (get_jiffies_64() <= (start_jiffies + tick_divider)) { pre_start = start; read_current_timer(&start); } @@ -73,7 +73,7 @@ static unsigned long __devinit calibrate_delay_direct(void) pre_end = 0; end = post_start; - while (jiffies <= + while (get_jiffies_64() <= (start_jiffies + tick_divider * (1 + delay_calibration_ticks))) { pre_end = end; read_current_timer(&end); [-- Attachment #2: diff.out --] [-- Type: application/octet-stream, Size: 1236 bytes --] Signed-off-by: Mukesh Rathor <mukesh.rathor@oracle.com> diff --git a/init/calibrate.c b/init/calibrate.c index 06066a6..14f62c8 100644 --- a/init/calibrate.c +++ b/init/calibrate.c @@ -32,7 +32,7 @@ static unsigned long __devinit calibrate_delay_direct(void) { unsigned long pre_start, start, post_start; unsigned long pre_end, end, post_end; - unsigned long start_jiffies; + u64 start_jiffies; unsigned long tsc_rate_min, tsc_rate_max; unsigned long good_tsc_sum = 0; unsigned long good_tsc_count = 0; @@ -64,8 +64,8 @@ static unsigned long __devinit calibrate_delay_direct(void) for (i = 0; i < MAX_DIRECT_CALIBRATION_RETRIES; i++) { pre_start = 0; read_current_timer(&start); - start_jiffies = jiffies; - while (jiffies <= (start_jiffies + tick_divider)) { + start_jiffies = get_jiffies_64(); + while (get_jiffies_64() <= (start_jiffies + tick_divider)) { pre_start = start; read_current_timer(&start); } @@ -73,7 +73,7 @@ static unsigned long __devinit calibrate_delay_direct(void) pre_end = 0; end = post_start; - while (jiffies <= + while (get_jiffies_64() <= (start_jiffies + tick_divider * (1 + delay_calibration_ticks))) { pre_end = end; read_current_timer(&end); [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply related [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-19 4:43 ` Mukesh Rathor @ 2009-12-21 9:55 ` Jan Beulich 2009-12-21 18:20 ` Dan Magenheimer 2009-12-21 10:44 ` Keir Fraser 2009-12-21 19:17 ` Steve Ofsthun 2 siblings, 1 reply; 51+ messages in thread From: Jan Beulich @ 2009-12-21 9:55 UTC (permalink / raw) To: Mukesh Rathor Cc: jeremy, Xen-devel@lists.xensource.com, Hackel, Dan Magenheimer, Keir Fraser, Kurt >>> Mukesh Rathor <mukesh.rathor@oracle.com> 19.12.09 05:43 >>> >if first ever timer interrupt comes after start_jiffies is set, dom0 boot >may hang if delta in timer_interrupt() is so huge that it causes jiffies >to wrap. It appears delta is very large when memory is more than 512GB on >certain boxes causing wrap around. > >why is delta in dom0->timer_interrupt() related to memory on system? >Because hyp creates dom0, then page scrubs, then unpauses vcpu. so it >appears lot of page scurbbing results in huge delta on first tick. Based on prior analysis of similar problems, I'm not convinced this is the right solution: Kernel code should not need changing here. Instead, I'd recommend trying to insert a call to process_pending_timers() every so many pages scrubbed (just like is e.g. being done in the P2M/M2P table population code). Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 9:55 ` Jan Beulich @ 2009-12-21 18:20 ` Dan Magenheimer 2009-12-21 19:07 ` Keir Fraser 2009-12-22 8:51 ` Jan Beulich 0 siblings, 2 replies; 51+ messages in thread From: Dan Magenheimer @ 2009-12-21 18:20 UTC (permalink / raw) To: Jan Beulich, mukesh.rathor; +Cc: kurt.hackel, jeremy, Xen-devel, Keir Fraser > From: Jan Beulich [mailto:JBeulich@novell.com] > > >>> Mukesh Rathor <mukesh.rathor@oracle.com> 19.12.09 05:43 >>> > >if first ever timer interrupt comes after start_jiffies is > set, dom0 boot > >may hang if delta in timer_interrupt() is so huge that it > causes jiffies > >to wrap. It appears delta is very large when memory is more > than 512GB on > >certain boxes causing wrap around. > > > >why is delta in dom0->timer_interrupt() related to memory on system? > >Because hyp creates dom0, then page scrubs, then unpauses vcpu. so it > >appears lot of page scurbbing results in huge delta on first tick. > > Based on prior analysis of similar problems, I'm not > convinced this is the > right solution: Kernel code should not need changing here. > Instead, I'd > recommend trying to insert a call to process_pending_timers() every so > many pages scrubbed (just like is e.g. being done in the P2M/M2P table > population code). Mukesh has dug into this a lot deeper than I, but I think process_pending_timers() is irrelevant here. When dom0 is constructed, its data space is initialized in memory and jiffies has been initialized in the data section with a fixed value of -300 * HZ. At this point, dom0 lives in memory but has not executed a single instruction, so is not capable of receiving any interrupts. I *think* Xen also initializes a clocksource (pvclock?) here. Then scrub_heap_pages() occurs which eats up a lot of time. THEN dom0 is started and receives a timer interrupt and, I guess, the clocksource code updates jiffies based on the time elapsed and, since jiffies is unsigned, it wraps around. So (admitting I don't understand this fully), I think the problem is that the kernel has hardcoded into it that it's impossible for 300 seconds to expire between the time it is put in memory and the time the first interrupt occurs. That seems like a kernel bug to me, maybe in the pvclock code, but still in the kernel. Not to say the problem can't or shouldn't be fixed in Xen. Keir, would bad things happen if construct_dom0 is done after scrub_heap_pages()? Other than some time wastage because dom0's memory would get scrubbed just before it gets overwritten (which is admittedly a much bigger problem when dom0_mem is not specified in the Xen boot line on a machine with ginormous memory). Thanks, Dan ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 18:20 ` Dan Magenheimer @ 2009-12-21 19:07 ` Keir Fraser 2009-12-21 19:52 ` Mukesh Rathor 2009-12-22 8:51 ` Jan Beulich 1 sibling, 1 reply; 51+ messages in thread From: Keir Fraser @ 2009-12-21 19:07 UTC (permalink / raw) To: Dan Magenheimer, Jan Beulich, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com On 21/12/2009 18:20, "Dan Magenheimer" <dan.magenheimer@oracle.com> wrote: > Not to say the problem can't or shouldn't be fixed in Xen. > Keir, would bad things happen if construct_dom0 is done after > scrub_heap_pages()? Other than some time wastage because > dom0's memory would get scrubbed just before it gets > overwritten (which is admittedly a much bigger problem > when dom0_mem is not specified in the Xen boot line > on a machine with ginormous memory). The problem is more likely that Xen system time started ticking some time earlier during boot process. I doubt it is to do with ordering of construct_dom0 versus boot-time scrubbing. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 19:07 ` Keir Fraser @ 2009-12-21 19:52 ` Mukesh Rathor 2009-12-21 19:55 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 51+ messages in thread From: Mukesh Rathor @ 2009-12-21 19:52 UTC (permalink / raw) To: Keir Fraser Cc: kurt.hackel@oracle.com, Dan Magenheimer, Xen-devel@lists.xensource.com, Jeremy Fitzhardinge, Jan Beulich On Mon, 21 Dec 2009 19:07:39 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 21/12/2009 18:20, "Dan Magenheimer" <dan.magenheimer@oracle.com> > wrote: > > > Not to say the problem can't or shouldn't be fixed in Xen. > > Keir, would bad things happen if construct_dom0 is done after > > scrub_heap_pages()? Other than some time wastage because > > dom0's memory would get scrubbed just before it gets > > overwritten (which is admittedly a much bigger problem > > when dom0_mem is not specified in the Xen boot line > > on a machine with ginormous memory). > > The problem is more likely that Xen system time started ticking some > time earlier during boot process. I doubt it is to do with ordering of > construct_dom0 versus boot-time scrubbing. > > -- Keir > The problem is exactly how Dan described it. 'delta' for first interrupt in dom0->timer_interrupt() goes up proportionately with amount of memory on system. On this box, it appears more than 600GB causes delta to be large enough to wrap jiffies. 1TB delta: 940b7d68a4 32GB delta: 02ae56eadb xen->send_guest_vcpu_virq() ----> dom0->handle_IRQ() -> timer_interrupt() timer_interrupt will call do_timer delta/NS_PER_TICK number of times. Linux initializes jiffies to -5 minutes to catch problems from jiffies wrap early on. But like Dan said, dom0->calibrate_delay_direct() on baremetal starts running right away and is guaranteed to run in less than 5 minutes. We could let that assumption be true by moving page scrub before xen->construct_dom0(), in which case the first timer interrupt in dom0 will come in lot sooner, or just fix the loop to account for wrap. Since jiffies just represents lower 32bits of jiffies_64, and get_jiffies_64() is provided for the purpose of reading 64bit version, I just avail of that. Thanks, Mukesh ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 19:52 ` Mukesh Rathor @ 2009-12-21 19:55 ` Jeremy Fitzhardinge 2009-12-21 22:47 ` Mukesh Rathor 0 siblings, 1 reply; 51+ messages in thread From: Jeremy Fitzhardinge @ 2009-12-21 19:55 UTC (permalink / raw) To: Mukesh Rathor Cc: kurt.hackel@oracle.com, Dan Magenheimer, Xen-devel@lists.xensource.com, Keir Fraser, Jan Beulich On 12/21/2009 11:52 AM, Mukesh Rathor wrote: > On Mon, 21 Dec 2009 19:07:39 +0000 > Keir Fraser<keir.fraser@eu.citrix.com> wrote: > > >> On 21/12/2009 18:20, "Dan Magenheimer"<dan.magenheimer@oracle.com> >> wrote: >> >> >>> Not to say the problem can't or shouldn't be fixed in Xen. >>> Keir, would bad things happen if construct_dom0 is done after >>> scrub_heap_pages()? Other than some time wastage because >>> dom0's memory would get scrubbed just before it gets >>> overwritten (which is admittedly a much bigger problem >>> when dom0_mem is not specified in the Xen boot line >>> on a machine with ginormous memory). >>> >> The problem is more likely that Xen system time started ticking some >> time earlier during boot process. I doubt it is to do with ordering of >> construct_dom0 versus boot-time scrubbing. >> >> -- Keir >> >> > The problem is exactly how Dan described it. 'delta' for first interrupt > in dom0->timer_interrupt() goes up proportionately with amount of memory > on system. On this box, it appears more than 600GB causes delta to be > large enough to wrap jiffies. > > 1TB delta: 940b7d68a4 > 32GB delta: 02ae56eadb > > xen->send_guest_vcpu_virq() ----> dom0->handle_IRQ() -> timer_interrupt() > > timer_interrupt will call do_timer delta/NS_PER_TICK number of times. > How is it computing that delta? Anyway, I'm not at all sure this will apply to a pvops dom0 kernel as it does timekeeping quite differently from 2.6.18-xen. J ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 19:55 ` Jeremy Fitzhardinge @ 2009-12-21 22:47 ` Mukesh Rathor 2009-12-21 23:13 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 51+ messages in thread From: Mukesh Rathor @ 2009-12-21 22:47 UTC (permalink / raw) To: Jeremy Fitzhardinge Cc: kurt.hackel@oracle.com, Dan Magenheimer, Xen-devel@lists.xensource.com, Keir Fraser, Jan Beulich On Mon, 21 Dec 2009 11:55:09 -0800 Jeremy Fitzhardinge <jeremy@goop.org> wrote: > On 12/21/2009 11:52 AM, Mukesh Rathor wrote: > > On Mon, 21 Dec 2009 19:07:39 +0000 > > Keir Fraser<keir.fraser@eu.citrix.com> wrote: > > > > > >> On 21/12/2009 18:20, "Dan Magenheimer"<dan.magenheimer@oracle.com> > >> wrote: > >> > >> > >>> Not to say the problem can't or shouldn't be fixed in Xen. > >>> Keir, would bad things happen if construct_dom0 is done after > >>> scrub_heap_pages()? Other than some time wastage because > >>> dom0's memory would get scrubbed just before it gets > >>> overwritten (which is admittedly a much bigger problem > >>> when dom0_mem is not specified in the Xen boot line > >>> on a machine with ginormous memory). > >>> > >> The problem is more likely that Xen system time started ticking > >> some time earlier during boot process. I doubt it is to do with > >> ordering of construct_dom0 versus boot-time scrubbing. > >> > >> -- Keir > >> > >> > > The problem is exactly how Dan described it. 'delta' for first > > interrupt in dom0->timer_interrupt() goes up proportionately with > > amount of memory on system. On this box, it appears more than 600GB > > causes delta to be large enough to wrap jiffies. > > > > 1TB delta: 940b7d68a4 > > 32GB delta: 02ae56eadb > > > > xen->send_guest_vcpu_virq() ----> dom0->handle_IRQ() -> > > timer_interrupt() > > > > timer_interrupt will call do_timer delta/NS_PER_TICK number of > > times. > > How is it computing that delta? > > Anyway, I'm not at all sure this will apply to a pvops dom0 kernel as > it does timekeeping quite differently from 2.6.18-xen. > > J delta comes from: timer_inetrrupt() in time-xen.c : ... do { get_time_values_from_xen(cpu); /* Obtain a consistent snapshot of elapsed wallclock cycles. */ ---> delta = delta_cpu = shadow->system_timestamp + get_nsec_offset(shadow); ---> delta -= processed_system_time; delta_cpu -= per_cpu(processed_system_time, cpu); /* * Obtain a consistent snapshot of stolen/blocked cycles. We * can use state_entry_time to detect if we get preempted here. */ do { sched_time = runstate->state_entry_time; barrier(); stolen = runstate->time[RUNSTATE_runnable] + runstate->time[RUNSTATE_offline] - per_cpu(processed_stolen_time, cpu); blocked = runstate->time[RUNSTATE_blocked] - per_cpu(processed_blocked_time, cpu); barrier(); } while (sched_time != runstate->state_entry_time); } while (!time_values_up_to_date(cpu)); ... At first glance, i don't understand the above algorithm. Since you've the same code, I assumed you could also compute delta to be a large value when dom0 starts, in which case you may observe dom0 hang. thanks, Mukesh ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 22:47 ` Mukesh Rathor @ 2009-12-21 23:13 ` Jeremy Fitzhardinge 2009-12-21 23:57 ` Dan Magenheimer 0 siblings, 1 reply; 51+ messages in thread From: Jeremy Fitzhardinge @ 2009-12-21 23:13 UTC (permalink / raw) To: Mukesh Rathor Cc: kurt.hackel@oracle.com, Dan Magenheimer, Xen-devel@lists.xensource.com, Keir Fraser, Jan Beulich On 12/21/2009 02:47 PM, Mukesh Rathor wrote: > delta comes from: > > timer_inetrrupt() in time-xen.c : > ... > do { > get_time_values_from_xen(cpu); > > /* Obtain a consistent snapshot of elapsed wallclock cycles. */ > ---> delta = delta_cpu = > shadow->system_timestamp + get_nsec_offset(shadow); > ---> delta -= processed_system_time; > delta_cpu -= per_cpu(processed_system_time, cpu); > > /* > * Obtain a consistent snapshot of stolen/blocked cycles. We > * can use state_entry_time to detect if we get preempted here. > */ > do { > sched_time = runstate->state_entry_time; > barrier(); > stolen = runstate->time[RUNSTATE_runnable] + > runstate->time[RUNSTATE_offline] - > per_cpu(processed_stolen_time, cpu); > blocked = runstate->time[RUNSTATE_blocked] - > per_cpu(processed_blocked_time, cpu); > barrier(); > } while (sched_time != runstate->state_entry_time); > } while (!time_values_up_to_date(cpu)); > ... > > > At first glance, i don't understand the above algorithm. Since you've > the same code, I assumed you could also compute delta to be a large > value when dom0 starts, in which case you may observe dom0 hang. > There's some code in the pvops kernel which looks vaguely like that, but it has nothing to do with timer interrupts. Could you be more specific about what you're referring to? J ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 23:13 ` Jeremy Fitzhardinge @ 2009-12-21 23:57 ` Dan Magenheimer 2009-12-22 4:31 ` Mukesh Rathor 0 siblings, 1 reply; 51+ messages in thread From: Dan Magenheimer @ 2009-12-21 23:57 UTC (permalink / raw) To: Jeremy Fitzhardinge, mukesh.rathor Cc: kurt.hackel, Xen-devel, Keir Fraser, Jan Beulich > From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > > There's some code in the pvops kernel which looks vaguely > like that, but > it has nothing to do with timer interrupts. Could you be > more specific > about what you're referring to? I spent some time rooting through the 2.6.32 code and ended up with my head spinning. I think the bottom line is if there is code that may cause jiffies to increment by a large amount from a single "tick" delivered by Xen, it's likely the same problem can occur in 2.6.32 dom0 when running on Xen. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 23:57 ` Dan Magenheimer @ 2009-12-22 4:31 ` Mukesh Rathor 0 siblings, 0 replies; 51+ messages in thread From: Mukesh Rathor @ 2009-12-22 4:31 UTC (permalink / raw) To: Dan Magenheimer Cc: Jeremy Fitzhardinge, Xen-devel, kurt.hackel, Beulich, Jan, Keir Fraser On Mon, 21 Dec 2009 15:57:33 -0800 (PST) Dan Magenheimer <dan.magenheimer@oracle.com> wrote: > > From: Jeremy Fitzhardinge [mailto:jeremy@goop.org] > > > > There's some code in the pvops kernel which looks vaguely > > like that, but > > it has nothing to do with timer interrupts. Could you be > > more specific > > about what you're referring to? > > I spent some time rooting through the 2.6.32 code and > ended up with my head spinning. I think the bottom line > is if there is code that may cause jiffies to increment > by a large amount from a single "tick" delivered by > Xen, it's likely the same problem can occur in 2.6.32 > dom0 when running on Xen. Right. Your calibrate_delay_direct() is the same, so in there: start_jiffies = jiffies; ... timer interrupt.... while (jiffies <= (start_jiffies + tick_divider)) { pre_start = start; read_current_timer(&start); } if timer tick comes in after start_jiff is set, and upon returning from timer interrupt the while loop finds jiffies wrapped, it will hang. i was looking at wrong "jeremy's pvops tree", but now that i am looking at correct one, i see that your timer_interrupt() is pretty different. so if you believe that you could also increment jiffies by more than one in timer_interrupt, you should consider my new patch when i submit. i'm testing right now. thanks, mukesh ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 18:20 ` Dan Magenheimer 2009-12-21 19:07 ` Keir Fraser @ 2009-12-22 8:51 ` Jan Beulich 2009-12-22 10:20 ` Keir Fraser 1 sibling, 1 reply; 51+ messages in thread From: Jan Beulich @ 2009-12-22 8:51 UTC (permalink / raw) To: Dan Magenheimer, mukesh.rathor Cc: kurt.hackel, jeremy, Xen-devel, Keir Fraser >>> Dan Magenheimer <dan.magenheimer@oracle.com> 21.12.09 19:20 >>> > From: Jan Beulich [mailto:JBeulich@novell.com] >> Based on prior analysis of similar problems, I'm not >> convinced this is the >> right solution: Kernel code should not need changing here. >> Instead, I'd >> recommend trying to insert a call to process_pending_timers() every so >> many pages scrubbed (just like is e.g. being done in the P2M/M2P table >> population code). > >Mukesh has dug into this a lot deeper than I, but I think >process_pending_timers() is irrelevant here. When dom0 Why would this be any different than a lot of time being consumed populating large p2m/m2p tables? All this happens when Dom0 already exists, but isn't running yet. >is constructed, its data space is initialized in memory >and jiffies has been initialized in the data section with >a fixed value of -300 * HZ. At this point, dom0 lives in >memory but has not executed a single instruction, so is >not capable of receiving any interrupts. I *think* Xen >also initializes a clocksource (pvclock?) here. ... and updates it each time local_time_calibration() is run, which is the missing piece (process_pending_timers() causes time_calibration() to run as needed, in turn causing TIME_CALIBRATE_SOFTIRQ to be raised as needed [and run the latest immediately before Dom0 gets passed control], in turn causing local_time_calibration() to run, updating dom0:vcpu0's system time). >Then scrub_heap_pages() occurs which eats up a lot of time. ... and confuses Xen's own time keeping (because, depending on the platform timer used and it's wrap-around interval, a wrap may be missed if process_pending_timers() isn't being executed frequently enough. But from the other mail regarding this subject I conclude that this suggestion wasn't even tried, despite me knowing that it fixed similar problems on 1Tb systems. And be assured, I spent hours (if not days) analyzing the problem until I finally understood that this is entirely unrelated to the kernel. >THEN dom0 is started and receives a timer interrupt and, >I guess, the clocksource code updates jiffies based on >the time elapsed and, since jiffies is unsigned, it >wraps around. > >So (admitting I don't understand this fully), I think the >problem is that the kernel has hardcoded into it that it's >impossible for 300 seconds to expire between the time it >is put in memory and the time the first interrupt occurs. >That seems like a kernel bug to me, maybe in the pvclock >code, but still in the kernel. No, the time the kernel gets put in memory doesn't matter at all. Counting starts when the kernel starts initializing its time subsystem, and with timer interrupts being disabled initially I can't even see how multiple of them could pile up. >Not to say the problem can't or shouldn't be fixed in Xen. >Keir, would bad things happen if construct_dom0 is done after >scrub_heap_pages()? Other than some time wastage because >dom0's memory would get scrubbed just before it gets >overwritten (which is admittedly a much bigger problem >when dom0_mem is not specified in the Xen boot line >on a machine with ginormous memory). Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 8:51 ` Jan Beulich @ 2009-12-22 10:20 ` Keir Fraser 2009-12-22 11:10 ` Jan Beulich 0 siblings, 1 reply; 51+ messages in thread From: Keir Fraser @ 2009-12-22 10:20 UTC (permalink / raw) To: Jan Beulich, Dan Magenheimer, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com On 22/12/2009 08:51, "Jan Beulich" <JBeulich@novell.com> wrote: >> Then scrub_heap_pages() occurs which eats up a lot of time. > > ... and confuses Xen's own time keeping (because, depending on > the platform timer used and it's wrap-around interval, a wrap may > be missed if process_pending_timers() isn't being executed > frequently enough. Process_pending_timers() has been called on every iteration of the scrub loop for as long as I can remember. I believe it was even you who added it. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 10:20 ` Keir Fraser @ 2009-12-22 11:10 ` Jan Beulich 2009-12-22 13:35 ` Keir Fraser 0 siblings, 1 reply; 51+ messages in thread From: Jan Beulich @ 2009-12-22 11:10 UTC (permalink / raw) To: Keir Fraser, Dan Magenheimer, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com >>> Keir Fraser <keir.fraser@eu.citrix.com> 22.12.09 11:20 >>> >On 22/12/2009 08:51, "Jan Beulich" <JBeulich@novell.com> wrote: > >>> Then scrub_heap_pages() occurs which eats up a lot of time. >> >> ... and confuses Xen's own time keeping (because, depending on >> the platform timer used and it's wrap-around interval, a wrap may >> be missed if process_pending_timers() isn't being executed >> frequently enough. > >Process_pending_timers() has been called on every iteration of the scrub >loop for as long as I can remember. I believe it was even you who added it. Should I have overlooked it? Indeed, I did (I looked at the end of the loop, while it's sitting at the beginning). I'm really sorry for the noise then. Nevertheless I remain convinced that the problem ought not to be fixed by a kernel change (and even less by one that modifies Xen-unspecific code). Any patch to this effect, unless I should be convinced otherwise, has my explicit up front NAK (in case this counts anything). And then it should be possible to simulate the problem quite easily on a system with much less memory, by slowing down the scrub loop artificially. If I find time before the holiday break I'll try to do that and see if I can convince myself otherwise (as per above). artificially Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 11:10 ` Jan Beulich @ 2009-12-22 13:35 ` Keir Fraser 2009-12-22 14:17 ` Jan Beulich 2009-12-22 16:33 ` Jan Beulich 0 siblings, 2 replies; 51+ messages in thread From: Keir Fraser @ 2009-12-22 13:35 UTC (permalink / raw) To: Jan Beulich, Dan Magenheimer, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com On 22/12/2009 11:10, "Jan Beulich" <JBeulich@novell.com> wrote: > Nevertheless I remain convinced that the problem ought not to be fixed > by a kernel change (and even less by one that modifies Xen-unspecific > code). Any patch to this effect, unless I should be convinced otherwise, > has my explicit up front NAK (in case this counts anything). Well, I must say the kernel patch looked quite sensible to me. If no other reason than reinforcing the fact that jiffy values should always be compared using the provided macros. But I'm happy to have a hypervisor patch as well, if we can work out what it should be. I'm still unclear on the reason why slow page scrubbing causes this problem - Oracle's explanation hasn't convinced me yet. > And then it should be possible to simulate the problem quite easily on > a system with much less memory, by slowing down the scrub loop > artificially. If I find time before the holiday break I'll try to do that and > see if I can convince myself otherwise (as per above). > artificially That would be helpful, thanks. I'm particularly intrigued by how this could be seen for dom0 but not be a similar or worse issue for domU. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 13:35 ` Keir Fraser @ 2009-12-22 14:17 ` Jan Beulich 2009-12-22 14:23 ` Jan Beulich 2009-12-22 16:33 ` Jan Beulich 1 sibling, 1 reply; 51+ messages in thread From: Jan Beulich @ 2009-12-22 14:17 UTC (permalink / raw) To: Dan Magenheimer, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com, Keir Fraser >>> Keir Fraser <keir.fraser@eu.citrix.com> 22.12.09 14:35 >>> >On 22/12/2009 11:10, "Jan Beulich" <JBeulich@novell.com> wrote: > >> Nevertheless I remain convinced that the problem ought not to be fixed >> by a kernel change (and even less by one that modifies Xen-unspecific >> code). Any patch to this effect, unless I should be convinced otherwise, >> has my explicit up front NAK (in case this counts anything). > >Well, I must say the kernel patch looked quite sensible to me. If no other >reason than reinforcing the fact that jiffy values should always be compared >using the provided macros. But I'm happy to have a hypervisor patch as well, >if we can work out what it should be. I'm still unclear on the reason why >slow page scrubbing causes this problem - Oracle's explanation hasn't >convinced me yet. There's another thing that seems inconsistent with this report: jiffies itself as well as all the arithmetic in calibrate_delay_direct() is using "unsigned long", so is being done 64-bit on x86-64 (which we're talking about here). Hence I can see even less how an overflow could have happened, or how using explicit 64-bit types (or get_jiffies_64()) here can help. Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 14:17 ` Jan Beulich @ 2009-12-22 14:23 ` Jan Beulich 2009-12-22 15:19 ` Keir Fraser 2009-12-22 15:30 ` Dan Magenheimer 0 siblings, 2 replies; 51+ messages in thread From: Jan Beulich @ 2009-12-22 14:23 UTC (permalink / raw) To: Jan Beulich, Dan Magenheimer, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com, Keir Fraser >>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 15:17 >>> >There's another thing that seems inconsistent with this report: jiffies >itself as well as all the arithmetic in calibrate_delay_direct() is using >"unsigned long", so is being done 64-bit on x86-64 (which we're >talking about here). Hence I can see even less how an overflow could >have happened, or how using explicit 64-bit types (or get_jiffies_64()) >here can help. Oh, or are we talking about 32-bit Dom0 on 64-bit Xen here? I don't recall this having been mentioned anywhere, but maybe I just overlooked it. Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 14:23 ` Jan Beulich @ 2009-12-22 15:19 ` Keir Fraser 2009-12-22 15:30 ` Dan Magenheimer 1 sibling, 0 replies; 51+ messages in thread From: Keir Fraser @ 2009-12-22 15:19 UTC (permalink / raw) To: Jan Beulich, Dan Magenheimer, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com On 22/12/2009 14:23, "Jan Beulich" <JBeulich@novell.com> wrote: >>>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 15:17 >>> >> There's another thing that seems inconsistent with this report: jiffies >> itself as well as all the arithmetic in calibrate_delay_direct() is using >> "unsigned long", so is being done 64-bit on x86-64 (which we're >> talking about here). Hence I can see even less how an overflow could >> have happened, or how using explicit 64-bit types (or get_jiffies_64()) >> here can help. > > Oh, or are we talking about 32-bit Dom0 on 64-bit Xen here? I don't > recall this having been mentioned anywhere, but maybe I just > overlooked it. I'd assumed this must be the case. As you say, the issue couldn't happen as described on 64-bit Linux. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 14:23 ` Jan Beulich 2009-12-22 15:19 ` Keir Fraser @ 2009-12-22 15:30 ` Dan Magenheimer 2009-12-22 15:36 ` Jan Beulich 1 sibling, 1 reply; 51+ messages in thread From: Dan Magenheimer @ 2009-12-22 15:30 UTC (permalink / raw) To: Jan Beulich, mukesh.rathor Cc: kurt.hackel, Jeremy Fitzhardinge, Xen-devel, Keir Fraser > >>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 15:17 >>> > >There's another thing that seems inconsistent with this > report: jiffies > >itself as well as all the arithmetic in > calibrate_delay_direct() is using > >"unsigned long", so is being done 64-bit on x86-64 (which we're > >talking about here). Hence I can see even less how an overflow could > >have happened, or how using explicit 64-bit types (or > get_jiffies_64()) > >here can help. > > Oh, or are we talking about 32-bit Dom0 on 64-bit Xen here? I don't > recall this having been mentioned anywhere, but maybe I just > overlooked it. Mukesh's work has been on a 32-bit dom0. ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 15:30 ` Dan Magenheimer @ 2009-12-22 15:36 ` Jan Beulich 2009-12-22 16:05 ` Dan Magenheimer 0 siblings, 1 reply; 51+ messages in thread From: Jan Beulich @ 2009-12-22 15:36 UTC (permalink / raw) To: Dan Magenheimer, mukesh.rathor Cc: kurt.hackel, Jeremy Fitzhardinge, Xen-devel, Keir Fraser >>> Dan Magenheimer <dan.magenheimer@oracle.com> 22.12.09 16:30 >>> >Mukesh's work has been on a 32-bit dom0. Which seems quite odd a combination - 1Tb of memory, but a 32-bit Dom0 -, which is why initially I didn't even consider the possibility. I'm afraid you're asking for more trouble with this. Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 15:36 ` Jan Beulich @ 2009-12-22 16:05 ` Dan Magenheimer 2009-12-22 17:02 ` Jan Beulich 0 siblings, 1 reply; 51+ messages in thread From: Dan Magenheimer @ 2009-12-22 16:05 UTC (permalink / raw) To: Jan Beulich, mukesh.rathor Cc: kurt.hackel, Jeremy Fitzhardinge, Xen-devel, Keir Fraser > From: Jan Beulich [mailto:JBeulich@novell.com] > > >>> Dan Magenheimer <dan.magenheimer@oracle.com> 22.12.09 16:30 >>> > >Mukesh's work has been on a 32-bit dom0. > > Which seems quite odd a combination - 1Tb of memory, but a 32-bit > Dom0 -, which is why initially I didn't even consider the > possibility. I'm > afraid you're asking for more trouble with this. Indeed. Oracle expects to move to a 64-bit dom0 in a "future" release, but we have to make the 32-bit dom0 work until then. Our default configuration specifies dom0_mem=xxx which I would assume would eliminate most or all of the "trouble" you are referring to, but, Mukesh, could you confirm what dom0_mem is set to? ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 16:05 ` Dan Magenheimer @ 2009-12-22 17:02 ` Jan Beulich 2009-12-22 18:03 ` Jeremy Fitzhardinge 0 siblings, 1 reply; 51+ messages in thread From: Jan Beulich @ 2009-12-22 17:02 UTC (permalink / raw) To: Dan Magenheimer, mukesh.rathor Cc: kurt.hackel, Jeremy Fitzhardinge, Xen-devel, Keir Fraser >>> Dan Magenheimer <dan.magenheimer@oracle.com> 22.12.09 17:05 >>> >> From: Jan Beulich [mailto:JBeulich@novell.com] >> >> >>> Dan Magenheimer <dan.magenheimer@oracle.com> 22.12.09 16:30 >>> >> >Mukesh's work has been on a 32-bit dom0. >> >> Which seems quite odd a combination - 1Tb of memory, but a 32-bit >> Dom0 -, which is why initially I didn't even consider the >> possibility. I'm >> afraid you're asking for more trouble with this. > >Indeed. Oracle expects to move to a 64-bit dom0 in a "future" >release, but we have to make the 32-bit dom0 work until then. > >Our default configuration specifies dom0_mem=xxx which I would >assume would eliminate most or all of the "trouble" you are >referring to, but, Mukesh, could you confirm what dom0_mem >is set to? No, that won't help. I'm referring to things like Dom0 accesses to the M2P table it sees, which doesn't cover even nearly all memory. I can't say whether that can go without problem, but without closely looking at it I don't think you can assume this would work. Likewise I would suspect tools issues (if you use the tools from xen-unstable et al), though I have no precise pointer right now at specific issues. Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 17:02 ` Jan Beulich @ 2009-12-22 18:03 ` Jeremy Fitzhardinge 2010-01-04 8:23 ` Jan Beulich 0 siblings, 1 reply; 51+ messages in thread From: Jeremy Fitzhardinge @ 2009-12-22 18:03 UTC (permalink / raw) To: Jan Beulich; +Cc: kurt.hackel, Dan Magenheimer, Xen-devel, Keir Fraser On 12/22/2009 09:02 AM, Jan Beulich wrote: > No, that won't help. I'm referring to things like Dom0 accesses to the > M2P table it sees, which doesn't cover even nearly all memory. I can't > say whether that can go without problem, but without closely looking > at it I don't think you can assume this would work. Likewise I would > suspect tools issues (if you use the tools from xen-unstable et al), > though I have no precise pointer right now at specific issues. > 32-bit dom0 is the standard use model for Citrix product, and I think people tend to run it even with xen-unstable. Its a fairly well-tested combination. J ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 18:03 ` Jeremy Fitzhardinge @ 2010-01-04 8:23 ` Jan Beulich 2010-01-04 22:07 ` Dan Magenheimer 0 siblings, 1 reply; 51+ messages in thread From: Jan Beulich @ 2010-01-04 8:23 UTC (permalink / raw) To: Jeremy Fitzhardinge; +Cc: kurt.hackel, Dan Magenheimer, Xen-devel, Keir Fraser >>> Jeremy Fitzhardinge <jeremy@goop.org> 22.12.09 19:03 >>> >On 12/22/2009 09:02 AM, Jan Beulich wrote: >> No, that won't help. I'm referring to things like Dom0 accesses to the >> M2P table it sees, which doesn't cover even nearly all memory. I can't >> say whether that can go without problem, but without closely looking >> at it I don't think you can assume this would work. Likewise I would >> suspect tools issues (if you use the tools from xen-unstable et al), >> though I have no precise pointer right now at specific issues. >> > >32-bit dom0 is the standard use model for Citrix product, and I think >people tend to run it even with xen-unstable. Its a fairly well-tested >combination. But very unlikely with 1Tb of memory, don't you think? Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2010-01-04 8:23 ` Jan Beulich @ 2010-01-04 22:07 ` Dan Magenheimer 2010-01-04 22:21 ` Ian Campbell 2010-01-05 8:33 ` Jan Beulich 0 siblings, 2 replies; 51+ messages in thread From: Dan Magenheimer @ 2010-01-04 22:07 UTC (permalink / raw) To: Jan Beulich, Jeremy Fitzhardinge; +Cc: kurt.hackel, Xen-devel, Keir Fraser > From: Jan Beulich [mailto:JBeulich@novell.com] > > >>> Jeremy Fitzhardinge <jeremy@goop.org> 22.12.09 19:03 >>> > >On 12/22/2009 09:02 AM, Jan Beulich wrote: > >> No, that won't help. I'm referring to things like Dom0 > accesses to the > >> M2P table it sees, which doesn't cover even nearly all > memory. I can't > >> say whether that can go without problem, but without > closely looking > >> at it I don't think you can assume this would work. > Likewise I would > >> suspect tools issues (if you use the tools from > xen-unstable et al), > >> though I have no precise pointer right now at specific issues. > >> > > > >32-bit dom0 is the standard use model for Citrix product, > and I think > >people tend to run it even with xen-unstable. Its a fairly > well-tested > >combination. > > But very unlikely with 1Tb of memory, don't you think? > > Jan Only because machines with 1TB are rare/unlikely. I can't speak for the Citrix product but there is NO supported configuration (yet) of the Oracle VM product with a 64-bit dom0. In other words, if a customer is using a released Oracle VM product and the machine on which they are running it has 1TB of physical memory, they ARE using a 32-bit dom0. However, Oracle VM always specifies a dom0_mem= Xen boot parameter. (which is always much smaller than 1TB). If there are known issues with 1TB of memory in this configuration, we'd like to understand them. If 1Tb with 32-bit dom0 is rife with hidden unresolvable problems, we'd like to make a clear support statement as to what the physical memory limit is. Thanks, Dan ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2010-01-04 22:07 ` Dan Magenheimer @ 2010-01-04 22:21 ` Ian Campbell 2010-01-05 8:33 ` Jan Beulich 1 sibling, 0 replies; 51+ messages in thread From: Ian Campbell @ 2010-01-04 22:21 UTC (permalink / raw) To: Dan Magenheimer Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com, Keir Fraser, Jan Beulich On Mon, 2010-01-04 at 22:07 +0000, Dan Magenheimer wrote: > > From: Jan Beulich [mailto:JBeulich@novell.com] > > > > >>> Jeremy Fitzhardinge <jeremy@goop.org> 22.12.09 19:03 >>> > > >On 12/22/2009 09:02 AM, Jan Beulich wrote: > > >> No, that won't help. I'm referring to things like Dom0 > > accesses to the > > >> M2P table it sees, which doesn't cover even nearly all > > memory. I can't > > >> say whether that can go without problem, but without > > closely looking > > >> at it I don't think you can assume this would work. > > Likewise I would > > >> suspect tools issues (if you use the tools from > > xen-unstable et al), > > >> though I have no precise pointer right now at specific issues. > > >> > > > > > >32-bit dom0 is the standard use model for Citrix product, > > and I think > > >people tend to run it even with xen-unstable. Its a fairly > > well-tested > > >combination. > > > > But very unlikely with 1Tb of memory, don't you think? > > > > Jan > > Only because machines with 1TB are rare/unlikely. I can't > speak for the Citrix product but there is NO supported > configuration (yet) of the Oracle VM product with a 64-bit dom0. > In other words, if a customer is using a released Oracle VM > product and the machine on which they are running it has 1TB > of physical memory, they ARE using a 32-bit dom0. > > However, Oracle VM always specifies a dom0_mem= Xen boot parameter. > (which is always much smaller than 1TB). So do XenServer and XCP, FWIW. Ian. ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2010-01-04 22:07 ` Dan Magenheimer 2010-01-04 22:21 ` Ian Campbell @ 2010-01-05 8:33 ` Jan Beulich 2010-01-05 15:46 ` Dan Magenheimer 1 sibling, 1 reply; 51+ messages in thread From: Jan Beulich @ 2010-01-05 8:33 UTC (permalink / raw) To: Dan Magenheimer; +Cc: kurt.hackel, Jeremy Fitzhardinge, Xen-devel, Keir Fraser >>> Dan Magenheimer <dan.magenheimer@oracle.com> 04.01.10 23:07 >>> >If there are known issues with 1TB of memory in this >configuration, we'd like to understand them. If 1Tb with >32-bit dom0 is rife with hidden unresolvable problems, >we'd like to make a clear support statement as to what the >physical memory limit is. I can't say there are known problems, but I'm convinced not everything can work properly above the boundary of 168G. Nevertheless it is quite possible that most or all of the normal (not error handling) code paths work well. Page table walks e.g. during exceptions or kexec would be problem candidates. And while my knowledge of the tools is rather limited, libxc also has - iirc - several hard coded assumptions that might not hold. What is clear though is that you also depend on the memory distribution across the (physical) address space: Contiguous (apart from the below- 4G hole) memory will likely represent little problems, but sparse memory crossing the 44-bit boundary can't work in any case (since MFNs are represented as 32-bit quantities in 32-bit Dom0). Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2010-01-05 8:33 ` Jan Beulich @ 2010-01-05 15:46 ` Dan Magenheimer 2010-01-05 15:54 ` Ian Campbell 2010-01-05 16:08 ` Jan Beulich 0 siblings, 2 replies; 51+ messages in thread From: Dan Magenheimer @ 2010-01-05 15:46 UTC (permalink / raw) To: Jan Beulich; +Cc: kurt.hackel, Jeremy Fitzhardinge, Xen-devel, Keir Fraser > What is clear though is that you also depend on the memory > distribution > across the (physical) address space: Contiguous (apart from the below- > 4G hole) memory will likely represent little problems, but > sparse memory > crossing the 44-bit boundary can't work in any case (since MFNs are > represented as 32-bit quantities in 32-bit Dom0). Urk. Yes, I had forgotten about the sparse problem. > I can't say there are known problems, but I'm convinced not everything > can work properly above the boundary of 168G. Nevertheless it is quite > possible that most or all of the normal (not error handling) > code paths > work well. Page table walks e.g. during exceptions or kexec would be > problem candidates. And while my knowledge of the tools is rather > limited, libxc also has - iirc - several hard coded > assumptions that might not hold. What is special about 168GB? Or is that a typo? (And if it is supposed to be 128GB, what is special about 128GB?) Thanks, Dan ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2010-01-05 15:46 ` Dan Magenheimer @ 2010-01-05 15:54 ` Ian Campbell 2010-01-05 16:08 ` Jan Beulich 1 sibling, 0 replies; 51+ messages in thread From: Ian Campbell @ 2010-01-05 15:54 UTC (permalink / raw) To: Dan Magenheimer Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com, Keir Fraser, Jan Beulich On Tue, 2010-01-05 at 15:46 +0000, Dan Magenheimer wrote: > > What is clear though is that you also depend on the memory > > distribution > > across the (physical) address space: Contiguous (apart from the below- > > 4G hole) memory will likely represent little problems, but > > sparse memory > > crossing the 44-bit boundary can't work in any case (since MFNs are > > represented as 32-bit quantities in 32-bit Dom0). > > Urk. Yes, I had forgotten about the sparse problem. > > > I can't say there are known problems, but I'm convinced not everything > > can work properly above the boundary of 168G. Nevertheless it is quite > > possible that most or all of the normal (not error handling) > > code paths > > work well. Page table walks e.g. during exceptions or kexec would be > > problem candidates. And while my knowledge of the tools is rather > > limited, libxc also has - iirc - several hard coded > > assumptions that might not hold. > > What is special about 168GB? Or is that a typo? (And if it > is supposed to be 128GB, what is special about 128GB?) It's the size of m2p you can fit into the hypervisor hole of a PAE guest running on a 64 bit hypervisor, since the hypervisor no longer need to reside in there it bigger than with a PAE guest on a PAE hypervisor. The size of the hypervisor hole is runtime settable for many guests but I'm not sure that is plumbed through in the tools so who knows how well it works. Increasing the size of the hypervisor hole eats in to kernel low memory though so you would be trading off maximum per-guest RAM against maximum host RAM to some degree. Ian. > > Thanks, > Dan > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2010-01-05 15:46 ` Dan Magenheimer 2010-01-05 15:54 ` Ian Campbell @ 2010-01-05 16:08 ` Jan Beulich 1 sibling, 0 replies; 51+ messages in thread From: Jan Beulich @ 2010-01-05 16:08 UTC (permalink / raw) To: Dan Magenheimer; +Cc: kurt.hackel, Jeremy Fitzhardinge, Xen-devel, Keir Fraser >>> Dan Magenheimer <dan.magenheimer@oracle.com> 05.01.10 16:46 >>> >What is special about 168GB? Or is that a typo? (And if it >is supposed to be 128GB, what is special about 128GB?) No, it's not a typo. The maximum hole Xen can reserve for itself is 168M, and this is what allows to accommodate the M2P table for 168G. Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 13:35 ` Keir Fraser 2009-12-22 14:17 ` Jan Beulich @ 2009-12-22 16:33 ` Jan Beulich 2009-12-22 16:42 ` Jan Beulich 1 sibling, 1 reply; 51+ messages in thread From: Jan Beulich @ 2009-12-22 16:33 UTC (permalink / raw) To: Keir Fraser, Dan Magenheimer, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com >>> Keir Fraser <keir.fraser@eu.citrix.com> 22.12.09 14:35 >>> >> And then it should be possible to simulate the problem quite easily on >> a system with much less memory, by slowing down the scrub loop >> artificially. If I find time before the holiday break I'll try to do that and >> see if I can convince myself otherwise (as per above). >> artificially > >That would be helpful, thanks. I'm particularly intrigued by how this could >be seen for dom0 but not be a similar or worse issue for domU. Simulating the issue indeed went without problem. What I'm seeing is a Xen problem, though (as expected): Right around when scrubbing starts there is a run through time_calibration(). During and after scrub, none happen however, until Dom0 proceeded quite a bit into its initialization (namely, past its delay loop calibration). It is only then when regular (1 second interval) time_calibration() invocations resume. One other irregular at the first glance thing is that the mentioned very first run through time_calibration() does not seem to result in running local_time_calibration() on CPU0. One invocation (apparently independent of time_calibration()) happens right before Dom0 starts executing. Jan (XEN) *** LOADING DOMAIN 0 *** (XEN) Xen kernel: 64-bit, lsb, compat32 (XEN) Dom0 kernel: 32-bit, PAE, lsb, paddr 0x2000 -> 0x496000 (XEN) PHYSICAL MEMORY ARRANGEMENT: (XEN) Dom0 alloc.: 0000000229000000->000000022a000000 (1936909 pages to be allocated) (XEN) VIRTUAL MEMORY ARRANGEMENT: (XEN) Loaded kernel: 00000000c0002000->00000000c0496000 (XEN) Init. ramdisk: 00000000c0496000->00000000c0759193 (XEN) Phys-Mach map: 00000000c075a000->00000000c0ec1834 (XEN) Start info: 00000000c0ec2000->00000000c0ec24b4 (XEN) Page tables: 00000000c0ec3000->00000000c0ed1000 (XEN) Boot stack: 00000000c0ed1000->00000000c0ed2000 (XEN) TOTAL: 00000000c0000000->00000000c1000000 (XEN) ENTRY ADDRESS: 00000000c0002000 (XEN) Dom0 has maximum 8 VCPUs (XEN) tc (XEN) ltc@4[32767:4] (XEN) ltc@3[32767:3] (XEN) ltc@7[32767:7] (XEN) ltc@6[32767:6] (XEN) Scrubbing Free RAM: ltc@2[32767:2] (XEN) ltc@5[32767:5] (XEN) ltc@1[32767:1] (XEN) .....done. (XEN) Xen trace buffers: disabled (XEN) Std. Loglevel: All (XEN) Guest Loglevel: All (XEN) Xen is relinquishing VGA console. (XEN) *** Serial input -> Xen (type 'CTRL-q' three times to switch input to DOM0) (XEN) Freed 160kB init memory. (XEN) ltc@0[0:0] ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 16:33 ` Jan Beulich @ 2009-12-22 16:42 ` Jan Beulich 2009-12-22 17:27 ` Dan Magenheimer ` (2 more replies) 0 siblings, 3 replies; 51+ messages in thread From: Jan Beulich @ 2009-12-22 16:42 UTC (permalink / raw) To: Keir Fraser, Dan Magenheimer, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com >>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 17:33 >>> >One other irregular at the first glance thing is that the mentioned >very first run through time_calibration() does not seem to result in >running local_time_calibration() on CPU0. One invocation (apparently >independent of time_calibration()) happens right before Dom0 starts >executing. And that's of course the problem: CPU0's TIME_CALIBRATE_SOFTIRQ can't get serviced until entry to Dom0, but CPU0 is responsible for re-arming calibration_timer. Hence there's a gap of calibrations, resulting in an excessive delta observed during the first timer interrupt in Dom0. Jan ^ permalink raw reply [flat|nested] 51+ messages in thread
* RE: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 16:42 ` Jan Beulich @ 2009-12-22 17:27 ` Dan Magenheimer 2009-12-22 17:48 ` Keir Fraser 2009-12-22 18:42 ` Keir Fraser 2 siblings, 0 replies; 51+ messages in thread From: Dan Magenheimer @ 2009-12-22 17:27 UTC (permalink / raw) To: Jan Beulich, Keir Fraser, mukesh.rathor Cc: kurt.hackel, Jeremy Fitzhardinge, Xen-devel So, checking my understanding, the underlying problem is that shadow->tsc_timestamp has essentially stopped but hardware tsc has continued moving forward? Thus in timer_interrupt() (in time-xen.c) shadow->system_timestamp will be stale and so get_nsec_offset() is returning a large number, resulting in a large delta, which in turn causes jiffies to be incremented by a large amount which, if the interrupt happens by coincidence in the middle of the first while loop in calibrate_delay_direct() (in init/calibrate.c) and the large jiffies increment happens to be enough to wrap, the while loop will run for weeks. If this is right, I'm still not clear on how it can be fixed in Xen. > -----Original Message----- > From: Jan Beulich [mailto:JBeulich@novell.com] > Sent: Tuesday, December 22, 2009 9:43 AM > To: Keir Fraser; Dan Magenheimer; Mukesh Rathor > Cc: Jeremy Fitzhardinge; Xen-devel@lists.xensource.com; Kurt Hackel > Subject: Re: [Xen-devel] [timer/ticks related] dom0 hang > during boot on > large 1TB system > > > >>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 17:33 >>> > >One other irregular at the first glance thing is that the mentioned > >very first run through time_calibration() does not seem to result in > >running local_time_calibration() on CPU0. One invocation (apparently > >independent of time_calibration()) happens right before Dom0 starts > >executing. > > And that's of course the problem: CPU0's TIME_CALIBRATE_SOFTIRQ can't > get serviced until entry to Dom0, but CPU0 is responsible for > re-arming > calibration_timer. Hence there's a gap of calibrations, > resulting in an > excessive delta observed during the first timer interrupt in Dom0. > > Jan > > ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 16:42 ` Jan Beulich 2009-12-22 17:27 ` Dan Magenheimer @ 2009-12-22 17:48 ` Keir Fraser 2009-12-22 18:42 ` Keir Fraser 2 siblings, 0 replies; 51+ messages in thread From: Keir Fraser @ 2009-12-22 17:48 UTC (permalink / raw) To: Jan Beulich, Dan Magenheimer, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com On 22/12/2009 16:42, "Jan Beulich" <JBeulich@novell.com> wrote: >>>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 17:33 >>> >> One other irregular at the first glance thing is that the mentioned >> very first run through time_calibration() does not seem to result in >> running local_time_calibration() on CPU0. One invocation (apparently >> independent of time_calibration()) happens right before Dom0 starts >> executing. > > And that's of course the problem: CPU0's TIME_CALIBRATE_SOFTIRQ can't > get serviced until entry to Dom0, but CPU0 is responsible for re-arming > calibration_timer. Hence there's a gap of calibrations, resulting in an > excessive delta observed during the first timer interrupt in Dom0. Arbitrarily delaying softirq work is probably inherently fragile. All we have to defer is SCHEDULE_SOFTIRQ as that can preempt the current context. So I will look into making a patch that changes process_pending_timers() to process_pending_softirqs(). Thanks, Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 16:42 ` Jan Beulich 2009-12-22 17:27 ` Dan Magenheimer 2009-12-22 17:48 ` Keir Fraser @ 2009-12-22 18:42 ` Keir Fraser 2009-12-22 23:00 ` Mukesh Rathor 2 siblings, 1 reply; 51+ messages in thread From: Keir Fraser @ 2009-12-22 18:42 UTC (permalink / raw) To: Jan Beulich, Dan Magenheimer, mukesh.rathor@oracle.com Cc: kurt.hackel@oracle.com, Jeremy Fitzhardinge, Xen-devel@lists.xensource.com On 22/12/2009 16:42, "Jan Beulich" <JBeulich@novell.com> wrote: >>>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 17:33 >>> >> One other irregular at the first glance thing is that the mentioned >> very first run through time_calibration() does not seem to result in >> running local_time_calibration() on CPU0. One invocation (apparently >> independent of time_calibration()) happens right before Dom0 starts >> executing. > > And that's of course the problem: CPU0's TIME_CALIBRATE_SOFTIRQ can't > get serviced until entry to Dom0, but CPU0 is responsible for re-arming > calibration_timer. Hence there's a gap of calibrations, resulting in an > excessive delta observed during the first timer interrupt in Dom0. Please give xen-unstable:20714 a look. If that fixes the apparent problems I think it is also a good candidate for backport to 3.4 branch. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 18:42 ` Keir Fraser @ 2009-12-22 23:00 ` Mukesh Rathor 0 siblings, 0 replies; 51+ messages in thread From: Mukesh Rathor @ 2009-12-22 23:00 UTC (permalink / raw) To: Keir Fraser Cc: kurt.hackel@oracle.com, Dan Magenheimer, Xen-devel@lists.xensource.com, Jeremy Fitzhardinge, Jan Beulich On Tue, 22 Dec 2009 18:42:01 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 22/12/2009 16:42, "Jan Beulich" <JBeulich@novell.com> wrote: > > >>>> "Jan Beulich" <JBeulich@novell.com> 22.12.09 17:33 >>> > >> One other irregular at the first glance thing is that the mentioned > >> very first run through time_calibration() does not seem to result > >> in running local_time_calibration() on CPU0. One invocation > >> (apparently independent of time_calibration()) happens right > >> before Dom0 starts executing. > > > > And that's of course the problem: CPU0's TIME_CALIBRATE_SOFTIRQ > > can't get serviced until entry to Dom0, but CPU0 is responsible for > > re-arming calibration_timer. Hence there's a gap of calibrations, > > resulting in an excessive delta observed during the first timer > > interrupt in Dom0. > > Please give xen-unstable:20714 a look. If that fixes the apparent > problems I think it is also a good candidate for backport to 3.4 > branch. > > -- Keir > Yup, that fixed it. Jiffies now jumps from 0xfffedb08 to 0xfffedb56 as opposed to 0x0000102c or similar. BTW, my test was limited to just booting the box. I'm glad it got resolved before I leave on holidays for a while. So many thanks to all. Mukesh ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-19 4:43 ` Mukesh Rathor 2009-12-21 9:55 ` Jan Beulich @ 2009-12-21 10:44 ` Keir Fraser 2009-12-21 23:40 ` Mukesh Rathor 2009-12-21 19:17 ` Steve Ofsthun 2 siblings, 1 reply; 51+ messages in thread From: Keir Fraser @ 2009-12-21 10:44 UTC (permalink / raw) To: Mukesh Rathor Cc: Kurt Hackel, Dan Magenheimer, Xen-devel@lists.xensource.com, Jeremy Fitzhardinge On 19/12/2009 04:43, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > Ok, I came up with the following patch. Jeremy, can you please take a > look also, and comment on my fix since I noticed you've got the same > issue in your tree. Here's a summary for your benefit: This patch doesn't apply to http://xenbits.xensource.com/linux-2.6.18-xen.hg by the way. The code is different there. So I'm dropping this patch as I have nowhere to put it. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 10:44 ` Keir Fraser @ 2009-12-21 23:40 ` Mukesh Rathor 2009-12-22 7:35 ` Keir Fraser 0 siblings, 1 reply; 51+ messages in thread From: Mukesh Rathor @ 2009-12-21 23:40 UTC (permalink / raw) To: Keir Fraser Cc: Hackel, Kurt, Xen-devel@lists.xensource.com, Dan Magenheimer, Jeremy Fitzhardinge On Mon, 21 Dec 2009 10:44:51 +0000 Keir Fraser <keir.fraser@eu.citrix.com> wrote: > On 19/12/2009 04:43, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > > > Ok, I came up with the following patch. Jeremy, can you please take > > a look also, and comment on my fix since I noticed you've got the > > same issue in your tree. Here's a summary for your benefit: > > This patch doesn't apply to > http://xenbits.xensource.com/linux-2.6.18-xen.hg by the way. The code > is different there. So I'm dropping this patch as I have nowhere to > put it. > > -- Keir > Actually, INITIAL_JIFFIES appears to be buggy on 64bit linux: #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ)) The casting to uint makes it still 0xfffedb08 instead of 0xfffffffffffedb08 which is what the intention is, that jiffies should wrap in few minutes. So, if they fix it in linux in future, my patch will still have the same problem. Ok, I'll come up with another patch. thanks, Mukesh ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 23:40 ` Mukesh Rathor @ 2009-12-22 7:35 ` Keir Fraser 0 siblings, 0 replies; 51+ messages in thread From: Keir Fraser @ 2009-12-22 7:35 UTC (permalink / raw) To: Mukesh Rathor Cc: Kurt Hackel, Dan Magenheimer, Xen-devel@lists.xensource.com, Jeremy Fitzhardinge On 21/12/2009 23:40, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > Actually, INITIAL_JIFFIES appears to be buggy on 64bit linux: > > #define INITIAL_JIFFIES ((unsigned long)(unsigned int) (-300*HZ)) > > The casting to uint makes it still 0xfffedb08 instead of > 0xfffffffffffedb08 which is what the intention is, that jiffies should > wrap in few minutes. So, if they fix it in linux in future, my > patch will still have the same problem. Actually the cast to unsigned int is deliberate. They want jiffies to wrap 32 bits soon after boot, but it should pretty much never wrap 64 bits. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-19 4:43 ` Mukesh Rathor 2009-12-21 9:55 ` Jan Beulich 2009-12-21 10:44 ` Keir Fraser @ 2009-12-21 19:17 ` Steve Ofsthun 2009-12-22 4:00 ` Mukesh Rathor 2 siblings, 1 reply; 51+ messages in thread From: Steve Ofsthun @ 2009-12-21 19:17 UTC (permalink / raw) To: Mukesh Rathor Cc: Dan Magenheimer, Xen-devel@lists.xensource.com, Hackel, jeremy, Keir Fraser, Kurt Mukesh Rathor wrote: > On Fri, 18 Dec 2009 07:02:55 +0000 > Keir Fraser <keir.fraser@eu.citrix.com> wrote: > >> On 18/12/2009 04:36, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: >> >>> The other fix I thought of was to change INITIAL_JIFFIES to >>> something sooner. >>> >>> Would appreciate any help, I don't understand xen time management >>> well. >> This isn't really Xen time code, but unchanged Linux time code. I >> don't know which tree you quoted the code from -- 2.6.18 has similar >> but not identical. Anyway, I suggest try using the jiffy-comparison >> macros from <linux/jiffies.h>: time_before(), time_after(), etc. >> These are designed to work even when jiffies wraps. Feel free to send >> patch(es) for that, if you test that out and it works okay. >> >> -- Keir >> > > Ok, I came up with the following patch. Jeremy, can you please take a > look also, and comment on my fix since I noticed you've got the same > issue in your tree. Here's a summary for your benefit: > > init/calibrate.c : calibrate_delay_direct(): > > start_jiffies = get_jiffies_64(); > while (get_jiffies_64() <= (start_jiffies + tick_divider)) { > pre_start = start; > read_current_timer(&start); > } > Linux time code explicitly forces jiffies (32-bit) to wrap soon after boot to prevent other kernel code from making assumptions about jiffies wrap. In your case, I'm guessing that the scrubbing delay is causing a sufficient number of timer interrupts to be delayed (queued up) that it is forcing the jiffies to wrap earlier in the boot path than expected. As Keir suggests, the correct solution is probably to use the time_before/after macros appropriately. The proposed code avoids the problem by accessing jiffies_64 instead. > if first ever timer interrupt comes after start_jiffies is set, dom0 boot > may hang if delta in timer_interrupt() is so huge that it causes jiffies > to wrap. It appears delta is very large when memory is more than 512GB on > certain boxes causing wrap around. > > why is delta in dom0->timer_interrupt() related to memory on system? > Because hyp creates dom0, then page scrubs, then unpauses vcpu. so it > appears lot of page scurbbing results in huge delta on first tick. The problem here may be that timers are running in the domain while the vcpu is not. Steve ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-21 19:17 ` Steve Ofsthun @ 2009-12-22 4:00 ` Mukesh Rathor 2009-12-22 4:18 ` Mukesh Rathor 2009-12-22 7:59 ` Keir Fraser 0 siblings, 2 replies; 51+ messages in thread From: Mukesh Rathor @ 2009-12-22 4:00 UTC (permalink / raw) To: Steve Ofsthun Cc: Magenheimer, Xen-devel@lists.xensource.com, Hackel, Dan, jeremy, Keir Fraser, Kurt On Mon, 21 Dec 2009 14:17:57 -0500 Steve Ofsthun <steve.ofsthun@oracle.com> wrote: > As Keir suggests, the correct solution is probably to use the > time_before/after macros appropriately. > > The proposed code avoids the problem by accessing jiffies_64 instead. can't use time_after/before as they do signed comparisions. time_after(a,b): ((long)(b) - (long)(a) < 0)) thus, time_after(0xFFFEDB09, 0xFFFEDB08) will return true as will time_after(0x1020, 0xFFFEDB08) as they are both after 0xFFFEDB08. For wrapping, unsigned comparision must be done, which is also the jiffies data type. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 4:00 ` Mukesh Rathor @ 2009-12-22 4:18 ` Mukesh Rathor 2009-12-22 7:59 ` Keir Fraser 1 sibling, 0 replies; 51+ messages in thread From: Mukesh Rathor @ 2009-12-22 4:18 UTC (permalink / raw) To: Mukesh Rathor Cc: Dan Magenheimer, Xen-devel@lists.xensource.com, Hackel, jeremy, Keir Fraser, Kurt, Steve Ofsthun On Mon, 21 Dec 2009 20:00:25 -0800 Mukesh Rathor <mukesh.rathor@oracle.com> wrote: > On Mon, 21 Dec 2009 14:17:57 -0500 > Steve Ofsthun <steve.ofsthun@oracle.com> wrote: > > > As Keir suggests, the correct solution is probably to use the > > time_before/after macros appropriately. > > > > The proposed code avoids the problem by accessing jiffies_64 > > instead. > > can't use time_after/before as they do signed comparisions. > time_after(a,b): ((long)(b) - (long)(a) < 0)) > > thus, time_after(0xFFFEDB09, 0xFFFEDB08) will return true as will > time_after(0x1020, 0xFFFEDB08) as they are both after 0xFFFEDB08. > > For wrapping, unsigned comparision must be done, which is also the > jiffies data type. > actually my bad. it can't be used in if statement to check for wrapping, but i can use it in the while loop here as it seems to only care when jiffies is gone up. ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 4:00 ` Mukesh Rathor 2009-12-22 4:18 ` Mukesh Rathor @ 2009-12-22 7:59 ` Keir Fraser 2009-12-22 8:05 ` Keir Fraser 1 sibling, 1 reply; 51+ messages in thread From: Keir Fraser @ 2009-12-22 7:59 UTC (permalink / raw) To: Mukesh Rathor, Steve Ofsthun Cc: Hackel, Dan Magenheimer, Xen-devel@lists.xensource.com, Jeremy Fitzhardinge, Kurt@acsinet11.oracle.com I'll try and make this *really* clear... On 22/12/2009 04:00, "Mukesh Rathor" <mukesh.rathor@oracle.com> wrote: > can't use time_after/before as they do signed comparisions. > time_after(a,b): ((long)(b) - (long)(a) < 0)) The whole point is to do signed comparison. This gives you reliable +/- (BITS_PER_LONG-1) bits to reliably compare: with 32-bit Linux that means jiffy values which do not differ by more than +/- 2^31 can be reliably compared, regardless of wrapping. Bear in mind that even at HZ=1000, it'll take 3.5 *weeks* for jiffies to increase by 2^31. > thus, time_after(0xFFFEDB09, 0xFFFEDB08) will return true as will > time_after(0x1020, 0xFFFEDB08) as they are both after 0xFFFEDB08. Well yeah: anything in the ranges a=0xFFFEDB09-0xFFFFFFFF and a=0x0-0x7FFEDB09 will return true for time_after(a,0xFFFEDB08). That's how a signed 32-bit comparison works. The assumption here is that 0x1020 is derived from jiffies_64=0x100001020: in general the assumption is that the arguments to time_after() were taken within seconds/minutes/hours of each other, not days/weeks. Which precludes a jiffies_64 difference of >0x7FFFFFFF, which is what would invalidate use of time_after(). > For wrapping, unsigned comparision must be done, which is also the jiffies > data type. If you do 32-bit unsigned comparisons, that is broken by jiffies wrapping, the fixing of which was the whole point of the comparison macros. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
* Re: [timer/ticks related] dom0 hang during boot on large 1TB system 2009-12-22 7:59 ` Keir Fraser @ 2009-12-22 8:05 ` Keir Fraser 0 siblings, 0 replies; 51+ messages in thread From: Keir Fraser @ 2009-12-22 8:05 UTC (permalink / raw) To: Mukesh Rathor, Steve Ofsthun Cc: Hackel, Dan Magenheimer, Xen-devel@lists.xensource.com, Jeremy Fitzhardinge, Kurt@acsinet11.oracle.com On 22/12/2009 07:59, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote: >> For wrapping, unsigned comparision must be done, which is also the jiffies >> data type. > > If you do 32-bit unsigned comparisons, that is broken by jiffies wrapping, > the fixing of which was the whole point of the comparison macros. I'm talking about '(ulong)b<(ulong)a' here of course. '(ulong)b-(ulong)a<0' would always be false, which is even less useful. -- Keir ^ permalink raw reply [flat|nested] 51+ messages in thread
end of thread, other threads:[~2010-01-05 16:08 UTC | newest] Thread overview: 51+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2009-12-18 4:36 [timer/ticks related] dom0 hang during boot on large 1TB system Mukesh Rathor 2009-12-18 7:02 ` Keir Fraser 2009-12-18 8:42 ` Jan Beulich 2009-12-18 9:13 ` Keir Fraser 2009-12-18 16:35 ` Dan Magenheimer 2009-12-18 17:15 ` Keir Fraser 2009-12-18 19:28 ` Mukesh Rathor 2009-12-18 19:25 ` Mukesh Rathor 2009-12-19 4:43 ` Mukesh Rathor 2009-12-21 9:55 ` Jan Beulich 2009-12-21 18:20 ` Dan Magenheimer 2009-12-21 19:07 ` Keir Fraser 2009-12-21 19:52 ` Mukesh Rathor 2009-12-21 19:55 ` Jeremy Fitzhardinge 2009-12-21 22:47 ` Mukesh Rathor 2009-12-21 23:13 ` Jeremy Fitzhardinge 2009-12-21 23:57 ` Dan Magenheimer 2009-12-22 4:31 ` Mukesh Rathor 2009-12-22 8:51 ` Jan Beulich 2009-12-22 10:20 ` Keir Fraser 2009-12-22 11:10 ` Jan Beulich 2009-12-22 13:35 ` Keir Fraser 2009-12-22 14:17 ` Jan Beulich 2009-12-22 14:23 ` Jan Beulich 2009-12-22 15:19 ` Keir Fraser 2009-12-22 15:30 ` Dan Magenheimer 2009-12-22 15:36 ` Jan Beulich 2009-12-22 16:05 ` Dan Magenheimer 2009-12-22 17:02 ` Jan Beulich 2009-12-22 18:03 ` Jeremy Fitzhardinge 2010-01-04 8:23 ` Jan Beulich 2010-01-04 22:07 ` Dan Magenheimer 2010-01-04 22:21 ` Ian Campbell 2010-01-05 8:33 ` Jan Beulich 2010-01-05 15:46 ` Dan Magenheimer 2010-01-05 15:54 ` Ian Campbell 2010-01-05 16:08 ` Jan Beulich 2009-12-22 16:33 ` Jan Beulich 2009-12-22 16:42 ` Jan Beulich 2009-12-22 17:27 ` Dan Magenheimer 2009-12-22 17:48 ` Keir Fraser 2009-12-22 18:42 ` Keir Fraser 2009-12-22 23:00 ` Mukesh Rathor 2009-12-21 10:44 ` Keir Fraser 2009-12-21 23:40 ` Mukesh Rathor 2009-12-22 7:35 ` Keir Fraser 2009-12-21 19:17 ` Steve Ofsthun 2009-12-22 4:00 ` Mukesh Rathor 2009-12-22 4:18 ` Mukesh Rathor 2009-12-22 7:59 ` Keir Fraser 2009-12-22 8:05 ` Keir Fraser
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.