All of lore.kernel.org
 help / color / mirror / Atom feed
* Stability GPLPV - new test results
@ 2011-10-12 13:47 Andreas Kinzler
  2011-10-13  1:23 ` James Harper
  0 siblings, 1 reply; 2+ messages in thread
From: Andreas Kinzler @ 2011-10-12 13:47 UTC (permalink / raw)
  To: James Harper, xen-devel

[-- Attachment #1: Type: text/plain, Size: 694 bytes --]

Hello James,

something quite interesting happened during my stability tests. GPLPV 
0.11.0.213 which I consider stable, showed the same hang as the newer 
GPLPV versions. I now try to find out why even the stable 0.11.0.213 
hangs when it was and is stable on our production systems. There are 3 
possible causes: Xen 4.1.1 vs Xen 4.0.1, dom0 2.6.32.36 vs 2.6.32.18 and 
CPU Xeon E3-1230 vs Xeon X3450 [and board X9SCM-F vs. X8SIL-F].

The attached log show debugkeys for the hang. I find lines 64-66 quite 
interesting where is shows that there is an event channel upcall pending 
on the hung VM2, no problems on VM1 (line 52-54). Could that be a hint 
to the real problem?

Regards Andreas


[-- Attachment #2: x3450-xen --]
[-- Type: text/plain, Size: 13622 bytes --]

'e' pressed -> dumping event-channel info
(XEN) Event channel information for domain 0:
(XEN) Polling vCPUs: {No periodic timer}
(XEN)     port [p/m]
(XEN)        1 [0/0]: s=5 n=0 v=0 x=0
(XEN)        2 [0/0]: s=6 n=0 x=0
(XEN)        3 [0/0]: s=6 n=0 x=0
(XEN)        4 [0/0]: s=5 n=0 v=1 x=0
(XEN)        5 [0/0]: s=6 n=0 x=0
(XEN)        6 [0/0]: s=5 n=1 v=0 x=0
(XEN)        7 [0/0]: s=6 n=1 x=0
(XEN)        8 [0/0]: s=6 n=1 x=0
(XEN)        9 [0/0]: s=5 n=1 v=1 x=0
(XEN)       10 [0/0]: s=6 n=1 x=0
(XEN)       11 [0/0]: s=3 n=0 d=0 p=25 x=0
(XEN)       12 [0/0]: s=5 n=0 v=9 x=0
(XEN)       13 [0/0]: s=4 n=0 p=9 x=0
(XEN)       14 [0/1]: s=5 n=0 v=2 x=0
(XEN)       15 [0/0]: s=4 n=0 p=16 x=0
(XEN)       16 [0/0]: s=4 n=0 p=279 x=0
(XEN)       17 [0/0]: s=4 n=0 p=21 x=0
(XEN)       18 [0/0]: s=4 n=0 p=23 x=0
(XEN)       19 [0/0]: s=4 n=0 p=12 x=0
(XEN)       20 [0/0]: s=4 n=0 p=1 x=0
(XEN)       21 [0/0]: s=4 n=0 p=278 x=0
(XEN)       22 [0/0]: s=4 n=0 p=277 x=0
(XEN)       23 [0/0]: s=4 n=0 p=276 x=0
(XEN)       24 [0/0]: s=4 n=0 p=275 x=0
(XEN)       25 [0/0]: s=3 n=0 d=0 p=11 x=0
(XEN)       26 [0/0]: s=5 n=0 v=3 x=0
(XEN)       27 [0/0]: s=3 n=0 d=1 p=3 x=0
(XEN)       28 [0/0]: s=3 n=0 d=1 p=1 x=0
(XEN)       29 [0/0]: s=3 n=0 d=1 p=2 x=0
(XEN)       30 [0/0]: s=3 n=0 d=2 p=3 x=0
(XEN)       31 [0/0]: s=3 n=0 d=2 p=1 x=0
(XEN)       32 [0/0]: s=3 n=0 d=2 p=2 x=0
(XEN)       33 [0/0]: s=3 n=0 d=1 p=7 x=0
(XEN)       34 [0/0]: s=3 n=0 d=1 p=8 x=0
(XEN)       35 [0/0]: s=3 n=0 d=2 p=7 x=0
(XEN)       36 [0/0]: s=3 n=0 d=2 p=8 x=0
(XEN)       37 [0/0]: s=3 n=0 d=1 p=9 x=0
(XEN)       38 [0/0]: s=3 n=0 d=2 p=9 x=0
(XEN) Event channel information for domain 1:
(XEN) Polling vCPUs: {No periodic timer}
(XEN)     port [p/m]
(XEN)        1 [0/1]: s=3 n=0 d=0 p=28 x=1
(XEN)        2 [0/1]: s=3 n=1 d=0 p=29 x=1
(XEN)        3 [0/0]: s=3 n=0 d=0 p=27 x=0
(XEN)        4 [0/1]: s=2 n=0 d=0 x=0
(XEN)        5 [0/0]: s=6 n=0 x=0
(XEN)        6 [0/0]: s=2 n=0 d=0 x=0
(XEN)        7 [0/0]: s=3 n=0 d=0 p=33 x=0
(XEN)        8 [0/0]: s=3 n=0 d=0 p=34 x=0
(XEN)        9 [0/0]: s=3 n=0 d=0 p=37 x=0
(XEN) Event channel information for domain 2:
(XEN) Polling vCPUs: {No periodic timer}
(XEN)     port [p/m]
(XEN)        1 [0/1]: s=3 n=0 d=0 p=31 x=1
(XEN)        2 [0/1]: s=3 n=1 d=0 p=32 x=1
(XEN)        3 [0/0]: s=3 n=0 d=0 p=30 x=0
(XEN)        4 [0/1]: s=2 n=0 d=0 x=0
(XEN)        5 [0/0]: s=6 n=0 x=0
(XEN)        6 [0/0]: s=2 n=0 d=0 x=0
(XEN)        7 [1/0]: s=3 n=0 d=0 p=35 x=0
(XEN)        8 [1/0]: s=3 n=0 d=0 p=36 x=0
(XEN)        9 [1/0]: s=3 n=0 d=0 p=38 x=0

gnttab_usage_print_all [ key 'g' pressed
(XEN)       -------- active --------       -------- shared --------
(XEN) [ref] localdom mfn      pin          localdom gmfn     flags
(XEN) grant-table for remote domain:    0 ... no active grant table entries
(XEN)       -------- active --------       -------- shared --------
(XEN) [ref] localdom mfn      pin          localdom gmfn     flags
(XEN) grant-table for remote domain:    1 (v1)
(XEN) [15453]        0 0x1ac492 0x00000001          0 0x08fe92 0x19
(XEN) [15461]        0 0x19f2a7 0x00000100          0 0x09d0a7 0x09
(XEN) [15480]        0 0x1ab528 0x00000001          0 0x090f28 0x19
(XEN) [15483]        0 0x19fa2a 0x00000100          0 0x09c82a 0x09
(XEN) [15492]        0 0x1ac8a9 0x00000001          0 0x08faa9 0x19
(XEN) [15575]        0 0x19f2a7 0x00000100          0 0x09d0a7 0x09
(XEN) [15637]        0 0x1ab927 0x00000001          0 0x090b27 0x19
(XEN) [15663]        0 0x1ac993 0x00000001          0 0x08fb93 0x19
(XEN) [15755]        0 0x1aca14 0x00000001          0 0x08f814 0x19
(XEN) [15763]        0 0x1ac3a8 0x00000100          0 0x0901a8 0x09
(XEN) [15781]        0 0x1abf26 0x00000001          0 0x090526 0x19
(XEN) [15782]        0 0x1ad115 0x00000001          0 0x08f315 0x19
(XEN) [15801]        0 0x1ac492 0x00000001          0 0x08fe92 0x19
(XEN) [15804]        0 0x1ac029 0x00000100          0 0x090229 0x09
(XEN) [16156]        0 0x180471 0x00000001          0 0x0bbe71 0x19
(XEN) [16262]        0 0x180467 0x00000001          0 0x0bbe67 0x19
(XEN) [16371]        0 0x1800ae 0x00000001          0 0x0bc2ae 0x19
(XEN) [16383]        0 0x180098 0x00000001          0 0x0bc298 0x19
(XEN)       -------- active --------       -------- shared --------
(XEN) [ref] localdom mfn      pin          localdom gmfn     flags
(XEN) grant-table for remote domain:    2 (v1)
(XEN) [16287]        0 0x08bc68 0x00000001          0 0x0bbe68 0x19
(XEN) [16345]        0 0x08bc69 0x00000001          0 0x0bbe69 0x19
(XEN) [16371]        0 0x08b67b 0x00000001          0 0x0bc47b 0x19
(XEN) [16383]        0 0x08b664 0x00000001          0 0x0bc464 0x19
(XEN) gnttab_usage_print_all ] done

'q' pressed -> dumping domain info (now=0x1C578:AC3FDB38)
(XEN) General information for domain 0:
(XEN)     refcnt=3 dying=0 nr_pages=195681 xenheap_pages=5 dirty_cpus={0,2} max_pages=4294967295
(XEN)     handle=00000000-0000-0000-0000-000000000000 vm_assist=0000000d
(XEN) Rangesets belonging to domain 0:
(XEN)     I/O Ports  { 0-1f, 22-3f, 44-60, 62-9f, a2-807, 80c-cfb, d00-ffff }
(XEN)     Interrupts { 0-279 }
(XEN)     I/O Memory { 0-febff, fec01-fedff, fee01-ffffffffffffffff }
(XEN) Memory pages belonging to domain 0:
(XEN)     DomPage list too long to display
(XEN)     XenPage 000000000023f40c: caf=c000000000000002, taf=7400000000000002
(XEN)     XenPage 000000000023f40b: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000023f40a: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000023f409: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000000bf4bd: caf=c000000000000002, taf=7400000000000002
(XEN) VCPU information and callbacks for domain 0:
(XEN)     VCPU0: CPU0 [has=T] flags=0 poll=0 upcall_pend = 01, upcall_mask = 00 dirty_cpus={0} cpu_affinity={0-127}
(XEN)     No periodic timer
(XEN)     VCPU1: CPU2 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={2} cpu_affinity={0-127}
(XEN)     No periodic timer
(XEN) General information for domain 1:
(XEN)     refcnt=3 dying=0 nr_pages=773080 xenheap_pages=34 dirty_cpus={1,3} max_pages=774400
(XEN)     handle=b1726fc7-dd8a-43b7-93b3-e6275d0611db vm_assist=00000000
(XEN)     paging assistance: hap refcounts translate external 
(XEN) Rangesets belonging to domain 1:
(XEN)     I/O Ports  { }
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN) Memory pages belonging to domain 1:
(XEN)     DomPage list too long to display
(XEN)     PoD entries=0 cachesize=0
(XEN)     XenPage 000000000021a269: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000021a268: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000021a267: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000021a266: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000000bf2f0: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000021a120: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050bd: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050bc: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050bb: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050ba: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050b9: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050b8: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050b7: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050b6: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050b5: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050b4: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050b3: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050b2: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050b1: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050b0: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050af: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050ae: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050ad: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050ac: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050ab: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050aa: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050a9: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050a8: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050a7: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050a6: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050a5: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050a4: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050a3: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000002050a2: caf=c000000000000001, taf=7400000000000001
(XEN) VCPU information and callbacks for domain 1:
(XEN)     VCPU0: CPU1 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={1} cpu_affinity={0-127}
(XEN)     paging assistance: hap, 4 levels
(XEN)     No periodic timer
(XEN)     VCPU1: CPU3 [has=T] flags=0 poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={3} cpu_affinity={0-127}
(XEN)     paging assistance: hap, 4 levels
(XEN)     No periodic timer
(XEN) General information for domain 2:
(XEN)     refcnt=3 dying=0 nr_pages=773080 xenheap_pages=34 dirty_cpus={} max_pages=774400
(XEN)     handle=fca5ebe7-0b9d-4bc5-9e35-fad9ba95f9a9 vm_assist=00000000
(XEN)     paging assistance: hap refcounts log_dirty translate external 
(XEN) Rangesets belonging to domain 2:
(XEN)     I/O Ports  { }
(XEN)     Interrupts { }
(XEN)     I/O Memory { }
(XEN) Memory pages belonging to domain 2:
(XEN)     DomPage list too long to display
(XEN)     PoD entries=0 cachesize=0
(XEN)     XenPage 0000000000200c0b: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 0000000000200c2a: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 0000000000201577: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000020156c: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 00000000000bf473: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 0000000000204847: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f015: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f014: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f013: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f012: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f011: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f010: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f00f: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f00e: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f00d: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f00c: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f00b: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f00a: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f009: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f008: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f007: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f006: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f005: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f004: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f003: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f002: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f001: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f000: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f47f: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f47e: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f47d: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f47c: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f47b: caf=c000000000000001, taf=7400000000000001
(XEN)     XenPage 000000000022f47a: caf=c000000000000001, taf=7400000000000001
(XEN) VCPU information and callbacks for domain 2:
(XEN)     VCPU0: CPU0 [has=F] flags=1 poll=0 upcall_pend = 01, upcall_mask = 00 dirty_cpus={} cpu_affinity={0-127}
(XEN)     paging assistance: hap, 4 levels
(XEN)     No periodic timer
(XEN)     VCPU1: CPU2 [has=F] flags=1 poll=0 upcall_pend = 00, upcall_mask = 00 dirty_cpus={} cpu_affinity={0-127}
(XEN)     paging assistance: hap, 4 levels
(XEN)     No periodic timer
(XEN) Notifying guest 0:0 (virq 1, port 4, stat 0/0/-1)
(XEN) Notifying guest 0:1 (virq 1, port 9, stat 0/0/0)
(XEN) Notifying guest 1:0 (virq 1, port 0, stat 0/-1/0)
(XEN) Notifying guest 1:1 (virq 1, port 0, stat 0/-1/0)
(XEN) Notifying guest 2:0 (virq 1, port 0, stat 0/-1/-1)
(XEN) Notifying guest 2:1 (virq 1, port 0, stat 0/-1/0)

[-- Attachment #3: Type: text/plain, Size: 138 bytes --]

_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 2+ messages in thread

* RE: Stability GPLPV - new test results
  2011-10-12 13:47 Stability GPLPV - new test results Andreas Kinzler
@ 2011-10-13  1:23 ` James Harper
  0 siblings, 0 replies; 2+ messages in thread
From: James Harper @ 2011-10-13  1:23 UTC (permalink / raw)
  To: Andreas Kinzler, xen-devel

> Hello James,
> 
> something quite interesting happened during my stability tests. GPLPV
> 0.11.0.213 which I consider stable, showed the same hang as the newer
> GPLPV versions. I now try to find out why even the stable 0.11.0.213
hangs
> when it was and is stable on our production systems. There are 3
possible
> causes: Xen 4.1.1 vs Xen 4.0.1, dom0 2.6.32.36 vs 2.6.32.18 and CPU
Xeon E3-
> 1230 vs Xeon X3450 [and board X9SCM-F vs. X8SIL-F].
> 
> The attached log show debugkeys for the hang. I find lines 64-66 quite
> interesting where is shows that there is an event channel upcall
pending on
> the hung VM2, no problems on VM1 (line 52-54). Could that be a hint to
the
> real problem?
> 

Could be, or it could just be a side effect - eg the machine has hung
and can't process any further events that come through.

One thing I thought of... virtualisation gives an interesting
opportunity to exaggerate race conditions. If you have 8 vCPU's in a
DomU but only let one or two physical CPUs service those 8 vCPU's, then
it can give rise to race conditions which could only be rarely seen (or
never seen) in normal operation. It's awful for performance but if you
could try that and see if it gives rise to crashes a bit more frequently
it might help us track down the problem.

Thanks

James

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2011-10-13  1:23 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-10-12 13:47 Stability GPLPV - new test results Andreas Kinzler
2011-10-13  1:23 ` James Harper

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.