* Problems using xl migrate
@ 2014-11-22 19:24 M A Young
2014-11-24 0:07 ` M A Young
` (2 more replies)
0 siblings, 3 replies; 25+ messages in thread
From: M A Young @ 2014-11-22 19:24 UTC (permalink / raw)
To: xen-devel
[-- Attachment #1: Type: TEXT/PLAIN, Size: 987 bytes --]
While investigating a bug reported on Red Hat Bugzilla
https://bugzilla.redhat.com/show_bug.cgi?id=1166461
I discovered the following
xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the
bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works.
There are actually two issues here
* the segfault in libxl-save-helper --restore-domain (as reported in the
bug above) occurs if the guest memory is 1024M (on my 4G box) and is
presumably because the allocated memory eventually runs out
* the segfault doesn't occur if the guest memory is 128M, but the
migration still fails. The first attached file contains the log from a run
with xl -v migrate --debug domid localhost (with mfn and duplicated lines
stripped out to make the size manageable).
I then tried xen 4.5-rc1 to see if the bug was fixed and found that xl
migrate doesn't work for me at all - see the second attached file for the
output of xl -v migrate domid localhost .
Mchael Young
[-- Attachment #2: Type: TEXT/PLAIN, Size: 6186 bytes --]
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x0/0x0/1294)
Loading new save file <incoming migration stream> (new xl fmt info 0x0/0x0/1294)
Savefile contains xl domain config
xc: detail: xc_domain_save: starting save of domid 23
xc: detail: Had 0 unexplained entries in p2m table
xc: detail:
xc: progress: Reloading memory pages: 2048/32768 6%
xc: detail:
xc: progress: Reloading memory pages: 6144/32768 18%
xc: detail:
xc: xc: detail:
xc: detail: Reloading memory pages: 8192/32768 25%
xc: detail:
xc: xc: progress: Reloading memory pages: 10240/32768 31%
xc: detail:
xc: xc: progress: Reloading memory pages: 14336/32768 43%
xc: detail:
xc: progress: Reloading memory pages: 16384/32768 50%
xc: detail:
xc: progress: Reloading memory pages: 18432/32768 56%
xc: detail:
xc: xc: progress: Reloading memory pages: 20480/32768 62%
detail:
xc: detail:
xc: progress: Reloading memory pages: 22528/32768 68%
xc: detail:
xc: xc: progress: Reloading memory pages: 24576/32768 75%
detail:
xc: detail:
xc: xc: detail:
Reloading memory pages: 26624/32768 81%xc: detail:
xc:
xc: detail:
xc: xc: progress: Reloading memory pages: 28672/32768 87%
detail:
xc: detail:
xc: progress: Reloading memory pages: 30720/32768 93%
xc: detail:
xc: detail: delta 4599ms, dom0 96%, target 0%, sent 232Mb/s, dirtied 0Mb/s 82 pages
xc: detail:
progress: Reloading memory pages: 32687/32768 99%
xc: detail:
xc: detail: delta 1659ms, dom0 90%, target 0%, sent 1Mb/s, dirtied 1Mb/s 82 pages
xc: detail:
xc: detail: delta 1670ms, dom0 85%, target 0%, sent 1Mb/s, dirtied 1Mb/s 81 pages
xc: detail:
xc: detail: delta 1679ms, dom0 85%, target 0%, sent 1Mb/s, dirtied 1Mb/s 73 pages
xc: detail:
xc: detail: delta 1666ms, dom0 86%, target 0%, sent 1Mb/s, dirtied 1Mb/s 81 pages
xc: detail:
xc: detail: delta 1683ms, dom0 85%, target 0%, sent 1Mb/s, dirtied 1Mb/s 73 pages
xc: detail:
xc: detail: delta 1674ms, dom0 85%, target 0%, sent 1Mb/s, dirtied 1Mb/s 83 pages
xc: detail:
xc: detail: delta 1678ms, dom0 85%, target 0%, sent 1Mb/s, dirtied 1Mb/s 82 pages
xc: detail:
xc: detail: delta 1676ms, dom0 85%, target 0%, sent 1Mb/s, dirtied 1Mb/s 73 pages
xc: detail:
xc: detail: delta 1689ms, dom0 87%, target 0%, sent 1Mb/s, dirtied 1Mb/s 82 pages
xc: detail:
xc: detail: delta 1684ms, dom0 85%, target 0%, sent 1Mb/s, dirtied 1Mb/s 73 pages
xc: detail:
xc: detail: delta 1678ms, dom0 86%, target 0%, sent 1Mb/s, dirtied 1Mb/s 81 pages
xc: detail:
xc: detail: delta 1664ms, dom0 86%, target 0%, sent 1Mb/s, dirtied 1Mb/s 75 pages
xc: detail:
xc: detail: delta 1701ms, dom0 86%, target 0%, sent 1Mb/s, dirtied 1Mb/s 90 pages
xc: detail:
xc: detail: delta 1678ms, dom0 87%, target 0%, sent 1Mb/s, dirtied 1Mb/s 74 pages
xc: detail:
xc: detail: delta 1651ms, dom0 89%, target 0%, sent 1Mb/s, dirtied 1Mb/s 75 pages
xc: detail:
xc: detail: delta 1677ms, dom0 85%, target 0%, sent 1Mb/s, dirtied 1Mb/s 81 pages
xc: detail:
xc: detail: delta 1644ms, dom0 89%, target 0%, sent 1Mb/s, dirtied 1Mb/s 73 pages
xc: detail:
xc: detail:
xc: detail: delta 1656ms, dom0 87%, target 0%, sent 1Mb/s, dirtied 1Mb/s 82 pages
xc: detail:
xc: detail: delta 1647ms, dom0 88%, target 0%, sent 1Mb/s, dirtied 1Mb/s 83 pages
xc: detail:
xc: detail: delta 1660ms, dom0 86%, target 0%, sent 1Mb/s, dirtied 1Mb/s 74 pages
xc: detail:
xc: detail: delta 2760ms, dom0 98%, target 0%, sent 0Mb/s, dirtied 0Mb/s 84 pages
xc: detail:
xc: detail: delta 1635ms, dom0 91%, target 0%, sent 1Mb/s, dirtied 1Mb/s 81 pages
xc: detail:
xc: detail: delta 2035ms, dom0 96%, target 0%, sent 1Mb/s, dirtied 1Mb/s 75 pages
xc: detail:
xc: detail: delta 2218ms, dom0 97%, target 0%, sent 1Mb/s, dirtied 1Mb/s 90 pages
xc: detail:
xc: detail: delta 1666ms, dom0 96%, target 0%, sent 1Mb/s, dirtied 1Mb/s 73 pages
xc: detail:
xc: detail: delta 1687ms, dom0 99%, target 0%, sent 1Mb/s, dirtied 1Mb/s 74 pages
xc: detail:
xc: detail: delta 1765ms, dom0 95%, target 0%, sent 1Mb/s, dirtied 1Mb/s 82 pages
xc: detail:
xc: detail: Start last iteration
xc: detail: SUSPEND shinfo 000dc0cb
xc: detail: delta 1906ms, dom0 86%, target 0%, sent 1Mb/s, dirtied 3Mb/s 208 pages
xc: detail:
xc: detail: delta 2417ms, dom0 92%, target 0%, sent 2Mb/s, dirtied 2Mb/s 208 pages
xc: detail: Total pages sent= 35107 (1.07x)
xc: detail: (of which 0 were fixups)
xc: detail: Entering debug resend-all mode
xc: progress: Reloading memory pages: 36131/32768 110%
xc: progress: Reloading memory pages: 38179/32768 116%
xc: progress: Reloading memory pages: 40227/32768 122%
xc: progress: Reloading memory pages: 42275/32768 129%
xc: progress: Reloading memory pages: 44323/32768 135%
xc: progress: Reloading memory pages: 46371/32768 141%
xc: progress: Reloading memory pages: 48419/32768 147%
xc: progress: Reloading memory pages: 50467/32768 154%
xc: progress: Reloading memory pages: 52515/32768 160%
xc: progress: Reloading memory pages: 54563/32768 166%
xc: progress: Reloading memory pages: 56611/32768 172%
xc: progress: Reloading memory pages: 58659/32768 179%
xc: progress: Reloading memory pages: 60707/32768 185%
xc: progress: Reloading memory pages: 62755/32768 191%
xc: progress: Reloading memory pages: 64803/32768 197%
xc: detail: delta 1662ms, dom0 95%, target 0%, sent 646Mb/s, dirtied 4Mb/s 208 pages
xc: detail: Total pages sent= 67875 (2.07x)
xc: detail: (of which 0 were fixups)
xc: detail: All memory is saved
xc: progress: Reloading memory pages: 66851/32768 204%
xc: detail: Save exit of domid 23 with rc=0
migration receiver stream contained unexpected data instead of ready message
(command run was: exec ssh localhost xl migrate-receive -d )
migration target: Transfer complete, requesting permission to start domain.
libxl: error: libxl_utils.c:396:libxl_read_exactly: file/stream truncated reading GO message from migration stream
migration target: Failure, destroying our copy.
migration target: Cleanup OK, granting sender permission to resume.
Migration failed, resuming at sender.
[-- Attachment #3: Type: TEXT/PLAIN, Size: 15819 bytes --]
migration target: Ready to receive domain.
Saving to migration stream new xl format (info 0x1/0x0/1182)
Loading new save file <incoming migration stream> (new xl fmt info 0x1/0x0/1182)
Savefile contains xl domain config in JSON format
xc: detail: xc_domain_save: starting save of domid 4
xc: detail: Had 0 unexplained entries in p2m table
xc: detail: xc_domain_restore: starting restore of new domid 6
xc: detail: xc_domain_restore: p2m_size = 40000
xc: detail: Mapping order 0, 1024; first pfn 0
xc: detail: Mapping order 0, 1021; first pfn 400
xc: detail: Mapping order 0, 1024; first pfn 800
xc: detail: Mapping order 0, 950; first pfn c00
xc: detail: Mapping order 0, 1024; first pfn 1000
xc: detail: Mapping order 0, 1024; first pfn 1400
xc: detail: Mapping order 0, 1022; first pfn 1800
xc: detail: Mapping order 0, 1017; first pfn 1c01
xc: detail: Mapping order 0, 1018; first pfn 200d
xc: detail: Mapping order 0, 1024; first pfn 240d
xc: detail: Mapping order 0, 828; first pfn 280d
xc: detail: Mapping order 0, 1006; first pfn 2c0d
xc: detail: Mapping order 0, 1024; first pfn 300d
xc: detail: Mapping order 0, 1016; first pfn 340f
xc: detail: Mapping order 0, 1022; first pfn 3810
xc: detail: Mapping order 0, 901; first pfn 3c10
xc: detail: Mapping order 0, 1024; first pfn 4010
xc: detail: Mapping order 0, 1024; first pfn 4410
xc: detail: Mapping order 0, 1024; first pfn 4810
xc: detail: Mapping order 0, 1024; first pfn 4c10
xc: detail: Mapping order 0, 1024; first pfn 5010
xc: detail: Mapping order 0, 1024; first pfn 5410
xc: detail: Mapping order 0, 1024; first pfn 5810
xc: detail: Mapping order 0, 1024; first pfn 5c10
xc: detail: Mapping order 0, 1024; first pfn 6010
xc: detail: Mapping order 0, 1024; first pfn 6410
xc: detail: Mapping order 0, 1024; first pfn 6810
xc: detail: Mapping order 0, 1024; first pfn 6c10
xc: detail: Mapping order 0, 1024; first pfn 7010
xc: detail: Mapping order 0, 1024; first pfn 7410
xc: detail: Mapping order 0, 1024; first pfn 7810
xc: detail: Mapping order 0, 1024; first pfn 7c10
xc: detail: Mapping order 0, 1024; first pfn 8010
xc: detail: Mapping order 0, 1024; first pfn 8410
xc: detail: Mapping order 0, 1024; first pfn 8810
xc: detail: Mapping order 0, 1024; first pfn 8c10
xc: detail: Mapping order 0, 1024; first pfn 9010
xc: detail: Mapping order 0, 1024; first pfn 9410
xc: detail: Mapping order 0, 1024; first pfn 9810
xc: detail: Mapping order 0, 1024; first pfn 9c10
xc: detail: Mapping order 0, 1024; first pfn a010
xc: detail: Mapping order 0, 1024; first pfn a410
xc: detail: Mapping order 0, 1024; first pfn a810
xc: detail: Mapping order 0, 1024; first pfn ac10
xc: detail: Mapping order 0, 1024; first pfn b010
xc: detail: Mapping order 0, 1024; first pfn b410
xc: detail: Mapping order 0, 1024; first pfn b810
xc: detail: Mapping order 0, 1024; first pfn bc10
xc: detail: Mapping order 0, 1024; first pfn c010
xc: detail: Mapping order 0, 1024; first pfn c410
xc: detail: Mapping order 0, 1024; first pfn c810
xc: detail: Mapping order 0, 1024; first pfn cc10
xc: detail: Mapping order 0, 1024; first pfn d010
xc: detail: Mapping order 0, 1024; first pfn d410
xc: detail: Mapping order 0, 1024; first pfn d810
xc: detail: Mapping order 0, 1024; first pfn dc10
xc: detail: Mapping order 0, 1024; first pfn e010
xc: detail: Mapping order 0, 1024; first pfn e410
xc: detail: Mapping order 0, 1024; first pfn e810
xc: detail: Mapping order 0, 1024; first pfn ec10
xc: detail: Mapping order 0, 1024; first pfn f010
xc: detail: Mapping order 0, 1024; first pfn f410
xc: detail: Mapping order 0, 1024; first pfn f810
xc: detail: Mapping order 0, 1024; first pfn fc10
xc: detail: Mapping order 0, 1024; first pfn 10010
xc: detail: Mapping order 0, 1024; first pfn 10410
xc: detail: Mapping order 0, 1024; first pfn 10810
xc: detail: Mapping order 0, 1024; first pfn 10c10
xc: detail: Mapping order 0, 1024; first pfn 11010
xc: detail: Mapping order 0, 1024; first pfn 11410
xc: detail: Mapping order 0, 1024; first pfn 11810
xc: detail: Mapping order 0, 1024; first pfn 11c10
xc: detail: Mapping order 0, 1024; first pfn 12010
xc: detail: Mapping order 0, 1024; first pfn 12410
xc: detail: Mapping order 0, 1024; first pfn 12810
xc: detail: Mapping order 0, 1024; first pfn 12c10
xc: detail: Mapping order 0, 1024; first pfn 13010
xc: detail: Mapping order 0, 1024; first pfn 13410
xc: detail: Mapping order 0, 1024; first pfn 13810
xc: detail: Mapping order 0, 1024; first pfn 13c10
xc: detail: Mapping order 0, 1024; first pfn 14010
xc: detail: Mapping order 0, 1024; first pfn 14410
xc: detail: Mapping order 0, 1024; first pfn 14810
xc: detail: Mapping order 0, 1024; first pfn 14c10
xc: detail: Mapping order 0, 1024; first pfn 15010
xc: detail: Mapping order 0, 1024; first pfn 15410
xc: detail: Mapping order 0, 1024; first pfn 15810
xc: detail: Mapping order 0, 1024; first pfn 15c10
xc: detail: Mapping order 0, 1024; first pfn 16010
xc: detail: Mapping order 0, 1024; first pfn 16410
xc: detail: Mapping order 0, 1024; first pfn 16810
xc: detail: Mapping order 0, 1024; first pfn 16c10
xc: detail: Mapping order 0, 1024; first pfn 17010
xc: detail: Mapping order 0, 1024; first pfn 17410
xc: detail: Mapping order 0, 1024; first pfn 17810
xc: detail: Mapping order 0, 1024; first pfn 17c10
xc: detail: Mapping order 0, 1024; first pfn 18010
xc: detail: Mapping order 0, 1024; first pfn 18410
xc: detail: Mapping order 0, 1024; first pfn 18810
xc: detail: Mapping order 0, 1024; first pfn 18c10
xc: detail: Mapping order 0, 1024; first pfn 19010
xc: detail: Mapping order 0, 1024; first pfn 19410
xc: detail: Mapping order 0, 1024; first pfn 19810
xc: detail: Mapping order 0, 1024; first pfn 19c10
xc: detail: Mapping order 0, 1024; first pfn 1a010
xc: detail: Mapping order 0, 1024; first pfn 1a410
xc: detail: Mapping order 0, 1024; first pfn 1a810
xc: detail: Mapping order 0, 1024; first pfn 1ac10
xc: detail: Mapping order 0, 1024; first pfn 1b010
xc: detail: Mapping order 0, 1024; first pfn 1b410
xc: detail: Mapping order 0, 1024; first pfn 1b810
xc: detail: Mapping order 0, 1024; first pfn 1bc10
xc: detail: Mapping order 0, 1024; first pfn 1c010
xc: detail: Mapping order 0, 1024; first pfn 1c410
xc: detail: Mapping order 0, 1024; first pfn 1c810
xc: detail: Mapping order 0, 1024; first pfn 1cc10
xc: detail: Mapping order 0, 1024; first pfn 1d010
xc: detail: Mapping order 0, 1024; first pfn 1d410
xc: detail: Mapping order 0, 1024; first pfn 1d810
xc: detail: Mapping order 0, 1024; first pfn 1dc10
xc: detail: Mapping order 0, 1024; first pfn 1e010
xc: detail: Mapping order 0, 1024; first pfn 1e410
xc: detail: Mapping order 0, 1024; first pfn 1e810
xc: detail: Mapping order 0, 1024; first pfn 1ec10
xc: detail: Mapping order 0, 1024; first pfn 1f010
xc: detail: Mapping order 0, 1024; first pfn 1f410
xc: detail: Mapping order 0, 1024; first pfn 1f810
xc: detail: Mapping order 0, 1024; first pfn 1fc10
xc: detail: Mapping order 0, 1024; first pfn 20010
xc: detail: Mapping order 0, 1024; first pfn 20410
xc: detail: Mapping order 0, 1024; first pfn 20810
xc: detail: Mapping order 0, 1024; first pfn 20c10
xc: detail: Mapping order 0, 1024; first pfn 21010
xc: detail: Mapping order 0, 1024; first pfn 21410
xc: detail: Mapping order 0, 1024; first pfn 21810
xc: detail: Mapping order 0, 1024; first pfn 21c10
xc: detail: Mapping order 0, 1024; first pfn 22010
xc: detail: Mapping order 0, 1024; first pfn 22410
xc: detail: Mapping order 0, 1024; first pfn 22810
xc: detail: Mapping order 0, 1024; first pfn 22c10
xc: detail: Mapping order 0, 1024; first pfn 23010
xc: detail: Mapping order 0, 1024; first pfn 23410
xc: detail: Mapping order 0, 1024; first pfn 23810
xc: detail: Mapping order 0, 1024; first pfn 23c10
xc: detail: Mapping order 0, 1024; first pfn 24010
xc: detail: Mapping order 0, 1024; first pfn 24410
xc: detail: Mapping order 0, 1024; first pfn 24810
xc: detail: Mapping order 0, 1024; first pfn 24c10
xc: detail: Mapping order 0, 1024; first pfn 25010
xc: detail: Mapping order 0, 1024; first pfn 25410
xc: detail: Mapping order 0, 1024; first pfn 25810
xc: detail: Mapping order 0, 1024; first pfn 25c10
xc: detail: Mapping order 0, 1024; first pfn 26010
xc: detail: Mapping order 0, 1024; first pfn 26410
xc: detail: Mapping order 0, 1024; first pfn 26810
xc: detail: Mapping order 0, 1024; first pfn 26c10
xc: detail: Mapping order 0, 1024; first pfn 27010
xc: detail: Mapping order 0, 1024; first pfn 27410
xc: detail: Mapping order 0, 1024; first pfn 27810
xc: detail: Mapping order 0, 1024; first pfn 27c10
xc: detail: Mapping order 0, 1024; first pfn 28010
xc: detail: Mapping order 0, 1024; first pfn 28410
xc: detail: Mapping order 0, 1024; first pfn 28810
xc: detail: Mapping order 0, 1024; first pfn 28c10
xc: detail: Mapping order 0, 1024; first pfn 29010
xc: detail: Mapping order 0, 1024; first pfn 29410
xc: detail: Mapping order 0, 1024; first pfn 29810
xc: detail: Mapping order 0, 1024; first pfn 29c10
xc: detail: Mapping order 0, 1024; first pfn 2a010
xc: detail: Mapping order 0, 1024; first pfn 2a410
xc: detail: Mapping order 0, 1024; first pfn 2a810
xc: detail: Mapping order 0, 1024; first pfn 2ac10
xc: detail: Mapping order 0, 1024; first pfn 2b010
xc: detail: Mapping order 0, 1024; first pfn 2b410
xc: detail: Mapping order 0, 1024; first pfn 2b810
xc: detail: Mapping order 0, 1024; first pfn 2bc10
xc: detail: Mapping order 0, 1024; first pfn 2c010
xc: detail: Mapping order 0, 1024; first pfn 2c410
xc: detail: Mapping order 0, 1024; first pfn 2c810
xc: detail: Mapping order 0, 1024; first pfn 2cc10
xc: detail: Mapping order 0, 1024; first pfn 2d010
xc: detail: Mapping order 0, 1024; first pfn 2d410
xc: detail: Mapping order 0, 1024; first pfn 2d810
xc: detail: Mapping order 0, 1021; first pfn 2dc10
xc: detail: Mapping order 0, 1000; first pfn 2e010
xc: detail: Mapping order 0, 165; first pfn 2e413
xc: detail: Mapping order 0, 333; first pfn 2e813
xc: detail: Mapping order 0, 603; first pfn 2ec13
xc: detail: Mapping order 0, 973; first pfn 2f013
xc: detail: Mapping order 0, 866; first pfn 2f413
xc: detail: Mapping order 0, 904; first pfn 2f814
xc: detail: Mapping order 0, 930; first pfn 2fc18
xc: detail: Mapping order 0, 941; first pfn 30022
xc: detail: Mapping order 0, 983; first pfn 30424
xc: detail: Mapping order 0, 976; first pfn 30824
xc: detail: Mapping order 0, 973; first pfn 30c24
xc: detail: Mapping order 0, 977; first pfn 31024
xc: detail: Mapping order 0, 976; first pfn 31424
xc: detail: Mapping order 0, 994; first pfn 31827
xc: detail: Mapping order 0, 917; first pfn 31c27
xc: detail: Mapping order 0, 870; first pfn 32027
xc: detail: Mapping order 0, 968; first pfn 32427
xc: detail: Mapping order 0, 884; first pfn 32827
xc: detail: Mapping order 0, 951; first pfn 32c27
xc: detail: Mapping order 0, 949; first pfn 33027
xc: detail: Mapping order 0, 998; first pfn 33427
xc: detail: Mapping order 0, 1011; first pfn 33827
xc: detail: Mapping order 0, 910; first pfn 33c2b
xc: detail: Mapping order 0, 908; first pfn 3402b
xc: detail: Mapping order 0, 708; first pfn 34434
xc: detail: Mapping order 0, 835; first pfn 34834
xc: detail: Mapping order 0, 711; first pfn 34c34
xc: detail: Mapping order 0, 525; first pfn 35034
xc: detail: Mapping order 0, 637; first pfn 35434
xc: detail: Mapping order 0, 712; first pfn 35834
xc: detail: Mapping order 0, 956; first pfn 35c39
xc: detail: Mapping order 0, 668; first pfn 36035
xc: detail: Mapping order 0, 862; first pfn 36435
xc: detail: Mapping order 0, 927; first pfn 36835
xc: detail: Mapping order 0, 938; first pfn 36c35
xc: detail: Mapping order 0, 908; first pfn 37035
xc: detail: Mapping order 0, 880; first pfn 37435
xc: detail: Mapping order 0, 958; first pfn 37835
xc: detail: Mapping order 0, 920; first pfn 37c35
xc: detail: Mapping order 0, 850; first pfn 38036
xc: detail: Mapping order 0, 966; first pfn 38436
xc: detail: Mapping order 0, 971; first pfn 3883b
xc: detail: Mapping order 0, 991; first pfn 38c3a
xc: detail: Mapping order 0, 951; first pfn 39041
xc: detail: Mapping order 0, 838; first pfn 3943e
xc: detail: Mapping order 0, 813; first pfn 3983e
xc: detail: Mapping order 0, 837; first pfn 39c3e
xc: detail: Mapping order 0, 784; first pfn 3a03e
xc: detail: Mapping order 0, 806; first pfn 3a458
xc: detail: Mapping order 0, 886; first pfn 3a842
xc: detail: Mapping order 0, 1022; first pfn 3ac42
xc: detail: Mapping order 0, 940; first pfn 3b042
xc: detail: Mapping order 0, 958; first pfn 3b442
xc: detail: Mapping order 0, 980; first pfn 3b844
xc: detail: Mapping order 0, 757; first pfn 3bc60
xc: detail: Mapping order 0, 466; first pfn 3ca00
xc: detail: Mapping order 0, 925; first pfn 3cca1
xc: detail: Mapping order 0, 1024; first pfn 3d0ae
xc: detail: Mapping order 0, 961; first pfn 3d4af
xc: detail: Mapping order 0, 1023; first pfn 3d8ba
xc: detail: Mapping order 0, 936; first pfn 3dccc
xc: detail: Mapping order 0, 900; first pfn 3e0d3
xc: detail: Mapping order 0, 823; first pfn 3e4e2
xc: detail: Mapping order 0, 1024; first pfn 3e8d7
xc: detail: Mapping order 0, 1024; first pfn 3ecd7
xc: detail: Mapping order 0, 1024; first pfn 3f0d7
xc: detail: Mapping order 0, 886; first pfn 3f4d7
xc: detail: Mapping order 0, 922; first pfn 3f8f4
xc: detail: delta 15801ms, dom0 95%, target 0%, sent 543Mb/s, dirtied 0Mb/s 314 pages
xc: detail: Mapping order 0, 268; first pfn 3fcf4
xc: detail: delta 23ms, dom0 100%, target 0%, sent 447Mb/s, dirtied 0Mb/s 0 pages
xc: detail: Start last iteration
xc: Reloading memory pages: 262213/262144 100%xc: detail: SUSPEND shinfo 00082fbc
xc: detail: delta 17ms, dom0 58%, target 58%, sent 0Mb/s, dirtied 1033Mb/s 536 pages
xc: detail: delta 8ms, dom0 100%, target 0%, sent 2195Mb/s, dirtied 2195Mb/s 536 pages
xc: detail: Total pages sent= 262749 (1.00x)
xc: detail: (of which 0 were fixups)
xc: detail: All memory is saved
xc: error: Error querying maximum number of MSRs for VCPU0 (1 = Operation not permitted): Internal error
xc: detail: Save exit of domid 4 with errno=1
libxl: error: libxl_dom.c:1864:libxl__xc_domain_save_done: saving domain: domain responded to suspend request: Operation not permitted
libxl: error: libxl_dom.c:2021:remus_teardown_done: Remus: failed to teardown device for guest with domid 4, rc -3
migration sender: libxl_domain_suspend failed (rc=-3)
xc: error: 0-length read: Internal error
xc: error: rdexact failed (read rc: 0, errno: 0): Internal error
xc: error: Error when reading shared info page (0 = Success): Internal error
xc: error: error buffering image tail: Internal error
xc: detail: Restore exit of domid 6 with rc=1
libxl: error: libxl_create.c:1032:libxl__xc_domain_restore_done: restoring domain: Resource temporarily unavailable
libxl: error: libxl_create.c:1104:domcreate_rebuild_done: cannot (re-)build domain: -3
libxl: error: libxl.c:1542:libxl__destroy_domid: non-existant domain 6
libxl: error: libxl.c:1506:domain_destroy_callback: unable to destroy guest with domid 6
libxl: error: libxl_create.c:1462:domcreate_destruction_cb: unable to destroy domain 6 following failed creation
migration target: Domain creation failed (code -3).
libxl: info: libxl_exec.c:118:libxl_report_child_exitstatus: migration transport process [5326] exited with error status 3
Migration failed, resuming at sender.
[-- Attachment #4: Type: text/plain, Size: 126 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xen.org
http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: Problems using xl migrate
2014-11-22 19:24 Problems using xl migrate M A Young
@ 2014-11-24 0:07 ` M A Young
2014-11-24 11:50 ` George Dunlap
2014-11-24 12:25 ` George Dunlap
2014-11-24 12:41 ` Wei Liu
2 siblings, 1 reply; 25+ messages in thread
From: M A Young @ 2014-11-24 0:07 UTC (permalink / raw)
To: xen-devel
On Sat, 22 Nov 2014, M A Young wrote:
> While investigating a bug reported on Red Hat Bugzilla
> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
> I discovered the following
>
> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the bug
> report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There are
> actually two issues here
>
> * the segfault in libxl-save-helper --restore-domain (as reported in the bug
> above) occurs if the guest memory is 1024M (on my 4G box) and is presumably
> because the allocated memory eventually runs out
I have found a bit more out about this. The segfault at at line 1378 of
tools/libxc/xc_domain_restore.c which is
DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
"actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
csum_page(region_base + (i + curbatch)*PAGE_SIZE),
csum_page(buf));
and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the
array. This occurs in the verification phase.
> * the segfault doesn't occur if the guest memory is 128M, but the migration
> still fails. The first attached file contains the log from a run with xl -v
> migrate --debug domid localhost (with mfn and duplicated lines stripped out
> to make the size manageable).
The difference actually seems to be down to how active the VM is rather
than the memory size (my small memory test system was doing very little,
my larger system was a full OS install). In the non-segfault case the
problem was the printf and printf_info commands in the create_domain()
routine in tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass
status messages back from the restoring dom0, these commands cause an
unexpected message. If you move them onto stderr then the migration
completes in the non-segfault case.
Michael Young
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: Problems using xl migrate
2014-11-24 0:07 ` M A Young
@ 2014-11-24 11:50 ` George Dunlap
2014-11-24 12:06 ` M A Young
2014-11-24 13:13 ` Andrew Cooper
0 siblings, 2 replies; 25+ messages in thread
From: George Dunlap @ 2014-11-24 11:50 UTC (permalink / raw)
To: M A Young; +Cc: Ian Jackson, Wei Liu, Ian Campbell, xen-devel@lists.xen.org
On Mon, Nov 24, 2014 at 12:07 AM, M A Young <m.a.young@durham.ac.uk> wrote:
> On Sat, 22 Nov 2014, M A Young wrote:
>
>> While investigating a bug reported on Red Hat Bugzilla
>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>> I discovered the following
>>
>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the
>> bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There
>> are actually two issues here
>>
>> * the segfault in libxl-save-helper --restore-domain (as reported in the
>> bug above) occurs if the guest memory is 1024M (on my 4G box) and is
>> presumably because the allocated memory eventually runs out
>
>
> I have found a bit more out about this. The segfault at at line 1378 of
> tools/libxc/xc_domain_restore.c which is
> DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
> "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
> csum_page(region_base + (i + curbatch)*PAGE_SIZE),
> csum_page(buf));
> and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the
> array. This occurs in the verification phase.
>
>> * the segfault doesn't occur if the guest memory is 128M, but the
>> migration still fails. The first attached file contains the log from a run
>> with xl -v migrate --debug domid localhost (with mfn and duplicated lines
>> stripped out to make the size manageable).
>
>
> The difference actually seems to be down to how active the VM is rather than
> the memory size (my small memory test system was doing very little, my
> larger system was a full OS install). In the non-segfault case the problem
> was the printf and printf_info commands in the create_domain() routine in
> tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass status messages
> back from the restoring dom0, these commands cause an unexpected message. If
> you move them onto stderr then the migration completes in the non-segfault
> case.
Good job tracking those down -- are there patches in the works?
-George
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: Problems using xl migrate
2014-11-24 11:50 ` George Dunlap
@ 2014-11-24 12:06 ` M A Young
2014-11-24 12:21 ` Ian Campbell
2014-11-24 13:13 ` Andrew Cooper
1 sibling, 1 reply; 25+ messages in thread
From: M A Young @ 2014-11-24 12:06 UTC (permalink / raw)
To: George Dunlap; +Cc: Ian Jackson, Wei Liu, Ian Campbell, xen-devel@lists.xen.org
On Mon, 24 Nov 2014, George Dunlap wrote:
> On Mon, Nov 24, 2014 at 12:07 AM, M A Young <m.a.young@durham.ac.uk> wrote:
>> On Sat, 22 Nov 2014, M A Young wrote:
>>
>>> While investigating a bug reported on Red Hat Bugzilla
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>>> I discovered the following
>>>
>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the
>>> bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There
>>> are actually two issues here
>>>
>>> * the segfault in libxl-save-helper --restore-domain (as reported in the
>>> bug above) occurs if the guest memory is 1024M (on my 4G box) and is
>>> presumably because the allocated memory eventually runs out
>>
>>
>> I have found a bit more out about this. The segfault at at line 1378 of
>> tools/libxc/xc_domain_restore.c which is
>> DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
>> "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
>> csum_page(region_base + (i + curbatch)*PAGE_SIZE),
>> csum_page(buf));
>> and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the
>> array. This occurs in the verification phase.
>>
>>> * the segfault doesn't occur if the guest memory is 128M, but the
>>> migration still fails. The first attached file contains the log from a run
>>> with xl -v migrate --debug domid localhost (with mfn and duplicated lines
>>> stripped out to make the size manageable).
>>
>>
>> The difference actually seems to be down to how active the VM is rather than
>> the memory size (my small memory test system was doing very little, my
>> larger system was a full OS install). In the non-segfault case the problem
>> was the printf and printf_info commands in the create_domain() routine in
>> tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass status messages
>> back from the restoring dom0, these commands cause an unexpected message. If
>> you move them onto stderr then the migration completes in the non-segfault
>> case.
>
> Good job tracking those down -- are there patches in the works?
I have a partial patch for the printf printf_info problem, which works for
me but doesn't cover printing the info in sxp format. I haven't worked out
what is leading up to the segfault yet.
Michael Young
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: Problems using xl migrate
2014-11-24 12:06 ` M A Young
@ 2014-11-24 12:21 ` Ian Campbell
2014-11-24 12:29 ` M A Young
0 siblings, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-11-24 12:21 UTC (permalink / raw)
To: M A Young; +Cc: George Dunlap, Ian Jackson, Wei Liu, xen-devel@lists.xen.org
On Mon, 2014-11-24 at 12:06 +0000, M A Young wrote:
>
> On Mon, 24 Nov 2014, George Dunlap wrote:
>
> > On Mon, Nov 24, 2014 at 12:07 AM, M A Young <m.a.young@durham.ac.uk> wrote:
> >> On Sat, 22 Nov 2014, M A Young wrote:
> >>
> >>> While investigating a bug reported on Red Hat Bugzilla
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
> >>> I discovered the following
> >>>
> >>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the
> >>> bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There
> >>> are actually two issues here
> >>>
> >>> * the segfault in libxl-save-helper --restore-domain (as reported in the
> >>> bug above) occurs if the guest memory is 1024M (on my 4G box) and is
> >>> presumably because the allocated memory eventually runs out
> >>
> >>
> >> I have found a bit more out about this. The segfault at at line 1378 of
> >> tools/libxc/xc_domain_restore.c which is
> >> DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
> >> "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
> >> csum_page(region_base + (i + curbatch)*PAGE_SIZE),
> >> csum_page(buf));
> >> and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the
> >> array. This occurs in the verification phase.
> >>
> >>> * the segfault doesn't occur if the guest memory is 128M, but the
> >>> migration still fails. The first attached file contains the log from a run
> >>> with xl -v migrate --debug domid localhost (with mfn and duplicated lines
> >>> stripped out to make the size manageable).
> >>
> >>
> >> The difference actually seems to be down to how active the VM is rather than
> >> the memory size (my small memory test system was doing very little, my
> >> larger system was a full OS install). In the non-segfault case the problem
> >> was the printf and printf_info commands in the create_domain() routine in
> >> tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass status messages
> >> back from the restoring dom0, these commands cause an unexpected message. If
> >> you move them onto stderr then the migration completes in the non-segfault
> >> case.
> >
> > Good job tracking those down -- are there patches in the works?
>
> I have a partial patch for the printf printf_info problem, which works for
> me but doesn't cover printing the info in sxp format.
Am I right that is all related to the use of --debug and or -vm? and
that a plain "xl migrate" works ok?
It's still a bug of course, but changes the severity (somehow, not sure
to what extent IMHO it does etc).
> I haven't worked out
> what is leading up to the segfault yet.
>
> Michael Young
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: Problems using xl migrate
2014-11-24 12:21 ` Ian Campbell
@ 2014-11-24 12:29 ` M A Young
0 siblings, 0 replies; 25+ messages in thread
From: M A Young @ 2014-11-24 12:29 UTC (permalink / raw)
To: Ian Campbell; +Cc: George Dunlap, Ian Jackson, Wei Liu, xen-devel@lists.xen.org
On Mon, 24 Nov 2014, Ian Campbell wrote:
> On Mon, 2014-11-24 at 12:06 +0000, M A Young wrote:
>>
>> On Mon, 24 Nov 2014, George Dunlap wrote:
>>
>>> On Mon, Nov 24, 2014 at 12:07 AM, M A Young <m.a.young@durham.ac.uk> wrote:
>>>> On Sat, 22 Nov 2014, M A Young wrote:
>>>>
>>>>> While investigating a bug reported on Red Hat Bugzilla
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>>>>> I discovered the following
>>>>>
>>>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the
>>>>> bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There
>>>>> are actually two issues here
>>>>>
>>>>> * the segfault in libxl-save-helper --restore-domain (as reported in the
>>>>> bug above) occurs if the guest memory is 1024M (on my 4G box) and is
>>>>> presumably because the allocated memory eventually runs out
>>>>
>>>>
>>>> I have found a bit more out about this. The segfault at at line 1378 of
>>>> tools/libxc/xc_domain_restore.c which is
>>>> DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
>>>> "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
>>>> csum_page(region_base + (i + curbatch)*PAGE_SIZE),
>>>> csum_page(buf));
>>>> and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the
>>>> array. This occurs in the verification phase.
>>>>
>>>>> * the segfault doesn't occur if the guest memory is 128M, but the
>>>>> migration still fails. The first attached file contains the log from a run
>>>>> with xl -v migrate --debug domid localhost (with mfn and duplicated lines
>>>>> stripped out to make the size manageable).
>>>>
>>>>
>>>> The difference actually seems to be down to how active the VM is rather than
>>>> the memory size (my small memory test system was doing very little, my
>>>> larger system was a full OS install). In the non-segfault case the problem
>>>> was the printf and printf_info commands in the create_domain() routine in
>>>> tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass status messages
>>>> back from the restoring dom0, these commands cause an unexpected message. If
>>>> you move them onto stderr then the migration completes in the non-segfault
>>>> case.
>>>
>>> Good job tracking those down -- are there patches in the works?
>>
>> I have a partial patch for the printf printf_info problem, which works for
>> me but doesn't cover printing the info in sxp format.
>
> Am I right that is all related to the use of --debug and or -vm? and
> that a plain "xl migrate" works ok?
>
> It's still a bug of course, but changes the severity (somehow, not sure
> to what extent IMHO it does etc).
A plain xl migrate does work on 4.4. I didn't get xl migrate working on
4.5-rc1 but was waiting for rc3 before trying to debug that.
Michael Young
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Problems using xl migrate
2014-11-24 11:50 ` George Dunlap
2014-11-24 12:06 ` M A Young
@ 2014-11-24 13:13 ` Andrew Cooper
2014-11-24 14:09 ` Wei Liu
1 sibling, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-11-24 13:13 UTC (permalink / raw)
To: George Dunlap, M A Young
Cc: Wei Liu, Ian Jackson, Ian Campbell, xen-devel@lists.xen.org
On 24/11/14 11:50, George Dunlap wrote:
> On Mon, Nov 24, 2014 at 12:07 AM, M A Young <m.a.young@durham.ac.uk> wrote:
>> On Sat, 22 Nov 2014, M A Young wrote:
>>
>>> While investigating a bug reported on Red Hat Bugzilla
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>>> I discovered the following
>>>
>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the
>>> bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There
>>> are actually two issues here
>>>
>>> * the segfault in libxl-save-helper --restore-domain (as reported in the
>>> bug above) occurs if the guest memory is 1024M (on my 4G box) and is
>>> presumably because the allocated memory eventually runs out
>>
>> I have found a bit more out about this. The segfault at at line 1378 of
>> tools/libxc/xc_domain_restore.c which is
>> DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
>> "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
>> csum_page(region_base + (i + curbatch)*PAGE_SIZE),
>> csum_page(buf));
>> and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the
>> array. This occurs in the verification phase.
>>
>>> * the segfault doesn't occur if the guest memory is 128M, but the
>>> migration still fails. The first attached file contains the log from a run
>>> with xl -v migrate --debug domid localhost (with mfn and duplicated lines
>>> stripped out to make the size manageable).
>>
>> The difference actually seems to be down to how active the VM is rather than
>> the memory size (my small memory test system was doing very little, my
>> larger system was a full OS install). In the non-segfault case the problem
>> was the printf and printf_info commands in the create_domain() routine in
>> tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass status messages
>> back from the restoring dom0, these commands cause an unexpected message. If
>> you move them onto stderr then the migration completes in the non-segfault
>> case.
> Good job tracking those down -- are there patches in the works?
The segfault for "--debug" has already been identified and a patch
posted by Wen Congyang
The call to csum_page() incorrectly calculates the offset it is supposed
to checksum, and wanders beyond the mapping of guest space.
Patch in 1409908261-18682-3-git-send-email-wency@cn.fujitsu.com
~Andrew
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: Problems using xl migrate
2014-11-24 13:13 ` Andrew Cooper
@ 2014-11-24 14:09 ` Wei Liu
2014-11-24 14:13 ` Andrew Cooper
2014-11-25 8:52 ` M A Young
0 siblings, 2 replies; 25+ messages in thread
From: Wei Liu @ 2014-11-24 14:09 UTC (permalink / raw)
To: Andrew Cooper
Cc: Wei Liu, Ian Campbell, George Dunlap, Ian Jackson,
xen-devel@lists.xen.org, M A Young
On Mon, Nov 24, 2014 at 01:13:25PM +0000, Andrew Cooper wrote:
> On 24/11/14 11:50, George Dunlap wrote:
> > On Mon, Nov 24, 2014 at 12:07 AM, M A Young <m.a.young@durham.ac.uk> wrote:
> >> On Sat, 22 Nov 2014, M A Young wrote:
> >>
> >>> While investigating a bug reported on Red Hat Bugzilla
> >>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
> >>> I discovered the following
> >>>
> >>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the
> >>> bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There
> >>> are actually two issues here
> >>>
> >>> * the segfault in libxl-save-helper --restore-domain (as reported in the
> >>> bug above) occurs if the guest memory is 1024M (on my 4G box) and is
> >>> presumably because the allocated memory eventually runs out
> >>
> >> I have found a bit more out about this. The segfault at at line 1378 of
> >> tools/libxc/xc_domain_restore.c which is
> >> DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
> >> "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
> >> csum_page(region_base + (i + curbatch)*PAGE_SIZE),
> >> csum_page(buf));
> >> and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the
> >> array. This occurs in the verification phase.
> >>
> >>> * the segfault doesn't occur if the guest memory is 128M, but the
> >>> migration still fails. The first attached file contains the log from a run
> >>> with xl -v migrate --debug domid localhost (with mfn and duplicated lines
> >>> stripped out to make the size manageable).
> >>
> >> The difference actually seems to be down to how active the VM is rather than
> >> the memory size (my small memory test system was doing very little, my
> >> larger system was a full OS install). In the non-segfault case the problem
> >> was the printf and printf_info commands in the create_domain() routine in
> >> tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass status messages
> >> back from the restoring dom0, these commands cause an unexpected message. If
> >> you move them onto stderr then the migration completes in the non-segfault
> >> case.
> > Good job tracking those down -- are there patches in the works?
>
> The segfault for "--debug" has already been identified and a patch
> posted by Wen Congyang
>
> The call to csum_page() incorrectly calculates the offset it is supposed
> to checksum, and wanders beyond the mapping of guest space.
>
> Patch in 1409908261-18682-3-git-send-email-wency@cn.fujitsu.com
>
And the said patch has been applied (3460eeb3fc2) so we're fine.
Wei.
> ~Andrew
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: Problems using xl migrate
2014-11-24 14:09 ` Wei Liu
@ 2014-11-24 14:13 ` Andrew Cooper
2014-11-25 8:52 ` M A Young
1 sibling, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2014-11-24 14:13 UTC (permalink / raw)
To: Wei Liu
Cc: George Dunlap, xen-devel@lists.xen.org, Ian Jackson, Ian Campbell,
M A Young
On 24/11/14 14:09, Wei Liu wrote:
> On Mon, Nov 24, 2014 at 01:13:25PM +0000, Andrew Cooper wrote:
>> On 24/11/14 11:50, George Dunlap wrote:
>>> On Mon, Nov 24, 2014 at 12:07 AM, M A Young <m.a.young@durham.ac.uk> wrote:
>>>> On Sat, 22 Nov 2014, M A Young wrote:
>>>>
>>>>> While investigating a bug reported on Red Hat Bugzilla
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>>>>> I discovered the following
>>>>>
>>>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the
>>>>> bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There
>>>>> are actually two issues here
>>>>>
>>>>> * the segfault in libxl-save-helper --restore-domain (as reported in the
>>>>> bug above) occurs if the guest memory is 1024M (on my 4G box) and is
>>>>> presumably because the allocated memory eventually runs out
>>>> I have found a bit more out about this. The segfault at at line 1378 of
>>>> tools/libxc/xc_domain_restore.c which is
>>>> DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
>>>> "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
>>>> csum_page(region_base + (i + curbatch)*PAGE_SIZE),
>>>> csum_page(buf));
>>>> and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the
>>>> array. This occurs in the verification phase.
>>>>
>>>>> * the segfault doesn't occur if the guest memory is 128M, but the
>>>>> migration still fails. The first attached file contains the log from a run
>>>>> with xl -v migrate --debug domid localhost (with mfn and duplicated lines
>>>>> stripped out to make the size manageable).
>>>> The difference actually seems to be down to how active the VM is rather than
>>>> the memory size (my small memory test system was doing very little, my
>>>> larger system was a full OS install). In the non-segfault case the problem
>>>> was the printf and printf_info commands in the create_domain() routine in
>>>> tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass status messages
>>>> back from the restoring dom0, these commands cause an unexpected message. If
>>>> you move them onto stderr then the migration completes in the non-segfault
>>>> case.
>>> Good job tracking those down -- are there patches in the works?
>> The segfault for "--debug" has already been identified and a patch
>> posted by Wen Congyang
>>
>> The call to csum_page() incorrectly calculates the offset it is supposed
>> to checksum, and wanders beyond the mapping of guest space.
>>
>> Patch in 1409908261-18682-3-git-send-email-wency@cn.fujitsu.com
>>
> And the said patch has been applied (3460eeb3fc2) so we're fine.
But not backported to 4.4, which is why Michael is falling over it.
~Andrew
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: Problems using xl migrate
2014-11-24 14:09 ` Wei Liu
2014-11-24 14:13 ` Andrew Cooper
@ 2014-11-25 8:52 ` M A Young
2014-11-25 9:15 ` Wei Liu
1 sibling, 1 reply; 25+ messages in thread
From: M A Young @ 2014-11-25 8:52 UTC (permalink / raw)
To: Wei Liu
Cc: George Dunlap, Andrew Cooper, Ian Jackson, Ian Campbell,
xen-devel@lists.xen.org
On Mon, 24 Nov 2014, Wei Liu wrote:
> On Mon, Nov 24, 2014 at 01:13:25PM +0000, Andrew Cooper wrote:
>> On 24/11/14 11:50, George Dunlap wrote:
>>> On Mon, Nov 24, 2014 at 12:07 AM, M A Young <m.a.young@durham.ac.uk> wrote:
>>>> On Sat, 22 Nov 2014, M A Young wrote:
>>>>
>>>>> While investigating a bug reported on Red Hat Bugzilla
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>>>>> I discovered the following
>>>>>
>>>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the
>>>>> bug report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There
>>>>> are actually two issues here
>>>>>
>>>>> * the segfault in libxl-save-helper --restore-domain (as reported in the
>>>>> bug above) occurs if the guest memory is 1024M (on my 4G box) and is
>>>>> presumably because the allocated memory eventually runs out
>>>>
>>>> I have found a bit more out about this. The segfault at at line 1378 of
>>>> tools/libxc/xc_domain_restore.c which is
>>>> DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
>>>> "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
>>>> csum_page(region_base + (i + curbatch)*PAGE_SIZE),
>>>> csum_page(buf));
>>>> and is because pfn in pagebuf->pfn_types[pfn] is beyond the end of the
>>>> array. This occurs in the verification phase.
>>>>
>>>>> * the segfault doesn't occur if the guest memory is 128M, but the
>>>>> migration still fails. The first attached file contains the log from a run
>>>>> with xl -v migrate --debug domid localhost (with mfn and duplicated lines
>>>>> stripped out to make the size manageable).
>>>>
>>>> The difference actually seems to be down to how active the VM is rather than
>>>> the memory size (my small memory test system was doing very little, my
>>>> larger system was a full OS install). In the non-segfault case the problem
>>>> was the printf and printf_info commands in the create_domain() routine in
>>>> tools/libxl/xl_cmdimpl.c . As xl migrate uses stdout to pass status messages
>>>> back from the restoring dom0, these commands cause an unexpected message. If
>>>> you move them onto stderr then the migration completes in the non-segfault
>>>> case.
>>> Good job tracking those down -- are there patches in the works?
>>
>> The segfault for "--debug" has already been identified and a patch
>> posted by Wen Congyang
>>
>> The call to csum_page() incorrectly calculates the offset it is supposed
>> to checksum, and wanders beyond the mapping of guest space.
>>
>> Patch in 1409908261-18682-3-git-send-email-wency@cn.fujitsu.com
>>
>
> And the said patch has been applied (3460eeb3fc2) so we're fine.
However that doesn't fix my crash. I tried with it applied and still saw
the crash. I also tried 4.5-rc1 (without XSM to avoid my other issue) and
that crashed as well.
Michael Young
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: Problems using xl migrate
2014-11-25 8:52 ` M A Young
@ 2014-11-25 9:15 ` Wei Liu
2014-11-25 22:16 ` M A Young
0 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2014-11-25 9:15 UTC (permalink / raw)
To: M A Young
Cc: Wei Liu, Ian Campbell, George Dunlap, Andrew Cooper, Ian Jackson,
xen-devel@lists.xen.org
On Tue, Nov 25, 2014 at 08:52:00AM +0000, M A Young wrote:
[...]
> >
> >And the said patch has been applied (3460eeb3fc2) so we're fine.
>
> However that doesn't fix my crash. I tried with it applied and still saw the
> crash. I also tried 4.5-rc1 (without XSM to avoid my other issue) and that
> crashed as well.
>
And the log is still the same? If the crash happens in different
location it might be another bug.
Wei.
> Michael Young
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Problems using xl migrate
2014-11-25 9:15 ` Wei Liu
@ 2014-11-25 22:16 ` M A Young
2014-11-25 22:32 ` Andrew Cooper
0 siblings, 1 reply; 25+ messages in thread
From: M A Young @ 2014-11-25 22:16 UTC (permalink / raw)
To: Wei Liu
Cc: George Dunlap, Andrew Cooper, Ian Jackson, Ian Campbell,
xen-devel@lists.xen.org
On Tue, 25 Nov 2014, Wei Liu wrote:
> On Tue, Nov 25, 2014 at 08:52:00AM +0000, M A Young wrote:
> [...]
>>>
>>> And the said patch has been applied (3460eeb3fc2) so we're fine.
>>
>> However that doesn't fix my crash. I tried with it applied and still saw the
>> crash. I also tried 4.5-rc1 (without XSM to avoid my other issue) and that
>> crashed as well.
>>
>
> And the log is still the same? If the crash happens in different
> location it might be another bug.
Yes, it is the same crash. Going back to the dprintf command,
DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
"actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
csum_page(region_base + i * PAGE_SIZE),
csum_page(buf));
what does pagebuf->pfn_types[pfn] actually mean and how does it relate to
the type= it matches. I suspect it should be something else, eg.
pfn_type[pfn] which would give the page type corresponding to pfn.
Michael Young
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: Problems using xl migrate
2014-11-25 22:16 ` M A Young
@ 2014-11-25 22:32 ` Andrew Cooper
0 siblings, 0 replies; 25+ messages in thread
From: Andrew Cooper @ 2014-11-25 22:32 UTC (permalink / raw)
To: M A Young, Wei Liu
Cc: George Dunlap, Ian Jackson, Ian Campbell, xen-devel@lists.xen.org
On 25/11/2014 22:16, M A Young wrote:
>
>
> On Tue, 25 Nov 2014, Wei Liu wrote:
>
>> On Tue, Nov 25, 2014 at 08:52:00AM +0000, M A Young wrote:
>> [...]
>>>>
>>>> And the said patch has been applied (3460eeb3fc2) so we're fine.
>>>
>>> However that doesn't fix my crash. I tried with it applied and still
>>> saw the
>>> crash. I also tried 4.5-rc1 (without XSM to avoid my other issue)
>>> and that
>>> crashed as well.
>>>
>>
>> And the log is still the same? If the crash happens in different
>> location it might be another bug.
>
> Yes, it is the same crash. Going back to the dprintf command,
> DPRINTF("************** pfn=%lx type=%lx gotcs=%08lx "
> "actualcs=%08lx\n", pfn, pagebuf->pfn_types[pfn],
> csum_page(region_base + i * PAGE_SIZE),
> csum_page(buf));
> what does pagebuf->pfn_types[pfn] actually mean and how does it relate
> to the type= it matches. I suspect it should be something else, eg.
> pfn_type[pfn] which would give the page type corresponding to pfn.
>
> Michael Young
Eugh - this code is horrible. (The migration v2 code is so much nicer).
"pagebuf->pfn_types[pfn]" is completely bogus in this context, and
should indeed be "pfn_type[pfn]" instead. The pagebuf->pfn_types[]
array is up to 1024 entries long.
Having said that, it is my firm opinion that verify mode is useless for
anyone who isn't actively developing a migration stream (and even then,
less useful than it would appear to be). I would skip the "--debug", as
I don't believe it will help you at all in tracking down your issue.
~Andrew
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Problems using xl migrate
2014-11-22 19:24 Problems using xl migrate M A Young
2014-11-24 0:07 ` M A Young
@ 2014-11-24 12:25 ` George Dunlap
2014-11-24 12:41 ` Wei Liu
2 siblings, 0 replies; 25+ messages in thread
From: George Dunlap @ 2014-11-24 12:25 UTC (permalink / raw)
To: M A Young; +Cc: xen-devel@lists.xen.org
On Sat, Nov 22, 2014 at 7:24 PM, M A Young <m.a.young@durham.ac.uk> wrote:
> While investigating a bug reported on Red Hat Bugzilla
> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
> I discovered the following
>
> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the bug
> report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There are
> actually two issues here
>
> * the segfault in libxl-save-helper --restore-domain (as reported in the bug
> above) occurs if the guest memory is 1024M (on my 4G box) and is presumably
> because the allocated memory eventually runs out
>
> * the segfault doesn't occur if the guest memory is 128M, but the migration
> still fails. The first attached file contains the log from a run with xl -v
> migrate --debug domid localhost (with mfn and duplicated lines stripped out
> to make the size manageable).
>
> I then tried xen 4.5-rc1 to see if the bug was fixed and found that xl
> migrate doesn't work for me at all - see the second attached file for the
> output of xl -v migrate domid localhost .
Could you resend this as a separate message so we can debug it separately?
-George
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Problems using xl migrate
2014-11-22 19:24 Problems using xl migrate M A Young
2014-11-24 0:07 ` M A Young
2014-11-24 12:25 ` George Dunlap
@ 2014-11-24 12:41 ` Wei Liu
2014-11-24 13:15 ` Andrew Cooper
2 siblings, 1 reply; 25+ messages in thread
From: Wei Liu @ 2014-11-24 12:41 UTC (permalink / raw)
To: M A Young; +Cc: wei.liu2, xen-devel
On Sat, Nov 22, 2014 at 07:24:21PM +0000, M A Young wrote:
> While investigating a bug reported on Red Hat Bugzilla
> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
> I discovered the following
>
> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the bug
> report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There are
> actually two issues here
>
> * the segfault in libxl-save-helper --restore-domain (as reported in the bug
> above) occurs if the guest memory is 1024M (on my 4G box) and is presumably
> because the allocated memory eventually runs out
>
> * the segfault doesn't occur if the guest memory is 128M, but the migration
> still fails. The first attached file contains the log from a run with xl -v
> migrate --debug domid localhost (with mfn and duplicated lines stripped out
> to make the size manageable).
>
> I then tried xen 4.5-rc1 to see if the bug was fixed and found that xl
> migrate doesn't work for me at all - see the second attached file for the
> output of xl -v migrate domid localhost .
>
> Mchael Young
[...]
> xc: detail: delta 15801ms, dom0 95%, target 0%, sent 543Mb/s, dirtied 0Mb/s 314 pages
> xc: detail: Mapping order 0, 268; first pfn 3fcf4
> xc: detail: delta 23ms, dom0 100%, target 0%, sent 447Mb/s, dirtied 0Mb/s 0 pages
> xc: detail: Start last iteration
> xc: Reloading memory pages: 262213/262144 100%xc: detail: SUSPEND shinfo 00082fbc
> xc: detail: delta 17ms, dom0 58%, target 58%, sent 0Mb/s, dirtied 1033Mb/s 536 pages
> xc: detail: delta 8ms, dom0 100%, target 0%, sent 2195Mb/s, dirtied 2195Mb/s 536 pages
> xc: detail: Total pages sent= 262749 (1.00x)
> xc: detail: (of which 0 were fixups)
> xc: detail: All memory is saved
> xc: error: Error querying maximum number of MSRs for VCPU0 (1 = Operation not permitted): Internal error
Per your description this is the output of "xl -v migrate domid
localhost", so no "--debug" is involved. (Just to make sure...)
This error message means a domctl fails, which should be addressed
first?
FWIW I tried "xl -v migrate domid localhost" for a PV guest it worked
for me. :-(
Is there anything I need to do to trigger this failure?
Wei.
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: Problems using xl migrate
2014-11-24 12:41 ` Wei Liu
@ 2014-11-24 13:15 ` Andrew Cooper
2014-11-24 14:32 ` (4.5-rc1) " M A Young
0 siblings, 1 reply; 25+ messages in thread
From: Andrew Cooper @ 2014-11-24 13:15 UTC (permalink / raw)
To: M A Young; +Cc: Wei Liu, xen-devel
On 24/11/14 12:41, Wei Liu wrote:
> On Sat, Nov 22, 2014 at 07:24:21PM +0000, M A Young wrote:
>> While investigating a bug reported on Red Hat Bugzilla
>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>> I discovered the following
>>
>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the bug
>> report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There are
>> actually two issues here
>>
>> * the segfault in libxl-save-helper --restore-domain (as reported in the bug
>> above) occurs if the guest memory is 1024M (on my 4G box) and is presumably
>> because the allocated memory eventually runs out
>>
>> * the segfault doesn't occur if the guest memory is 128M, but the migration
>> still fails. The first attached file contains the log from a run with xl -v
>> migrate --debug domid localhost (with mfn and duplicated lines stripped out
>> to make the size manageable).
>>
>> I then tried xen 4.5-rc1 to see if the bug was fixed and found that xl
>> migrate doesn't work for me at all - see the second attached file for the
>> output of xl -v migrate domid localhost .
>>
>> Mchael Young
> [...]
>> xc: detail: delta 15801ms, dom0 95%, target 0%, sent 543Mb/s, dirtied 0Mb/s 314 pages
>> xc: detail: Mapping order 0, 268; first pfn 3fcf4
>> xc: detail: delta 23ms, dom0 100%, target 0%, sent 447Mb/s, dirtied 0Mb/s 0 pages
>> xc: detail: Start last iteration
>> xc: Reloading memory pages: 262213/262144 100%xc: detail: SUSPEND shinfo 00082fbc
>> xc: detail: delta 17ms, dom0 58%, target 58%, sent 0Mb/s, dirtied 1033Mb/s 536 pages
>> xc: detail: delta 8ms, dom0 100%, target 0%, sent 2195Mb/s, dirtied 2195Mb/s 536 pages
>> xc: detail: Total pages sent= 262749 (1.00x)
>> xc: detail: (of which 0 were fixups)
>> xc: detail: All memory is saved
>> xc: error: Error querying maximum number of MSRs for VCPU0 (1 = Operation not permitted): Internal error
> Per your description this is the output of "xl -v migrate domid
> localhost", so no "--debug" is involved. (Just to make sure...)
>
> This error message means a domctl fails, which should be addressed
> first?
>
> FWIW I tried "xl -v migrate domid localhost" for a PV guest it worked
> for me. :-(
>
> Is there anything I need to do to trigger this failure?
Is XSM in use? I can't think of any other reason why that hypercall
would fail with EPERM.
~Andrew
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: (4.5-rc1) Problems using xl migrate
2014-11-24 13:15 ` Andrew Cooper
@ 2014-11-24 14:32 ` M A Young
2014-11-24 14:43 ` Andrew Cooper
0 siblings, 1 reply; 25+ messages in thread
From: M A Young @ 2014-11-24 14:32 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Wei Liu, xen-devel
On Mon, 24 Nov 2014, Andrew Cooper wrote:
> On 24/11/14 12:41, Wei Liu wrote:
>> On Sat, Nov 22, 2014 at 07:24:21PM +0000, M A Young wrote:
>>> While investigating a bug reported on Red Hat Bugzilla
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>>> I discovered the following
>>>
>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv (the bug
>>> report is for Xen 4.3 hvm ) when xl migrate domid localhost works. There are
>>> actually two issues here
>>>
>>> * the segfault in libxl-save-helper --restore-domain (as reported in the bug
>>> above) occurs if the guest memory is 1024M (on my 4G box) and is presumably
>>> because the allocated memory eventually runs out
>>>
>>> * the segfault doesn't occur if the guest memory is 128M, but the migration
>>> still fails. The first attached file contains the log from a run with xl -v
>>> migrate --debug domid localhost (with mfn and duplicated lines stripped out
>>> to make the size manageable).
>>>
>>> I then tried xen 4.5-rc1 to see if the bug was fixed and found that xl
>>> migrate doesn't work for me at all - see the second attached file for the
>>> output of xl -v migrate domid localhost .
>>>
>>> Mchael Young
>> [...]
>>> xc: detail: delta 15801ms, dom0 95%, target 0%, sent 543Mb/s, dirtied 0Mb/s 314 pages
>>> xc: detail: Mapping order 0, 268; first pfn 3fcf4
>>> xc: detail: delta 23ms, dom0 100%, target 0%, sent 447Mb/s, dirtied 0Mb/s 0 pages
>>> xc: detail: Start last iteration
>>> xc: Reloading memory pages: 262213/262144 100%xc: detail: SUSPEND shinfo 00082fbc
>>> xc: detail: delta 17ms, dom0 58%, target 58%, sent 0Mb/s, dirtied 1033Mb/s 536 pages
>>> xc: detail: delta 8ms, dom0 100%, target 0%, sent 2195Mb/s, dirtied 2195Mb/s 536 pages
>>> xc: detail: Total pages sent= 262749 (1.00x)
>>> xc: detail: (of which 0 were fixups)
>>> xc: detail: All memory is saved
>>> xc: error: Error querying maximum number of MSRs for VCPU0 (1 = Operation not permitted): Internal error
>> Per your description this is the output of "xl -v migrate domid
>> localhost", so no "--debug" is involved. (Just to make sure...)
>>
>> This error message means a domctl fails, which should be addressed
>> first?
>>
>> FWIW I tried "xl -v migrate domid localhost" for a PV guest it worked
>> for me. :-(
>>
>> Is there anything I need to do to trigger this failure?
>
> Is XSM in use? I can't think of any other reason why that hypercall
> would fail with EPERM.
XSM is built in (I wanted to allow the option of people using it) but I
didn't think it was active.
Michael Young
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: (4.5-rc1) Problems using xl migrate
2014-11-24 14:32 ` (4.5-rc1) " M A Young
@ 2014-11-24 14:43 ` Andrew Cooper
2014-11-24 14:55 ` Ian Campbell
2014-11-24 20:12 ` M A Young
0 siblings, 2 replies; 25+ messages in thread
From: Andrew Cooper @ 2014-11-24 14:43 UTC (permalink / raw)
To: M A Young; +Cc: Wei Liu, xen-devel
On 24/11/14 14:32, M A Young wrote:
> On Mon, 24 Nov 2014, Andrew Cooper wrote:
>
>> On 24/11/14 12:41, Wei Liu wrote:
>>> On Sat, Nov 22, 2014 at 07:24:21PM +0000, M A Young wrote:
>>>> While investigating a bug reported on Red Hat Bugzilla
>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>>>> I discovered the following
>>>>
>>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv
>>>> (the bug
>>>> report is for Xen 4.3 hvm ) when xl migrate domid localhost works.
>>>> There are
>>>> actually two issues here
>>>>
>>>> * the segfault in libxl-save-helper --restore-domain (as reported
>>>> in the bug
>>>> above) occurs if the guest memory is 1024M (on my 4G box) and is
>>>> presumably
>>>> because the allocated memory eventually runs out
>>>>
>>>> * the segfault doesn't occur if the guest memory is 128M, but the
>>>> migration
>>>> still fails. The first attached file contains the log from a run
>>>> with xl -v
>>>> migrate --debug domid localhost (with mfn and duplicated lines
>>>> stripped out
>>>> to make the size manageable).
>>>>
>>>> I then tried xen 4.5-rc1 to see if the bug was fixed and found that xl
>>>> migrate doesn't work for me at all - see the second attached file
>>>> for the
>>>> output of xl -v migrate domid localhost .
>>>>
>>>> Mchael Young
>>> [...]
>>>> xc: detail: delta 15801ms, dom0 95%, target 0%, sent 543Mb/s,
>>>> dirtied 0Mb/s 314 pages
>>>> xc: detail: Mapping order 0, 268; first pfn 3fcf4
>>>> xc: detail: delta 23ms, dom0 100%, target 0%, sent 447Mb/s, dirtied
>>>> 0Mb/s 0 pages
>>>> xc: detail: Start last iteration
>>>> xc: Reloading memory pages: 262213/262144 100%xc: detail: SUSPEND
>>>> shinfo 00082fbc
>>>> xc: detail: delta 17ms, dom0 58%, target 58%, sent 0Mb/s, dirtied
>>>> 1033Mb/s 536 pages
>>>> xc: detail: delta 8ms, dom0 100%, target 0%, sent 2195Mb/s, dirtied
>>>> 2195Mb/s 536 pages
>>>> xc: detail: Total pages sent= 262749 (1.00x)
>>>> xc: detail: (of which 0 were fixups)
>>>> xc: detail: All memory is saved
>>>> xc: error: Error querying maximum number of MSRs for VCPU0 (1 =
>>>> Operation not permitted): Internal error
>>> Per your description this is the output of "xl -v migrate domid
>>> localhost", so no "--debug" is involved. (Just to make sure...)
>>>
>>> This error message means a domctl fails, which should be addressed
>>> first?
>>>
>>> FWIW I tried "xl -v migrate domid localhost" for a PV guest it worked
>>> for me. :-(
>>>
>>> Is there anything I need to do to trigger this failure?
>>
>> Is XSM in use? I can't think of any other reason why that hypercall
>> would fail with EPERM.
>
> XSM is built in (I wanted to allow the option of people using it) but
> I didn't think it was active.
>
> Michael Young
I don't believe there is any concept of "available but not active",
which probably means that the default policy is missing an entry for
this hypercall.
Can you check the hypervisor console around this failure and see whether
a flask error concerning domctl 72 is reported?
~Andrew
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: (4.5-rc1) Problems using xl migrate
2014-11-24 14:43 ` Andrew Cooper
@ 2014-11-24 14:55 ` Ian Campbell
2014-11-24 19:28 ` Daniel De Graaf
2014-11-24 20:12 ` M A Young
1 sibling, 1 reply; 25+ messages in thread
From: Ian Campbell @ 2014-11-24 14:55 UTC (permalink / raw)
To: Andrew Cooper, Daniel De Graaf; +Cc: xen-devel, Wei Liu, M A Young
On Mon, 2014-11-24 at 14:43 +0000, Andrew Cooper wrote:
> On 24/11/14 14:32, M A Young wrote:
> > On Mon, 24 Nov 2014, Andrew Cooper wrote:
> >> Is XSM in use? I can't think of any other reason why that hypercall
> >> would fail with EPERM.
> >
> > XSM is built in (I wanted to allow the option of people using it) but
> > I didn't think it was active.
>
> I don't believe there is any concept of "available but not active",
I think there is, the "dummy" policy which is loaded when there is no
explicit policy given should behave as if xsm were disabled. AIUI all
the XSM_* and xsm_default_action stuff is supposed to semi automatically
ensure this is the case at compile time. CC-ing Daniel to confirm/deny.
> which probably means that the default policy is missing an entry for
> this hypercall.
That said domctl is XSM_OTHER, which basically means "special one off
handling" I think. But it basically turns into XSM_DM_PRIV for a small
handful of subops and XSM_PRIV for the rest. Since this is a migration
the relevant domain is certainly PRIV I think.
Ian.
> Can you check the hypervisor console around this failure and see whether
> a flask error concerning domctl 72 is reported?
>
> ~Andrew
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: (4.5-rc1) Problems using xl migrate
2014-11-24 14:55 ` Ian Campbell
@ 2014-11-24 19:28 ` Daniel De Graaf
0 siblings, 0 replies; 25+ messages in thread
From: Daniel De Graaf @ 2014-11-24 19:28 UTC (permalink / raw)
To: Ian Campbell, Andrew Cooper; +Cc: xen-devel, Wei Liu, M A Young
On 11/24/2014 09:55 AM, Ian Campbell wrote:
> On Mon, 2014-11-24 at 14:43 +0000, Andrew Cooper wrote:
>> On 24/11/14 14:32, M A Young wrote:
>>> On Mon, 24 Nov 2014, Andrew Cooper wrote:
>>>> Is XSM in use? I can't think of any other reason why that hypercall
>>>> would fail with EPERM.
>>>
>>> XSM is built in (I wanted to allow the option of people using it) but
>>> I didn't think it was active.
>>
>> I don't believe there is any concept of "available but not active",
>
> I think there is, the "dummy" policy which is loaded when there is no
> explicit policy given should behave as if xsm were disabled. AIUI all
> the XSM_* and xsm_default_action stuff is supposed to semi automatically
> ensure this is the case at compile time. CC-ing Daniel to confirm/deny.
Yes. The case where XSM is enabled at compile time but using the dummy
module is supposed to produce identical behavior to disabling XSM at
compile time.
The hypervisor parameter flask_enabled controls this run-time switching.
>> which probably means that the default policy is missing an entry for
>> this hypercall.
>
> That said domctl is XSM_OTHER, which basically means "special one off
> handling" I think. But it basically turns into XSM_DM_PRIV for a small
> handful of subops and XSM_PRIV for the rest. Since this is a migration
> the relevant domain is certainly PRIV I think.
>
> Ian.
>
>> Can you check the hypervisor console around this failure and see whether
>> a flask error concerning domctl 72 is reported?
>>
>> ~Andrew
If you get any mention of AVC messages, then FLASK is active and the dummy
policy is not being used. The FLASK security server can be active without
loading a policy: this is intended to allow dom0 to load the XSM policy in
cases where it is not possible to have the bootloader do it (which is the
preferred method).
If FLASK is active, then any domctl not in the list of handled domctls (see
the large switch statement in xsm/flask/hooks.c) will return -EPERM and
will print an error to the hypervisor console, as Andrew pointed out.
--
Daniel De Graaf
National Security Agency
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: (4.5-rc1) Problems using xl migrate
2014-11-24 14:43 ` Andrew Cooper
2014-11-24 14:55 ` Ian Campbell
@ 2014-11-24 20:12 ` M A Young
2014-11-24 22:05 ` Daniel De Graaf
1 sibling, 1 reply; 25+ messages in thread
From: M A Young @ 2014-11-24 20:12 UTC (permalink / raw)
To: Andrew Cooper; +Cc: Daniel De Graaf, Wei Liu, Ian Campbell, xen-devel
On Mon, 24 Nov 2014, Andrew Cooper wrote:
> On 24/11/14 14:32, M A Young wrote:
>> On Mon, 24 Nov 2014, Andrew Cooper wrote:
>>
>>> On 24/11/14 12:41, Wei Liu wrote:
>>>> On Sat, Nov 22, 2014 at 07:24:21PM +0000, M A Young wrote:
>>>>> While investigating a bug reported on Red Hat Bugzilla
>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>>>>> I discovered the following
>>>>>
>>>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv
>>>>> (the bug
>>>>> report is for Xen 4.3 hvm ) when xl migrate domid localhost works.
>>>>> There are
>>>>> actually two issues here
>>>>>
>>>>> * the segfault in libxl-save-helper --restore-domain (as reported
>>>>> in the bug
>>>>> above) occurs if the guest memory is 1024M (on my 4G box) and is
>>>>> presumably
>>>>> because the allocated memory eventually runs out
>>>>>
>>>>> * the segfault doesn't occur if the guest memory is 128M, but the
>>>>> migration
>>>>> still fails. The first attached file contains the log from a run
>>>>> with xl -v
>>>>> migrate --debug domid localhost (with mfn and duplicated lines
>>>>> stripped out
>>>>> to make the size manageable).
>>>>>
>>>>> I then tried xen 4.5-rc1 to see if the bug was fixed and found that xl
>>>>> migrate doesn't work for me at all - see the second attached file
>>>>> for the
>>>>> output of xl -v migrate domid localhost .
>>>>>
>>>>> Mchael Young
>>>> [...]
>>>>> xc: detail: delta 15801ms, dom0 95%, target 0%, sent 543Mb/s,
>>>>> dirtied 0Mb/s 314 pages
>>>>> xc: detail: Mapping order 0, 268; first pfn 3fcf4
>>>>> xc: detail: delta 23ms, dom0 100%, target 0%, sent 447Mb/s, dirtied
>>>>> 0Mb/s 0 pages
>>>>> xc: detail: Start last iteration
>>>>> xc: Reloading memory pages: 262213/262144 100%xc: detail: SUSPEND
>>>>> shinfo 00082fbc
>>>>> xc: detail: delta 17ms, dom0 58%, target 58%, sent 0Mb/s, dirtied
>>>>> 1033Mb/s 536 pages
>>>>> xc: detail: delta 8ms, dom0 100%, target 0%, sent 2195Mb/s, dirtied
>>>>> 2195Mb/s 536 pages
>>>>> xc: detail: Total pages sent= 262749 (1.00x)
>>>>> xc: detail: (of which 0 were fixups)
>>>>> xc: detail: All memory is saved
>>>>> xc: error: Error querying maximum number of MSRs for VCPU0 (1 =
>>>>> Operation not permitted): Internal error
>>>> Per your description this is the output of "xl -v migrate domid
>>>> localhost", so no "--debug" is involved. (Just to make sure...)
>>>>
>>>> This error message means a domctl fails, which should be addressed
>>>> first?
>>>>
>>>> FWIW I tried "xl -v migrate domid localhost" for a PV guest it worked
>>>> for me. :-(
>>>>
>>>> Is there anything I need to do to trigger this failure?
>>>
>>> Is XSM in use? I can't think of any other reason why that hypercall
>>> would fail with EPERM.
>>
>> XSM is built in (I wanted to allow the option of people using it) but
>> I didn't think it was active.
>>
>> Michael Young
>
> I don't believe there is any concept of "available but not active",
> which probably means that the default policy is missing an entry for
> this hypercall.
>
> Can you check the hypervisor console around this failure and see whether
> a flask error concerning domctl 72 is reported?
I do. The error is
(XEN) flask_domctl: Unknown op 72
Incidentally, Flask is running in permissive mode.
Michael Young
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: (4.5-rc1) Problems using xl migrate
2014-11-24 20:12 ` M A Young
@ 2014-11-24 22:05 ` Daniel De Graaf
2014-11-25 10:07 ` George Dunlap
0 siblings, 1 reply; 25+ messages in thread
From: Daniel De Graaf @ 2014-11-24 22:05 UTC (permalink / raw)
To: M A Young, Andrew Cooper; +Cc: Wei Liu, Ian Campbell, xen-devel
On 11/24/2014 03:12 PM, M A Young wrote:
> On Mon, 24 Nov 2014, Andrew Cooper wrote:
>> On 24/11/14 14:32, M A Young wrote:
>>> On Mon, 24 Nov 2014, Andrew Cooper wrote:
>>>> On 24/11/14 12:41, Wei Liu wrote:
>>>>> On Sat, Nov 22, 2014 at 07:24:21PM +0000, M A Young wrote:
>>>>>> While investigating a bug reported on Red Hat Bugzilla
>>>>>> https://bugzilla.redhat.com/show_bug.cgi?id=1166461
>>>>>> I discovered the following
>>>>>>
>>>>>> xl migrate --debug domid localhost does indeed fail for Xen 4.4 pv
>>>>>> (the bug
>>>>>> report is for Xen 4.3 hvm ) when xl migrate domid localhost works.
>>>>>> There are
>>>>>> actually two issues here
>>>>>>
>>>>>> * the segfault in libxl-save-helper --restore-domain (as reported
>>>>>> in the bug
>>>>>> above) occurs if the guest memory is 1024M (on my 4G box) and is
>>>>>> presumably
>>>>>> because the allocated memory eventually runs out
>>>>>>
>>>>>> * the segfault doesn't occur if the guest memory is 128M, but the
>>>>>> migration
>>>>>> still fails. The first attached file contains the log from a run
>>>>>> with xl -v
>>>>>> migrate --debug domid localhost (with mfn and duplicated lines
>>>>>> stripped out
>>>>>> to make the size manageable).
>>>>>>
>>>>>> I then tried xen 4.5-rc1 to see if the bug was fixed and found that xl
>>>>>> migrate doesn't work for me at all - see the second attached file
>>>>>> for the
>>>>>> output of xl -v migrate domid localhost .
>>>>>>
>>>>>> Mchael Young
>>>>> [...]
>>>>>> xc: detail: delta 15801ms, dom0 95%, target 0%, sent 543Mb/s,
>>>>>> dirtied 0Mb/s 314 pages
>>>>>> xc: detail: Mapping order 0, 268; first pfn 3fcf4
>>>>>> xc: detail: delta 23ms, dom0 100%, target 0%, sent 447Mb/s, dirtied
>>>>>> 0Mb/s 0 pages
>>>>>> xc: detail: Start last iteration
>>>>>> xc: Reloading memory pages: 262213/262144 100%xc: detail: SUSPEND
>>>>>> shinfo 00082fbc
>>>>>> xc: detail: delta 17ms, dom0 58%, target 58%, sent 0Mb/s, dirtied
>>>>>> 1033Mb/s 536 pages
>>>>>> xc: detail: delta 8ms, dom0 100%, target 0%, sent 2195Mb/s, dirtied
>>>>>> 2195Mb/s 536 pages
>>>>>> xc: detail: Total pages sent= 262749 (1.00x)
>>>>>> xc: detail: (of which 0 were fixups)
>>>>>> xc: detail: All memory is saved
>>>>>> xc: error: Error querying maximum number of MSRs for VCPU0 (1 =
>>>>>> Operation not permitted): Internal error
>>>>> Per your description this is the output of "xl -v migrate domid
>>>>> localhost", so no "--debug" is involved. (Just to make sure...)
>>>>>
>>>>> This error message means a domctl fails, which should be addressed
>>>>> first?
>>>>>
>>>>> FWIW I tried "xl -v migrate domid localhost" for a PV guest it worked
>>>>> for me. :-(
>>>>>
>>>>> Is there anything I need to do to trigger this failure?
>>>>
>>>> Is XSM in use? I can't think of any other reason why that hypercall
>>>> would fail with EPERM.
>>>
>>> XSM is built in (I wanted to allow the option of people using it) but
>>> I didn't think it was active.
>>>
>>> Michael Young
>>
>> I don't believe there is any concept of "available but not active",
>> which probably means that the default policy is missing an entry for
>> this hypercall.
>>
>> Can you check the hypervisor console around this failure and see whether
>> a flask error concerning domctl 72 is reported?
>
> I do. The error is
> (XEN) flask_domctl: Unknown op 72
>
> Incidentally, Flask is running in permissive mode.
>
> Michael Young
>
This means that the new domctl needs to be added to the switch statement
in flask/hooks.c. This error is triggered in permissive mode because it
is a code error rather than a policy error (which is what permissive mode
is intended to debug).
It looks like neither XEN_DOMCTL_get_vcpu_msrs or XEN_DOMCTL_set_vcpu_msrs
have a FLASK hook. Andrew, did you want to add these since you introduced
the ops?
Unless you can think of a reason why there would be a reason to split the
access, I think it makes sense to reuse the permissions that are used for
XEN_DOMCTL_{get,set}_ext_vcpucontext.
--
Daniel De Graaf
National Security Agency
^ permalink raw reply [flat|nested] 25+ messages in thread* Re: (4.5-rc1) Problems using xl migrate
2014-11-24 22:05 ` Daniel De Graaf
@ 2014-11-25 10:07 ` George Dunlap
2014-11-25 18:03 ` Daniel De Graaf
0 siblings, 1 reply; 25+ messages in thread
From: George Dunlap @ 2014-11-25 10:07 UTC (permalink / raw)
To: Daniel De Graaf
Cc: Andrew Cooper, xen-devel@lists.xen.org, Wei Liu, Ian Campbell,
M A Young
On Mon, Nov 24, 2014 at 10:05 PM, Daniel De Graaf <dgdegra@tycho.nsa.gov> wrote:
>> I do. The error is
>> (XEN) flask_domctl: Unknown op 72
>>
>> Incidentally, Flask is running in permissive mode.
>>
>> Michael Young
>>
>
> This means that the new domctl needs to be added to the switch statement
> in flask/hooks.c. This error is triggered in permissive mode because it
> is a code error rather than a policy error (which is what permissive mode
> is intended to debug).
If that's the case, should we make that a BUG_ON()? Or at least an
ASSERT() (which will only bug when compiled with debug=y), followed by
allow if in permissive mode, and deny if in enforcing mode?
Having it default deny, even in permissive mode, breaks the "principle
of least surprise", I think. :-)
-George
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: (4.5-rc1) Problems using xl migrate
2014-11-25 10:07 ` George Dunlap
@ 2014-11-25 18:03 ` Daniel De Graaf
2014-11-25 18:17 ` Konrad Rzeszutek Wilk
0 siblings, 1 reply; 25+ messages in thread
From: Daniel De Graaf @ 2014-11-25 18:03 UTC (permalink / raw)
To: George Dunlap
Cc: Andrew Cooper, xen-devel@lists.xen.org, Wei Liu, Ian Campbell,
M A Young
On 11/25/2014 05:07 AM, George Dunlap wrote:
> On Mon, Nov 24, 2014 at 10:05 PM, Daniel De Graaf <dgdegra@tycho.nsa.gov> wrote:
>>> I do. The error is
>>> (XEN) flask_domctl: Unknown op 72
>>>
>>> Incidentally, Flask is running in permissive mode.
>>>
>>> Michael Young
>>>
>>
>> This means that the new domctl needs to be added to the switch statement
>> in flask/hooks.c. This error is triggered in permissive mode because it
>> is a code error rather than a policy error (which is what permissive mode
>> is intended to debug).
>
> If that's the case, should we make that a BUG_ON()? Or at least an
> ASSERT() (which will only bug when compiled with debug=y), followed by
> allow if in permissive mode, and deny if in enforcing mode?
>
> Having it default deny, even in permissive mode, breaks the "principle
> of least surprise", I think. :-)
>
> -George
Either one of these will allow a guest to crash the hypervisor by requesting
an undefined domctl, which is not really a good idea. Linux uses a flag in
the security policy which defines if unknown permissions are allowed or
denied; I will send a patch adding this to Xen's security server and using
it instead of -EPERM in the default case of the switch statements.
The patch adding this feature probably shouldn't be applied to 4.5, but I'll
send it anyway. I will also send a separate patch adding the 2 domctls.
--
Daniel De Graaf
National Security Agency
^ permalink raw reply [flat|nested] 25+ messages in thread
* Re: (4.5-rc1) Problems using xl migrate
2014-11-25 18:03 ` Daniel De Graaf
@ 2014-11-25 18:17 ` Konrad Rzeszutek Wilk
0 siblings, 0 replies; 25+ messages in thread
From: Konrad Rzeszutek Wilk @ 2014-11-25 18:17 UTC (permalink / raw)
To: Daniel De Graaf
Cc: Wei Liu, Ian Campbell, Andrew Cooper, George Dunlap,
xen-devel@lists.xen.org, M A Young
On Tue, Nov 25, 2014 at 01:03:34PM -0500, Daniel De Graaf wrote:
> On 11/25/2014 05:07 AM, George Dunlap wrote:
> >On Mon, Nov 24, 2014 at 10:05 PM, Daniel De Graaf <dgdegra@tycho.nsa.gov> wrote:
> >>>I do. The error is
> >>>(XEN) flask_domctl: Unknown op 72
> >>>
> >>>Incidentally, Flask is running in permissive mode.
> >>>
> >>> Michael Young
> >>>
> >>
> >>This means that the new domctl needs to be added to the switch statement
> >>in flask/hooks.c. This error is triggered in permissive mode because it
> >>is a code error rather than a policy error (which is what permissive mode
> >>is intended to debug).
> >
> >If that's the case, should we make that a BUG_ON()? Or at least an
> >ASSERT() (which will only bug when compiled with debug=y), followed by
> >allow if in permissive mode, and deny if in enforcing mode?
> >
> >Having it default deny, even in permissive mode, breaks the "principle
> >of least surprise", I think. :-)
> >
> > -George
> Either one of these will allow a guest to crash the hypervisor by requesting
> an undefined domctl, which is not really a good idea. Linux uses a flag in
> the security policy which defines if unknown permissions are allowed or
> denied; I will send a patch adding this to Xen's security server and using
> it instead of -EPERM in the default case of the switch statements.
Thought I think that for the DEBUG case we want to still be boldly
told about it so we can fix it.
>
> The patch adding this feature probably shouldn't be applied to 4.5, but I'll
> send it anyway. I will also send a separate patch adding the 2 domctls.
>
> --
> Daniel De Graaf
> National Security Agency
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xen.org
> http://lists.xen.org/xen-devel
^ permalink raw reply [flat|nested] 25+ messages in thread
end of thread, other threads:[~2014-11-25 22:32 UTC | newest]
Thread overview: 25+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2014-11-22 19:24 Problems using xl migrate M A Young
2014-11-24 0:07 ` M A Young
2014-11-24 11:50 ` George Dunlap
2014-11-24 12:06 ` M A Young
2014-11-24 12:21 ` Ian Campbell
2014-11-24 12:29 ` M A Young
2014-11-24 13:13 ` Andrew Cooper
2014-11-24 14:09 ` Wei Liu
2014-11-24 14:13 ` Andrew Cooper
2014-11-25 8:52 ` M A Young
2014-11-25 9:15 ` Wei Liu
2014-11-25 22:16 ` M A Young
2014-11-25 22:32 ` Andrew Cooper
2014-11-24 12:25 ` George Dunlap
2014-11-24 12:41 ` Wei Liu
2014-11-24 13:15 ` Andrew Cooper
2014-11-24 14:32 ` (4.5-rc1) " M A Young
2014-11-24 14:43 ` Andrew Cooper
2014-11-24 14:55 ` Ian Campbell
2014-11-24 19:28 ` Daniel De Graaf
2014-11-24 20:12 ` M A Young
2014-11-24 22:05 ` Daniel De Graaf
2014-11-25 10:07 ` George Dunlap
2014-11-25 18:03 ` Daniel De Graaf
2014-11-25 18:17 ` Konrad Rzeszutek Wilk
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.