* Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 @ 2015-02-05 14:33 Stefan Bader 2015-02-05 19:36 ` [Xen-devel] " Konrad Rzeszutek Wilk 2015-02-27 11:29 ` Stefan Bader 0 siblings, 2 replies; 10+ messages in thread From: Stefan Bader @ 2015-02-05 14:33 UTC (permalink / raw) To: xen-devel@lists.xensource.com, linux-acpi@vger.kernel.org [-- Attachment #1: Type: text/plain, Size: 4201 bytes --] While experimenting/testing various kernel versions I discovered that trying to boot a Haswell based hosts will always crash when booting as Xen dom0 (Xen-4.4.1). The same crash happens since v3.19-rc1 and still does happen with v3.19-rc7. A bare metal boot is having no issues and also an Opteron based host is having no issues (dom0 and bare metal). Could be a table that the other host does not have and since its only happening in dom0 maybe some cpu capability that needs to be passed on? [ 2.108038] ACPI: Core revision 20141107 [ 2.108153] ACPI Warning: Unsupported module-level executable opcode 0x91 at table offset 0x002B (20141107/psloop-225) [ 2.108264] ACPI Warning: Unsupported module-level executable opcode 0x91 at table offset 0x0033 (20141107/psloop-225) [ 2.108375] ACPI Warning: Unsupported module-level executable opcode 0x95 at table offset 0x0038 (20141107/psloop-225) [ 2.108489] ACPI Warning: Unsupported module-level executable opcode 0x95 at table offset 0x0041 (20141107/psloop-225) [ 2.108613] ACPI Warning: Unsupported module-level executable opcode 0x7D at table offset 0x040D (20141107/psloop-225) [ 2.108751] BUG: unable to handle kernel paging request at ffffc90000ee74e0 [ 2.108835] IP: [<ffffffff814573db>] acpi_ps_peek_opcode+0xd/0x1f [ 2.108902] PGD 1f4be067 PUD 1f4bd067 PMD 1488f067 PTE 0 [ 2.109018] Oops: 0000 [#1] SMP [ 2.109094] Modules linked in: [ 2.109153] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-031900rc7-generi c #201502020035 [ 2.109220] Hardware name: Intel Corporation Shark Bay Client platform/Flathe ad Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013 [ 2.109295] task: ffffffff81c1c500 ti: ffffffff81c00000 task.ti: ffffffff81c0 0000 [ 2.109360] RIP: e030:[<ffffffff814573db>] [<ffffffff814573db>] acpi_ps_peek _opcode+0xd/0x1f [ 2.109445] RSP: e02b:ffffffff81c03ce8 EFLAGS: 00010283 [ 2.109490] RAX: 000000000000000c RBX: ffff880014887000 RCX: ffffffff81c03d50 [ 2.109539] RDX: ffffc90000ee74e0 RSI: ffff880014887030 RDI: ffff880014887030 [ 2.109587] RBP: ffffffff81c03ce8 R08: ffffea0000522600 R09: ffffffff81432c4f [ 2.109635] R10: ffff880014899090 R11: 00000000000000ba R12: ffff880014887030 [ 2.109684] R13: ffff880014887000 R14: ffffffff81c03d50 R15: 000000000000000d [ 2.109735] FS: 0000000000000000(0000) GS:ffff880018c00000(0000) knlGS:00000 00000000000 [ 2.109836] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 2.109881] CR2: ffffc90000ee74e0 CR3: 0000000001c15000 CR4: 0000000000042660 [ 2.109930] Stack: [ 2.109968] ffffffff81c03d38 ffffffff81456537 ffffffff81c03d28 ffffffff81457 a40 [ 2.110104] ffff880014887000 ffff880014887000 ffff8800148990c0 ffffc90000ee7 4e0 [ 2.110238] ffff880014887030 0000000000000000 ffffffff81c03d78 ffffffff81456 760 [ 2.110373] Call Trace: [ 2.110413] [<ffffffff81456537>] acpi_ps_get_next_arg+0x114/0x1f9 [ 2.110461] [<ffffffff81457a40>] ? acpi_ps_pop_scope+0x54/0x72 [ 2.110508] [<ffffffff81456760>] acpi_ps_get_arguments+0x91/0x228 [ 2.110555] [<ffffffff81456ad2>] acpi_ps_parse_loop+0x1db/0x311 [ 2.110602] [<ffffffff81457705>] acpi_ps_parse_aml+0x96/0x275 [ 2.110649] [<ffffffff8145322f>] acpi_ns_one_complete_parse+0xf7/0x114 [ 2.110698] [<ffffffff817d149a>] ? _raw_spin_lock_irqsave+0x1a/0x60 [ 2.110746] [<ffffffff8145326c>] acpi_ns_parse_table+0x20/0x38 [ 2.110792] [<ffffffff81452c20>] acpi_ns_load_table+0x4c/0x90 [ 2.110840] [<ffffffff817c50b5>] acpi_tb_load_namespace+0xa6/0x14a [ 2.110889] [<ffffffff81d83269>] acpi_load_tables+0xc/0x35 [ 2.110935] [<ffffffff81454bf6>] ? acpi_ns_get_node+0xb7/0xc9 [ 2.110982] [<ffffffff81d825cf>] acpi_early_init+0x73/0x105 [ 2.111029] [<ffffffff81d3b083>] start_kernel+0x348/0x3f0 [ 2.111075] [<ffffffff81d3abcd>] ? set_init_arg+0x56/0x56 [ 2.111121] [<ffffffff81d3a5f8>] x86_64_start_reservations+0x2a/0x2c [ 2.111169] [<ffffffff81d3e88c>] xen_start_kernel+0x4f5/0x4f7 [ 2.111215] Code: 8a 87 60 05 87 81 5d c3 e8 73 cc 37 00 55 81 ff 00 01 00 00 19 c0 48 89 e5 83 c0 02 5d c3 e8 5d cc 3 [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xen-devel] Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 2015-02-05 14:33 Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 Stefan Bader @ 2015-02-05 19:36 ` Konrad Rzeszutek Wilk 2015-02-09 13:07 ` Stefan Bader 2015-02-27 11:29 ` Stefan Bader 1 sibling, 1 reply; 10+ messages in thread From: Konrad Rzeszutek Wilk @ 2015-02-05 19:36 UTC (permalink / raw) To: Stefan Bader; +Cc: xen-devel@lists.xensource.com, linux-acpi@vger.kernel.org On Thu, Feb 05, 2015 at 03:33:02PM +0100, Stefan Bader wrote: > While experimenting/testing various kernel versions I discovered that trying to > boot a Haswell based hosts will always crash when booting as Xen dom0 > (Xen-4.4.1). The same crash happens since v3.19-rc1 and still does happen with > v3.19-rc7. A bare metal boot is having no issues and also an Opteron based host > is having no issues (dom0 and bare metal). > Could be a table that the other host does not have and since its only happening > in dom0 maybe some cpu capability that needs to be passed on? Usually it means that the ACPI AML code is trying to do something with the IOAPIC or something wihch is not accessible. But this on the other hand looks to be trying to execute some AML code that is unknown. Any chance you cna disassemble it and perhaps also run with acpi debug options on to figure out where it blows up? > > [ 2.108038] ACPI: Core revision 20141107 > [ 2.108153] ACPI Warning: Unsupported module-level executable opcode 0x91 at > table offset 0x002B (20141107/psloop-225) > [ 2.108264] ACPI Warning: Unsupported module-level executable opcode 0x91 at > table offset 0x0033 (20141107/psloop-225) > [ 2.108375] ACPI Warning: Unsupported module-level executable opcode 0x95 at > table offset 0x0038 (20141107/psloop-225) > [ 2.108489] ACPI Warning: Unsupported module-level executable opcode 0x95 at > table offset 0x0041 (20141107/psloop-225) > [ 2.108613] ACPI Warning: Unsupported module-level executable opcode 0x7D at > table offset 0x040D (20141107/psloop-225) > [ 2.108751] BUG: unable to handle kernel paging request at ffffc90000ee74e0 > [ 2.108835] IP: [<ffffffff814573db>] acpi_ps_peek_opcode+0xd/0x1f > [ 2.108902] PGD 1f4be067 PUD 1f4bd067 PMD 1488f067 PTE 0 > [ 2.109018] Oops: 0000 [#1] SMP > [ 2.109094] Modules linked in: > [ 2.109153] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-031900rc7-generi > c #201502020035 > [ 2.109220] Hardware name: Intel Corporation Shark Bay Client platform/Flathe > ad Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013 > [ 2.109295] task: ffffffff81c1c500 ti: ffffffff81c00000 task.ti: ffffffff81c0 > 0000 > [ 2.109360] RIP: e030:[<ffffffff814573db>] [<ffffffff814573db>] acpi_ps_peek > _opcode+0xd/0x1f > [ 2.109445] RSP: e02b:ffffffff81c03ce8 EFLAGS: 00010283 > [ 2.109490] RAX: 000000000000000c RBX: ffff880014887000 RCX: ffffffff81c03d50 > [ 2.109539] RDX: ffffc90000ee74e0 RSI: ffff880014887030 RDI: ffff880014887030 > [ 2.109587] RBP: ffffffff81c03ce8 R08: ffffea0000522600 R09: ffffffff81432c4f > [ 2.109635] R10: ffff880014899090 R11: 00000000000000ba R12: ffff880014887030 > [ 2.109684] R13: ffff880014887000 R14: ffffffff81c03d50 R15: 000000000000000d > [ 2.109735] FS: 0000000000000000(0000) GS:ffff880018c00000(0000) knlGS:00000 > 00000000000 > [ 2.109836] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 2.109881] CR2: ffffc90000ee74e0 CR3: 0000000001c15000 CR4: 0000000000042660 > [ 2.109930] Stack: > [ 2.109968] ffffffff81c03d38 ffffffff81456537 ffffffff81c03d28 ffffffff81457 > a40 > [ 2.110104] ffff880014887000 ffff880014887000 ffff8800148990c0 ffffc90000ee7 > 4e0 > [ 2.110238] ffff880014887030 0000000000000000 ffffffff81c03d78 ffffffff81456 > 760 > [ 2.110373] Call Trace: > [ 2.110413] [<ffffffff81456537>] acpi_ps_get_next_arg+0x114/0x1f9 > [ 2.110461] [<ffffffff81457a40>] ? acpi_ps_pop_scope+0x54/0x72 > [ 2.110508] [<ffffffff81456760>] acpi_ps_get_arguments+0x91/0x228 > [ 2.110555] [<ffffffff81456ad2>] acpi_ps_parse_loop+0x1db/0x311 > [ 2.110602] [<ffffffff81457705>] acpi_ps_parse_aml+0x96/0x275 > [ 2.110649] [<ffffffff8145322f>] acpi_ns_one_complete_parse+0xf7/0x114 > [ 2.110698] [<ffffffff817d149a>] ? _raw_spin_lock_irqsave+0x1a/0x60 > [ 2.110746] [<ffffffff8145326c>] acpi_ns_parse_table+0x20/0x38 > [ 2.110792] [<ffffffff81452c20>] acpi_ns_load_table+0x4c/0x90 > [ 2.110840] [<ffffffff817c50b5>] acpi_tb_load_namespace+0xa6/0x14a > [ 2.110889] [<ffffffff81d83269>] acpi_load_tables+0xc/0x35 > [ 2.110935] [<ffffffff81454bf6>] ? acpi_ns_get_node+0xb7/0xc9 > [ 2.110982] [<ffffffff81d825cf>] acpi_early_init+0x73/0x105 > [ 2.111029] [<ffffffff81d3b083>] start_kernel+0x348/0x3f0 > [ 2.111075] [<ffffffff81d3abcd>] ? set_init_arg+0x56/0x56 > [ 2.111121] [<ffffffff81d3a5f8>] x86_64_start_reservations+0x2a/0x2c > [ 2.111169] [<ffffffff81d3e88c>] xen_start_kernel+0x4f5/0x4f7 > [ 2.111215] Code: 8a 87 60 05 87 81 5d c3 e8 73 cc 37 00 55 81 ff 00 01 00 00 > 19 c0 48 89 e5 83 c0 02 5d c3 e8 5d cc 3 > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xen-devel] Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 2015-02-05 19:36 ` [Xen-devel] " Konrad Rzeszutek Wilk @ 2015-02-09 13:07 ` Stefan Bader 2015-02-09 13:33 ` Stefan Bader 0 siblings, 1 reply; 10+ messages in thread From: Stefan Bader @ 2015-02-09 13:07 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xensource.com, linux-acpi@vger.kernel.org, Juergen Gross, David Vrabel [-- Attachment #1: Type: text/plain, Size: 8936 bytes --] On 05.02.2015 20:36, Konrad Rzeszutek Wilk wrote: > On Thu, Feb 05, 2015 at 03:33:02PM +0100, Stefan Bader wrote: >> While experimenting/testing various kernel versions I discovered that trying to >> boot a Haswell based hosts will always crash when booting as Xen dom0 >> (Xen-4.4.1). The same crash happens since v3.19-rc1 and still does happen with >> v3.19-rc7. A bare metal boot is having no issues and also an Opteron based host >> is having no issues (dom0 and bare metal). >> Could be a table that the other host does not have and since its only happening >> in dom0 maybe some cpu capability that needs to be passed on? > > Usually it means that the ACPI AML code is trying to do something with > the IOAPIC or something wihch is not accessible. > > But this on the other hand looks to be trying to execute some AML code > that is unknown. Any chance you cna disassemble it and perhaps also > run with acpi debug options on to figure out where it blows up? The weird thing here is that bare-metal on the same machine does work. And previous kernels did work as well. So I think we can assume the ACPI tables are ok. It could even be a red-herring. Well, likely is as booting with acpi=off does hang instead of crashing. Since I got no clue, I did what we always do when we are dumbfound, I went ahead and bisected 3.18..3.19-rc1. Unfortunately the very last kernel I build was something in between good and bad. Good as it did not crash exactly but bad as it did not come up in a usable state. So I would not be sure the claimed to be offending commit is right. Could be one in the range of: G * xen: use common page allocation function in p2m.c * xen: Delay remapping memory of pv-domain g * xen: Delay m2p_override initialization -> * xen: Delay invalidating extra memory B * x86: Introduce function to get pmd entry pointer (G) really good, (g) somewhat not bad, (B) bad, (->) claimed first broken. So it seems one of the delaying changes has a very bad effect on that Sharkbay. A bit odd since none of those sounds Intel/AMD geared. Could only be a different usage of memory (my AMD box has considerably more memory and also no CPU with GPU functionality as the Haswell). Jürgen, maybe some description that might trigger an idea for you...? -Stefan --- git bisect start # good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18 git bisect good b2776bf7149bddd1f4161f14f79520f17fc1d71d # bad: [97bf6af1f928216fd6c5a66e8a57bfa95a659672] Linux 3.19-rc1 git bisect bad 97bf6af1f928216fd6c5a66e8a57bfa95a659672 # good: [70e71ca0af244f48a5dcf56dc435243792e3a495] Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next git bisect good 70e71ca0af244f48a5dcf56dc435243792e3a495 # good: [988adfdffdd43cfd841df734664727993076d7cb] Merge branch 'drm-next' of git://people.freedesktop.org/~airlied/linux git bisect good 988adfdffdd43cfd841df734664727993076d7cb # good: [b024793188002b9eed452b5f6a04d45003ed5772] staging: rtl8723au: phy_SsPwrSwitch92CU() was never called with bRegSSPwrLvl != 1 git bisect good b024793188002b9eed452b5f6a04d45003ed5772 # bad: [66dcff86ba40eebb5133cccf450878f2bba102ef] Merge tag 'for-linus' of git://git.kernel.org/pub/scm/virt/kvm/kvm git bisect bad 66dcff86ba40eebb5133cccf450878f2bba102ef # bad: [d6666be6f0c43efb9475d1d35fbef9f8be61b7b1] Merge tag 'for-linus-20141215' of git://git.infradead.org/linux-mtd git bisect bad d6666be6f0c43efb9475d1d35fbef9f8be61b7b1 # bad: [94bbdb63d7ed5ca56b788e43d0ca4a8f9494c9e7] Merge tag 'fixes-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc git bisect bad 94bbdb63d7ed5ca56b788e43d0ca4a8f9494c9e7 # good: [2dbfca5a181973558277b28b1f4c36362291f5e0] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds git bisect good 2dbfca5a181973558277b28b1f4c36362291f5e0 # bad: [0db2812a5240f2663b92d8d4b761122dd2e0c6c3] Merge git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile git bisect bad 0db2812a5240f2663b92d8d4b761122dd2e0c6c3 # bad: [f1d04b23b2015b4c3c0a8419677179b133afea08] Merge branch 'devel/for-linus-3.19' into stable/for-linus-3.19 git bisect bad f1d04b23b2015b4c3c0a8419677179b133afea08 # bad: [792230c3a66b3d17d6dcca712866d24f2283d4a6] x86: Introduce function to get pmd entry pointer git bisect bad 792230c3a66b3d17d6dcca712866d24f2283d4a6 # good: [7108c9ce8f6e59f775b0c8250dba52b569b6cba2] xen: use common page allocation function in p2m.c # NOTE: This was the last really good git bisect good 7108c9ce8f6e59f775b0c8250dba52b569b6cba2 # good: [97f4533a60ce5d0cb35ff44a190111f81a987620] xen: Delay m2p_override initialization # NOTE: This revision did not crash the usual way but was not useable either # NOTE: Use of wrong bits in page-tables. git bisect good 97f4533a60ce5d0cb35ff44a190111f81a987620 > >> >> [ 2.108038] ACPI: Core revision 20141107 >> [ 2.108153] ACPI Warning: Unsupported module-level executable opcode 0x91 at >> table offset 0x002B (20141107/psloop-225) >> [ 2.108264] ACPI Warning: Unsupported module-level executable opcode 0x91 at >> table offset 0x0033 (20141107/psloop-225) >> [ 2.108375] ACPI Warning: Unsupported module-level executable opcode 0x95 at >> table offset 0x0038 (20141107/psloop-225) >> [ 2.108489] ACPI Warning: Unsupported module-level executable opcode 0x95 at >> table offset 0x0041 (20141107/psloop-225) >> [ 2.108613] ACPI Warning: Unsupported module-level executable opcode 0x7D at >> table offset 0x040D (20141107/psloop-225) >> [ 2.108751] BUG: unable to handle kernel paging request at ffffc90000ee74e0 >> [ 2.108835] IP: [<ffffffff814573db>] acpi_ps_peek_opcode+0xd/0x1f >> [ 2.108902] PGD 1f4be067 PUD 1f4bd067 PMD 1488f067 PTE 0 >> [ 2.109018] Oops: 0000 [#1] SMP >> [ 2.109094] Modules linked in: >> [ 2.109153] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-031900rc7-generi >> c #201502020035 >> [ 2.109220] Hardware name: Intel Corporation Shark Bay Client platform/Flathe >> ad Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013 >> [ 2.109295] task: ffffffff81c1c500 ti: ffffffff81c00000 task.ti: ffffffff81c0 >> 0000 >> [ 2.109360] RIP: e030:[<ffffffff814573db>] [<ffffffff814573db>] acpi_ps_peek >> _opcode+0xd/0x1f >> [ 2.109445] RSP: e02b:ffffffff81c03ce8 EFLAGS: 00010283 >> [ 2.109490] RAX: 000000000000000c RBX: ffff880014887000 RCX: ffffffff81c03d50 >> [ 2.109539] RDX: ffffc90000ee74e0 RSI: ffff880014887030 RDI: ffff880014887030 >> [ 2.109587] RBP: ffffffff81c03ce8 R08: ffffea0000522600 R09: ffffffff81432c4f >> [ 2.109635] R10: ffff880014899090 R11: 00000000000000ba R12: ffff880014887030 >> [ 2.109684] R13: ffff880014887000 R14: ffffffff81c03d50 R15: 000000000000000d >> [ 2.109735] FS: 0000000000000000(0000) GS:ffff880018c00000(0000) knlGS:00000 >> 00000000000 >> [ 2.109836] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 2.109881] CR2: ffffc90000ee74e0 CR3: 0000000001c15000 CR4: 0000000000042660 >> [ 2.109930] Stack: >> [ 2.109968] ffffffff81c03d38 ffffffff81456537 ffffffff81c03d28 ffffffff81457 >> a40 >> [ 2.110104] ffff880014887000 ffff880014887000 ffff8800148990c0 ffffc90000ee7 >> 4e0 >> [ 2.110238] ffff880014887030 0000000000000000 ffffffff81c03d78 ffffffff81456 >> 760 >> [ 2.110373] Call Trace: >> [ 2.110413] [<ffffffff81456537>] acpi_ps_get_next_arg+0x114/0x1f9 >> [ 2.110461] [<ffffffff81457a40>] ? acpi_ps_pop_scope+0x54/0x72 >> [ 2.110508] [<ffffffff81456760>] acpi_ps_get_arguments+0x91/0x228 >> [ 2.110555] [<ffffffff81456ad2>] acpi_ps_parse_loop+0x1db/0x311 >> [ 2.110602] [<ffffffff81457705>] acpi_ps_parse_aml+0x96/0x275 >> [ 2.110649] [<ffffffff8145322f>] acpi_ns_one_complete_parse+0xf7/0x114 >> [ 2.110698] [<ffffffff817d149a>] ? _raw_spin_lock_irqsave+0x1a/0x60 >> [ 2.110746] [<ffffffff8145326c>] acpi_ns_parse_table+0x20/0x38 >> [ 2.110792] [<ffffffff81452c20>] acpi_ns_load_table+0x4c/0x90 >> [ 2.110840] [<ffffffff817c50b5>] acpi_tb_load_namespace+0xa6/0x14a >> [ 2.110889] [<ffffffff81d83269>] acpi_load_tables+0xc/0x35 >> [ 2.110935] [<ffffffff81454bf6>] ? acpi_ns_get_node+0xb7/0xc9 >> [ 2.110982] [<ffffffff81d825cf>] acpi_early_init+0x73/0x105 >> [ 2.111029] [<ffffffff81d3b083>] start_kernel+0x348/0x3f0 >> [ 2.111075] [<ffffffff81d3abcd>] ? set_init_arg+0x56/0x56 >> [ 2.111121] [<ffffffff81d3a5f8>] x86_64_start_reservations+0x2a/0x2c >> [ 2.111169] [<ffffffff81d3e88c>] xen_start_kernel+0x4f5/0x4f7 >> [ 2.111215] Code: 8a 87 60 05 87 81 5d c3 e8 73 cc 37 00 55 81 ff 00 01 00 00 >> 19 c0 48 89 e5 83 c0 02 5d c3 e8 5d cc 3 >> > > > >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xen.org >> http://lists.xen.org/xen-devel > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xen-devel] Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 2015-02-09 13:07 ` Stefan Bader @ 2015-02-09 13:33 ` Stefan Bader 2015-02-09 13:49 ` Juergen Gross 0 siblings, 1 reply; 10+ messages in thread From: Stefan Bader @ 2015-02-09 13:33 UTC (permalink / raw) To: Konrad Rzeszutek Wilk Cc: xen-devel@lists.xensource.com, linux-acpi@vger.kernel.org, Juergen Gross, David Vrabel [-- Attachment #1: Type: text/plain, Size: 12083 bytes --] On 09.02.2015 14:07, Stefan Bader wrote: > On 05.02.2015 20:36, Konrad Rzeszutek Wilk wrote: >> On Thu, Feb 05, 2015 at 03:33:02PM +0100, Stefan Bader wrote: >>> While experimenting/testing various kernel versions I discovered that trying to >>> boot a Haswell based hosts will always crash when booting as Xen dom0 >>> (Xen-4.4.1). The same crash happens since v3.19-rc1 and still does happen with >>> v3.19-rc7. A bare metal boot is having no issues and also an Opteron based host >>> is having no issues (dom0 and bare metal). >>> Could be a table that the other host does not have and since its only happening >>> in dom0 maybe some cpu capability that needs to be passed on? >> >> Usually it means that the ACPI AML code is trying to do something with >> the IOAPIC or something wihch is not accessible. >> >> But this on the other hand looks to be trying to execute some AML code >> that is unknown. Any chance you cna disassemble it and perhaps also >> run with acpi debug options on to figure out where it blows up? > > The weird thing here is that bare-metal on the same machine does work. And > previous kernels did work as well. So I think we can assume the ACPI tables are > ok. It could even be a red-herring. Well, likely is as booting with acpi=off > does hang instead of crashing. > > Since I got no clue, I did what we always do when we are dumbfound, I went ahead > and bisected 3.18..3.19-rc1. Unfortunately the very last kernel I build was > something in between good and bad. Good as it did not crash exactly but bad as > it did not come up in a usable state. So I would not be sure the claimed to be > offending commit is right. Could be one in the range of: > > G * xen: use common page allocation function in p2m.c > * xen: Delay remapping memory of pv-domain > g * xen: Delay m2p_override initialization > -> * xen: Delay invalidating extra memory > B * x86: Introduce function to get pmd entry pointer > > (G) really good, (g) somewhat not bad, (B) bad, (->) claimed first broken. Oh, since that all sounds related to E820 in some way: (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009a400 (usable) (XEN) 000000000009a400 - 00000000000a0000 (reserved) (XEN) 00000000000e0000 - 0000000000100000 (reserved) (XEN) 0000000000100000 - 0000000030a48000 (usable) (XEN) 0000000030a48000 - 0000000030a49000 (reserved) (XEN) 0000000030a49000 - 00000000a27f4000 (usable) (XEN) 00000000a27f4000 - 00000000a2ab4000 (reserved) (XEN) 00000000a2ab4000 - 00000000a2fb4000 (ACPI NVS) (XEN) 00000000a2fb4000 - 00000000a2feb000 (ACPI data) (XEN) 00000000a2feb000 - 00000000a3000000 (usable) (XEN) 00000000a3000000 - 00000000afa00000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fec00000 - 00000000fec01000 (reserved) (XEN) 00000000fed00000 - 00000000fed04000 (reserved) (XEN) 00000000fed10000 - 00000000fed1a000 (reserved) (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) (XEN) 00000000fed84000 - 00000000fed85000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000ffc00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 000000024e600000 (usable) and how it looks with a 3.18 boot: [ 0.000000] e820: BIOS-provided physical RAM map: [ 0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable [ 0.000000] Xen: [mem 0x000000000009a400-0x00000000000fffff] reserved [ 0.000000] Xen: [mem 0x0000000000100000-0x0000000030a47fff] usable [ 0.000000] Xen: [mem 0x0000000030a48000-0x0000000030a48fff] reserved [ 0.000000] Xen: [mem 0x0000000030a49000-0x00000000a27f3fff] usable [ 0.000000] Xen: [mem 0x00000000a27f4000-0x00000000a2ab3fff] reserved [ 0.000000] Xen: [mem 0x00000000a2ab4000-0x00000000a2fb3fff] ACPI NVS [ 0.000000] Xen: [mem 0x00000000a2fb4000-0x00000000a2feafff] ACPI data [ 0.000000] Xen: [mem 0x00000000a2feb000-0x00000000a2ffffff] usable [ 0.000000] Xen: [mem 0x00000000a3000000-0x00000000af9fffff] reserved [ 0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved [ 0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec00fff] reserved [ 0.000000] Xen: [mem 0x00000000fed00000-0x00000000fed03fff] reserved [ 0.000000] Xen: [mem 0x00000000fed10000-0x00000000fed19fff] reserved [ 0.000000] Xen: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved [ 0.000000] Xen: [mem 0x00000000fed84000-0x00000000fed84fff] reserved [ 0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved [ 0.000000] Xen: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved [ 0.000000] Xen: [mem 0x0000000100000000-0x00000001bdc59fff] usable [ 0.000000] Xen: [mem 0x00000001bdc5a000-0x000000024e5fffff] unusable Not sure that helps much. I probably have to try comparing later output. But that will need a bit of time. -Stefan > > So it seems one of the delaying changes has a very bad effect on that Sharkbay. > A bit odd since none of those sounds Intel/AMD geared. Could only be a different > usage of memory (my AMD box has considerably more memory and also no CPU with > GPU functionality as the Haswell). > > Jürgen, maybe some description that might trigger an idea for you...? > > -Stefan > > --- > > git bisect start > # good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18 > git bisect good b2776bf7149bddd1f4161f14f79520f17fc1d71d > # bad: [97bf6af1f928216fd6c5a66e8a57bfa95a659672] Linux 3.19-rc1 > git bisect bad 97bf6af1f928216fd6c5a66e8a57bfa95a659672 > # good: [70e71ca0af244f48a5dcf56dc435243792e3a495] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next > git bisect good 70e71ca0af244f48a5dcf56dc435243792e3a495 > # good: [988adfdffdd43cfd841df734664727993076d7cb] Merge branch 'drm-next' of > git://people.freedesktop.org/~airlied/linux > git bisect good 988adfdffdd43cfd841df734664727993076d7cb > # good: [b024793188002b9eed452b5f6a04d45003ed5772] staging: rtl8723au: > phy_SsPwrSwitch92CU() was never called with bRegSSPwrLvl != 1 > git bisect good b024793188002b9eed452b5f6a04d45003ed5772 > # bad: [66dcff86ba40eebb5133cccf450878f2bba102ef] Merge tag 'for-linus' of > git://git.kernel.org/pub/scm/virt/kvm/kvm > git bisect bad 66dcff86ba40eebb5133cccf450878f2bba102ef > # bad: [d6666be6f0c43efb9475d1d35fbef9f8be61b7b1] Merge tag 'for-linus-20141215' > of git://git.infradead.org/linux-mtd > git bisect bad d6666be6f0c43efb9475d1d35fbef9f8be61b7b1 > # bad: [94bbdb63d7ed5ca56b788e43d0ca4a8f9494c9e7] Merge tag 'fixes-for-linus' of > git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc > git bisect bad 94bbdb63d7ed5ca56b788e43d0ca4a8f9494c9e7 > # good: [2dbfca5a181973558277b28b1f4c36362291f5e0] Merge branch 'for-next' of > git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds > git bisect good 2dbfca5a181973558277b28b1f4c36362291f5e0 > # bad: [0db2812a5240f2663b92d8d4b761122dd2e0c6c3] Merge > git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile > git bisect bad 0db2812a5240f2663b92d8d4b761122dd2e0c6c3 > # bad: [f1d04b23b2015b4c3c0a8419677179b133afea08] Merge branch > 'devel/for-linus-3.19' into stable/for-linus-3.19 > git bisect bad f1d04b23b2015b4c3c0a8419677179b133afea08 > # bad: [792230c3a66b3d17d6dcca712866d24f2283d4a6] x86: Introduce function to get > pmd entry pointer > git bisect bad 792230c3a66b3d17d6dcca712866d24f2283d4a6 > # good: [7108c9ce8f6e59f775b0c8250dba52b569b6cba2] xen: use common page > allocation function in p2m.c > # NOTE: This was the last really good > git bisect good 7108c9ce8f6e59f775b0c8250dba52b569b6cba2 > # good: [97f4533a60ce5d0cb35ff44a190111f81a987620] xen: Delay m2p_override > initialization > # NOTE: This revision did not crash the usual way but was not useable either > # NOTE: Use of wrong bits in page-tables. > git bisect good 97f4533a60ce5d0cb35ff44a190111f81a987620 >> >>> >>> [ 2.108038] ACPI: Core revision 20141107 >>> [ 2.108153] ACPI Warning: Unsupported module-level executable opcode 0x91 at >>> table offset 0x002B (20141107/psloop-225) >>> [ 2.108264] ACPI Warning: Unsupported module-level executable opcode 0x91 at >>> table offset 0x0033 (20141107/psloop-225) >>> [ 2.108375] ACPI Warning: Unsupported module-level executable opcode 0x95 at >>> table offset 0x0038 (20141107/psloop-225) >>> [ 2.108489] ACPI Warning: Unsupported module-level executable opcode 0x95 at >>> table offset 0x0041 (20141107/psloop-225) >>> [ 2.108613] ACPI Warning: Unsupported module-level executable opcode 0x7D at >>> table offset 0x040D (20141107/psloop-225) >>> [ 2.108751] BUG: unable to handle kernel paging request at ffffc90000ee74e0 >>> [ 2.108835] IP: [<ffffffff814573db>] acpi_ps_peek_opcode+0xd/0x1f >>> [ 2.108902] PGD 1f4be067 PUD 1f4bd067 PMD 1488f067 PTE 0 >>> [ 2.109018] Oops: 0000 [#1] SMP >>> [ 2.109094] Modules linked in: >>> [ 2.109153] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-031900rc7-generi >>> c #201502020035 >>> [ 2.109220] Hardware name: Intel Corporation Shark Bay Client platform/Flathe >>> ad Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013 >>> [ 2.109295] task: ffffffff81c1c500 ti: ffffffff81c00000 task.ti: ffffffff81c0 >>> 0000 >>> [ 2.109360] RIP: e030:[<ffffffff814573db>] [<ffffffff814573db>] acpi_ps_peek >>> _opcode+0xd/0x1f >>> [ 2.109445] RSP: e02b:ffffffff81c03ce8 EFLAGS: 00010283 >>> [ 2.109490] RAX: 000000000000000c RBX: ffff880014887000 RCX: ffffffff81c03d50 >>> [ 2.109539] RDX: ffffc90000ee74e0 RSI: ffff880014887030 RDI: ffff880014887030 >>> [ 2.109587] RBP: ffffffff81c03ce8 R08: ffffea0000522600 R09: ffffffff81432c4f >>> [ 2.109635] R10: ffff880014899090 R11: 00000000000000ba R12: ffff880014887030 >>> [ 2.109684] R13: ffff880014887000 R14: ffffffff81c03d50 R15: 000000000000000d >>> [ 2.109735] FS: 0000000000000000(0000) GS:ffff880018c00000(0000) knlGS:00000 >>> 00000000000 >>> [ 2.109836] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >>> [ 2.109881] CR2: ffffc90000ee74e0 CR3: 0000000001c15000 CR4: 0000000000042660 >>> [ 2.109930] Stack: >>> [ 2.109968] ffffffff81c03d38 ffffffff81456537 ffffffff81c03d28 ffffffff81457 >>> a40 >>> [ 2.110104] ffff880014887000 ffff880014887000 ffff8800148990c0 ffffc90000ee7 >>> 4e0 >>> [ 2.110238] ffff880014887030 0000000000000000 ffffffff81c03d78 ffffffff81456 >>> 760 >>> [ 2.110373] Call Trace: >>> [ 2.110413] [<ffffffff81456537>] acpi_ps_get_next_arg+0x114/0x1f9 >>> [ 2.110461] [<ffffffff81457a40>] ? acpi_ps_pop_scope+0x54/0x72 >>> [ 2.110508] [<ffffffff81456760>] acpi_ps_get_arguments+0x91/0x228 >>> [ 2.110555] [<ffffffff81456ad2>] acpi_ps_parse_loop+0x1db/0x311 >>> [ 2.110602] [<ffffffff81457705>] acpi_ps_parse_aml+0x96/0x275 >>> [ 2.110649] [<ffffffff8145322f>] acpi_ns_one_complete_parse+0xf7/0x114 >>> [ 2.110698] [<ffffffff817d149a>] ? _raw_spin_lock_irqsave+0x1a/0x60 >>> [ 2.110746] [<ffffffff8145326c>] acpi_ns_parse_table+0x20/0x38 >>> [ 2.110792] [<ffffffff81452c20>] acpi_ns_load_table+0x4c/0x90 >>> [ 2.110840] [<ffffffff817c50b5>] acpi_tb_load_namespace+0xa6/0x14a >>> [ 2.110889] [<ffffffff81d83269>] acpi_load_tables+0xc/0x35 >>> [ 2.110935] [<ffffffff81454bf6>] ? acpi_ns_get_node+0xb7/0xc9 >>> [ 2.110982] [<ffffffff81d825cf>] acpi_early_init+0x73/0x105 >>> [ 2.111029] [<ffffffff81d3b083>] start_kernel+0x348/0x3f0 >>> [ 2.111075] [<ffffffff81d3abcd>] ? set_init_arg+0x56/0x56 >>> [ 2.111121] [<ffffffff81d3a5f8>] x86_64_start_reservations+0x2a/0x2c >>> [ 2.111169] [<ffffffff81d3e88c>] xen_start_kernel+0x4f5/0x4f7 >>> [ 2.111215] Code: 8a 87 60 05 87 81 5d c3 e8 73 cc 37 00 55 81 ff 00 01 00 00 >>> 19 c0 48 89 e5 83 c0 02 5d c3 e8 5d cc 3 >>> >> >> >> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xen.org >>> http://lists.xen.org/xen-devel >> > > [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: [Xen-devel] Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 2015-02-09 13:33 ` Stefan Bader @ 2015-02-09 13:49 ` Juergen Gross 0 siblings, 0 replies; 10+ messages in thread From: Juergen Gross @ 2015-02-09 13:49 UTC (permalink / raw) To: Stefan Bader, Konrad Rzeszutek Wilk Cc: linux-acpi@vger.kernel.org, xen-devel@lists.xensource.com, David Vrabel On 02/09/2015 02:33 PM, Stefan Bader wrote: > On 09.02.2015 14:07, Stefan Bader wrote: >> On 05.02.2015 20:36, Konrad Rzeszutek Wilk wrote: >>> On Thu, Feb 05, 2015 at 03:33:02PM +0100, Stefan Bader wrote: >>>> While experimenting/testing various kernel versions I discovered that trying to >>>> boot a Haswell based hosts will always crash when booting as Xen dom0 >>>> (Xen-4.4.1). The same crash happens since v3.19-rc1 and still does happen with >>>> v3.19-rc7. A bare metal boot is having no issues and also an Opteron based host >>>> is having no issues (dom0 and bare metal). >>>> Could be a table that the other host does not have and since its only happening >>>> in dom0 maybe some cpu capability that needs to be passed on? >>> >>> Usually it means that the ACPI AML code is trying to do something with >>> the IOAPIC or something wihch is not accessible. >>> >>> But this on the other hand looks to be trying to execute some AML code >>> that is unknown. Any chance you cna disassemble it and perhaps also >>> run with acpi debug options on to figure out where it blows up? >> >> The weird thing here is that bare-metal on the same machine does work. And >> previous kernels did work as well. So I think we can assume the ACPI tables are >> ok. It could even be a red-herring. Well, likely is as booting with acpi=off >> does hang instead of crashing. >> >> Since I got no clue, I did what we always do when we are dumbfound, I went ahead >> and bisected 3.18..3.19-rc1. Unfortunately the very last kernel I build was >> something in between good and bad. Good as it did not crash exactly but bad as >> it did not come up in a usable state. So I would not be sure the claimed to be >> offending commit is right. Could be one in the range of: >> >> G * xen: use common page allocation function in p2m.c >> * xen: Delay remapping memory of pv-domain >> g * xen: Delay m2p_override initialization >> -> * xen: Delay invalidating extra memory >> B * x86: Introduce function to get pmd entry pointer >> >> (G) really good, (g) somewhat not bad, (B) bad, (->) claimed first broken. > > Oh, since that all sounds related to E820 in some way: > > (XEN) Xen-e820 RAM map: > (XEN) 0000000000000000 - 000000000009a400 (usable) > (XEN) 000000000009a400 - 00000000000a0000 (reserved) > (XEN) 00000000000e0000 - 0000000000100000 (reserved) > (XEN) 0000000000100000 - 0000000030a48000 (usable) > (XEN) 0000000030a48000 - 0000000030a49000 (reserved) Hmm, this memory hole is at a rather low address. Could it be some vital data (one of kernel, page tables, initrd or p2m map) is located at this address? This would be a problem similar to the one I ran into when trying to test on a machine with 1TB of memory, where the p2m map was too big to fit into contiguous memory. Could you check the addresses where the hypervisor puts this data for Dom0? Juergen > (XEN) 0000000030a49000 - 00000000a27f4000 (usable) > (XEN) 00000000a27f4000 - 00000000a2ab4000 (reserved) > (XEN) 00000000a2ab4000 - 00000000a2fb4000 (ACPI NVS) > (XEN) 00000000a2fb4000 - 00000000a2feb000 (ACPI data) > (XEN) 00000000a2feb000 - 00000000a3000000 (usable) > (XEN) 00000000a3000000 - 00000000afa00000 (reserved) > (XEN) 00000000e0000000 - 00000000f0000000 (reserved) > (XEN) 00000000fec00000 - 00000000fec01000 (reserved) > (XEN) 00000000fed00000 - 00000000fed04000 (reserved) > (XEN) 00000000fed10000 - 00000000fed1a000 (reserved) > (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) > (XEN) 00000000fed84000 - 00000000fed85000 (reserved) > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > (XEN) 00000000ffc00000 - 0000000100000000 (reserved) > (XEN) 0000000100000000 - 000000024e600000 (usable) > > and how it looks with a 3.18 boot: > > [ 0.000000] e820: BIOS-provided physical RAM map: > [ 0.000000] Xen: [mem 0x0000000000000000-0x0000000000099fff] usable > [ 0.000000] Xen: [mem 0x000000000009a400-0x00000000000fffff] reserved > [ 0.000000] Xen: [mem 0x0000000000100000-0x0000000030a47fff] usable > [ 0.000000] Xen: [mem 0x0000000030a48000-0x0000000030a48fff] reserved > [ 0.000000] Xen: [mem 0x0000000030a49000-0x00000000a27f3fff] usable > [ 0.000000] Xen: [mem 0x00000000a27f4000-0x00000000a2ab3fff] reserved > [ 0.000000] Xen: [mem 0x00000000a2ab4000-0x00000000a2fb3fff] ACPI NVS > [ 0.000000] Xen: [mem 0x00000000a2fb4000-0x00000000a2feafff] ACPI data > [ 0.000000] Xen: [mem 0x00000000a2feb000-0x00000000a2ffffff] usable > [ 0.000000] Xen: [mem 0x00000000a3000000-0x00000000af9fffff] reserved > [ 0.000000] Xen: [mem 0x00000000e0000000-0x00000000efffffff] reserved > [ 0.000000] Xen: [mem 0x00000000fec00000-0x00000000fec00fff] reserved > [ 0.000000] Xen: [mem 0x00000000fed00000-0x00000000fed03fff] reserved > [ 0.000000] Xen: [mem 0x00000000fed10000-0x00000000fed19fff] reserved > [ 0.000000] Xen: [mem 0x00000000fed1c000-0x00000000fed1ffff] reserved > [ 0.000000] Xen: [mem 0x00000000fed84000-0x00000000fed84fff] reserved > [ 0.000000] Xen: [mem 0x00000000fee00000-0x00000000feefffff] reserved > [ 0.000000] Xen: [mem 0x00000000ffc00000-0x00000000ffffffff] reserved > [ 0.000000] Xen: [mem 0x0000000100000000-0x00000001bdc59fff] usable > [ 0.000000] Xen: [mem 0x00000001bdc5a000-0x000000024e5fffff] unusable > > Not sure that helps much. I probably have to try comparing later output. But > that will need a bit of time. > > -Stefan > >> >> So it seems one of the delaying changes has a very bad effect on that Sharkbay. >> A bit odd since none of those sounds Intel/AMD geared. Could only be a different >> usage of memory (my AMD box has considerably more memory and also no CPU with >> GPU functionality as the Haswell). >> >> Jürgen, maybe some description that might trigger an idea for you...? >> >> -Stefan >> >> --- >> >> git bisect start >> # good: [b2776bf7149bddd1f4161f14f79520f17fc1d71d] Linux 3.18 >> git bisect good b2776bf7149bddd1f4161f14f79520f17fc1d71d >> # bad: [97bf6af1f928216fd6c5a66e8a57bfa95a659672] Linux 3.19-rc1 >> git bisect bad 97bf6af1f928216fd6c5a66e8a57bfa95a659672 >> # good: [70e71ca0af244f48a5dcf56dc435243792e3a495] Merge >> git://git.kernel.org/pub/scm/linux/kernel/git/davem/net-next >> git bisect good 70e71ca0af244f48a5dcf56dc435243792e3a495 >> # good: [988adfdffdd43cfd841df734664727993076d7cb] Merge branch 'drm-next' of >> git://people.freedesktop.org/~airlied/linux >> git bisect good 988adfdffdd43cfd841df734664727993076d7cb >> # good: [b024793188002b9eed452b5f6a04d45003ed5772] staging: rtl8723au: >> phy_SsPwrSwitch92CU() was never called with bRegSSPwrLvl != 1 >> git bisect good b024793188002b9eed452b5f6a04d45003ed5772 >> # bad: [66dcff86ba40eebb5133cccf450878f2bba102ef] Merge tag 'for-linus' of >> git://git.kernel.org/pub/scm/virt/kvm/kvm >> git bisect bad 66dcff86ba40eebb5133cccf450878f2bba102ef >> # bad: [d6666be6f0c43efb9475d1d35fbef9f8be61b7b1] Merge tag 'for-linus-20141215' >> of git://git.infradead.org/linux-mtd >> git bisect bad d6666be6f0c43efb9475d1d35fbef9f8be61b7b1 >> # bad: [94bbdb63d7ed5ca56b788e43d0ca4a8f9494c9e7] Merge tag 'fixes-for-linus' of >> git://git.kernel.org/pub/scm/linux/kernel/git/arm/arm-soc >> git bisect bad 94bbdb63d7ed5ca56b788e43d0ca4a8f9494c9e7 >> # good: [2dbfca5a181973558277b28b1f4c36362291f5e0] Merge branch 'for-next' of >> git://git.kernel.org/pub/scm/linux/kernel/git/cooloney/linux-leds >> git bisect good 2dbfca5a181973558277b28b1f4c36362291f5e0 >> # bad: [0db2812a5240f2663b92d8d4b761122dd2e0c6c3] Merge >> git://git.kernel.org/pub/scm/linux/kernel/git/cmetcalf/linux-tile >> git bisect bad 0db2812a5240f2663b92d8d4b761122dd2e0c6c3 >> # bad: [f1d04b23b2015b4c3c0a8419677179b133afea08] Merge branch >> 'devel/for-linus-3.19' into stable/for-linus-3.19 >> git bisect bad f1d04b23b2015b4c3c0a8419677179b133afea08 >> # bad: [792230c3a66b3d17d6dcca712866d24f2283d4a6] x86: Introduce function to get >> pmd entry pointer >> git bisect bad 792230c3a66b3d17d6dcca712866d24f2283d4a6 >> # good: [7108c9ce8f6e59f775b0c8250dba52b569b6cba2] xen: use common page >> allocation function in p2m.c >> # NOTE: This was the last really good >> git bisect good 7108c9ce8f6e59f775b0c8250dba52b569b6cba2 >> # good: [97f4533a60ce5d0cb35ff44a190111f81a987620] xen: Delay m2p_override >> initialization >> # NOTE: This revision did not crash the usual way but was not useable either >> # NOTE: Use of wrong bits in page-tables. >> git bisect good 97f4533a60ce5d0cb35ff44a190111f81a987620 >>> >>>> >>>> [ 2.108038] ACPI: Core revision 20141107 >>>> [ 2.108153] ACPI Warning: Unsupported module-level executable opcode 0x91 at >>>> table offset 0x002B (20141107/psloop-225) >>>> [ 2.108264] ACPI Warning: Unsupported module-level executable opcode 0x91 at >>>> table offset 0x0033 (20141107/psloop-225) >>>> [ 2.108375] ACPI Warning: Unsupported module-level executable opcode 0x95 at >>>> table offset 0x0038 (20141107/psloop-225) >>>> [ 2.108489] ACPI Warning: Unsupported module-level executable opcode 0x95 at >>>> table offset 0x0041 (20141107/psloop-225) >>>> [ 2.108613] ACPI Warning: Unsupported module-level executable opcode 0x7D at >>>> table offset 0x040D (20141107/psloop-225) >>>> [ 2.108751] BUG: unable to handle kernel paging request at ffffc90000ee74e0 >>>> [ 2.108835] IP: [<ffffffff814573db>] acpi_ps_peek_opcode+0xd/0x1f >>>> [ 2.108902] PGD 1f4be067 PUD 1f4bd067 PMD 1488f067 PTE 0 >>>> [ 2.109018] Oops: 0000 [#1] SMP >>>> [ 2.109094] Modules linked in: >>>> [ 2.109153] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-031900rc7-generi >>>> c #201502020035 >>>> [ 2.109220] Hardware name: Intel Corporation Shark Bay Client platform/Flathe >>>> ad Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013 >>>> [ 2.109295] task: ffffffff81c1c500 ti: ffffffff81c00000 task.ti: ffffffff81c0 >>>> 0000 >>>> [ 2.109360] RIP: e030:[<ffffffff814573db>] [<ffffffff814573db>] acpi_ps_peek >>>> _opcode+0xd/0x1f >>>> [ 2.109445] RSP: e02b:ffffffff81c03ce8 EFLAGS: 00010283 >>>> [ 2.109490] RAX: 000000000000000c RBX: ffff880014887000 RCX: ffffffff81c03d50 >>>> [ 2.109539] RDX: ffffc90000ee74e0 RSI: ffff880014887030 RDI: ffff880014887030 >>>> [ 2.109587] RBP: ffffffff81c03ce8 R08: ffffea0000522600 R09: ffffffff81432c4f >>>> [ 2.109635] R10: ffff880014899090 R11: 00000000000000ba R12: ffff880014887030 >>>> [ 2.109684] R13: ffff880014887000 R14: ffffffff81c03d50 R15: 000000000000000d >>>> [ 2.109735] FS: 0000000000000000(0000) GS:ffff880018c00000(0000) knlGS:00000 >>>> 00000000000 >>>> [ 2.109836] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 >>>> [ 2.109881] CR2: ffffc90000ee74e0 CR3: 0000000001c15000 CR4: 0000000000042660 >>>> [ 2.109930] Stack: >>>> [ 2.109968] ffffffff81c03d38 ffffffff81456537 ffffffff81c03d28 ffffffff81457 >>>> a40 >>>> [ 2.110104] ffff880014887000 ffff880014887000 ffff8800148990c0 ffffc90000ee7 >>>> 4e0 >>>> [ 2.110238] ffff880014887030 0000000000000000 ffffffff81c03d78 ffffffff81456 >>>> 760 >>>> [ 2.110373] Call Trace: >>>> [ 2.110413] [<ffffffff81456537>] acpi_ps_get_next_arg+0x114/0x1f9 >>>> [ 2.110461] [<ffffffff81457a40>] ? acpi_ps_pop_scope+0x54/0x72 >>>> [ 2.110508] [<ffffffff81456760>] acpi_ps_get_arguments+0x91/0x228 >>>> [ 2.110555] [<ffffffff81456ad2>] acpi_ps_parse_loop+0x1db/0x311 >>>> [ 2.110602] [<ffffffff81457705>] acpi_ps_parse_aml+0x96/0x275 >>>> [ 2.110649] [<ffffffff8145322f>] acpi_ns_one_complete_parse+0xf7/0x114 >>>> [ 2.110698] [<ffffffff817d149a>] ? _raw_spin_lock_irqsave+0x1a/0x60 >>>> [ 2.110746] [<ffffffff8145326c>] acpi_ns_parse_table+0x20/0x38 >>>> [ 2.110792] [<ffffffff81452c20>] acpi_ns_load_table+0x4c/0x90 >>>> [ 2.110840] [<ffffffff817c50b5>] acpi_tb_load_namespace+0xa6/0x14a >>>> [ 2.110889] [<ffffffff81d83269>] acpi_load_tables+0xc/0x35 >>>> [ 2.110935] [<ffffffff81454bf6>] ? acpi_ns_get_node+0xb7/0xc9 >>>> [ 2.110982] [<ffffffff81d825cf>] acpi_early_init+0x73/0x105 >>>> [ 2.111029] [<ffffffff81d3b083>] start_kernel+0x348/0x3f0 >>>> [ 2.111075] [<ffffffff81d3abcd>] ? set_init_arg+0x56/0x56 >>>> [ 2.111121] [<ffffffff81d3a5f8>] x86_64_start_reservations+0x2a/0x2c >>>> [ 2.111169] [<ffffffff81d3e88c>] xen_start_kernel+0x4f5/0x4f7 >>>> [ 2.111215] Code: 8a 87 60 05 87 81 5d c3 e8 73 cc 37 00 55 81 ff 00 01 00 00 >>>> 19 c0 48 89 e5 83 c0 02 5d c3 e8 5d cc 3 >>>> >>> >>> >>> >>>> _______________________________________________ >>>> Xen-devel mailing list >>>> Xen-devel@lists.xen.org >>>> http://lists.xen.org/xen-devel >>> >> >> > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > -- To unsubscribe from this list: send the line "unsubscribe linux-acpi" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 2015-02-05 14:33 Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 Stefan Bader 2015-02-05 19:36 ` [Xen-devel] " Konrad Rzeszutek Wilk @ 2015-02-27 11:29 ` Stefan Bader 2015-02-27 12:30 ` Juergen Gross 1 sibling, 1 reply; 10+ messages in thread From: Stefan Bader @ 2015-02-27 11:29 UTC (permalink / raw) To: xen-devel@lists.xensource.com; +Cc: Juergen Gross [-- Attachment #1.1: Type: text/plain, Size: 8855 bytes --] On 05.02.2015 15:33, Stefan Bader wrote: > While experimenting/testing various kernel versions I discovered that trying to > boot a Haswell based hosts will always crash when booting as Xen dom0 > (Xen-4.4.1). The same crash happens since v3.19-rc1 and still does happen with > v3.19-rc7. A bare metal boot is having no issues and also an Opteron based host > is having no issues (dom0 and bare metal). > Could be a table that the other host does not have and since its only happening > in dom0 maybe some cpu capability that needs to be passed on? I think I may have some more data here. I tried some patches which Juergen sent me, but those were not changing much. I found that the problem is related on that host to the use of dom0_mem= and may be a crash like below or a hang or "weird state" in general. When not using dom0_mem, I can boot with a 3.19 kernel, otherwise (trying 512M and 1G) there is trouble. What is special about this host is that is has more "holes" than the other machine I usually use. (XEN) Xen-e820 RAM map: (XEN) 0000000000000000 - 000000000009a400 (usable) (XEN) 000000000009a400 - 00000000000a0000 (reserved) (XEN) 00000000000e0000 - 0000000000100000 (reserved) The first hole is common (XEN) 0000000000100000 - 0000000030a48000 (usable) (XEN) 0000000030a48000 - 0000000030a49000 (reserved) (XEN) 0000000030a49000 - 00000000a27f4000 (usable) But then normally there is only one usable area up to around ACPI_NVS (XEN) 00000000a27f4000 - 00000000a2ab4000 (reserved) (XEN) 00000000a2ab4000 - 00000000a2fb4000 (ACPI NVS) (XEN) 00000000a2fb4000 - 00000000a2feb000 (ACPI data) (XEN) 00000000a2feb000 - 00000000a3000000 (usable) (XEN) 00000000a3000000 - 00000000afa00000 (reserved) (XEN) 00000000e0000000 - 00000000f0000000 (reserved) (XEN) 00000000fec00000 - 00000000fec01000 (reserved) (XEN) 00000000fed00000 - 00000000fed04000 (reserved) (XEN) 00000000fed10000 - 00000000fed1a000 (reserved) (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) (XEN) 00000000fed84000 - 00000000fed85000 (reserved) (XEN) 00000000fee00000 - 00000000fee01000 (reserved) (XEN) 00000000ffc00000 - 0000000100000000 (reserved) (XEN) 0000000100000000 - 000000024e600000 (usable) Also after ACPI data there is some usable, and then another hole (area) which is unuasual. So I added a bit more debug printk's: Here a boot with dom0_mem=512M:max=512M: [ 0.000000] SMB: remap 154(0x9A)-256(0x100) -> 131072(0x20000) ==> 0x09A000-0x100000 -> 0x20000000 (@512M+) ==> 0x09A000-0x09A3FF was usable but partial The first hole is supposed to be remapped as it is below the 512M which are in the initial MFN list. I suppose this works but Juergen, I really would love to understand how and I am not sure I grasp things. To me it looks like the remap info is stored in the memory area to be mapped... which is reserved(?!) I think the problem comes from these other holes (which are beyond 512M). When not using dom0_mem those are remapped (like the first one), while with the clamp they supposedly should be identity mapped... [ 0.000000] SMB: prange id 199240(0x30A48) - 199241(0x30A49) ==> 0x30A48000(~778M) [ 0.000000] SMB: prange id 665588(0xA27F4) - 667627(0xA2FEB) ==> 0xA27F4000(~2599M) [ 0.000000] SMB: prange id 667648(0xA3000) - 1048576(0x100000) ==> 0xA3000000(~2608M)-0x100000000(=4G) id mapped [ 0.000000] Released 0 page(s) [ 0.000000] Remapped 102 page(s) So here is xen_set_identity_and_remap_chunk(): ... while (i < n) { ... /* Do not remap pages beyond the current allocation */ if (cur_pfn >= nr_pages) { /* Identity map remaining pages */ set_phys_range_identity(cur_pfn, cur_pfn + size); break; } ... Now, I think the call to set_phys_range_identity() is really doing nothing because nr_pages really is the same (or mostly beside of an 512 alignment) as xen_p2m_size, so it just returns 0. ... /* * If the PFNs are currently mapped, the VA mapping also needs * to be updated to be 1:1. */ for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) (void)HYPERVISOR_update_va_mapping( (unsigned long)__va(pfn << PAGE_SHIFT), mfn_pte(pfn, PAGE_KERNEL_IO), 0); I cannot make my head up about this one. Before this all changed, there was code that resembled this loop but was rather clearing the mapping (except for a range below 1M). Ok, that was done then in a different order which set identity mapping after... My feeling is that the problem comes from assuming identity mapping for holes after the initial mapping. I might miss something but I cannot really see where this could be recovered. -Stefan > > [ 2.108038] ACPI: Core revision 20141107 > [ 2.108153] ACPI Warning: Unsupported module-level executable opcode 0x91 at > table offset 0x002B (20141107/psloop-225) > [ 2.108264] ACPI Warning: Unsupported module-level executable opcode 0x91 at > table offset 0x0033 (20141107/psloop-225) > [ 2.108375] ACPI Warning: Unsupported module-level executable opcode 0x95 at > table offset 0x0038 (20141107/psloop-225) > [ 2.108489] ACPI Warning: Unsupported module-level executable opcode 0x95 at > table offset 0x0041 (20141107/psloop-225) > [ 2.108613] ACPI Warning: Unsupported module-level executable opcode 0x7D at > table offset 0x040D (20141107/psloop-225) > [ 2.108751] BUG: unable to handle kernel paging request at ffffc90000ee74e0 > [ 2.108835] IP: [<ffffffff814573db>] acpi_ps_peek_opcode+0xd/0x1f > [ 2.108902] PGD 1f4be067 PUD 1f4bd067 PMD 1488f067 PTE 0 > [ 2.109018] Oops: 0000 [#1] SMP > [ 2.109094] Modules linked in: > [ 2.109153] CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.19.0-031900rc7-generi > c #201502020035 > [ 2.109220] Hardware name: Intel Corporation Shark Bay Client platform/Flathe > ad Creek Crb, BIOS HSWLPTU1.86C.0109.R03.1301282055 01/28/2013 > [ 2.109295] task: ffffffff81c1c500 ti: ffffffff81c00000 task.ti: ffffffff81c0 > 0000 > [ 2.109360] RIP: e030:[<ffffffff814573db>] [<ffffffff814573db>] acpi_ps_peek > _opcode+0xd/0x1f > [ 2.109445] RSP: e02b:ffffffff81c03ce8 EFLAGS: 00010283 > [ 2.109490] RAX: 000000000000000c RBX: ffff880014887000 RCX: ffffffff81c03d50 > [ 2.109539] RDX: ffffc90000ee74e0 RSI: ffff880014887030 RDI: ffff880014887030 > [ 2.109587] RBP: ffffffff81c03ce8 R08: ffffea0000522600 R09: ffffffff81432c4f > [ 2.109635] R10: ffff880014899090 R11: 00000000000000ba R12: ffff880014887030 > [ 2.109684] R13: ffff880014887000 R14: ffffffff81c03d50 R15: 000000000000000d > [ 2.109735] FS: 0000000000000000(0000) GS:ffff880018c00000(0000) knlGS:00000 > 00000000000 > [ 2.109836] CS: e033 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 2.109881] CR2: ffffc90000ee74e0 CR3: 0000000001c15000 CR4: 0000000000042660 > [ 2.109930] Stack: > [ 2.109968] ffffffff81c03d38 ffffffff81456537 ffffffff81c03d28 ffffffff81457 > a40 > [ 2.110104] ffff880014887000 ffff880014887000 ffff8800148990c0 ffffc90000ee7 > 4e0 > [ 2.110238] ffff880014887030 0000000000000000 ffffffff81c03d78 ffffffff81456 > 760 > [ 2.110373] Call Trace: > [ 2.110413] [<ffffffff81456537>] acpi_ps_get_next_arg+0x114/0x1f9 > [ 2.110461] [<ffffffff81457a40>] ? acpi_ps_pop_scope+0x54/0x72 > [ 2.110508] [<ffffffff81456760>] acpi_ps_get_arguments+0x91/0x228 > [ 2.110555] [<ffffffff81456ad2>] acpi_ps_parse_loop+0x1db/0x311 > [ 2.110602] [<ffffffff81457705>] acpi_ps_parse_aml+0x96/0x275 > [ 2.110649] [<ffffffff8145322f>] acpi_ns_one_complete_parse+0xf7/0x114 > [ 2.110698] [<ffffffff817d149a>] ? _raw_spin_lock_irqsave+0x1a/0x60 > [ 2.110746] [<ffffffff8145326c>] acpi_ns_parse_table+0x20/0x38 > [ 2.110792] [<ffffffff81452c20>] acpi_ns_load_table+0x4c/0x90 > [ 2.110840] [<ffffffff817c50b5>] acpi_tb_load_namespace+0xa6/0x14a > [ 2.110889] [<ffffffff81d83269>] acpi_load_tables+0xc/0x35 > [ 2.110935] [<ffffffff81454bf6>] ? acpi_ns_get_node+0xb7/0xc9 > [ 2.110982] [<ffffffff81d825cf>] acpi_early_init+0x73/0x105 > [ 2.111029] [<ffffffff81d3b083>] start_kernel+0x348/0x3f0 > [ 2.111075] [<ffffffff81d3abcd>] ? set_init_arg+0x56/0x56 > [ 2.111121] [<ffffffff81d3a5f8>] x86_64_start_reservations+0x2a/0x2c > [ 2.111169] [<ffffffff81d3e88c>] xen_start_kernel+0x4f5/0x4f7 > [ 2.111215] Code: 8a 87 60 05 87 81 5d c3 e8 73 cc 37 00 55 81 ff 00 01 00 00 > 19 c0 48 89 e5 83 c0 02 5d c3 e8 5d cc 3 > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xen.org > http://lists.xen.org/xen-devel > [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 2015-02-27 11:29 ` Stefan Bader @ 2015-02-27 12:30 ` Juergen Gross 2015-02-27 14:12 ` Stefan Bader 0 siblings, 1 reply; 10+ messages in thread From: Juergen Gross @ 2015-02-27 12:30 UTC (permalink / raw) To: Stefan Bader, xen-devel@lists.xensource.com [-- Attachment #1: Type: text/plain, Size: 5712 bytes --] On 02/27/2015 12:29 PM, Stefan Bader wrote: > On 05.02.2015 15:33, Stefan Bader wrote: >> While experimenting/testing various kernel versions I discovered that trying to >> boot a Haswell based hosts will always crash when booting as Xen dom0 >> (Xen-4.4.1). The same crash happens since v3.19-rc1 and still does happen with >> v3.19-rc7. A bare metal boot is having no issues and also an Opteron based host >> is having no issues (dom0 and bare metal). >> Could be a table that the other host does not have and since its only happening >> in dom0 maybe some cpu capability that needs to be passed on? > > I think I may have some more data here. I tried some patches which Juergen sent > me, but those were not changing much. I found that the problem is related on > that host to the use of dom0_mem= and may be a crash like below or a hang or > "weird state" in general. > When not using dom0_mem, I can boot with a 3.19 kernel, otherwise (trying 512M > and 1G) there is trouble. What is special about this host is that is has more > "holes" than the other machine I usually use. > > (XEN) Xen-e820 RAM map: > (XEN) 0000000000000000 - 000000000009a400 (usable) > (XEN) 000000000009a400 - 00000000000a0000 (reserved) > (XEN) 00000000000e0000 - 0000000000100000 (reserved) > The first hole is common > (XEN) 0000000000100000 - 0000000030a48000 (usable) > (XEN) 0000000030a48000 - 0000000030a49000 (reserved) > (XEN) 0000000030a49000 - 00000000a27f4000 (usable) > But then normally there is only one usable area up to > around ACPI_NVS > (XEN) 00000000a27f4000 - 00000000a2ab4000 (reserved) > (XEN) 00000000a2ab4000 - 00000000a2fb4000 (ACPI NVS) > (XEN) 00000000a2fb4000 - 00000000a2feb000 (ACPI data) > (XEN) 00000000a2feb000 - 00000000a3000000 (usable) > (XEN) 00000000a3000000 - 00000000afa00000 (reserved) > (XEN) 00000000e0000000 - 00000000f0000000 (reserved) > (XEN) 00000000fec00000 - 00000000fec01000 (reserved) > (XEN) 00000000fed00000 - 00000000fed04000 (reserved) > (XEN) 00000000fed10000 - 00000000fed1a000 (reserved) > (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) > (XEN) 00000000fed84000 - 00000000fed85000 (reserved) > (XEN) 00000000fee00000 - 00000000fee01000 (reserved) > (XEN) 00000000ffc00000 - 0000000100000000 (reserved) > (XEN) 0000000100000000 - 000000024e600000 (usable) > Also after ACPI data there is some usable, and then another > hole (area) which is unuasual. > > So I added a bit more debug printk's: Here a boot with dom0_mem=512M:max=512M: > > [ 0.000000] SMB: remap 154(0x9A)-256(0x100) -> 131072(0x20000) > ==> 0x09A000-0x100000 -> 0x20000000 (@512M+) > ==> 0x09A000-0x09A3FF was usable but partial > > The first hole is supposed to be remapped as it is below the 512M which are in > the initial MFN list. I suppose this works but Juergen, I really would love to > understand how and I am not sure I grasp things. To me it looks like the remap > info is stored in the memory area to be mapped... which is reserved(?!) :-) We can remap only memory which is currently not in use, otherwise the information in that memory area couldn't be found again. So we are free to store the remap info in this memory, relieving us from the pain to find some memory where to store it without having enough of the memory management set up already. > I think the problem comes from these other holes (which are beyond 512M). When > not using dom0_mem those are remapped (like the first one), while with the clamp > they supposedly should be identity mapped... Indeed. > > [ 0.000000] SMB: prange id 199240(0x30A48) - 199241(0x30A49) > ==> 0x30A48000(~778M) > [ 0.000000] SMB: prange id 665588(0xA27F4) - 667627(0xA2FEB) > ==> 0xA27F4000(~2599M) > [ 0.000000] SMB: prange id 667648(0xA3000) - 1048576(0x100000) > ==> 0xA3000000(~2608M)-0x100000000(=4G) id mapped > [ 0.000000] Released 0 page(s) > [ 0.000000] Remapped 102 page(s) > > So here is xen_set_identity_and_remap_chunk(): > > ... > while (i < n) { > ... > /* Do not remap pages beyond the current allocation */ > if (cur_pfn >= nr_pages) { > /* Identity map remaining pages */ > set_phys_range_identity(cur_pfn, cur_pfn + size); > break; > } > ... > > Now, I think the call to set_phys_range_identity() is really doing nothing > because nr_pages really is the same (or mostly beside of an 512 alignment) as > xen_p2m_size, so it just returns 0. Sure, the p2m map is too small at this moment. We have no place to store the information to. > ... > /* > * If the PFNs are currently mapped, the VA mapping also needs > * to be updated to be 1:1. > */ > for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) > (void)HYPERVISOR_update_va_mapping( > (unsigned long)__va(pfn << PAGE_SHIFT), > mfn_pte(pfn, PAGE_KERNEL_IO), 0); > > I cannot make my head up about this one. Before this all changed, there was code > that resembled this loop but was rather clearing the mapping (except for a range > below 1M). Ok, that was done then in a different order which set identity > mapping after... > > My feeling is that the problem comes from assuming identity mapping for holes > after the initial mapping. I might miss something but I cannot really see where > this could be recovered. Your hints were really helping. I think I've found an error. What you've been missing is the fact that the new p2m list is initialized with identity frames after the area which was covered by the hypervisor supplied one. Could you please test the attached patch? Juergen [-- Attachment #2: p2m.patch --] [-- Type: text/x-patch, Size: 407 bytes --] diff --git a/arch/x86/xen/p2m.c b/arch/x86/xen/p2m.c index 740ae30..9f93af5 100644 --- a/arch/x86/xen/p2m.c +++ b/arch/x86/xen/p2m.c @@ -563,7 +563,7 @@ static bool alloc_p2m(unsigned long pfn) if (p2m_pfn == PFN_DOWN(__pa(p2m_missing))) p2m_init(p2m); else - p2m_init_identity(p2m, pfn); + p2m_init_identity(p2m, pfn & ~(P2M_PER_PAGE - 1)); spin_lock_irqsave(&p2m_update_lock, flags); [-- Attachment #3: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply related [flat|nested] 10+ messages in thread
* Re: Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 2015-02-27 12:30 ` Juergen Gross @ 2015-02-27 14:12 ` Stefan Bader 2015-02-27 14:27 ` Juergen Gross 0 siblings, 1 reply; 10+ messages in thread From: Stefan Bader @ 2015-02-27 14:12 UTC (permalink / raw) To: Juergen Gross, xen-devel@lists.xensource.com [-- Attachment #1.1: Type: text/plain, Size: 6602 bytes --] On 27.02.2015 13:30, Juergen Gross wrote: > On 02/27/2015 12:29 PM, Stefan Bader wrote: >> On 05.02.2015 15:33, Stefan Bader wrote: >>> While experimenting/testing various kernel versions I discovered that trying to >>> boot a Haswell based hosts will always crash when booting as Xen dom0 >>> (Xen-4.4.1). The same crash happens since v3.19-rc1 and still does happen with >>> v3.19-rc7. A bare metal boot is having no issues and also an Opteron based host >>> is having no issues (dom0 and bare metal). >>> Could be a table that the other host does not have and since its only happening >>> in dom0 maybe some cpu capability that needs to be passed on? >> >> I think I may have some more data here. I tried some patches which Juergen sent >> me, but those were not changing much. I found that the problem is related on >> that host to the use of dom0_mem= and may be a crash like below or a hang or >> "weird state" in general. >> When not using dom0_mem, I can boot with a 3.19 kernel, otherwise (trying 512M >> and 1G) there is trouble. What is special about this host is that is has more >> "holes" than the other machine I usually use. >> >> (XEN) Xen-e820 RAM map: >> (XEN) 0000000000000000 - 000000000009a400 (usable) >> (XEN) 000000000009a400 - 00000000000a0000 (reserved) >> (XEN) 00000000000e0000 - 0000000000100000 (reserved) >> The first hole is common >> (XEN) 0000000000100000 - 0000000030a48000 (usable) >> (XEN) 0000000030a48000 - 0000000030a49000 (reserved) >> (XEN) 0000000030a49000 - 00000000a27f4000 (usable) >> But then normally there is only one usable area up to >> around ACPI_NVS >> (XEN) 00000000a27f4000 - 00000000a2ab4000 (reserved) >> (XEN) 00000000a2ab4000 - 00000000a2fb4000 (ACPI NVS) >> (XEN) 00000000a2fb4000 - 00000000a2feb000 (ACPI data) >> (XEN) 00000000a2feb000 - 00000000a3000000 (usable) >> (XEN) 00000000a3000000 - 00000000afa00000 (reserved) >> (XEN) 00000000e0000000 - 00000000f0000000 (reserved) >> (XEN) 00000000fec00000 - 00000000fec01000 (reserved) >> (XEN) 00000000fed00000 - 00000000fed04000 (reserved) >> (XEN) 00000000fed10000 - 00000000fed1a000 (reserved) >> (XEN) 00000000fed1c000 - 00000000fed20000 (reserved) >> (XEN) 00000000fed84000 - 00000000fed85000 (reserved) >> (XEN) 00000000fee00000 - 00000000fee01000 (reserved) >> (XEN) 00000000ffc00000 - 0000000100000000 (reserved) >> (XEN) 0000000100000000 - 000000024e600000 (usable) >> Also after ACPI data there is some usable, and then another >> hole (area) which is unuasual. >> >> So I added a bit more debug printk's: Here a boot with dom0_mem=512M:max=512M: >> >> [ 0.000000] SMB: remap 154(0x9A)-256(0x100) -> 131072(0x20000) >> ==> 0x09A000-0x100000 -> 0x20000000 (@512M+) >> ==> 0x09A000-0x09A3FF was usable but partial >> >> The first hole is supposed to be remapped as it is below the 512M which are in >> the initial MFN list. I suppose this works but Juergen, I really would love to >> understand how and I am not sure I grasp things. To me it looks like the remap >> info is stored in the memory area to be mapped... which is reserved(?!) > > :-) > > We can remap only memory which is currently not in use, otherwise > the information in that memory area couldn't be found again. So we > are free to store the remap info in this memory, relieving us from > the pain to find some memory where to store it without having enough > of the memory management set up already. Argh, no, I just realized the fatal mistake in my whole imaginary model. For some stupid reason the initial MFN table there is a 1-1 mapping of the real memory. Which is complete non-sense and does not really help in getting what really is going on. Of course the whole purpose is to convert this into something that *does* look like the E820 setup provided. Bah, too many trees here... > >> I think the problem comes from these other holes (which are beyond 512M). When >> not using dom0_mem those are remapped (like the first one), while with the clamp >> they supposedly should be identity mapped... > > Indeed. > >> >> [ 0.000000] SMB: prange id 199240(0x30A48) - 199241(0x30A49) >> ==> 0x30A48000(~778M) >> [ 0.000000] SMB: prange id 665588(0xA27F4) - 667627(0xA2FEB) >> ==> 0xA27F4000(~2599M) >> [ 0.000000] SMB: prange id 667648(0xA3000) - 1048576(0x100000) >> ==> 0xA3000000(~2608M)-0x100000000(=4G) id mapped >> [ 0.000000] Released 0 page(s) >> [ 0.000000] Remapped 102 page(s) >> >> So here is xen_set_identity_and_remap_chunk(): >> >> ... >> while (i < n) { >> ... >> /* Do not remap pages beyond the current allocation */ >> if (cur_pfn >= nr_pages) { >> /* Identity map remaining pages */ >> set_phys_range_identity(cur_pfn, cur_pfn + size); >> break; >> } >> ... >> >> Now, I think the call to set_phys_range_identity() is really doing nothing >> because nr_pages really is the same (or mostly beside of an 512 alignment) as >> xen_p2m_size, so it just returns 0. > > Sure, the p2m map is too small at this moment. We have no place to > store the information to. > >> ... >> /* >> * If the PFNs are currently mapped, the VA mapping also needs >> * to be updated to be 1:1. >> */ >> for (pfn = start_pfn; pfn <= max_pfn_mapped && pfn < end_pfn; pfn++) >> (void)HYPERVISOR_update_va_mapping( >> (unsigned long)__va(pfn << PAGE_SHIFT), >> mfn_pte(pfn, PAGE_KERNEL_IO), 0); >> >> I cannot make my head up about this one. Before this all changed, there was code >> that resembled this loop but was rather clearing the mapping (except for a range >> below 1M). Ok, that was done then in a different order which set identity >> mapping after... >> >> My feeling is that the problem comes from assuming identity mapping for holes >> after the initial mapping. I might miss something but I cannot really see where >> this could be recovered. > > Your hints were really helping. I think I've found an error. > > What you've been missing is the fact that the new p2m list is > initialized with identity frames after the area which was covered by > the hypervisor supplied one. Ah ok. Right, I missed that. > > Could you please test the attached patch? \o/ Yeah, that does seem to do the trick. The machine came up and did not loose its mind or lapic base address! -Stefan > > > Juergen [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 2015-02-27 14:12 ` Stefan Bader @ 2015-02-27 14:27 ` Juergen Gross 2015-02-27 14:31 ` Stefan Bader 0 siblings, 1 reply; 10+ messages in thread From: Juergen Gross @ 2015-02-27 14:27 UTC (permalink / raw) To: Stefan Bader, xen-devel@lists.xensource.com On 02/27/2015 03:12 PM, Stefan Bader wrote: > On 27.02.2015 13:30, Juergen Gross wrote: >> Could you please test the attached patch? > > \o/ Yeah, that does seem to do the trick. The machine came up and did not loose > its mind or lapic base address! Stefan, can I add your "Reported-by" and "Tested-by" tags? Juergen ^ permalink raw reply [flat|nested] 10+ messages in thread
* Re: Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 2015-02-27 14:27 ` Juergen Gross @ 2015-02-27 14:31 ` Stefan Bader 0 siblings, 0 replies; 10+ messages in thread From: Stefan Bader @ 2015-02-27 14:31 UTC (permalink / raw) To: Juergen Gross, xen-devel@lists.xensource.com [-- Attachment #1.1: Type: text/plain, Size: 462 bytes --] On 27.02.2015 15:27, Juergen Gross wrote: > On 02/27/2015 03:12 PM, Stefan Bader wrote: >> On 27.02.2015 13:30, Juergen Gross wrote: >>> Could you please test the attached patch? >> >> \o/ Yeah, that does seem to do the trick. The machine came up and did not loose >> its mind or lapic base address! > > Stefan, can I add your "Reported-by" and "Tested-by" tags? Of course, yes. And some cc:stable 3.19 if there isn't. -Stefan > > Juergen [-- Attachment #1.2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 819 bytes --] [-- Attachment #2: Type: text/plain, Size: 126 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xen.org http://lists.xen.org/xen-devel ^ permalink raw reply [flat|nested] 10+ messages in thread
end of thread, other threads:[~2015-02-27 14:31 UTC | newest] Thread overview: 10+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2015-02-05 14:33 Crash in acpi_ps_peek_opcode when booting kernel 3.19 as Xen dom0 Stefan Bader 2015-02-05 19:36 ` [Xen-devel] " Konrad Rzeszutek Wilk 2015-02-09 13:07 ` Stefan Bader 2015-02-09 13:33 ` Stefan Bader 2015-02-09 13:49 ` Juergen Gross 2015-02-27 11:29 ` Stefan Bader 2015-02-27 12:30 ` Juergen Gross 2015-02-27 14:12 ` Stefan Bader 2015-02-27 14:27 ` Juergen Gross 2015-02-27 14:31 ` Stefan Bader
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.