* Issue with shared information page on Xen/ARM 4.17 @ 2023-09-29 2:49 Elliott Mitchell 2023-10-03 8:26 ` Roger Pau Monné 0 siblings, 1 reply; 24+ messages in thread From: Elliott Mitchell @ 2023-09-29 2:49 UTC (permalink / raw) To: xen-devel Cc: Stefano Stabellini, Julien Grall, Bertrand Marquis, Volodymyr Babchuk I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is the changes with the handling of the shared information page appear to have broken things for me. With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build of the 4.17 release, mapping the shared information page doesn't work. I'm using Tianocore as the first stage loader. This continues to work fine. The build is using tag "edk2-stable202211", commit fff6d81270. While Tianocore does map the shared information page, my reading of their source is that it properly unmaps the page and therefore shouldn't cause trouble. Notes on the actual call is gpfn was 0x0000000000040072. This is outside the recommended address range, but my understanding is this is supposed to be okay. The return code is -16, which is EBUSY. Ideas? The obvious spot for EBUSY is from already having the shared information page mapped. Issue is I'm pretty sure it isn't mapped. Did the policy change and now the shared information page *must* be mapped within the recommended address ranges? Is it a problem to map the shared information page at one address. Unmap it, then map it at another address? -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-09-29 2:49 Issue with shared information page on Xen/ARM 4.17 Elliott Mitchell @ 2023-10-03 8:26 ` Roger Pau Monné 2023-10-03 19:18 ` Elliott Mitchell 0 siblings, 1 reply; 24+ messages in thread From: Roger Pau Monné @ 2023-10-03 8:26 UTC (permalink / raw) To: Elliott Mitchell Cc: xen-devel, Stefano Stabellini, Julien Grall, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: > I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is > the changes with the handling of the shared information page appear to > have broken things for me. > > With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build > of the 4.17 release, mapping the shared information page doesn't work. This is due to 71320946d5edf AFAICT. > I'm using Tianocore as the first stage loader. This continues to work > fine. The build is using tag "edk2-stable202211", commit fff6d81270. > While Tianocore does map the shared information page, my reading of their > source is that it properly unmaps the page and therefore shouldn't cause > trouble. > > Notes on the actual call is gpfn was 0x0000000000040072. This is outside > the recommended address range, but my understanding is this is supposed > to be okay. > > The return code is -16, which is EBUSY. > > Ideas? I think the issue is that you are mapping the shared info page over a guest RAM page, and in order to do that you would fist need to create a hole and then map the shared info page. IOW: the issue is not with edk2 not having unmapped the page, but with FreeBSD trying to map the shared_info over a RAM page instead of a hole in the p2m. x86 behavior is different here, and does allow mapping the shared_info page over a RAM gfn (by first removing the backing RAM page on the gfn). Ideally we would like to use an unpopulated gfn, but doing so is not trivial right now, as the point where the shared_info page is mapped we don't yet have an easy way to account for unpopulated regions. My suggestion would be to do something like: diff --git a/sys/x86/xen/hvm.c b/sys/x86/xen/hvm.c index 4122daeaf600..7251bc69ae15 100644 --- a/sys/x86/xen/hvm.c +++ b/sys/x86/xen/hvm.c @@ -194,18 +194,20 @@ xen_hvm_init_shared_info_page(void) { struct xen_add_to_physmap xatp; - if (xen_pv_domain()) { - /* - * Already setup in the PV case, shared_info is passed inside - * of the start_info struct at start of day. - */ - return; - } - if (HYPERVISOR_shared_info == NULL) { + struct xen_remove_from_physmap rm = { + .domid = DOMID_SELF, + }; + int rc; + HYPERVISOR_shared_info = malloc(PAGE_SIZE, M_XENHVM, M_NOWAIT); if (HYPERVISOR_shared_info == NULL) panic("Unable to allocate Xen shared info page"); + + rm.gpfn = vtophys(HYPERVISOR_shared_info) >> PAGE_SHIFT; + rc = HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &rm); + if (rc != 0) + printf("Failed to remove shared_info GFN: %d\n", rc); } xatp.domid = DOMID_SELF; But in the long term we should see about initializing the shared_info page as part of xenpv_attach(). Regards, Roger. ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-03 8:26 ` Roger Pau Monné @ 2023-10-03 19:18 ` Elliott Mitchell 2023-10-04 8:13 ` Roger Pau Monné 0 siblings, 1 reply; 24+ messages in thread From: Elliott Mitchell @ 2023-10-03 19:18 UTC (permalink / raw) To: Roger Pau Monné Cc: xen-devel, Stefano Stabellini, Julien Grall, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: > On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: > > I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is > > the changes with the handling of the shared information page appear to > > have broken things for me. > > > > With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build > > of the 4.17 release, mapping the shared information page doesn't work. > > This is due to 71320946d5edf AFAICT. Yes. While the -EBUSY line may be the one triggering, I'm unsure why. This seems a fairly reasonable change, so I had no intention of asking for a revert (which likely would have been rejected). There is also a real possibility the -EBUSY comes from elsewhere. Could also be 71320946d5edf caused a bug elsewhere to be exposed. > > I'm using Tianocore as the first stage loader. This continues to work > > fine. The build is using tag "edk2-stable202211", commit fff6d81270. > > While Tianocore does map the shared information page, my reading of their > > source is that it properly unmaps the page and therefore shouldn't cause > > trouble. > > > > Notes on the actual call is gpfn was 0x0000000000040072. This is outside > > the recommended address range, but my understanding is this is supposed > > to be okay. > > > > The return code is -16, which is EBUSY. > > > > Ideas? > > I think the issue is that you are mapping the shared info page over a > guest RAM page, and in order to do that you would fist need to create > a hole and then map the shared info page. IOW: the issue is not with > edk2 not having unmapped the page, but with FreeBSD trying to map the > shared_info over a RAM page instead of a hole in the p2m. x86 > behavior is different here, and does allow mapping the shared_info > page over a RAM gfn (by first removing the backing RAM page on the > gfn). An interesting thought. I thought I'd tried this, but since I didn't see such in my experiments list. What I had tried was removing all the pages in the suggested mapping range. Yet this failed. Since this seemed reasonable, I've now tried and found it fails. The XENMEM_remove_from_physmap call returns 0. > Ideally we would like to use an unpopulated gfn, but doing so is not > trivial right now, as the point where the shared_info page is mapped > we don't yet have an easy way to account for unpopulated regions. > > My suggestion would be to do something like: > > diff --git a/sys/x86/xen/hvm.c b/sys/x86/xen/hvm.c > index 4122daeaf600..7251bc69ae15 100644 > --- a/sys/x86/xen/hvm.c > +++ b/sys/x86/xen/hvm.c > @@ -194,18 +194,20 @@ xen_hvm_init_shared_info_page(void) > { > struct xen_add_to_physmap xatp; > > - if (xen_pv_domain()) { > - /* > - * Already setup in the PV case, shared_info is passed inside > - * of the start_info struct at start of day. > - */ > - return; > - } > - > if (HYPERVISOR_shared_info == NULL) { > + struct xen_remove_from_physmap rm = { > + .domid = DOMID_SELF, > + }; > + int rc; > + > HYPERVISOR_shared_info = malloc(PAGE_SIZE, M_XENHVM, M_NOWAIT); > if (HYPERVISOR_shared_info == NULL) > panic("Unable to allocate Xen shared info page"); > + > + rm.gpfn = vtophys(HYPERVISOR_shared_info) >> PAGE_SHIFT; > + rc = HYPERVISOR_memory_op(XENMEM_remove_from_physmap, &rm); > + if (rc != 0) > + printf("Failed to remove shared_info GFN: %d\n", rc); > } > > xatp.domid = DOMID_SELF; > > But in the long term we should see about initializing the shared_info > page as part of xenpv_attach(). Didn't even look at the example since I already had ones handy. Unfortunately this has also failed. The XENMEM_remove_from_physmap call returns 0. If there is a need I can hand off a build of Xen and Tianocore, then let someone else with better Xen/ARM diagnostics play with it. Runs about 75MB, someone could place it in a better sharing place if desired. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-03 19:18 ` Elliott Mitchell @ 2023-10-04 8:13 ` Roger Pau Monné 2023-10-04 10:55 ` Julien Grall 0 siblings, 1 reply; 24+ messages in thread From: Roger Pau Monné @ 2023-10-04 8:13 UTC (permalink / raw) To: Elliott Mitchell Cc: xen-devel, Stefano Stabellini, Julien Grall, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: > On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: > > On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: > > > I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is > > > the changes with the handling of the shared information page appear to > > > have broken things for me. > > > > > > With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build > > > of the 4.17 release, mapping the shared information page doesn't work. > > > > This is due to 71320946d5edf AFAICT. > > Yes. While the -EBUSY line may be the one triggering, I'm unsure why. > This seems a fairly reasonable change, so I had no intention of asking > for a revert (which likely would have been rejected). There is also a > real possibility the -EBUSY comes from elsewhere. Could also be > 71320946d5edf caused a bug elsewhere to be exposed. A good way to know would be to attempt to revert 71320946d5edf and see if that fixes your issue. Alternatively you can try (or similar): diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c index 6ccffeaea57d..105ef3faecfd 100644 --- a/xen/arch/arm/mm.c +++ b/xen/arch/arm/mm.c @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( page_set_xenheap_gfn(mfn_to_page(mfn), gfn); } else + { + printk("%u already mapped\n", space); /* * Mandate the caller to first unmap the page before mapping it * again. This is to prevent Xen creating an unwanted hole in @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( * to unmap it afterwards. */ rc = -EBUSY; + } p2m_write_unlock(p2m); } > > > I'm using Tianocore as the first stage loader. This continues to work > > > fine. The build is using tag "edk2-stable202211", commit fff6d81270. > > > While Tianocore does map the shared information page, my reading of their > > > source is that it properly unmaps the page and therefore shouldn't cause > > > trouble. > > > > > > Notes on the actual call is gpfn was 0x0000000000040072. This is outside > > > the recommended address range, but my understanding is this is supposed > > > to be okay. > > > > > > The return code is -16, which is EBUSY. > > > > > > Ideas? > > > > I think the issue is that you are mapping the shared info page over a > > guest RAM page, and in order to do that you would fist need to create > > a hole and then map the shared info page. IOW: the issue is not with > > edk2 not having unmapped the page, but with FreeBSD trying to map the > > shared_info over a RAM page instead of a hole in the p2m. x86 > > behavior is different here, and does allow mapping the shared_info > > page over a RAM gfn (by first removing the backing RAM page on the > > gfn). > > An interesting thought. I thought I'd tried this, but since I didn't see > such in my experiments list. What I had tried was removing all the pages > in the suggested mapping range. Yet this failed. Yeah, I went too fast and didn't read the code correctly, it is not checking that the provided gfn is already populated, but whether the mfn intended to be mapped is already mapped at a different location. > Since this seemed reasonable, I've now tried and found it fails. The > XENMEM_remove_from_physmap call returns 0. XENMEM_remove_from_physmap returning 0 is fine, but it seems to me like edk2 hasn't unmapped the shared_info page. The OS has no idea at which position the shared_info page is currently mapped, and hence can't do anything to attempt to unmap it in order to cover up for buggy firmware. edk2 should be the entity to issue the XENMEM_remove_from_physmap against the gfn where it has the shared_info page mapped. Likely needs to be done as part of ExitBootServices() method. FWIW, 71320946d5edf is an ABI change, and as desirable as such behavior might be, a new hypercall should have introduced that had the behavior that the change intended to retrofit into XENMEM_add_to_physmap. Thanks, Roger. ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 8:13 ` Roger Pau Monné @ 2023-10-04 10:55 ` Julien Grall 2023-10-04 11:42 ` Oleksandr Tyshchenko 2023-10-04 12:53 ` Roger Pau Monné 0 siblings, 2 replies; 24+ messages in thread From: Julien Grall @ 2023-10-04 10:55 UTC (permalink / raw) To: Roger Pau Monné, Elliott Mitchell Cc: xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko Hi Roger, On 04/10/2023 09:13, Roger Pau Monné wrote: > On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: >> On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: >>> On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: >>>> I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is >>>> the changes with the handling of the shared information page appear to >>>> have broken things for me. >>>> >>>> With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build >>>> of the 4.17 release, mapping the shared information page doesn't work. >>> >>> This is due to 71320946d5edf AFAICT. >> >> Yes. While the -EBUSY line may be the one triggering, I'm unsure why. >> This seems a fairly reasonable change, so I had no intention of asking >> for a revert (which likely would have been rejected). There is also a >> real possibility the -EBUSY comes from elsewhere. Could also be >> 71320946d5edf caused a bug elsewhere to be exposed. > > A good way to know would be to attempt to revert 71320946d5edf and see > if that fixes your issue. > > Alternatively you can try (or similar): > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > index 6ccffeaea57d..105ef3faecfd 100644 > --- a/xen/arch/arm/mm.c > +++ b/xen/arch/arm/mm.c > @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( > page_set_xenheap_gfn(mfn_to_page(mfn), gfn); > } > else > + { > + printk("%u already mapped\n", space); > /* > * Mandate the caller to first unmap the page before mapping it > * again. This is to prevent Xen creating an unwanted hole in > @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( > * to unmap it afterwards. > */ > rc = -EBUSY; > + } > p2m_write_unlock(p2m); > } > > >>>> I'm using Tianocore as the first stage loader. This continues to work >>>> fine. The build is using tag "edk2-stable202211", commit fff6d81270. >>>> While Tianocore does map the shared information page, my reading of their >>>> source is that it properly unmaps the page and therefore shouldn't cause >>>> trouble. >>>> >>>> Notes on the actual call is gpfn was 0x0000000000040072. This is outside >>>> the recommended address range, but my understanding is this is supposed >>>> to be okay. >>>> >>>> The return code is -16, which is EBUSY. >>>> >>>> Ideas? >>> >>> I think the issue is that you are mapping the shared info page over a >>> guest RAM page, and in order to do that you would fist need to create >>> a hole and then map the shared info page. IOW: the issue is not with >>> edk2 not having unmapped the page, but with FreeBSD trying to map the >>> shared_info over a RAM page instead of a hole in the p2m. x86 >>> behavior is different here, and does allow mapping the shared_info >>> page over a RAM gfn (by first removing the backing RAM page on the >>> gfn). >> >> An interesting thought. I thought I'd tried this, but since I didn't see >> such in my experiments list. What I had tried was removing all the pages >> in the suggested mapping range. Yet this failed. > > Yeah, I went too fast and didn't read the code correctly, it is not > checking that the provided gfn is already populated, but whether the > mfn intended to be mapped is already mapped at a different location. > >> Since this seemed reasonable, I've now tried and found it fails. The >> XENMEM_remove_from_physmap call returns 0. > > XENMEM_remove_from_physmap returning 0 is fine, but it seems to me > like edk2 hasn't unmapped the shared_info page. The OS has no idea > at which position the shared_info page is currently mapped, and hence > can't do anything to attempt to unmap it in order to cover up for > buggy firmware. > > edk2 should be the entity to issue the XENMEM_remove_from_physmap > against the gfn where it has the shared_info page mapped. Likely > needs to be done as part of ExitBootServices() method. > > FWIW, 71320946d5edf is an ABI change, and as desirable as such > behavior might be, a new hypercall should have introduced that had the > behavior that the change intended to retrofit into > XENMEM_add_to_physmap. I can see how you think this is an ABI change but the previous behavior was incorrect. Before this patch, on Arm, we would allow the shared page to be mapped twice. As we don't know where the firmware had mapped it this could result to random corruption. Now, we could surely decide to remove the page as x86 did. But this could leave a hole in the RAM. As the OS would not know where the hole is, this could lead to page fault randomly during runtime. Neither of the two behaviors help the users. In fact, I think they only make the experience worse because you don't know when the issue will happen. AFAICT, there is no way for an HVM guestto know which GFN was inuse. So in all the cases, I can't think of a way for the OS to workaround properly buggy firmware. Therefore, returning -EBUSY is the safest we can do for our users and I don't view it as a ABI change (someone rely on the previous behavior is bound to failure). For more details, you can look at the original thread ([1], [2]). Cheers, [1] ef5bf7df-ad8a-e420-0fc4-d8f0a0e0f2fc@xen.org [2] dfd7657e-bbb9-3d96-2650-063561a00b9b@xen.org -- Julien Grall ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 10:55 ` Julien Grall @ 2023-10-04 11:42 ` Oleksandr Tyshchenko 2023-10-04 12:59 ` Roger Pau Monné 2023-10-04 12:53 ` Roger Pau Monné 1 sibling, 1 reply; 24+ messages in thread From: Oleksandr Tyshchenko @ 2023-10-04 11:42 UTC (permalink / raw) To: Julien Grall, Roger Pau Monné, Elliott Mitchell Cc: xen-devel@lists.xenproject.org, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk On 04.10.23 13:55, Julien Grall wrote: Hello all. > Hi Roger, > > On 04/10/2023 09:13, Roger Pau Monné wrote: >> On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: >>> On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: >>>> On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: >>>>> I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current >>>>> issue is >>>>> the changes with the handling of the shared information page appear to >>>>> have broken things for me. >>>>> >>>>> With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build >>>>> of the 4.17 release, mapping the shared information page doesn't work. >>>> >>>> This is due to 71320946d5edf AFAICT. >>> >>> Yes. While the -EBUSY line may be the one triggering, I'm unsure why. >>> This seems a fairly reasonable change, so I had no intention of asking >>> for a revert (which likely would have been rejected). There is also a >>> real possibility the -EBUSY comes from elsewhere. Could also be >>> 71320946d5edf caused a bug elsewhere to be exposed. >> >> A good way to know would be to attempt to revert 71320946d5edf and see >> if that fixes your issue. >> >> Alternatively you can try (or similar): >> >> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c >> index 6ccffeaea57d..105ef3faecfd 100644 >> --- a/xen/arch/arm/mm.c >> +++ b/xen/arch/arm/mm.c >> @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( >> page_set_xenheap_gfn(mfn_to_page(mfn), gfn); >> } >> else >> + { >> + printk("%u already mapped\n", space); >> /* >> * Mandate the caller to first unmap the page before >> mapping it >> * again. This is to prevent Xen creating an unwanted >> hole in >> @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( >> * to unmap it afterwards. >> */ >> rc = -EBUSY; >> + } >> p2m_write_unlock(p2m); >> } >> >>>>> I'm using Tianocore as the first stage loader. This continues to work >>>>> fine. The build is using tag "edk2-stable202211", commit fff6d81270. >>>>> While Tianocore does map the shared information page, my reading of >>>>> their >>>>> source is that it properly unmaps the page and therefore shouldn't >>>>> cause >>>>> trouble. >>>>> >>>>> Notes on the actual call is gpfn was 0x0000000000040072. This is >>>>> outside >>>>> the recommended address range, but my understanding is this is >>>>> supposed >>>>> to be okay. >>>>> >>>>> The return code is -16, which is EBUSY. >>>>> >>>>> Ideas? >>>> >>>> I think the issue is that you are mapping the shared info page over a >>>> guest RAM page, and in order to do that you would fist need to create >>>> a hole and then map the shared info page. IOW: the issue is not with >>>> edk2 not having unmapped the page, but with FreeBSD trying to map the >>>> shared_info over a RAM page instead of a hole in the p2m. x86 >>>> behavior is different here, and does allow mapping the shared_info >>>> page over a RAM gfn (by first removing the backing RAM page on the >>>> gfn). >>> >>> An interesting thought. I thought I'd tried this, but since I didn't >>> see >>> such in my experiments list. What I had tried was removing all the >>> pages >>> in the suggested mapping range. Yet this failed. >> >> Yeah, I went too fast and didn't read the code correctly, it is not >> checking that the provided gfn is already populated, but whether the >> mfn intended to be mapped is already mapped at a different location. >> >>> Since this seemed reasonable, I've now tried and found it fails. The >>> XENMEM_remove_from_physmap call returns 0. >> >> XENMEM_remove_from_physmap returning 0 is fine, but it seems to me >> like edk2 hasn't unmapped the shared_info page. The OS has no idea >> at which position the shared_info page is currently mapped, and hence >> can't do anything to attempt to unmap it in order to cover up for >> buggy firmware. >> >> edk2 should be the entity to issue the XENMEM_remove_from_physmap >> against the gfn where it has the shared_info page mapped. Likely >> needs to be done as part of ExitBootServices() method. >> >> FWIW, 71320946d5edf is an ABI change, and as desirable as such >> behavior might be, a new hypercall should have introduced that had the >> behavior that the change intended to retrofit into >> XENMEM_add_to_physmap. > I can see how you think this is an ABI change but the previous behavior > was incorrect. Before this patch, on Arm, we would allow the shared page > to be mapped twice. As we don't know where the firmware had mapped it > this could result to random corruption. > > Now, we could surely decide to remove the page as x86 did. But this > could leave a hole in the RAM. As the OS would not know where the hole > is, this could lead to page fault randomly during runtime. +1. In addition to what Julien has already said, I would like to say the same issue was faced due to U-Boot (running as a part of Xen guest before OS) didn't perform a cleanup before jumping to OS. This is already fixed to follow the current behavior. I didn't find corresponding U-Boot mail thread, but can point to already upstreamed commit in the main repo. https://github.com/u-boot/u-boot/commit/0001a964b840a62c66da42a89a10a2656831aa4b [snip] ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 11:42 ` Oleksandr Tyshchenko @ 2023-10-04 12:59 ` Roger Pau Monné 2023-10-04 17:18 ` Oleksandr Tyshchenko 0 siblings, 1 reply; 24+ messages in thread From: Roger Pau Monné @ 2023-10-04 12:59 UTC (permalink / raw) To: Oleksandr Tyshchenko Cc: Julien Grall, Elliott Mitchell, xen-devel@lists.xenproject.org, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk On Wed, Oct 04, 2023 at 11:42:32AM +0000, Oleksandr Tyshchenko wrote: > > > On 04.10.23 13:55, Julien Grall wrote: > > Hello all. > > > Hi Roger, > > > > On 04/10/2023 09:13, Roger Pau Monné wrote: > >> On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: > >>> On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: > >>>> On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: > >>>>> I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current > >>>>> issue is > >>>>> the changes with the handling of the shared information page appear to > >>>>> have broken things for me. > >>>>> > >>>>> With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build > >>>>> of the 4.17 release, mapping the shared information page doesn't work. > >>>> > >>>> This is due to 71320946d5edf AFAICT. > >>> > >>> Yes. While the -EBUSY line may be the one triggering, I'm unsure why. > >>> This seems a fairly reasonable change, so I had no intention of asking > >>> for a revert (which likely would have been rejected). There is also a > >>> real possibility the -EBUSY comes from elsewhere. Could also be > >>> 71320946d5edf caused a bug elsewhere to be exposed. > >> > >> A good way to know would be to attempt to revert 71320946d5edf and see > >> if that fixes your issue. > >> > >> Alternatively you can try (or similar): > >> > >> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > >> index 6ccffeaea57d..105ef3faecfd 100644 > >> --- a/xen/arch/arm/mm.c > >> +++ b/xen/arch/arm/mm.c > >> @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( > >> page_set_xenheap_gfn(mfn_to_page(mfn), gfn); > >> } > >> else > >> + { > >> + printk("%u already mapped\n", space); > >> /* > >> * Mandate the caller to first unmap the page before > >> mapping it > >> * again. This is to prevent Xen creating an unwanted > >> hole in > >> @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( > >> * to unmap it afterwards. > >> */ > >> rc = -EBUSY; > >> + } > >> p2m_write_unlock(p2m); > >> } > >> > >>>>> I'm using Tianocore as the first stage loader. This continues to work > >>>>> fine. The build is using tag "edk2-stable202211", commit fff6d81270. > >>>>> While Tianocore does map the shared information page, my reading of > >>>>> their > >>>>> source is that it properly unmaps the page and therefore shouldn't > >>>>> cause > >>>>> trouble. > >>>>> > >>>>> Notes on the actual call is gpfn was 0x0000000000040072. This is > >>>>> outside > >>>>> the recommended address range, but my understanding is this is > >>>>> supposed > >>>>> to be okay. > >>>>> > >>>>> The return code is -16, which is EBUSY. > >>>>> > >>>>> Ideas? > >>>> > >>>> I think the issue is that you are mapping the shared info page over a > >>>> guest RAM page, and in order to do that you would fist need to create > >>>> a hole and then map the shared info page. IOW: the issue is not with > >>>> edk2 not having unmapped the page, but with FreeBSD trying to map the > >>>> shared_info over a RAM page instead of a hole in the p2m. x86 > >>>> behavior is different here, and does allow mapping the shared_info > >>>> page over a RAM gfn (by first removing the backing RAM page on the > >>>> gfn). > >>> > >>> An interesting thought. I thought I'd tried this, but since I didn't > >>> see > >>> such in my experiments list. What I had tried was removing all the > >>> pages > >>> in the suggested mapping range. Yet this failed. > >> > >> Yeah, I went too fast and didn't read the code correctly, it is not > >> checking that the provided gfn is already populated, but whether the > >> mfn intended to be mapped is already mapped at a different location. > >> > >>> Since this seemed reasonable, I've now tried and found it fails. The > >>> XENMEM_remove_from_physmap call returns 0. > >> > >> XENMEM_remove_from_physmap returning 0 is fine, but it seems to me > >> like edk2 hasn't unmapped the shared_info page. The OS has no idea > >> at which position the shared_info page is currently mapped, and hence > >> can't do anything to attempt to unmap it in order to cover up for > >> buggy firmware. > >> > >> edk2 should be the entity to issue the XENMEM_remove_from_physmap > >> against the gfn where it has the shared_info page mapped. Likely > >> needs to be done as part of ExitBootServices() method. > >> > >> FWIW, 71320946d5edf is an ABI change, and as desirable as such > >> behavior might be, a new hypercall should have introduced that had the > >> behavior that the change intended to retrofit into > >> XENMEM_add_to_physmap. > > I can see how you think this is an ABI change but the previous behavior > > was incorrect. Before this patch, on Arm, we would allow the shared page > > to be mapped twice. As we don't know where the firmware had mapped it > > this could result to random corruption. > > > > Now, we could surely decide to remove the page as x86 did. But this > > could leave a hole in the RAM. As the OS would not know where the hole > > is, this could lead to page fault randomly during runtime. > > > +1. > > In addition to what Julien has already said, I would like to say the > same issue was faced due to U-Boot (running as a part of Xen guest > before OS) didn't perform a cleanup before jumping to OS. This is > already fixed to follow the current behavior. I didn't find > corresponding U-Boot mail thread, but can point to already upstreamed > commit in the main repo. What about other bootloaders? There might be other loaders that didn't unmap shared_info either, but that removed the page where the shared_info is mapped from the reported RAM ranges to the guest. In such case not doing the unmapping was benign, as the guest would never use that gfn as RAM anyway. Thanks, Roger. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 12:59 ` Roger Pau Monné @ 2023-10-04 17:18 ` Oleksandr Tyshchenko 0 siblings, 0 replies; 24+ messages in thread From: Oleksandr Tyshchenko @ 2023-10-04 17:18 UTC (permalink / raw) To: Roger Pau Monné Cc: Julien Grall, Elliott Mitchell, xen-devel@lists.xenproject.org, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk On 04.10.23 15:59, Roger Pau Monné wrote: Hello Roger > On Wed, Oct 04, 2023 at 11:42:32AM +0000, Oleksandr Tyshchenko wrote: >> >> >> On 04.10.23 13:55, Julien Grall wrote: >> >> Hello all. >> >>> Hi Roger, >>> >>> On 04/10/2023 09:13, Roger Pau Monné wrote: >>>> On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: >>>>> On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: >>>>>> On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: >>>>>>> I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current >>>>>>> issue is >>>>>>> the changes with the handling of the shared information page appear to >>>>>>> have broken things for me. >>>>>>> >>>>>>> With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build >>>>>>> of the 4.17 release, mapping the shared information page doesn't work. >>>>>> >>>>>> This is due to 71320946d5edf AFAICT. >>>>> >>>>> Yes. While the -EBUSY line may be the one triggering, I'm unsure why. >>>>> This seems a fairly reasonable change, so I had no intention of asking >>>>> for a revert (which likely would have been rejected). There is also a >>>>> real possibility the -EBUSY comes from elsewhere. Could also be >>>>> 71320946d5edf caused a bug elsewhere to be exposed. >>>> >>>> A good way to know would be to attempt to revert 71320946d5edf and see >>>> if that fixes your issue. >>>> >>>> Alternatively you can try (or similar): >>>> >>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c >>>> index 6ccffeaea57d..105ef3faecfd 100644 >>>> --- a/xen/arch/arm/mm.c >>>> +++ b/xen/arch/arm/mm.c >>>> @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( >>>> page_set_xenheap_gfn(mfn_to_page(mfn), gfn); >>>> } >>>> else >>>> + { >>>> + printk("%u already mapped\n", space); >>>> /* >>>> * Mandate the caller to first unmap the page before >>>> mapping it >>>> * again. This is to prevent Xen creating an unwanted >>>> hole in >>>> @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( >>>> * to unmap it afterwards. >>>> */ >>>> rc = -EBUSY; >>>> + } >>>> p2m_write_unlock(p2m); >>>> } >>>> >>>>>>> I'm using Tianocore as the first stage loader. This continues to work >>>>>>> fine. The build is using tag "edk2-stable202211", commit fff6d81270. >>>>>>> While Tianocore does map the shared information page, my reading of >>>>>>> their >>>>>>> source is that it properly unmaps the page and therefore shouldn't >>>>>>> cause >>>>>>> trouble. >>>>>>> >>>>>>> Notes on the actual call is gpfn was 0x0000000000040072. This is >>>>>>> outside >>>>>>> the recommended address range, but my understanding is this is >>>>>>> supposed >>>>>>> to be okay. >>>>>>> >>>>>>> The return code is -16, which is EBUSY. >>>>>>> >>>>>>> Ideas? >>>>>> >>>>>> I think the issue is that you are mapping the shared info page over a >>>>>> guest RAM page, and in order to do that you would fist need to create >>>>>> a hole and then map the shared info page. IOW: the issue is not with >>>>>> edk2 not having unmapped the page, but with FreeBSD trying to map the >>>>>> shared_info over a RAM page instead of a hole in the p2m. x86 >>>>>> behavior is different here, and does allow mapping the shared_info >>>>>> page over a RAM gfn (by first removing the backing RAM page on the >>>>>> gfn). >>>>> >>>>> An interesting thought. I thought I'd tried this, but since I didn't >>>>> see >>>>> such in my experiments list. What I had tried was removing all the >>>>> pages >>>>> in the suggested mapping range. Yet this failed. >>>> >>>> Yeah, I went too fast and didn't read the code correctly, it is not >>>> checking that the provided gfn is already populated, but whether the >>>> mfn intended to be mapped is already mapped at a different location. >>>> >>>>> Since this seemed reasonable, I've now tried and found it fails. The >>>>> XENMEM_remove_from_physmap call returns 0. >>>> >>>> XENMEM_remove_from_physmap returning 0 is fine, but it seems to me >>>> like edk2 hasn't unmapped the shared_info page. The OS has no idea >>>> at which position the shared_info page is currently mapped, and hence >>>> can't do anything to attempt to unmap it in order to cover up for >>>> buggy firmware. >>>> >>>> edk2 should be the entity to issue the XENMEM_remove_from_physmap >>>> against the gfn where it has the shared_info page mapped. Likely >>>> needs to be done as part of ExitBootServices() method. >>>> >>>> FWIW, 71320946d5edf is an ABI change, and as desirable as such >>>> behavior might be, a new hypercall should have introduced that had the >>>> behavior that the change intended to retrofit into >>>> XENMEM_add_to_physmap. >>> I can see how you think this is an ABI change but the previous behavior >>> was incorrect. Before this patch, on Arm, we would allow the shared page >>> to be mapped twice. As we don't know where the firmware had mapped it >>> this could result to random corruption. >>> >>> Now, we could surely decide to remove the page as x86 did. But this >>> could leave a hole in the RAM. As the OS would not know where the hole >>> is, this could lead to page fault randomly during runtime. >> >> >> +1. >> >> In addition to what Julien has already said, I would like to say the >> same issue was faced due to U-Boot (running as a part of Xen guest >> before OS) didn't perform a cleanup before jumping to OS. This is >> already fixed to follow the current behavior. I didn't find >> corresponding U-Boot mail thread, but can point to already upstreamed >> commit in the main repo. > > What about other bootloaders? > > There might be other loaders that didn't unmap shared_info either, but > that removed the page where the shared_info is mapped from the > reported RAM ranges to the guest. In such case not doing the > unmapping was benign, as the guest would never use that gfn as RAM > anyway. I got your point, this way, I agree, not doing the unmapping wouldn't be a big problem. I am wondering whether such bootloaders that instead of unmapping indeed hiding/reserving the gfn (where shared_info is mapped to) are really present in the world (Arm)? Patch was merged more than 1 year ago, and the issue hasn't been noticed until now with EDK2. Let's wait for the addition input on how the EDK2 is really dealing with shared_info. > > Thanks, Roger. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 10:55 ` Julien Grall 2023-10-04 11:42 ` Oleksandr Tyshchenko @ 2023-10-04 12:53 ` Roger Pau Monné 2023-10-04 13:03 ` Julien Grall 1 sibling, 1 reply; 24+ messages in thread From: Roger Pau Monné @ 2023-10-04 12:53 UTC (permalink / raw) To: Julien Grall Cc: Elliott Mitchell, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On Wed, Oct 04, 2023 at 11:55:05AM +0100, Julien Grall wrote: > Hi Roger, > > On 04/10/2023 09:13, Roger Pau Monné wrote: > > On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: > > > On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: > > > > On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: > > > > > I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is > > > > > the changes with the handling of the shared information page appear to > > > > > have broken things for me. > > > > > > > > > > With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build > > > > > of the 4.17 release, mapping the shared information page doesn't work. > > > > > > > > This is due to 71320946d5edf AFAICT. > > > > > > Yes. While the -EBUSY line may be the one triggering, I'm unsure why. > > > This seems a fairly reasonable change, so I had no intention of asking > > > for a revert (which likely would have been rejected). There is also a > > > real possibility the -EBUSY comes from elsewhere. Could also be > > > 71320946d5edf caused a bug elsewhere to be exposed. > > > > A good way to know would be to attempt to revert 71320946d5edf and see > > if that fixes your issue. > > > > Alternatively you can try (or similar): > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > index 6ccffeaea57d..105ef3faecfd 100644 > > --- a/xen/arch/arm/mm.c > > +++ b/xen/arch/arm/mm.c > > @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( > > page_set_xenheap_gfn(mfn_to_page(mfn), gfn); > > } > > else > > + { > > + printk("%u already mapped\n", space); > > /* > > * Mandate the caller to first unmap the page before mapping it > > * again. This is to prevent Xen creating an unwanted hole in > > @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( > > * to unmap it afterwards. > > */ > > rc = -EBUSY; > > + } > > p2m_write_unlock(p2m); > > } > > > > > > > I'm using Tianocore as the first stage loader. This continues to work > > > > > fine. The build is using tag "edk2-stable202211", commit fff6d81270. > > > > > While Tianocore does map the shared information page, my reading of their > > > > > source is that it properly unmaps the page and therefore shouldn't cause > > > > > trouble. > > > > > > > > > > Notes on the actual call is gpfn was 0x0000000000040072. This is outside > > > > > the recommended address range, but my understanding is this is supposed > > > > > to be okay. > > > > > > > > > > The return code is -16, which is EBUSY. > > > > > > > > > > Ideas? > > > > > > > > I think the issue is that you are mapping the shared info page over a > > > > guest RAM page, and in order to do that you would fist need to create > > > > a hole and then map the shared info page. IOW: the issue is not with > > > > edk2 not having unmapped the page, but with FreeBSD trying to map the > > > > shared_info over a RAM page instead of a hole in the p2m. x86 > > > > behavior is different here, and does allow mapping the shared_info > > > > page over a RAM gfn (by first removing the backing RAM page on the > > > > gfn). > > > > > > An interesting thought. I thought I'd tried this, but since I didn't see > > > such in my experiments list. What I had tried was removing all the pages > > > in the suggested mapping range. Yet this failed. > > > > Yeah, I went too fast and didn't read the code correctly, it is not > > checking that the provided gfn is already populated, but whether the > > mfn intended to be mapped is already mapped at a different location. > > > > > Since this seemed reasonable, I've now tried and found it fails. The > > > XENMEM_remove_from_physmap call returns 0. > > > > XENMEM_remove_from_physmap returning 0 is fine, but it seems to me > > like edk2 hasn't unmapped the shared_info page. The OS has no idea > > at which position the shared_info page is currently mapped, and hence > > can't do anything to attempt to unmap it in order to cover up for > > buggy firmware. > > > > edk2 should be the entity to issue the XENMEM_remove_from_physmap > > against the gfn where it has the shared_info page mapped. Likely > > needs to be done as part of ExitBootServices() method. > > > > FWIW, 71320946d5edf is an ABI change, and as desirable as such > > behavior might be, a new hypercall should have introduced that had the > > behavior that the change intended to retrofit into > > XENMEM_add_to_physmap. > I can see how you think this is an ABI change but the previous behavior was > incorrect. Before this patch, on Arm, we would allow the shared page to be > mapped twice. As we don't know where the firmware had mapped it this could > result to random corruption. > > Now, we could surely decide to remove the page as x86 did. But this could > leave a hole in the RAM. As the OS would not know where the hole is, this > could lead to page fault randomly during runtime. I would say it's the job of the firmware to notify the OS where the hole is, by modifying the memory map handled to the OS. Or else mapping the shared_info page in an unpopulated p2m hole. When using UEFI there's RAM that will always be in-use by the firmware, as runtime services cannot be shut down, and hence the firmware must already have a way to remove/reserve such region(s) on the memory map. > Neither of the two behaviors help the users. In fact, I think they only make > the experience worse because you don't know when the issue will happen. > > AFAICT, there is no way for an HVM guestto know which GFN was inuse. So in > all the cases, I can't think of a way for the OS to workaround properly > buggy firmware. Therefore, returning -EBUSY is the safest we can do for our > users and I don't view it as a ABI change (someone rely on the previous > behavior is bound to failure). I fully agree the current behavior might not be the best one, but I do consider this part of the ABI, specially as booting guests using edk2 has now stopped working after this change. Introducing a different hypercall, or even using XENMAPSPACE_shared_info with idx = 1 to signal the usage of the new behavior should be used instead. This would also allow unifying the behavior between x86 and Arm. Thanks, Roger. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 12:53 ` Roger Pau Monné @ 2023-10-04 13:03 ` Julien Grall 2023-10-04 13:39 ` Roger Pau Monné 0 siblings, 1 reply; 24+ messages in thread From: Julien Grall @ 2023-10-04 13:03 UTC (permalink / raw) To: Roger Pau Monné Cc: Elliott Mitchell, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko Hi Roger, On 04/10/2023 13:53, Roger Pau Monné wrote: > On Wed, Oct 04, 2023 at 11:55:05AM +0100, Julien Grall wrote: >> Hi Roger, >> >> On 04/10/2023 09:13, Roger Pau Monné wrote: >>> On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: >>>> On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: >>>>> On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: >>>>>> I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is >>>>>> the changes with the handling of the shared information page appear to >>>>>> have broken things for me. >>>>>> >>>>>> With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build >>>>>> of the 4.17 release, mapping the shared information page doesn't work. >>>>> >>>>> This is due to 71320946d5edf AFAICT. >>>> >>>> Yes. While the -EBUSY line may be the one triggering, I'm unsure why. >>>> This seems a fairly reasonable change, so I had no intention of asking >>>> for a revert (which likely would have been rejected). There is also a >>>> real possibility the -EBUSY comes from elsewhere. Could also be >>>> 71320946d5edf caused a bug elsewhere to be exposed. >>> >>> A good way to know would be to attempt to revert 71320946d5edf and see >>> if that fixes your issue. >>> >>> Alternatively you can try (or similar): >>> >>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c >>> index 6ccffeaea57d..105ef3faecfd 100644 >>> --- a/xen/arch/arm/mm.c >>> +++ b/xen/arch/arm/mm.c >>> @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( >>> page_set_xenheap_gfn(mfn_to_page(mfn), gfn); >>> } >>> else >>> + { >>> + printk("%u already mapped\n", space); >>> /* >>> * Mandate the caller to first unmap the page before mapping it >>> * again. This is to prevent Xen creating an unwanted hole in >>> @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( >>> * to unmap it afterwards. >>> */ >>> rc = -EBUSY; >>> + } >>> p2m_write_unlock(p2m); >>> } >>> >>>>>> I'm using Tianocore as the first stage loader. This continues to work >>>>>> fine. The build is using tag "edk2-stable202211", commit fff6d81270. >>>>>> While Tianocore does map the shared information page, my reading of their >>>>>> source is that it properly unmaps the page and therefore shouldn't cause >>>>>> trouble. >>>>>> >>>>>> Notes on the actual call is gpfn was 0x0000000000040072. This is outside >>>>>> the recommended address range, but my understanding is this is supposed >>>>>> to be okay. >>>>>> >>>>>> The return code is -16, which is EBUSY. >>>>>> >>>>>> Ideas? >>>>> >>>>> I think the issue is that you are mapping the shared info page over a >>>>> guest RAM page, and in order to do that you would fist need to create >>>>> a hole and then map the shared info page. IOW: the issue is not with >>>>> edk2 not having unmapped the page, but with FreeBSD trying to map the >>>>> shared_info over a RAM page instead of a hole in the p2m. x86 >>>>> behavior is different here, and does allow mapping the shared_info >>>>> page over a RAM gfn (by first removing the backing RAM page on the >>>>> gfn). >>>> >>>> An interesting thought. I thought I'd tried this, but since I didn't see >>>> such in my experiments list. What I had tried was removing all the pages >>>> in the suggested mapping range. Yet this failed. >>> >>> Yeah, I went too fast and didn't read the code correctly, it is not >>> checking that the provided gfn is already populated, but whether the >>> mfn intended to be mapped is already mapped at a different location. >>> >>>> Since this seemed reasonable, I've now tried and found it fails. The >>>> XENMEM_remove_from_physmap call returns 0. >>> >>> XENMEM_remove_from_physmap returning 0 is fine, but it seems to me >>> like edk2 hasn't unmapped the shared_info page. The OS has no idea >>> at which position the shared_info page is currently mapped, and hence >>> can't do anything to attempt to unmap it in order to cover up for >>> buggy firmware. >>> >>> edk2 should be the entity to issue the XENMEM_remove_from_physmap >>> against the gfn where it has the shared_info page mapped. Likely >>> needs to be done as part of ExitBootServices() method. >>> >>> FWIW, 71320946d5edf is an ABI change, and as desirable as such >>> behavior might be, a new hypercall should have introduced that had the >>> behavior that the change intended to retrofit into >>> XENMEM_add_to_physmap. >> I can see how you think this is an ABI change but the previous behavior was >> incorrect. Before this patch, on Arm, we would allow the shared page to be >> mapped twice. As we don't know where the firmware had mapped it this could >> result to random corruption. >> >> Now, we could surely decide to remove the page as x86 did. But this could >> leave a hole in the RAM. As the OS would not know where the hole is, this >> could lead to page fault randomly during runtime. > > I would say it's the job of the firmware to notify the OS where the > hole is, by modifying the memory map handled to the OS. Or else > mapping the shared_info page in an unpopulated p2m hole. I agree but I am not convinced that they are all doing it. At least U-boot didn't do it before we fixed it. > > When using UEFI there's RAM that will always be in-use by the > firmware, as runtime services cannot be shut down, and hence the > firmware must already have a way to remove/reserve such region(s) on > the memory map. Can either you or Elliott confirm if EDK2 reserve the region? > >> Neither of the two behaviors help the users. In fact, I think they only make >> the experience worse because you don't know when the issue will happen. >> >> AFAICT, there is no way for an HVM guestto know which GFN was inuse. So in >> all the cases, I can't think of a way for the OS to workaround properly >> buggy firmware. Therefore, returning -EBUSY is the safest we can do for our >> users and I don't view it as a ABI change (someone rely on the previous >> behavior is bound to failure). > > I fully agree the current behavior might not be the best one, but I do > consider this part of the ABI, specially as booting guests using edk2 > has now stopped working after this change. Right. If we remove the check, you may be able to boot a guest. But are we sure that such guest would run safely? Also, it is not really argument, but this is not the only broken part in EDK2 for Xen Arm guests. The other one I know is EDS makes assumption how some Device-Tree nodes and this will break on newer Xen. So overall, it feels to me that EDK2 is not entirely ready to be used in production for Xen on Arm guests. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 13:03 ` Julien Grall @ 2023-10-04 13:39 ` Roger Pau Monné 2023-10-04 14:06 ` Julien Grall 0 siblings, 1 reply; 24+ messages in thread From: Roger Pau Monné @ 2023-10-04 13:39 UTC (permalink / raw) To: Julien Grall Cc: Elliott Mitchell, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On Wed, Oct 04, 2023 at 02:03:43PM +0100, Julien Grall wrote: > Hi Roger, > > On 04/10/2023 13:53, Roger Pau Monné wrote: > > On Wed, Oct 04, 2023 at 11:55:05AM +0100, Julien Grall wrote: > > > Hi Roger, > > > > > > On 04/10/2023 09:13, Roger Pau Monné wrote: > > > > On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: > > > > > On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: > > > > > > On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: > > > > > > > I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is > > > > > > > the changes with the handling of the shared information page appear to > > > > > > > have broken things for me. > > > > > > > > > > > > > > With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build > > > > > > > of the 4.17 release, mapping the shared information page doesn't work. > > > > > > > > > > > > This is due to 71320946d5edf AFAICT. > > > > > > > > > > Yes. While the -EBUSY line may be the one triggering, I'm unsure why. > > > > > This seems a fairly reasonable change, so I had no intention of asking > > > > > for a revert (which likely would have been rejected). There is also a > > > > > real possibility the -EBUSY comes from elsewhere. Could also be > > > > > 71320946d5edf caused a bug elsewhere to be exposed. > > > > > > > > A good way to know would be to attempt to revert 71320946d5edf and see > > > > if that fixes your issue. > > > > > > > > Alternatively you can try (or similar): > > > > > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > > > index 6ccffeaea57d..105ef3faecfd 100644 > > > > --- a/xen/arch/arm/mm.c > > > > +++ b/xen/arch/arm/mm.c > > > > @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( > > > > page_set_xenheap_gfn(mfn_to_page(mfn), gfn); > > > > } > > > > else > > > > + { > > > > + printk("%u already mapped\n", space); > > > > /* > > > > * Mandate the caller to first unmap the page before mapping it > > > > * again. This is to prevent Xen creating an unwanted hole in > > > > @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( > > > > * to unmap it afterwards. > > > > */ > > > > rc = -EBUSY; > > > > + } > > > > p2m_write_unlock(p2m); > > > > } > > > > > > > > > > > I'm using Tianocore as the first stage loader. This continues to work > > > > > > > fine. The build is using tag "edk2-stable202211", commit fff6d81270. > > > > > > > While Tianocore does map the shared information page, my reading of their > > > > > > > source is that it properly unmaps the page and therefore shouldn't cause > > > > > > > trouble. > > > > > > > > > > > > > > Notes on the actual call is gpfn was 0x0000000000040072. This is outside > > > > > > > the recommended address range, but my understanding is this is supposed > > > > > > > to be okay. > > > > > > > > > > > > > > The return code is -16, which is EBUSY. > > > > > > > > > > > > > > Ideas? > > > > > > > > > > > > I think the issue is that you are mapping the shared info page over a > > > > > > guest RAM page, and in order to do that you would fist need to create > > > > > > a hole and then map the shared info page. IOW: the issue is not with > > > > > > edk2 not having unmapped the page, but with FreeBSD trying to map the > > > > > > shared_info over a RAM page instead of a hole in the p2m. x86 > > > > > > behavior is different here, and does allow mapping the shared_info > > > > > > page over a RAM gfn (by first removing the backing RAM page on the > > > > > > gfn). > > > > > > > > > > An interesting thought. I thought I'd tried this, but since I didn't see > > > > > such in my experiments list. What I had tried was removing all the pages > > > > > in the suggested mapping range. Yet this failed. > > > > > > > > Yeah, I went too fast and didn't read the code correctly, it is not > > > > checking that the provided gfn is already populated, but whether the > > > > mfn intended to be mapped is already mapped at a different location. > > > > > > > > > Since this seemed reasonable, I've now tried and found it fails. The > > > > > XENMEM_remove_from_physmap call returns 0. > > > > > > > > XENMEM_remove_from_physmap returning 0 is fine, but it seems to me > > > > like edk2 hasn't unmapped the shared_info page. The OS has no idea > > > > at which position the shared_info page is currently mapped, and hence > > > > can't do anything to attempt to unmap it in order to cover up for > > > > buggy firmware. > > > > > > > > edk2 should be the entity to issue the XENMEM_remove_from_physmap > > > > against the gfn where it has the shared_info page mapped. Likely > > > > needs to be done as part of ExitBootServices() method. > > > > > > > > FWIW, 71320946d5edf is an ABI change, and as desirable as such > > > > behavior might be, a new hypercall should have introduced that had the > > > > behavior that the change intended to retrofit into > > > > XENMEM_add_to_physmap. > > > I can see how you think this is an ABI change but the previous behavior was > > > incorrect. Before this patch, on Arm, we would allow the shared page to be > > > mapped twice. As we don't know where the firmware had mapped it this could > > > result to random corruption. > > > > > > Now, we could surely decide to remove the page as x86 did. But this could > > > leave a hole in the RAM. As the OS would not know where the hole is, this > > > could lead to page fault randomly during runtime. > > > > I would say it's the job of the firmware to notify the OS where the > > hole is, by modifying the memory map handled to the OS. Or else > > mapping the shared_info page in an unpopulated p2m hole. > > I agree but I am not convinced that they are all doing it. At least U-boot > didn't do it before we fixed it. > > > > > When using UEFI there's RAM that will always be in-use by the > > firmware, as runtime services cannot be shut down, and hence the > > firmware must already have a way to remove/reserve such region(s) on > > the memory map. > > Can either you or Elliott confirm if EDK2 reserve the region? I will defer to Elliott to check for arm. I would be quite surprised if it doesn't on x86, or else we would get a myriad of bug reports about guests randomly crashing when using edk2. > > > > > Neither of the two behaviors help the users. In fact, I think they only make > > > the experience worse because you don't know when the issue will happen. > > > > > > AFAICT, there is no way for an HVM guestto know which GFN was inuse. So in > > > all the cases, I can't think of a way for the OS to workaround properly > > > buggy firmware. Therefore, returning -EBUSY is the safest we can do for our > > > users and I don't view it as a ABI change (someone rely on the previous > > > behavior is bound to failure). > > > > I fully agree the current behavior might not be the best one, but I do > > consider this part of the ABI, specially as booting guests using edk2 > > has now stopped working after this change. > > Right. If we remove the check, you may be able to boot a guest. But are we > sure that such guest would run safely? If the guest wants the hypervisor to enforce such behavior, let it use the new hypercall to explicitly request the shared_info page to not be mapped anywhere else. But if you don't trust the bootloader, how do you know it hasn't poked holes elsewhere in the RAM regions? Even if the shared_info page has been unmapped, can you be sure the bootloader has put a RAM page back in that gfn? I understand this ABI change is done to avoid bugs, but at the cost of diverging from x86, and breaking existing firmwares which might not be buggy. > Also, it is not really argument, but this is not the only broken part in > EDK2 for Xen Arm guests. The other one I know is EDS makes assumption how > some Device-Tree nodes and this will break on newer Xen. > > So overall, it feels to me that EDK2 is not entirely ready to be used in > production for Xen on Arm guests. I really have no insight on this. What are the supported way of booting guests on Arm? (SUPPORT.md doesn't seem to list any firmware for Arm guests AFAICT). Thanks, Roger. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 13:39 ` Roger Pau Monné @ 2023-10-04 14:06 ` Julien Grall 2023-10-04 14:53 ` Roger Pau Monné 0 siblings, 1 reply; 24+ messages in thread From: Julien Grall @ 2023-10-04 14:06 UTC (permalink / raw) To: Roger Pau Monné Cc: Elliott Mitchell, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko Hi Roger, On 04/10/2023 14:39, Roger Pau Monné wrote: > On Wed, Oct 04, 2023 at 02:03:43PM +0100, Julien Grall wrote: >> Hi Roger, >> >> On 04/10/2023 13:53, Roger Pau Monné wrote: >>> On Wed, Oct 04, 2023 at 11:55:05AM +0100, Julien Grall wrote: >>>> Hi Roger, >>>> >>>> On 04/10/2023 09:13, Roger Pau Monné wrote: >>>>> On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: >>>>>> On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: >>>>>>> On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: >>>>>>>> I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is >>>>>>>> the changes with the handling of the shared information page appear to >>>>>>>> have broken things for me. >>>>>>>> >>>>>>>> With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build >>>>>>>> of the 4.17 release, mapping the shared information page doesn't work. >>>>>>> >>>>>>> This is due to 71320946d5edf AFAICT. >>>>>> >>>>>> Yes. While the -EBUSY line may be the one triggering, I'm unsure why. >>>>>> This seems a fairly reasonable change, so I had no intention of asking >>>>>> for a revert (which likely would have been rejected). There is also a >>>>>> real possibility the -EBUSY comes from elsewhere. Could also be >>>>>> 71320946d5edf caused a bug elsewhere to be exposed. >>>>> >>>>> A good way to know would be to attempt to revert 71320946d5edf and see >>>>> if that fixes your issue. >>>>> >>>>> Alternatively you can try (or similar): >>>>> >>>>> diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c >>>>> index 6ccffeaea57d..105ef3faecfd 100644 >>>>> --- a/xen/arch/arm/mm.c >>>>> +++ b/xen/arch/arm/mm.c >>>>> @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( >>>>> page_set_xenheap_gfn(mfn_to_page(mfn), gfn); >>>>> } >>>>> else >>>>> + { >>>>> + printk("%u already mapped\n", space); >>>>> /* >>>>> * Mandate the caller to first unmap the page before mapping it >>>>> * again. This is to prevent Xen creating an unwanted hole in >>>>> @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( >>>>> * to unmap it afterwards. >>>>> */ >>>>> rc = -EBUSY; >>>>> + } >>>>> p2m_write_unlock(p2m); >>>>> } >>>>> >>>>>>>> I'm using Tianocore as the first stage loader. This continues to work >>>>>>>> fine. The build is using tag "edk2-stable202211", commit fff6d81270. >>>>>>>> While Tianocore does map the shared information page, my reading of their >>>>>>>> source is that it properly unmaps the page and therefore shouldn't cause >>>>>>>> trouble. >>>>>>>> >>>>>>>> Notes on the actual call is gpfn was 0x0000000000040072. This is outside >>>>>>>> the recommended address range, but my understanding is this is supposed >>>>>>>> to be okay. >>>>>>>> >>>>>>>> The return code is -16, which is EBUSY. >>>>>>>> >>>>>>>> Ideas? >>>>>>> >>>>>>> I think the issue is that you are mapping the shared info page over a >>>>>>> guest RAM page, and in order to do that you would fist need to create >>>>>>> a hole and then map the shared info page. IOW: the issue is not with >>>>>>> edk2 not having unmapped the page, but with FreeBSD trying to map the >>>>>>> shared_info over a RAM page instead of a hole in the p2m. x86 >>>>>>> behavior is different here, and does allow mapping the shared_info >>>>>>> page over a RAM gfn (by first removing the backing RAM page on the >>>>>>> gfn). >>>>>> >>>>>> An interesting thought. I thought I'd tried this, but since I didn't see >>>>>> such in my experiments list. What I had tried was removing all the pages >>>>>> in the suggested mapping range. Yet this failed. >>>>> >>>>> Yeah, I went too fast and didn't read the code correctly, it is not >>>>> checking that the provided gfn is already populated, but whether the >>>>> mfn intended to be mapped is already mapped at a different location. >>>>> >>>>>> Since this seemed reasonable, I've now tried and found it fails. The >>>>>> XENMEM_remove_from_physmap call returns 0. >>>>> >>>>> XENMEM_remove_from_physmap returning 0 is fine, but it seems to me >>>>> like edk2 hasn't unmapped the shared_info page. The OS has no idea >>>>> at which position the shared_info page is currently mapped, and hence >>>>> can't do anything to attempt to unmap it in order to cover up for >>>>> buggy firmware. >>>>> >>>>> edk2 should be the entity to issue the XENMEM_remove_from_physmap >>>>> against the gfn where it has the shared_info page mapped. Likely >>>>> needs to be done as part of ExitBootServices() method. >>>>> >>>>> FWIW, 71320946d5edf is an ABI change, and as desirable as such >>>>> behavior might be, a new hypercall should have introduced that had the >>>>> behavior that the change intended to retrofit into >>>>> XENMEM_add_to_physmap. >>>> I can see how you think this is an ABI change but the previous behavior was >>>> incorrect. Before this patch, on Arm, we would allow the shared page to be >>>> mapped twice. As we don't know where the firmware had mapped it this could >>>> result to random corruption. >>>> >>>> Now, we could surely decide to remove the page as x86 did. But this could >>>> leave a hole in the RAM. As the OS would not know where the hole is, this >>>> could lead to page fault randomly during runtime. >>> >>> I would say it's the job of the firmware to notify the OS where the >>> hole is, by modifying the memory map handled to the OS. Or else >>> mapping the shared_info page in an unpopulated p2m hole. >> >> I agree but I am not convinced that they are all doing it. At least U-boot >> didn't do it before we fixed it. >> >>> >>> When using UEFI there's RAM that will always be in-use by the >>> firmware, as runtime services cannot be shut down, and hence the >>> firmware must already have a way to remove/reserve such region(s) on >>> the memory map. >> >> Can either you or Elliott confirm if EDK2 reserve the region? > > I will defer to Elliott to check for arm. I would be quite surprised > if it doesn't on x86, or else we would get a myriad of bug reports > about guests randomly crashing when using edk2. > >>> >>>> Neither of the two behaviors help the users. In fact, I think they only make >>>> the experience worse because you don't know when the issue will happen. >>>> >>>> AFAICT, there is no way for an HVM guestto know which GFN was inuse. So in >>>> all the cases, I can't think of a way for the OS to workaround properly >>>> buggy firmware. Therefore, returning -EBUSY is the safest we can do for our >>>> users and I don't view it as a ABI change (someone rely on the previous >>>> behavior is bound to failure). >>> >>> I fully agree the current behavior might not be the best one, but I do >>> consider this part of the ABI, specially as booting guests using edk2 >>> has now stopped working after this change. >> >> Right. If we remove the check, you may be able to boot a guest. But are we >> sure that such guest would run safely? > > If the guest wants the hypervisor to enforce such behavior, let it > use the new hypercall to explicitly request the shared_info page to > not be mapped anywhere else. TBH, I am not convinced the new hypercall is going to help. Let say we decide to modify FreeBSD/Linux to use it, The old EDK2 firmware would still be buggy and this would prevent boot. So we are back the same problem... We could also say we don't support older firmware. But that's not very different from leaving the hypercall as-is and fix EDK2 > > But if you don't trust the bootloader, how do you know it hasn't poked > holes elsewhere in the RAM regions? We don't know. But how do you know the bootloader will not want to continue using the vCPU shared page? For instance, the public headers doesn't seem to mention that the page can only mapped once and it would unmap the previous area. In fact, for Arm, until that commit shared page could be mapped N times... So technically even if we remove the page, the commit already made an ABI change. Yes it is now more inline with x86 but this doesn't this is still an ABI change. I would be surprised if you say we should not have done that because (in particular if you have XSA-379 in mind). > > Even if the shared_info page has been unmapped, can you be sure the > bootloader has put a RAM page back in that gfn? We can't. But the same goes with the bootloader reserving the region... > > I understand this ABI change is done to avoid bugs, but at the cost of > diverging from x86, and breaking existing firmwares which might not be > buggy. As I pointed out above, the exact behavior of the hypercall is not fully documented and the behavior has changed with some XSAs. So this is no surprise if Arm and x86 behaved differently (even before that commit). There are plenty of behavior I considered wrong in the way x86 update the P2M and I would be concerned if we don't give any leeway for the architectures to tighten the update. BTW some checks have evolved over the time during security event (XSA-378 for example). This is not very different here. For Arm we decided to not follow a behavior that I consider incorrect and potentially more harmful than trying to support bootloader not removing the shared page. If we want to handle such firmware, I think it would be better if we provide an hypercall that would return the GFN where it is currently mapped. > >> Also, it is not really argument, but this is not the only broken part in >> EDK2 for Xen Arm guests. The other one I know is EDS makes assumption how >> some Device-Tree nodes and this will break on newer Xen. >> >> So overall, it feels to me that EDK2 is not entirely ready to be used in >> production for Xen on Arm guests. > > I really have no insight on this. What are the supported way of booting > guests on Arm? (SUPPORT.md doesn't seem to list any firmware for Arm > guests AFAICT). Some bootloaders (e.g. U-boot/EDK2) have support to be used as a fimware for Xen on Arm guests. But they are not supported officially. Most of the setup seems to specify the kernel directly in the XL configuration. We probably ought to add support for EDK2/U-boot. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 14:06 ` Julien Grall @ 2023-10-04 14:53 ` Roger Pau Monné 2023-10-04 17:47 ` Julien Grall 0 siblings, 1 reply; 24+ messages in thread From: Roger Pau Monné @ 2023-10-04 14:53 UTC (permalink / raw) To: Julien Grall Cc: Elliott Mitchell, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On Wed, Oct 04, 2023 at 03:06:14PM +0100, Julien Grall wrote: > Hi Roger, > > On 04/10/2023 14:39, Roger Pau Monné wrote: > > On Wed, Oct 04, 2023 at 02:03:43PM +0100, Julien Grall wrote: > > > Hi Roger, > > > > > > On 04/10/2023 13:53, Roger Pau Monné wrote: > > > > On Wed, Oct 04, 2023 at 11:55:05AM +0100, Julien Grall wrote: > > > > > Hi Roger, > > > > > > > > > > On 04/10/2023 09:13, Roger Pau Monné wrote: > > > > > > On Tue, Oct 03, 2023 at 12:18:35PM -0700, Elliott Mitchell wrote: > > > > > > > On Tue, Oct 03, 2023 at 10:26:28AM +0200, Roger Pau Monné wrote: > > > > > > > > On Thu, Sep 28, 2023 at 07:49:18PM -0700, Elliott Mitchell wrote: > > > > > > > > > I'm trying to get FreeBSD/ARM operational on Xen/ARM. Current issue is > > > > > > > > > the changes with the handling of the shared information page appear to > > > > > > > > > have broken things for me. > > > > > > > > > > > > > > > > > > With a pre-4.17 build of Xen/ARM things worked fine. Yet with a build > > > > > > > > > of the 4.17 release, mapping the shared information page doesn't work. > > > > > > > > > > > > > > > > This is due to 71320946d5edf AFAICT. > > > > > > > > > > > > > > Yes. While the -EBUSY line may be the one triggering, I'm unsure why. > > > > > > > This seems a fairly reasonable change, so I had no intention of asking > > > > > > > for a revert (which likely would have been rejected). There is also a > > > > > > > real possibility the -EBUSY comes from elsewhere. Could also be > > > > > > > 71320946d5edf caused a bug elsewhere to be exposed. > > > > > > > > > > > > A good way to know would be to attempt to revert 71320946d5edf and see > > > > > > if that fixes your issue. > > > > > > > > > > > > Alternatively you can try (or similar): > > > > > > > > > > > > diff --git a/xen/arch/arm/mm.c b/xen/arch/arm/mm.c > > > > > > index 6ccffeaea57d..105ef3faecfd 100644 > > > > > > --- a/xen/arch/arm/mm.c > > > > > > +++ b/xen/arch/arm/mm.c > > > > > > @@ -1424,6 +1424,8 @@ int xenmem_add_to_physmap_one( > > > > > > page_set_xenheap_gfn(mfn_to_page(mfn), gfn); > > > > > > } > > > > > > else > > > > > > + { > > > > > > + printk("%u already mapped\n", space); > > > > > > /* > > > > > > * Mandate the caller to first unmap the page before mapping it > > > > > > * again. This is to prevent Xen creating an unwanted hole in > > > > > > @@ -1432,6 +1434,7 @@ int xenmem_add_to_physmap_one( > > > > > > * to unmap it afterwards. > > > > > > */ > > > > > > rc = -EBUSY; > > > > > > + } > > > > > > p2m_write_unlock(p2m); > > > > > > } > > > > > > > > > > > > > > > I'm using Tianocore as the first stage loader. This continues to work > > > > > > > > > fine. The build is using tag "edk2-stable202211", commit fff6d81270. > > > > > > > > > While Tianocore does map the shared information page, my reading of their > > > > > > > > > source is that it properly unmaps the page and therefore shouldn't cause > > > > > > > > > trouble. > > > > > > > > > > > > > > > > > > Notes on the actual call is gpfn was 0x0000000000040072. This is outside > > > > > > > > > the recommended address range, but my understanding is this is supposed > > > > > > > > > to be okay. > > > > > > > > > > > > > > > > > > The return code is -16, which is EBUSY. > > > > > > > > > > > > > > > > > > Ideas? > > > > > > > > > > > > > > > > I think the issue is that you are mapping the shared info page over a > > > > > > > > guest RAM page, and in order to do that you would fist need to create > > > > > > > > a hole and then map the shared info page. IOW: the issue is not with > > > > > > > > edk2 not having unmapped the page, but with FreeBSD trying to map the > > > > > > > > shared_info over a RAM page instead of a hole in the p2m. x86 > > > > > > > > behavior is different here, and does allow mapping the shared_info > > > > > > > > page over a RAM gfn (by first removing the backing RAM page on the > > > > > > > > gfn). > > > > > > > > > > > > > > An interesting thought. I thought I'd tried this, but since I didn't see > > > > > > > such in my experiments list. What I had tried was removing all the pages > > > > > > > in the suggested mapping range. Yet this failed. > > > > > > > > > > > > Yeah, I went too fast and didn't read the code correctly, it is not > > > > > > checking that the provided gfn is already populated, but whether the > > > > > > mfn intended to be mapped is already mapped at a different location. > > > > > > > > > > > > > Since this seemed reasonable, I've now tried and found it fails. The > > > > > > > XENMEM_remove_from_physmap call returns 0. > > > > > > > > > > > > XENMEM_remove_from_physmap returning 0 is fine, but it seems to me > > > > > > like edk2 hasn't unmapped the shared_info page. The OS has no idea > > > > > > at which position the shared_info page is currently mapped, and hence > > > > > > can't do anything to attempt to unmap it in order to cover up for > > > > > > buggy firmware. > > > > > > > > > > > > edk2 should be the entity to issue the XENMEM_remove_from_physmap > > > > > > against the gfn where it has the shared_info page mapped. Likely > > > > > > needs to be done as part of ExitBootServices() method. > > > > > > > > > > > > FWIW, 71320946d5edf is an ABI change, and as desirable as such > > > > > > behavior might be, a new hypercall should have introduced that had the > > > > > > behavior that the change intended to retrofit into > > > > > > XENMEM_add_to_physmap. > > > > > I can see how you think this is an ABI change but the previous behavior was > > > > > incorrect. Before this patch, on Arm, we would allow the shared page to be > > > > > mapped twice. As we don't know where the firmware had mapped it this could > > > > > result to random corruption. > > > > > > > > > > Now, we could surely decide to remove the page as x86 did. But this could > > > > > leave a hole in the RAM. As the OS would not know where the hole is, this > > > > > could lead to page fault randomly during runtime. > > > > > > > > I would say it's the job of the firmware to notify the OS where the > > > > hole is, by modifying the memory map handled to the OS. Or else > > > > mapping the shared_info page in an unpopulated p2m hole. > > > > > > I agree but I am not convinced that they are all doing it. At least U-boot > > > didn't do it before we fixed it. > > > > > > > > > > > When using UEFI there's RAM that will always be in-use by the > > > > firmware, as runtime services cannot be shut down, and hence the > > > > firmware must already have a way to remove/reserve such region(s) on > > > > the memory map. > > > > > > Can either you or Elliott confirm if EDK2 reserve the region? > > > > I will defer to Elliott to check for arm. I would be quite surprised > > if it doesn't on x86, or else we would get a myriad of bug reports > > about guests randomly crashing when using edk2. > > > > > > > > > > > Neither of the two behaviors help the users. In fact, I think they only make > > > > > the experience worse because you don't know when the issue will happen. > > > > > > > > > > AFAICT, there is no way for an HVM guestto know which GFN was inuse. So in > > > > > all the cases, I can't think of a way for the OS to workaround properly > > > > > buggy firmware. Therefore, returning -EBUSY is the safest we can do for our > > > > > users and I don't view it as a ABI change (someone rely on the previous > > > > > behavior is bound to failure). > > > > > > > > I fully agree the current behavior might not be the best one, but I do > > > > consider this part of the ABI, specially as booting guests using edk2 > > > > has now stopped working after this change. > > > > > > Right. If we remove the check, you may be able to boot a guest. But are we > > > sure that such guest would run safely? > > > > If the guest wants the hypervisor to enforce such behavior, let it > > use the new hypercall to explicitly request the shared_info page to > > not be mapped anywhere else. > > TBH, I am not convinced the new hypercall is going to help. Let say we > decide to modify FreeBSD/Linux to use it, The old EDK2 firmware would still > be buggy and this would prevent boot. So we are back the same problem... > > We could also say we don't support older firmware. But that's not very > different from leaving the hypercall as-is and fix EDK2 We could at least print a warning message that the firmware still had the shared_info page mapped, and that the system might not work as expected. > > > > But if you don't trust the bootloader, how do you know it hasn't poked > > holes elsewhere in the RAM regions? > > We don't know. But how do you know the bootloader will not want to continue > using the vCPU shared page? I don't think it's feasible for two entities to concurrently use the shared_info page. > For instance, the public headers doesn't seem to mention that the page can > only mapped once and it would unmap the previous area. In fact, for Arm, > until that commit shared page could be mapped N times... So technically even > if we remove the page, the commit already made an ABI change. Yes it is now > more inline with x86 but this doesn't this is still an ABI change. I would > be surprised if you say we should not have done that because (in particular > if you have XSA-379 in mind). So we agree at least that there's an ABI change :). It's different from 379 because the shared_info page is never freed for the lifetime of the domain, hence there's no risk of leak in this specific case. I can see how preventing multiple mappings can be a safeguard for possible issues similar to 379. Isn't it possible to map a grant at multiple gfns however? > > > > Even if the shared_info page has been unmapped, can you be sure the > > bootloader has put a RAM page back in that gfn? > > We can't. But the same goes with the bootloader reserving the region... > > > > > I understand this ABI change is done to avoid bugs, but at the cost of > > diverging from x86, and breaking existing firmwares which might not be > > buggy. > As I pointed out above, the exact behavior of the hypercall is not fully > documented and the behavior has changed with some XSAs. So this is no > surprise if Arm and x86 behaved differently (even before that commit). > > There are plenty of behavior I considered wrong in the way x86 update the > P2M and I would be concerned if we don't give any leeway for the > architectures to tighten the update. BTW some checks have evolved over the > time during security event (XSA-378 for example). ABI changes due to security issues are unavoidable. > This is not very different here. For Arm we decided to not follow a behavior > that I consider incorrect and potentially more harmful than trying to > support bootloader not removing the shared page. I think this is not very friendly to users, specially if edk2 wasn't checked. I understand the situation is different on Arm vs x86, so if edk2 is not supported on arm I guess it doesn't matter much whether it's broken. It would be a much worse issue on x86 where edk2 is supported. > If we want to handle such firmware, I think it would be better if we provide > an hypercall that would return the GFN where it is currently mapped. Sure, but such hypercall would be racy, as by the time the gfn is returned the value could be stale. Adding a replacing and non replacing XENMEM_add_to_physmap variations would IMO be better. Anyway, I don't maintain this, so it's up to you. > > > > > Also, it is not really argument, but this is not the only broken part in > > > EDK2 for Xen Arm guests. The other one I know is EDS makes assumption how > > > some Device-Tree nodes and this will break on newer Xen. > > > > > > So overall, it feels to me that EDK2 is not entirely ready to be used in > > > production for Xen on Arm guests. > > > > I really have no insight on this. What are the supported way of booting > > guests on Arm? (SUPPORT.md doesn't seem to list any firmware for Arm > > guests AFAICT). > > Some bootloaders (e.g. U-boot/EDK2) have support to be used as a fimware for > Xen on Arm guests. But they are not supported officially. > > Most of the setup seems to specify the kernel directly in the XL > configuration. We probably ought to add support for EDK2/U-boot. I had no idea about that, I do think some kind of firmware is required or else OSes different than Linux can't be supported unless they implement the Linux entry point. Is the entry point / CPU state for arm guests documented somewhere? I wonder whether Elliot could use that for FreeBSD until the situation with edk2 is stable. Thanks, Roger. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 14:53 ` Roger Pau Monné @ 2023-10-04 17:47 ` Julien Grall 2023-10-04 21:13 ` Elliott Mitchell 2023-10-04 22:52 ` Stefano Stabellini 0 siblings, 2 replies; 24+ messages in thread From: Julien Grall @ 2023-10-04 17:47 UTC (permalink / raw) To: Roger Pau Monné Cc: Elliott Mitchell, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko Hi, On 04/10/2023 15:53, Roger Pau Monné wrote: > On Wed, Oct 04, 2023 at 03:06:14PM +0100, Julien Grall wrote: >>>>>> Neither of the two behaviors help the users. In fact, I think they only make >>>>>> the experience worse because you don't know when the issue will happen. >>>>>> >>>>>> AFAICT, there is no way for an HVM guestto know which GFN was inuse. So in >>>>>> all the cases, I can't think of a way for the OS to workaround properly >>>>>> buggy firmware. Therefore, returning -EBUSY is the safest we can do for our >>>>>> users and I don't view it as a ABI change (someone rely on the previous >>>>>> behavior is bound to failure). >>>>> >>>>> I fully agree the current behavior might not be the best one, but I do >>>>> consider this part of the ABI, specially as booting guests using edk2 >>>>> has now stopped working after this change. >>>> >>>> Right. If we remove the check, you may be able to boot a guest. But are we >>>> sure that such guest would run safely? >>> >>> If the guest wants the hypervisor to enforce such behavior, let it >>> use the new hypercall to explicitly request the shared_info page to >>> not be mapped anywhere else. >> >> TBH, I am not convinced the new hypercall is going to help. Let say we >> decide to modify FreeBSD/Linux to use it, The old EDK2 firmware would still >> be buggy and this would prevent boot. So we are back the same problem... >> >> We could also say we don't support older firmware. But that's not very >> different from leaving the hypercall as-is and fix EDK2 > > We could at least print a warning message that the firmware still had > the shared_info page mapped, and that the system might not work as > expected. Are you suggesting to also fallback on the older hypercall? > >>> >>> But if you don't trust the bootloader, how do you know it hasn't poked >>> holes elsewhere in the RAM regions? >> >> We don't know. But how do you know the bootloader will not want to continue >> using the vCPU shared page? > > I don't think it's feasible for two entities to concurrently use the > shared_info page. > >> For instance, the public headers doesn't seem to mention that the page can >> only mapped once and it would unmap the previous area. In fact, for Arm, >> until that commit shared page could be mapped N times... So technically even >> if we remove the page, the commit already made an ABI change. Yes it is now >> more inline with x86 but this doesn't this is still an ABI change. I would >> be surprised if you say we should not have done that because (in particular >> if you have XSA-379 in mind). > > So we agree at least that there's an ABI change :). > > It's different from 379 because the shared_info page is never freed > for the lifetime of the domain, hence there's no risk of leak in this > specific case. I can see how preventing multiple mappings can be a > safeguard for possible issues similar to 379. > > Isn't it possible to map a grant at multiple gfns however? There can be multiple mapping for the grant frame. But there can be *only* one mapping for the grant *table* frame. > >>> >>> Even if the shared_info page has been unmapped, can you be sure the >>> bootloader has put a RAM page back in that gfn? >> >> We can't. But the same goes with the bootloader reserving the region... >> >>> >>> I understand this ABI change is done to avoid bugs, but at the cost of >>> diverging from x86, and breaking existing firmwares which might not be >>> buggy. >> As I pointed out above, the exact behavior of the hypercall is not fully >> documented and the behavior has changed with some XSAs. So this is no >> surprise if Arm and x86 behaved differently (even before that commit). >> >> There are plenty of behavior I considered wrong in the way x86 update the >> P2M and I would be concerned if we don't give any leeway for the >> architectures to tighten the update. BTW some checks have evolved over the >> time during security event (XSA-378 for example). > > ABI changes due to security issues are unavoidable. > >> This is not very different here. For Arm we decided to not follow a behavior >> that I consider incorrect and potentially more harmful than trying to >> support bootloader not removing the shared page. > > I think this is not very friendly to users, specially if edk2 wasn't > checked. This was forgotten because it is not yet common to use EDK2 on Xen on Arm (the proof is it took one year to find the obvious bug). I agree this is not user friendly, but it is impossible to check all the single projects. I will usually only look at the one that I know are used on Arm and/or someone remind me on the ML. > I understand the situation is different on Arm vs x86, so if > edk2 is not supported on arm I guess it doesn't matter much whether > it's broken. It would be a much worse issue on x86 where edk2 is > supported. AFAIK, we have CI for x86 on EDK2 but we don't on Arm. > >> If we want to handle such firmware, I think it would be better if we provide >> an hypercall that would return the GFN where it is currently mapped. > > Sure, but such hypercall would be racy, as by the time the gfn is > returned the value could be stale. Adding a replacing and non > replacing XENMEM_add_to_physmap variations would IMO be better. > > Anyway, I don't maintain this, so it's up to you. Bertrand/Stefano, any opinions? > >>> >>>> Also, it is not really argument, but this is not the only broken part in >>>> EDK2 for Xen Arm guests. The other one I know is EDS makes assumption how >>>> some Device-Tree nodes and this will break on newer Xen. >>>> >>>> So overall, it feels to me that EDK2 is not entirely ready to be used in >>>> production for Xen on Arm guests. >>> >>> I really have no insight on this. What are the supported way of booting >>> guests on Arm? (SUPPORT.md doesn't seem to list any firmware for Arm >>> guests AFAICT). >> >> Some bootloaders (e.g. U-boot/EDK2) have support to be used as a fimware for >> Xen on Arm guests. But they are not supported officially. >> >> Most of the setup seems to specify the kernel directly in the XL >> configuration. We probably ought to add support for EDK2/U-boot. > > I had no idea about that, I do think some kind of firmware is required > or else OSes different than Linux can't be supported unless they > implement the Linux entry point. Indeed. But -ENOTIME and so far no-one else shown any interest :). > > Is the entry point / CPU state for arm guests documented somewhere? Yes. The protocol is documented in https://docs.kernel.org/arch/arm64/booting.html. > > I wonder whether Elliot could use that for FreeBSD until the situation > with edk2 is stable. Unfortunately, the situation is unlikely to change if no one puts any effort to fix it and add testing. While I am on the reviewer list for EDK2, I don't have the bandwith to rewiew, let alone working on it. In fact, I have been looking for someone to replace me as the reviewer for EDK2 in the past few months. The role is still open if someone has any interest and more time to allocate. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 17:47 ` Julien Grall @ 2023-10-04 21:13 ` Elliott Mitchell 2023-10-04 22:21 ` Stefano Stabellini 2023-10-05 8:54 ` Julien Grall 2023-10-04 22:52 ` Stefano Stabellini 1 sibling, 2 replies; 24+ messages in thread From: Elliott Mitchell @ 2023-10-04 21:13 UTC (permalink / raw) To: Julien Grall Cc: Roger Pau Monné, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko Heavily trimming earlier messages. Also doing one response to cover several items. Hopefully I'm not missing something which needs a response. On Wed, Oct 04, 2023 at 03:39:16PM +0200, Roger Pau Monné wrote: > On Wed, Oct 04, 2023 at 02:03:43PM +0100, Julien Grall wrote: > > > > On 04/10/2023 13:53, Roger Pau Monné wrote: > > > > > > When using UEFI there's RAM that will always be in-use by the > > > firmware, as runtime services cannot be shut down, and hence the > > > firmware must already have a way to remove/reserve such region(s) on > > > the memory map. > > > > Can either you or Elliott confirm if EDK2 reserve the region? > > I will defer to Elliott to check for arm. I would be quite surprised > if it doesn't on x86, or else we would get a myriad of bug reports > about guests randomly crashing when using edk2. When I had originally looked I thought there was no problem as `OvmfPkg/XenPlatformPei/Xen.c`: CalibrateLapicTimer() MapSharedInfoPage(SharedInfo) ... UnmapXenPage(SharedInfo) Later using `find * -type f -print0 | xargs -0 grep -eXENMAPSPACE_shared_info` `OvmfPkg/XenBusDxe/XenBusDxe.c`: XenGetSharedInfoPage() // using reserved page because the page is not released when Linux is // starting because of the add_to_physmap. QEMU might try to access the // page, and fail because it have no right to do so (segv). Looks like this second case leaks the shared information page. Originally I thought there was no problem as I'd only found the first instance. Appears this second instance is the problem. > > Also, it is not really argument, but this is not the only broken part in > > EDK2 for Xen Arm guests. The other one I know is EDS makes assumption how > > some Device-Tree nodes and this will break on newer Xen. > > > > So overall, it feels to me that EDK2 is not entirely ready to be used in > > production for Xen on Arm guests. > > I really have no insight on this. What are the supported way of booting > guests on Arm? (SUPPORT.md doesn't seem to list any firmware for Arm > guests AFAICT). I don't know about whether their support status, but I'm aware of two viable ways to boot domains on ARM. First, PyGRUB does work. This is a rather poor way to boot, but I do admit it is functional for Linux. Second, Tianocore/EDK2. This is very functional, but seems the above broke recently. I hope PvGRUB for ARM becomes available soon, but right now it isn't available. On Wed, Oct 04, 2023 at 06:47:41PM +0100, Julien Grall wrote: > > On 04/10/2023 15:53, Roger Pau Monné wrote: > > On Wed, Oct 04, 2023 at 03:06:14PM +0100, Julien Grall wrote: > >> > >> This is not very different here. For Arm we decided to not follow a behavior > >> that I consider incorrect and potentially more harmful than trying to > >> support bootloader not removing the shared page. > > > > I think this is not very friendly to users, specially if edk2 wasn't > > checked. > > This was forgotten because it is not yet common to use EDK2 on Xen on > Arm (the proof is it took one year to find the obvious bug). I agree > this is not user friendly, but it is impossible to check all the single > projects. I will usually only look at the one that I know are used on > Arm and/or someone remind me on the ML. By traditional standards, 1 year is quite fast. Figure 3-6 months for distributions/vendors to pick up the latest, then another 3-6 months while people are experimenting. This may point to Xen/ARM not being heavily deployed. Could also be most uses are doing direct kernel boot, or using PyGRUB. Since it is the only bootloader for non-Linux and only bootloader which doesn't requite execution in domain 0, Tianocore/EDK2 seems worth monitoring. > > I understand the situation is different on Arm vs x86, so if > > edk2 is not supported on arm I guess it doesn't matter much whether > > it's broken. It would be a much worse issue on x86 where edk2 is > > supported. > > AFAIK, we have CI for x86 on EDK2 but we don't on Arm. What is the current status of this? I'm unsure whether it was an extra patch done by Debian, but "edk2-stable202211"/fff6d81270 doesn't work with Xen/Qemu. I've also never observed 'kernel = "OVMF.fd"' generating any activity with a PVH domain on x86. Good news is Xen/x86 now accepts "OVMF.fd" as a valid kernel, so progress has been made. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 21:13 ` Elliott Mitchell @ 2023-10-04 22:21 ` Stefano Stabellini 2023-10-04 22:37 ` Elliott Mitchell 2023-10-05 8:54 ` Julien Grall 1 sibling, 1 reply; 24+ messages in thread From: Stefano Stabellini @ 2023-10-04 22:21 UTC (permalink / raw) To: Elliott Mitchell Cc: Julien Grall, Roger Pau Monné, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko [-- Attachment #1: Type: text/plain, Size: 1678 bytes --] On Wed, 4 Oct 2023, Elliott Mitchell wrote: > On Wed, Oct 04, 2023 at 03:39:16PM +0200, Roger Pau Monné wrote: > > On Wed, Oct 04, 2023 at 02:03:43PM +0100, Julien Grall wrote: > > > > > > On 04/10/2023 13:53, Roger Pau Monné wrote: > > > > > > > > When using UEFI there's RAM that will always be in-use by the > > > > firmware, as runtime services cannot be shut down, and hence the > > > > firmware must already have a way to remove/reserve such region(s) on > > > > the memory map. > > > > > > Can either you or Elliott confirm if EDK2 reserve the region? > > > > I will defer to Elliott to check for arm. I would be quite surprised > > if it doesn't on x86, or else we would get a myriad of bug reports > > about guests randomly crashing when using edk2. > > When I had originally looked I thought there was no problem as > `OvmfPkg/XenPlatformPei/Xen.c`: > CalibrateLapicTimer() > MapSharedInfoPage(SharedInfo) > ... > UnmapXenPage(SharedInfo) > > Later using `find * -type f -print0 | xargs -0 grep -eXENMAPSPACE_shared_info` > `OvmfPkg/XenBusDxe/XenBusDxe.c`: > XenGetSharedInfoPage() > // using reserved page because the page is not released when Linux is > // starting because of the add_to_physmap. QEMU might try to access the > // page, and fail because it have no right to do so (segv). > > Looks like this second case leaks the shared information page. > Originally I thought there was no problem as I'd only found the first > instance. Appears this second instance is the problem. I understand this second case is *not* unmapping the SharedInfo page, but is it reserving it somehow? For instance marking it as reserved in the EFI memory map? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 22:21 ` Stefano Stabellini @ 2023-10-04 22:37 ` Elliott Mitchell 2023-10-04 22:42 ` Stefano Stabellini 0 siblings, 1 reply; 24+ messages in thread From: Elliott Mitchell @ 2023-10-04 22:37 UTC (permalink / raw) To: Stefano Stabellini Cc: Julien Grall, Roger Pau Monné, xen-devel, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On Wed, Oct 04, 2023 at 03:21:04PM -0700, Stefano Stabellini wrote: > On Wed, 4 Oct 2023, Elliott Mitchell wrote: > > On Wed, Oct 04, 2023 at 03:39:16PM +0200, Roger Pau Monné wrote: > > > On Wed, Oct 04, 2023 at 02:03:43PM +0100, Julien Grall wrote: > > > > > > > > On 04/10/2023 13:53, Roger Pau Monné wrote: > > > > > > > > > > When using UEFI there's RAM that will always be in-use by the > > > > > firmware, as runtime services cannot be shut down, and hence the > > > > > firmware must already have a way to remove/reserve such region(s) on > > > > > the memory map. > > > > > > > > Can either you or Elliott confirm if EDK2 reserve the region? > > > > > > I will defer to Elliott to check for arm. I would be quite surprised > > > if it doesn't on x86, or else we would get a myriad of bug reports > > > about guests randomly crashing when using edk2. > > > > When I had originally looked I thought there was no problem as > > `OvmfPkg/XenPlatformPei/Xen.c`: > > CalibrateLapicTimer() > > MapSharedInfoPage(SharedInfo) > > ... > > UnmapXenPage(SharedInfo) > > > > Later using `find * -type f -print0 | xargs -0 grep -eXENMAPSPACE_shared_info` > > `OvmfPkg/XenBusDxe/XenBusDxe.c`: > > XenGetSharedInfoPage() > > // using reserved page because the page is not released when Linux is > > // starting because of the add_to_physmap. QEMU might try to access the > > // page, and fail because it have no right to do so (segv). > > > > Looks like this second case leaks the shared information page. > > Originally I thought there was no problem as I'd only found the first > > instance. Appears this second instance is the problem. > > I understand this second case is *not* unmapping the SharedInfo page, > but is it reserving it somehow? For instance marking it as reserved in > the EFI memory map? Notice the "//" comment which I carefully grabbed? // using reserved page because the page is not released when Linux is // starting because of the add_to_physmap. QEMU might try to access the // page, and fail because it have no right to do so (segv). So the page shouldn't be touched by anyone, but it does end up wasted. Likely ExitBootServices() should clear the mapping. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 22:37 ` Elliott Mitchell @ 2023-10-04 22:42 ` Stefano Stabellini 0 siblings, 0 replies; 24+ messages in thread From: Stefano Stabellini @ 2023-10-04 22:42 UTC (permalink / raw) To: Elliott Mitchell Cc: Stefano Stabellini, Julien Grall, Roger Pau Monné, xen-devel, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko [-- Attachment #1: Type: text/plain, Size: 2530 bytes --] On Wed, 4 Oct 2023, Elliott Mitchell wrote: > On Wed, Oct 04, 2023 at 03:21:04PM -0700, Stefano Stabellini wrote: > > On Wed, 4 Oct 2023, Elliott Mitchell wrote: > > > On Wed, Oct 04, 2023 at 03:39:16PM +0200, Roger Pau Monné wrote: > > > > On Wed, Oct 04, 2023 at 02:03:43PM +0100, Julien Grall wrote: > > > > > > > > > > On 04/10/2023 13:53, Roger Pau Monné wrote: > > > > > > > > > > > > When using UEFI there's RAM that will always be in-use by the > > > > > > firmware, as runtime services cannot be shut down, and hence the > > > > > > firmware must already have a way to remove/reserve such region(s) on > > > > > > the memory map. > > > > > > > > > > Can either you or Elliott confirm if EDK2 reserve the region? > > > > > > > > I will defer to Elliott to check for arm. I would be quite surprised > > > > if it doesn't on x86, or else we would get a myriad of bug reports > > > > about guests randomly crashing when using edk2. > > > > > > When I had originally looked I thought there was no problem as > > > `OvmfPkg/XenPlatformPei/Xen.c`: > > > CalibrateLapicTimer() > > > MapSharedInfoPage(SharedInfo) > > > ... > > > UnmapXenPage(SharedInfo) > > > > > > Later using `find * -type f -print0 | xargs -0 grep -eXENMAPSPACE_shared_info` > > > `OvmfPkg/XenBusDxe/XenBusDxe.c`: > > > XenGetSharedInfoPage() > > > // using reserved page because the page is not released when Linux is > > > // starting because of the add_to_physmap. QEMU might try to access the > > > // page, and fail because it have no right to do so (segv). > > > > > > Looks like this second case leaks the shared information page. > > > Originally I thought there was no problem as I'd only found the first > > > instance. Appears this second instance is the problem. > > > > I understand this second case is *not* unmapping the SharedInfo page, > > but is it reserving it somehow? For instance marking it as reserved in > > the EFI memory map? > > Notice the "//" comment which I carefully grabbed? > > // using reserved page because the page is not released when Linux is > // starting because of the add_to_physmap. QEMU might try to access the > // page, and fail because it have no right to do so (segv). > > So the page shouldn't be touched by anyone, but it does end up wasted. > Likely ExitBootServices() should clear the mapping. Sorry to be pedantic but I am really not familiar with EDK2. Does "reserved page" in this context mean a memory page from a reserved region marked as reserved in the EFI memory map? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 21:13 ` Elliott Mitchell 2023-10-04 22:21 ` Stefano Stabellini @ 2023-10-05 8:54 ` Julien Grall 2023-10-05 15:56 ` Elliott Mitchell 1 sibling, 1 reply; 24+ messages in thread From: Julien Grall @ 2023-10-05 8:54 UTC (permalink / raw) To: Elliott Mitchell Cc: Roger Pau Monné, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On 04/10/2023 22:13, Elliott Mitchell wrote: >>> I understand the situation is different on Arm vs x86, so if >>> edk2 is not supported on arm I guess it doesn't matter much whether >>> it's broken. It would be a much worse issue on x86 where edk2 is >>> supported. >> >> AFAIK, we have CI for x86 on EDK2 but we don't on Arm. > > What is the current status of this? I'm unsure whether it was an extra > patch done by Debian, but "edk2-stable202211"/fff6d81270 doesn't work > with Xen/Qemu. I don't know what's the status for x86. But for Arm, there are nothing. And as I pointed out in my previous answer this is unlikely to change until someone invest time in EDK2 on Xen on Arm. If there are patches sent on the ML, then I am happy to attempt to review them. But I am afraid, I am not going to have time to try to find and fix all the issues when using EDK2 in Arm guests. Cheers, -- Julien Grall ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-05 8:54 ` Julien Grall @ 2023-10-05 15:56 ` Elliott Mitchell 0 siblings, 0 replies; 24+ messages in thread From: Elliott Mitchell @ 2023-10-05 15:56 UTC (permalink / raw) To: Julien Grall Cc: Roger Pau Monné, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko [-- Attachment #1: Type: text/plain, Size: 2318 bytes --] On Thu, Oct 05, 2023 at 09:54:26AM +0100, Julien Grall wrote: > On 04/10/2023 22:13, Elliott Mitchell wrote: > >>> I understand the situation is different on Arm vs x86, so if > >>> edk2 is not supported on arm I guess it doesn't matter much whether > >>> it's broken. It would be a much worse issue on x86 where edk2 is > >>> supported. > >> > >> AFAIK, we have CI for x86 on EDK2 but we don't on Arm. > > > > What is the current status of this? I'm unsure whether it was an extra > > patch done by Debian, but "edk2-stable202211"/fff6d81270 doesn't work > > with Xen/Qemu. > > I don't know what's the status for x86. But for Arm, there are nothing. > And as I pointed out in my previous answer this is unlikely to change > until someone invest time in EDK2 on Xen on Arm. Indications are the measures for x86 aren't very good either. Presently the Debian distribution is using builds based on tag edk2-stable202211, commit fff6d81270. Could be the Debian package process got broken. Could also be that tag, builds are completing yet the output fails to function. I've also been trying to get Tianocore/EDK2 to function as a bootloader for PVH. Current OVMF builds are accepted for the 'kernel = "OVMF.fd"' line, but simply hang. I'm wondering whether it is assuming the presence of a framebuffer and doesn't use the x86 Xen console. > If there are patches sent on the ML, then I am happy to attempt to > review them. But I am afraid, I am not going to have time to try to find > and fix all the issues when using EDK2 in Arm guests. The attached patch got booting via Tianocore/EDK2 working. This likely needs adjustment to match their style; problem is their style is so awful I was looking for an airsickness bag. I really don't want to do further polishing. I'm unsure whether XENMEM_remove_from_physmap uncovers a memory page which had been previously mapped, versus turning it into a hole. In the former case, the allocation should be moved to a normal heap page since the OS can reuse it. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 [-- Attachment #2: 0001-OvmfPkg-XenBusDxe-Unmap-shared-information-page-on-e.patch --] [-- Type: text/x-diff, Size: 1746 bytes --] From 19b0e88b5e0d6954a924f444b02bcb576a115952 Mon Sep 17 00:00:00 2001 Message-Id: <19b0e88b5e0d6954a924f444b02bcb576a115952.1696517822.git.ehem+xen@m5p.com> From: Elliott Mitchell <ehem+xen@m5p.com> Date: Wed, 4 Oct 2023 21:47:17 -0700 Subject: [PATCH] OvmfPkg/XenBusDxe: Unmap shared information page on exit Xen/ARM now requires the shared information page to only be mapped once. This behavior is expected to be copied to Xen/x86 at some future point. Signed-off-by: Elliott Mitchell <ehem+xen@m5p.com> --- OvmfPkg/XenBusDxe/XenBusDxe.c | 29 +++++++++++++++++++++++++++++ 1 file changed, 29 insertions(+) diff --git a/OvmfPkg/XenBusDxe/XenBusDxe.c b/OvmfPkg/XenBusDxe/XenBusDxe.c index 132f43a72b..117a299d10 100644 --- a/OvmfPkg/XenBusDxe/XenBusDxe.c +++ b/OvmfPkg/XenBusDxe/XenBusDxe.c @@ -84,6 +84,33 @@ XenGetSharedInfoPage ( return EFI_SUCCESS; } +/** + Unmap the shared_info_t page from memory. + + @param Dev A XENBUS_DEVICE instance. +**/ +VOID +XenClearSharedInfoPage ( + IN OUT XENBUS_DEVICE *Dev + ) +{ + xen_remove_from_physmap_t Parameter; + + // + // Either never mapped, or else already cleared. No further cleanup + // action required. + // + if (!Dev->SharedInfo) return; + + Parameter.domid = DOMID_SELF; + Parameter.gpfn = (UINTN)Dev->SharedInfo >> EFI_PAGE_SHIFT; + if (XenHypercallMemoryOp (XENMEM_remove_from_physmap, &Parameter) != 0) + return; + + FreePages (Dev->SharedInfo, 1); + Dev->SharedInfo = NULL; +} + /** Unloads an image. @@ -501,6 +528,8 @@ XenBusDxeDriverBindingStop ( XenStoreDeinit (Dev); XenGrantTableDeinit (Dev); + XenClearSharedInfoPage(Dev); + gBS->CloseProtocol ( ControllerHandle, &gEfiDevicePathProtocolGuid, -- 2.39.2 ^ permalink raw reply related [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 17:47 ` Julien Grall 2023-10-04 21:13 ` Elliott Mitchell @ 2023-10-04 22:52 ` Stefano Stabellini 2023-10-05 0:20 ` Elliott Mitchell 2023-10-05 9:24 ` Roger Pau Monné 1 sibling, 2 replies; 24+ messages in thread From: Stefano Stabellini @ 2023-10-04 22:52 UTC (permalink / raw) To: Julien Grall Cc: Roger Pau Monné, Elliott Mitchell, xen-devel, Stefano Stabellini, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On Wed, 4 Oct 2023, Julien Grall wrote: > > > This is not very different here. For Arm we decided to not follow a > > > behavior > > > that I consider incorrect and potentially more harmful than trying to > > > support bootloader not removing the shared page. > > > > I think this is not very friendly to users, specially if edk2 wasn't > > checked. > > This was forgotten because it is not yet common to use EDK2 on Xen on Arm (the > proof is it took one year to find the obvious bug). I agree this is not user > friendly, but it is impossible to check all the single projects. I will > usually only look at the one that I know are used on Arm and/or someone remind > me on the ML. At AMD/Xilinx we don't have EDK2 on any ARM board. Unless I go out of my way to test it on purpose I wouldn't see it. > > I understand the situation is different on Arm vs x86, so if > > edk2 is not supported on arm I guess it doesn't matter much whether > > it's broken. It would be a much worse issue on x86 where edk2 is > > supported. > > AFAIK, we have CI for x86 on EDK2 but we don't on Arm. I think we should have a gitlab-ci job testing EDK2 on ARM using QEMU and I would certainly welcome it if someone contributed it. > > > If we want to handle such firmware, I think it would be better if we > > > provide > > > an hypercall that would return the GFN where it is currently mapped. > > > > Sure, but such hypercall would be racy, as by the time the gfn is > > returned the value could be stale. Adding a replacing and non > > replacing XENMEM_add_to_physmap variations would IMO be better. > > > > Anyway, I don't maintain this, so it's up to you. > > Bertrand/Stefano, any opinions? I think we should fix EDK2 to unmap the shared info in all cases as that's simpler and the best implementation. What's the value of keeping the mapping around when the OS can't find it? Unless you have an idea on how the OS could find the location of the existing EDK2 shared info mapping. It is important not to have 2 different behaviors for the same hypercall on ARM and x86, especially when the hypercall looks arch-neutral and an operating system would reasonably expect to use it in common code. Having different behaviors on ARM/x86 is more error prone than having a less-than-ideal hypercall implementation. I agree with Julien that the ARM behavior is the right behavior. Can we change the x86 implementation to match ARM somehow? If we do, I guess we would end up breaking legacy EDK2? Is really the only choice to change the ARM implementation to match the x86 implementation? ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 22:52 ` Stefano Stabellini @ 2023-10-05 0:20 ` Elliott Mitchell 2023-10-05 9:24 ` Roger Pau Monné 1 sibling, 0 replies; 24+ messages in thread From: Elliott Mitchell @ 2023-10-05 0:20 UTC (permalink / raw) To: Stefano Stabellini Cc: Julien Grall, Roger Pau Monné, xen-devel, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On Wed, Oct 04, 2023 at 03:52:54PM -0700, Stefano Stabellini wrote: > On Wed, 4 Oct 2023, Julien Grall wrote: > > > > If we want to handle such firmware, I think it would be better if we > > > > provide > > > > an hypercall that would return the GFN where it is currently mapped. > > > > > > Sure, but such hypercall would be racy, as by the time the gfn is > > > returned the value could be stale. Adding a replacing and non > > > replacing XENMEM_add_to_physmap variations would IMO be better. > > > > > > Anyway, I don't maintain this, so it's up to you. > > > > Bertrand/Stefano, any opinions? > > I think we should fix EDK2 to unmap the shared info in all cases as > that's simpler and the best implementation. What's the value of keeping > the mapping around when the OS can't find it? Unless you have an idea on > how the OS could find the location of the existing EDK2 shared info > mapping. I tend to agree. > It is important not to have 2 different behaviors for the same hypercall > on ARM and x86, especially when the hypercall looks arch-neutral and an > operating system would reasonably expect to use it in common code. > Having different behaviors on ARM/x86 is more error prone than having a > less-than-ideal hypercall implementation. I attempted to head this direction with FreeBSD, but there were enough surrounding differences to make this troublesome to implement. May be easier on other OSes with less history though. I do agree on general principle fewer/smaller differences are better. Yet I note my earlier patch to have `typedef struct trap_info trap_info_t` consistently visible didn't go through... > I agree with Julien that the ARM behavior is the right behavior. Can we > change the x86 implementation to match ARM somehow? > > If we do, I guess we would end up breaking legacy EDK2? I agree this ARM behavior does seem appropriate. Due to longer history though it will need far more transition time. If this 4.18 tarballs aren't out yet, I would try to see whether a warning message could be implemented *now*. I estimate the old behavior will need support for 5-10 years. On Wed, Oct 04, 2023 at 03:42:00PM -0700, Stefano Stabellini wrote: > Sorry to be pedantic but I am really not familiar with EDK2. Does > "reserved page" in this context mean a memory page from a reserved > region marked as reserved in the EFI memory map? I'm not too familiar either, so a fair amount of what I write is speculation or guesses. I'm assuming this refers to "ACPI reserved page", meaning marked as not for use by the OS. -- (\___(\___(\______ --=> 8-) EHM <=-- ______/)___/)___/) \BS ( | ehem+sigmsg@m5p.com PGP 87145445 | ) / \_CS\ | _____ -O #include <stddisclaimer.h> O- _____ | / _/ 8A19\___\_|_/58D2 7E3D DDF4 7BA6 <-PGP-> 41D1 B375 37D0 8714\_|_/___/5445 ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-04 22:52 ` Stefano Stabellini 2023-10-05 0:20 ` Elliott Mitchell @ 2023-10-05 9:24 ` Roger Pau Monné 2023-10-05 20:52 ` Stefano Stabellini 1 sibling, 1 reply; 24+ messages in thread From: Roger Pau Monné @ 2023-10-05 9:24 UTC (permalink / raw) To: Stefano Stabellini Cc: Julien Grall, Elliott Mitchell, xen-devel, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko On Wed, Oct 04, 2023 at 03:52:54PM -0700, Stefano Stabellini wrote: > On Wed, 4 Oct 2023, Julien Grall wrote: > > > > If we want to handle such firmware, I think it would be better if we > > > > provide > > > > an hypercall that would return the GFN where it is currently mapped. > > > > > > Sure, but such hypercall would be racy, as by the time the gfn is > > > returned the value could be stale. Adding a replacing and non > > > replacing XENMEM_add_to_physmap variations would IMO be better. > > > > > > Anyway, I don't maintain this, so it's up to you. > > > > Bertrand/Stefano, any opinions? > > I think we should fix EDK2 to unmap the shared info in all cases as > that's simpler and the best implementation. What's the value of keeping > the mapping around when the OS can't find it? Unless you have an idea on > how the OS could find the location of the existing EDK2 shared info > mapping. Indeed, edk2 should unmap the page, and we should fix that. > > It is important not to have 2 different behaviors for the same hypercall > on ARM and x86, especially when the hypercall looks arch-neutral and an > operating system would reasonably expect to use it in common code. > Having different behaviors on ARM/x86 is more error prone than having a > less-than-ideal hypercall implementation. > > I agree with Julien that the ARM behavior is the right behavior. Can we > change the x86 implementation to match ARM somehow? I'm afraid I don't see how. edk2 is supported on x86, and hence we cannot simply make a change to the hypervisor that would render all current versions of edk2 unusable. > If we do, I guess we would end up breaking legacy EDK2? Breaking plain edk2, as there's no version of edk2 that currently does the unmapping? > Is really the only choice to change the ARM implementation to match the > x86 implementation? Unless we want x86 and Arm to diverge in behavior. I do think the arm behavior is more sane, but I don't think we can make that change on x86 and simply render all existing versions of edk2 unusable. Thanks, Roger. ^ permalink raw reply [flat|nested] 24+ messages in thread
* Re: Issue with shared information page on Xen/ARM 4.17 2023-10-05 9:24 ` Roger Pau Monné @ 2023-10-05 20:52 ` Stefano Stabellini 0 siblings, 0 replies; 24+ messages in thread From: Stefano Stabellini @ 2023-10-05 20:52 UTC (permalink / raw) To: Roger Pau Monné Cc: Stefano Stabellini, Julien Grall, Elliott Mitchell, xen-devel, Bertrand Marquis, Volodymyr Babchuk, Oleksandr Tyshchenko [-- Attachment #1: Type: text/plain, Size: 2721 bytes --] On Thu, 5 Oct 2023, Roger Pau Monné wrote: > On Wed, Oct 04, 2023 at 03:52:54PM -0700, Stefano Stabellini wrote: > > On Wed, 4 Oct 2023, Julien Grall wrote: > > > > > If we want to handle such firmware, I think it would be better if we > > > > > provide > > > > > an hypercall that would return the GFN where it is currently mapped. > > > > > > > > Sure, but such hypercall would be racy, as by the time the gfn is > > > > returned the value could be stale. Adding a replacing and non > > > > replacing XENMEM_add_to_physmap variations would IMO be better. > > > > > > > > Anyway, I don't maintain this, so it's up to you. > > > > > > Bertrand/Stefano, any opinions? > > > > I think we should fix EDK2 to unmap the shared info in all cases as > > that's simpler and the best implementation. What's the value of keeping > > the mapping around when the OS can't find it? Unless you have an idea on > > how the OS could find the location of the existing EDK2 shared info > > mapping. > > Indeed, edk2 should unmap the page, and we should fix that. > > > > > It is important not to have 2 different behaviors for the same hypercall > > on ARM and x86, especially when the hypercall looks arch-neutral and an > > operating system would reasonably expect to use it in common code. > > Having different behaviors on ARM/x86 is more error prone than having a > > less-than-ideal hypercall implementation. > > > > I agree with Julien that the ARM behavior is the right behavior. Can we > > change the x86 implementation to match ARM somehow? > > I'm afraid I don't see how. edk2 is supported on x86, and hence we > cannot simply make a change to the hypervisor that would render all > current versions of edk2 unusable. > > > If we do, I guess we would end up breaking legacy EDK2? > > Breaking plain edk2, as there's no version of edk2 that currently does > the unmapping? > > > Is really the only choice to change the ARM implementation to match the > > x86 implementation? > > Unless we want x86 and Arm to diverge in behavior. > > I do think the arm behavior is more sane, but I don't think we can > make that change on x86 and simply render all existing versions of > edk2 unusable. Right, but maybe we can come up with a deprecation period? Especially considering that this is for domU firmware (not host firmware), and domU firmware is often (not always, but often) provided by Xen (Xen in the sense of xen.git and Xen packages). First we fix EDK2 upstream and we update the EDK2 build in xen.git. Then we give it a couple of releases, then we change the x86 hypercall behavior? In the meantime we could print a warning in Xen DEBUG builds when the hypercall is called and there is one existing mapping? ^ permalink raw reply [flat|nested] 24+ messages in thread
end of thread, other threads:[~2023-10-05 20:52 UTC | newest] Thread overview: 24+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-29 2:49 Issue with shared information page on Xen/ARM 4.17 Elliott Mitchell 2023-10-03 8:26 ` Roger Pau Monné 2023-10-03 19:18 ` Elliott Mitchell 2023-10-04 8:13 ` Roger Pau Monné 2023-10-04 10:55 ` Julien Grall 2023-10-04 11:42 ` Oleksandr Tyshchenko 2023-10-04 12:59 ` Roger Pau Monné 2023-10-04 17:18 ` Oleksandr Tyshchenko 2023-10-04 12:53 ` Roger Pau Monné 2023-10-04 13:03 ` Julien Grall 2023-10-04 13:39 ` Roger Pau Monné 2023-10-04 14:06 ` Julien Grall 2023-10-04 14:53 ` Roger Pau Monné 2023-10-04 17:47 ` Julien Grall 2023-10-04 21:13 ` Elliott Mitchell 2023-10-04 22:21 ` Stefano Stabellini 2023-10-04 22:37 ` Elliott Mitchell 2023-10-04 22:42 ` Stefano Stabellini 2023-10-05 8:54 ` Julien Grall 2023-10-05 15:56 ` Elliott Mitchell 2023-10-04 22:52 ` Stefano Stabellini 2023-10-05 0:20 ` Elliott Mitchell 2023-10-05 9:24 ` Roger Pau Monné 2023-10-05 20:52 ` Stefano Stabellini
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.