* Live migration leaves page tables read-only? @ 2006-11-29 0:13 John Byrne 2006-11-29 0:22 ` John Byrne 0 siblings, 1 reply; 20+ messages in thread From: John Byrne @ 2006-11-29 0:13 UTC (permalink / raw) To: xen-devel I have been trying to debug a problem live-migrating SAP on Xen-3.0.3 x86-64 (I also tested Xen-unstable changeset 12548) without success. SAP seems to run fine on a given host; live-migrating it to another host causes the guest to almost immediately panic in the mprotect() call in the change_pte_range() routine in the set_pte_at() macro because the page table page it is trying to update is write-protected. My attempts at understanding where this is coming from have come to naught. Any help in running this down would be appreciated. I am perfectly willing/able to write some debugging code if I am given a few clues what to look for. Thanks, John Byrne ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-11-29 0:13 Live migration leaves page tables read-only? John Byrne @ 2006-11-29 0:22 ` John Byrne 2006-11-29 1:36 ` Ian Pratt 0 siblings, 1 reply; 20+ messages in thread From: John Byrne @ 2006-11-29 0:22 UTC (permalink / raw) To: xen-devel I forgot to mention that a very simple test case I wrote using shared memory and the mprotect call didn't fail. So, the only test case I have at the moment is to run SAP. John Byrne John Byrne wrote: > > I have been trying to debug a problem live-migrating SAP on Xen-3.0.3 > x86-64 (I also tested Xen-unstable changeset 12548) without success. > > SAP seems to run fine on a given host; live-migrating it to another host > causes the guest to almost immediately panic in the mprotect() call in > the change_pte_range() routine in the set_pte_at() macro because the > page table page it is trying to update is write-protected. > > My attempts at understanding where this is coming from have come to > naught. Any help in running this down would be appreciated. I am > perfectly willing/able to write some debugging code if I am given a few > clues what to look for. > > Thanks, > > John Byrne > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel > ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Live migration leaves page tables read-only? 2006-11-29 0:22 ` John Byrne @ 2006-11-29 1:36 ` Ian Pratt 2006-11-29 2:52 ` John Byrne 0 siblings, 1 reply; 20+ messages in thread From: Ian Pratt @ 2006-11-29 1:36 UTC (permalink / raw) To: John Byrne, xen-devel > I forgot to mention that a very simple test case I wrote using shared > memory and the mprotect call didn't fail. So, the only test case I have > at the moment is to run SAP. What happens if you use non-live relo? Also, can you repro on 32b? Thanks, Ian > John Byrne > > John Byrne wrote: > > > > I have been trying to debug a problem live-migrating SAP on Xen-3.0.3 > > x86-64 (I also tested Xen-unstable changeset 12548) without success. > > > > SAP seems to run fine on a given host; live-migrating it to another host > > causes the guest to almost immediately panic in the mprotect() call in > > the change_pte_range() routine in the set_pte_at() macro because the > > page table page it is trying to update is write-protected. > > > > My attempts at understanding where this is coming from have come to > > naught. Any help in running this down would be appreciated. I am > > perfectly willing/able to write some debugging code if I am given a few > > clues what to look for. > > > > Thanks, > > > > John Byrne > > > > > > > > > > _______________________________________________ > > Xen-devel mailing list > > Xen-devel@lists.xensource.com > > http://lists.xensource.com/xen-devel > > > > > _______________________________________________ > Xen-devel mailing list > Xen-devel@lists.xensource.com > http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-11-29 1:36 ` Ian Pratt @ 2006-11-29 2:52 ` John Byrne 2006-11-29 7:42 ` Keir Fraser 2006-11-30 23:36 ` John Byrne 0 siblings, 2 replies; 20+ messages in thread From: John Byrne @ 2006-11-29 2:52 UTC (permalink / raw) To: Ian Pratt; +Cc: xen-devel Ian Pratt wrote: >> I forgot to mention that a very simple test case I wrote using shared >> memory and the mprotect call didn't fail. So, the only test case I > have >> at the moment is to run SAP. > > What happens if you use non-live relo? I thought I had tested that way back at the beginning without seeing the problem, but I must not have, because I just retested it to be sure and it died the same way. (Now I am truly confused and I need to go back and re-examine some of my earlier experiments.) In the meantime, any ideas where to look? > > Also, can you repro on 32b? I am doing this on behalf of someone else, so I'd have to ask them to do the setup if they have the time. I am reluctant to do so at this point. Thanks, John > > Thanks, > Ian > > >> John Byrne >> >> John Byrne wrote: >>> I have been trying to debug a problem live-migrating SAP on > Xen-3.0.3 >>> x86-64 (I also tested Xen-unstable changeset 12548) without success. >>> >>> SAP seems to run fine on a given host; live-migrating it to another > host >>> causes the guest to almost immediately panic in the mprotect() call > in >>> the change_pte_range() routine in the set_pte_at() macro because the >>> page table page it is trying to update is write-protected. >>> >>> My attempts at understanding where this is coming from have come to >>> naught. Any help in running this down would be appreciated. I am >>> perfectly willing/able to write some debugging code if I am given a > few >>> clues what to look for. >>> >>> Thanks, >>> >>> John Byrne >>> >>> >>> >>> >>> _______________________________________________ >>> Xen-devel mailing list >>> Xen-devel@lists.xensource.com >>> http://lists.xensource.com/xen-devel >>> >> >> _______________________________________________ >> Xen-devel mailing list >> Xen-devel@lists.xensource.com >> http://lists.xensource.com/xen-devel > ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-11-29 2:52 ` John Byrne @ 2006-11-29 7:42 ` Keir Fraser 2006-11-29 16:49 ` John Byrne 2006-11-30 23:36 ` John Byrne 1 sibling, 1 reply; 20+ messages in thread From: Keir Fraser @ 2006-11-29 7:42 UTC (permalink / raw) To: John Byrne, Ian Pratt; +Cc: xen-devel On 29/11/06 2:52 am, "John Byrne" <john.l.byrne@hp.com> wrote: >> What happens if you use non-live relo? > > I thought I had tested that way back at the beginning without seeing the > problem, but I must not have, because I just retested it to be sure and > it died the same way. (Now I am truly confused and I need to go back and > re-examine some of my earlier experiments.) > > In the meantime, any ideas where to look? This will be very dependent on the guest that is being migrated. What Linux kernel is the domU running? -- Keir ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-11-29 7:42 ` Keir Fraser @ 2006-11-29 16:49 ` John Byrne 0 siblings, 0 replies; 20+ messages in thread From: John Byrne @ 2006-11-29 16:49 UTC (permalink / raw) To: Keir Fraser; +Cc: Ian Pratt, xen-devel Keir Fraser wrote: > > > On 29/11/06 2:52 am, "John Byrne" <john.l.byrne@hp.com> wrote: > >>> What happens if you use non-live relo? >> I thought I had tested that way back at the beginning without seeing the >> problem, but I must not have, because I just retested it to be sure and >> it died the same way. (Now I am truly confused and I need to go back and >> re-examine some of my earlier experiments.) >> >> In the meantime, any ideas where to look? > > This will be very dependent on the guest that is being migrated. What Linux > kernel is the domU running? > > -- Keir > > > Linux 2.6.16.29 (+ the SLES 10 iscsi patches) with approximately the config used by the SLES 10 Xen kernel. I can send the config, if you need it. John Byrne ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-11-29 2:52 ` John Byrne 2006-11-29 7:42 ` Keir Fraser @ 2006-11-30 23:36 ` John Byrne 2006-12-01 1:13 ` Ian Pratt 1 sibling, 1 reply; 20+ messages in thread From: John Byrne @ 2006-11-30 23:36 UTC (permalink / raw) To: ian.pratt; +Cc: Ian Pratt, xen-devel John Byrne wrote: > Ian Pratt wrote: >>> I forgot to mention that a very simple test case I wrote using shared >>> memory and the mprotect call didn't fail. So, the only test case I >> have >>> at the moment is to run SAP. >> >> What happens if you use non-live relo? > > I thought I had tested that way back at the beginning without seeing the > problem, but I must not have, because I just retested it to be sure and > it died the same way. (Now I am truly confused and I need to go back and > re-examine some of my earlier experiments.) > After redoing some of my tests and understanding more about how Xen handles page tables, I started looking at ptwr_do_page_fault() and put debugging code into it. (On Xen 3.0.3 x86-64.) The fixup is failing in x86_emulate_memop(). Building a debug version of Xen provided some additional information (the final line is from my debugging, after the ":" is domid, addr, pte, pte flags, type_info, page owner, domain): (XEN) DOM1: (file=mm.c, line=1682) Bad type (saw 0000000028000001 != exp 00000000e0000000) for mfn c8de3 (pfn 12491) (XEN) DOM1: (file=mm.c, line=606) Error getting mfn c8de3 (pfn 12491) from L1 entry 00000000c8de3167 for dom1 (XEN) DOM1: (file=mm.c, line=1682) Bad type (saw 0000000028000001 != exp 00000000e0000000) for mfn c8de3 (pfn 12491) (XEN) DOM1: (file=mm.c, line=606) Error getting mfn c8de3 (pfn 12491) from L1 entry 00000000c8de3067 for dom1 (XEN) DOM1: (file=mm.c, line=3120) ptwr_emulate: could not get_page_from_l1e() (XEN) ptwr_do_page_fault,3253:1 ffff880011065bc0 80100000ca20f065 801065 28000001 ffff830000fe7080 ffff830000fe7080 I'll keep following this down, but any help would be appreciated. Thanks, John Byrne ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Live migration leaves page tables read-only? 2006-11-30 23:36 ` John Byrne @ 2006-12-01 1:13 ` Ian Pratt 2006-12-09 5:40 ` John Byrne 0 siblings, 1 reply; 20+ messages in thread From: Ian Pratt @ 2006-12-01 1:13 UTC (permalink / raw) To: John Byrne, ian.pratt; +Cc: xen-devel > >> What happens if you use non-live relo? > > > > I thought I had tested that way back at the beginning without seeing the > > problem, but I must not have, because I just retested it to be sure and > > it died the same way. (Now I am truly confused and I need to go back and > > re-examine some of my earlier experiments.) > > > > After redoing some of my tests and understanding more about how Xen > handles page tables, I started looking at ptwr_do_page_fault() and put > debugging code into it. (On Xen 3.0.3 x86-64.) The fixup is failing in > x86_emulate_memop(). Building a debug version of Xen provided some > additional information (the final line is from my debugging, after the > ":" is domid, addr, pte, pte flags, type_info, page owner, domain): You say you can repro the problem using non-live relo. In that case, you should also be able to repro it using save/restore, which has almost identical code paths. Please try and isolate whether the crash happens on save or restore, and further whether a given saved images crashes every time in the same way when you try and restore it (mfns will be different, but pfns may be the same). Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-12-01 1:13 ` Ian Pratt @ 2006-12-09 5:40 ` John Byrne 2006-12-09 5:44 ` John Byrne 2006-12-09 8:33 ` Ian Pratt 0 siblings, 2 replies; 20+ messages in thread From: John Byrne @ 2006-12-09 5:40 UTC (permalink / raw) To: Ian Pratt; +Cc: xen-devel [-- Attachment #1: Type: text/plain, Size: 1583 bytes --] Ian Pratt wrote: >>>> What happens if you use non-live relo? >>> I thought I had tested that way back at the beginning without seeing > the >>> problem, but I must not have, because I just retested it to be sure > and >>> it died the same way. (Now I am truly confused and I need to go back > and >>> re-examine some of my earlier experiments.) >>> >> After redoing some of my tests and understanding more about how Xen >> handles page tables, I started looking at ptwr_do_page_fault() and put >> debugging code into it. (On Xen 3.0.3 x86-64.) The fixup is failing > in >> x86_emulate_memop(). Building a debug version of Xen provided some >> additional information (the final line is from my debugging, after the >> ":" is domid, addr, pte, pte flags, type_info, page owner, domain): > > You say you can repro the problem using non-live relo. In that case, you > should also be able to repro it using save/restore, which has almost > identical code paths. > > Please try and isolate whether the crash happens on save or restore, and > further whether a given saved images crashes every time in the same way > when you try and restore it (mfns will be different, but pfns may be the > same). > > > Ian > > I finally ran down the problem. SAP is protecting the pages PROT_NONE, so the page-present bit in the pte is not set and canonicalize/uncanonicalize code in save/restore ignore the pte. I've attached a patch. It is possible that this change should be made to the l1e tests in xc_ptrace.c; I'm not sure. John Byrne Signed-off-by: John Byrne <john.l.byrne@hp.com> [-- Attachment #2: migprotnone.patch --] [-- Type: text/x-patch, Size: 1444 bytes --] diff -r 1ad7dff99968 tools/libxc/xc_linux_restore.c --- a/tools/libxc/xc_linux_restore.c Fri Dec 08 18:37:19 2006 +0000 +++ b/tools/libxc/xc_linux_restore.c Fri Dec 08 21:37:27 2006 -0600 @@ -73,7 +73,7 @@ static int uncanonicalize_pagetable(unsi else pte = ((uint64_t *)page)[i]; - if(pte & _PAGE_PRESENT) { + if(pte_present(pte)) { pfn = (pte >> PAGE_SHIFT) & 0xffffffff; diff -r 1ad7dff99968 tools/libxc/xc_linux_save.c --- a/tools/libxc/xc_linux_save.c Fri Dec 08 18:37:19 2006 +0000 +++ b/tools/libxc/xc_linux_save.c Fri Dec 08 21:36:59 2006 -0600 @@ -471,7 +471,7 @@ static int canonicalize_pagetable(unsign if (i >= xen_start && i < xen_end) pte = 0; - if (pte & _PAGE_PRESENT) { + if (pte_present(pte)) { mfn = (pte >> PAGE_SHIFT) & 0xfffffff; if (!MFN_IS_IN_PSEUDOPHYS_MAP(mfn)) { diff -r 1ad7dff99968 tools/libxc/xg_private.h --- a/tools/libxc/xg_private.h Fri Dec 08 18:37:19 2006 +0000 +++ b/tools/libxc/xg_private.h Fri Dec 08 17:48:49 2006 -0600 @@ -46,6 +46,10 @@ unsigned long csum_page (void * page); #define _PAGE_PSE 0x080 #define _PAGE_GLOBAL 0x100 +#define _PAGE_PROTNONE 0x080 /* If not present */ + +#define pte_present(_pteval) ((_pteval) & (_PAGE_PRESENT|_PAGE_PROTNONE)) + #define L1_PAGETABLE_SHIFT_PAE 12 #define L2_PAGETABLE_SHIFT_PAE 21 #define L3_PAGETABLE_SHIFT_PAE 30 [-- Attachment #3: Type: text/plain, Size: 138 bytes --] _______________________________________________ Xen-devel mailing list Xen-devel@lists.xensource.com http://lists.xensource.com/xen-devel ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-12-09 5:40 ` John Byrne @ 2006-12-09 5:44 ` John Byrne 2006-12-09 8:33 ` Ian Pratt 1 sibling, 0 replies; 20+ messages in thread From: John Byrne @ 2006-12-09 5:44 UTC (permalink / raw) To: Ian Pratt; +Cc: xen-devel John Byrne wrote: > ...snipped.. > I finally ran down the problem. SAP is protecting the pages PROT_NONE, > so the page-present bit in the pte is not set and > canonicalize/uncanonicalize code in save/restore ignore the pte. I've > attached a patch. It is possible that this change should be made to the > l1e tests in xc_ptrace.c; I'm not sure. > > John Byrne > The patch is against xen-unstable changeset 12815:1ad7dff99968. John Byrne ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Live migration leaves page tables read-only? 2006-12-09 5:40 ` John Byrne 2006-12-09 5:44 ` John Byrne @ 2006-12-09 8:33 ` Ian Pratt 2006-12-09 9:22 ` Keir Fraser ` (2 more replies) 1 sibling, 3 replies; 20+ messages in thread From: Ian Pratt @ 2006-12-09 8:33 UTC (permalink / raw) To: John Byrne, Ian Pratt; +Cc: xen-devel, Joe Bonasera, Christian Limpach > I finally ran down the problem. SAP is protecting the pages PROT_NONE, > so the page-present bit in the pte is not set and > canonicalize/uncanonicalize code in save/restore ignore the pte. I've > attached a patch. It is possible that this change should be made to the > l1e tests in xc_ptrace.c; I'm not sure. That's a good catch, thanks. Interesting that we hadn't seen this before. Although your patch works today, it will break when we add PSE (super page) support for PV guests as it will confuse PROT_NONE with PSE. Assuming PROT_NONE only makes sense for L1 entries, we can probably gate the tests on whether the page table page is an L1 or not to fix this. However, it does point out an issue for other OSes: Taking this patch effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE) part of the Xen API. We need to find out whether this is compatible with *BSD and Solaris' use of flags for not present ptes. Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-12-09 8:33 ` Ian Pratt @ 2006-12-09 9:22 ` Keir Fraser 2006-12-09 9:34 ` Keir Fraser 2006-12-11 17:00 ` Joe Bonasera 2007-01-14 4:11 ` John Byrne 2 siblings, 1 reply; 20+ messages in thread From: Keir Fraser @ 2006-12-09 9:22 UTC (permalink / raw) To: Ian Pratt, John Byrne; +Cc: xen-devel, Christian Limpach, Joe Bonasera On 9/12/06 8:33 am, "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote: > Although your patch works today, it will break when we add PSE (super > page) support for PV guests as it will confuse PROT_NONE with PSE. > Assuming PROT_NONE only makes sense for L1 entries, we can probably gate > the tests on whether the page table page is an L1 or not to fix this. > > However, it does point out an issue for other OSes: Taking this patch > effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE) > part of the Xen API. We need to find out whether this is compatible with > *BSD and Solaris' use of flags for not present ptes. If _PAGE_PRESENT is clear then the other N-1 bits can be assumed available for things like swapcache info. Making assumptions about not-present PTEs is not really tenable. -- Keir ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-12-09 9:22 ` Keir Fraser @ 2006-12-09 9:34 ` Keir Fraser 2006-12-09 9:48 ` Keir Fraser 0 siblings, 1 reply; 20+ messages in thread From: Keir Fraser @ 2006-12-09 9:34 UTC (permalink / raw) To: Keir Fraser, Ian Pratt, John Byrne Cc: xen-devel, Christian Limpach, Joe Bonasera On 9/12/06 9:22 am, "Keir Fraser" <keir@xensource.com> wrote: >> Although your patch works today, it will break when we add PSE (super >> page) support for PV guests as it will confuse PROT_NONE with PSE. >> Assuming PROT_NONE only makes sense for L1 entries, we can probably gate >> the tests on whether the page table page is an L1 or not to fix this. >> >> However, it does point out an issue for other OSes: Taking this patch >> effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE) >> part of the Xen API. We need to find out whether this is compatible with >> *BSD and Solaris' use of flags for not present ptes. > > If _PAGE_PRESENT is clear then the other N-1 bits can be assumed available > for things like swapcache info. Making assumptions about not-present PTEs is > not really tenable. Speaking more constructively we could have a pte_active_mask communicated via elfnote or xenbus (or some other way) which the tools would apply to PTEs to determine if they contain an MFN. Default would be 0x1. -- Keir ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-12-09 9:34 ` Keir Fraser @ 2006-12-09 9:48 ` Keir Fraser 0 siblings, 0 replies; 20+ messages in thread From: Keir Fraser @ 2006-12-09 9:48 UTC (permalink / raw) To: Ian Pratt, John Byrne; +Cc: xen-devel, Christian Limpach, Joe Bonasera On 9/12/06 9:34 am, "Keir Fraser" <keir@xensource.com> wrote: >> If _PAGE_PRESENT is clear then the other N-1 bits can be assumed available >> for things like swapcache info. Making assumptions about not-present PTEs is >> not really tenable. > > Speaking more constructively we could have a pte_active_mask communicated > via elfnote or xenbus (or some other way) which the tools would apply to > PTEs to determine if they contain an MFN. Default would be 0x1. Or we could apply the special case only for images with the OS elfnote set to 'linux', if all Linux kernels have the same PROT_NONE definition. With any of these solutions, the problem is how to communicate the flag or mask to xc_linux_save/xc_linux_restore, and how to propagate it across save/restore (i.e., how is it represented in a saved image?). -- Keir ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-12-09 8:33 ` Ian Pratt 2006-12-09 9:22 ` Keir Fraser @ 2006-12-11 17:00 ` Joe Bonasera 2006-12-11 18:29 ` Ian Pratt 2007-01-14 4:11 ` John Byrne 2 siblings, 1 reply; 20+ messages in thread From: Joe Bonasera @ 2006-12-11 17:00 UTC (permalink / raw) To: Ian Pratt; +Cc: Christian Limpach, xen-devel, John Byrne Ian Pratt wrote: > >> I finally ran down the problem. SAP is protecting the pages PROT_NONE, >> so the page-present bit in the pte is not set and >> canonicalize/uncanonicalize code in save/restore ignore the pte. I've >> attached a patch. It is possible that this change should be made to > the >> l1e tests in xc_ptrace.c; I'm not sure. > > That's a good catch, thanks. Interesting that we hadn't seen this > before. > > Although your patch works today, it will break when we add PSE (super > page) support for PV guests as it will confuse PROT_NONE with PSE. > Assuming PROT_NONE only makes sense for L1 entries, we can probably gate > the tests on whether the page table page is an L1 or not to fix this. > > However, it does point out an issue for other OSes: Taking this patch > effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE) > part of the Xen API. We need to find out whether this is compatible with > *BSD and Solaris' use of flags for not present ptes. > > Ian Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it becomes zero). Hence our PTEs always had either zero or have the PRESENT bit set. The only exception to this was adding some fixage to allow for the old Xen writable page table approach which temporarily made the upper table non-PRESENT. So you can make not-present, but non-zero entries mean anything you want. As long as it's the guest OS that creates the entries, we'll just not do it. Joe ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Live migration leaves page tables read-only? 2006-12-11 17:00 ` Joe Bonasera @ 2006-12-11 18:29 ` Ian Pratt 2006-12-11 19:55 ` John Byrne 2006-12-11 21:30 ` Joe Bonasera 0 siblings, 2 replies; 20+ messages in thread From: Ian Pratt @ 2006-12-11 18:29 UTC (permalink / raw) To: Joe Bonasera; +Cc: Christian Limpach, xen-devel, John Byrne > Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it > becomes zero). Hence our PTEs always had either zero or have the PRESENT > bit set. The only exception to this was adding some fixage to allow > for the old Xen writable page table approach which temporarily made > the upper table non-PRESENT. > > So you can make not-present, but non-zero entries mean anything you want. > As long as it's the guest OS that creates the entries, we'll just not do > it. Just to be confirm: in Solaris there are no not-present PTE's that contain machine addresses. This means we need to implement the scheme that Keir suggested to enable the guest OS to tell xen/xc_save/restore about flags in not-present PTEs that should trigger a m2p conversion. Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-12-11 18:29 ` Ian Pratt @ 2006-12-11 19:55 ` John Byrne 2006-12-11 21:30 ` Joe Bonasera 1 sibling, 0 replies; 20+ messages in thread From: John Byrne @ 2006-12-11 19:55 UTC (permalink / raw) To: Ian Pratt; +Cc: xen-devel, Joe Bonasera, Christian Limpach Ian Pratt wrote: > >> Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it >> becomes zero). Hence our PTEs always had either zero or have the > PRESENT >> bit set. The only exception to this was adding some fixage to allow >> for the old Xen writable page table approach which temporarily made >> the upper table non-PRESENT. >> >> So you can make not-present, but non-zero entries mean anything you > want. >> As long as it's the guest OS that creates the entries, we'll just not > do >> it. > > Just to be confirm: in Solaris there are no not-present PTE's that > contain machine addresses. > > This means we need to implement the scheme that Keir suggested to enable > the guest OS to tell xen/xc_save/restore about flags in not-present PTEs > that should trigger a m2p conversion. > > Ian > Ian, Silly me. I thought "xc_linux_save" meant what it said. I haven't paid much attention to BSD or Solaris on Xen and didn't realize that went through the same path. I'd really like to see this fixed for 3.0.4, at least for Linux, but I don't think I'm the person to implement a new "scheme" quickly to do it, but I'll try if someone wants to give me some advice on how to start. On the subject of schemes, what about support for other architectures? Is there anything we should be thinking about for supporting guests with different page sizes, for instance? John Byrne ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-12-11 18:29 ` Ian Pratt 2006-12-11 19:55 ` John Byrne @ 2006-12-11 21:30 ` Joe Bonasera 1 sibling, 0 replies; 20+ messages in thread From: Joe Bonasera @ 2006-12-11 21:30 UTC (permalink / raw) To: Ian Pratt; +Cc: Christian Limpach, xen-devel, John Byrne Ian Pratt wrote: > >> Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it >> becomes zero). Hence our PTEs always had either zero or have the > PRESENT >> bit set. The only exception to this was adding some fixage to allow >> for the old Xen writable page table approach which temporarily made >> the upper table non-PRESENT. >> >> So you can make not-present, but non-zero entries mean anything you > want. >> As long as it's the guest OS that creates the entries, we'll just not > do >> it. > > Just to be confirm: in Solaris there are no not-present PTE's that > contain machine addresses. yes > This means we need to implement the scheme that Keir suggested to enable > the guest OS to tell xen/xc_save/restore about flags in not-present PTEs > that should trigger a m2p conversion. > > Ian ^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only? 2006-12-09 8:33 ` Ian Pratt 2006-12-09 9:22 ` Keir Fraser 2006-12-11 17:00 ` Joe Bonasera @ 2007-01-14 4:11 ` John Byrne 2007-01-14 8:21 ` Ian Pratt 2 siblings, 1 reply; 20+ messages in thread From: John Byrne @ 2007-01-14 4:11 UTC (permalink / raw) To: Ian Pratt; +Cc: xen-devel Ian, I haven't noticed a fix. Is someone working on this bug or should I open a bugzilla for it, so it isn't forgotten? John Byrne Ian Pratt wrote: > >> I finally ran down the problem. SAP is protecting the pages PROT_NONE, >> so the page-present bit in the pte is not set and >> canonicalize/uncanonicalize code in save/restore ignore the pte. I've >> attached a patch. It is possible that this change should be made to > the >> l1e tests in xc_ptrace.c; I'm not sure. > > That's a good catch, thanks. Interesting that we hadn't seen this > before. > > Although your patch works today, it will break when we add PSE (super > page) support for PV guests as it will confuse PROT_NONE with PSE. > Assuming PROT_NONE only makes sense for L1 entries, we can probably gate > the tests on whether the page table page is an L1 or not to fix this. > > However, it does point out an issue for other OSes: Taking this patch > effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE) > part of the Xen API. We need to find out whether this is compatible with > *BSD and Solaris' use of flags for not present ptes. > > Ian > ^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Live migration leaves page tables read-only? 2007-01-14 4:11 ` John Byrne @ 2007-01-14 8:21 ` Ian Pratt 0 siblings, 0 replies; 20+ messages in thread From: Ian Pratt @ 2007-01-14 8:21 UTC (permalink / raw) To: John Byrne; +Cc: xen-devel > I haven't noticed a fix. Is someone working on this bug or should I open > a bugzilla for it, so it isn't forgotten? It's not forgotten, but I'm not aware of anyone actively working on it. It's a bit fiddly to fix properly. We need to add an elf note that describes how to identify not-present PTEs that contain MFNs. For linux this is easy as testing for the presence of a single bit being set works. In principle, you might need a more complex scheme, but I'm not aware of any OSes that actually require this. Allowing a mask and value to be specified would be good, something that could be extended into a list of mask:value,mask:value in future if need be e.g.: np_pte_contains_mfn_flags=c0:80,c0:40 The elf note would need to be pulled out of kernel by the domain builder, and then we need to figure out how to make the info available to the save/restore code. Would be good if someone could pick this up. Thanks, Ian > Ian Pratt wrote: > > > >> I finally ran down the problem. SAP is protecting the pages PROT_NONE, > >> so the page-present bit in the pte is not set and > >> canonicalize/uncanonicalize code in save/restore ignore the pte. I've > >> attached a patch. It is possible that this change should be made to > > the > >> l1e tests in xc_ptrace.c; I'm not sure. > > > > That's a good catch, thanks. Interesting that we hadn't seen this > > before. > > > > Although your patch works today, it will break when we add PSE (super > > page) support for PV guests as it will confuse PROT_NONE with PSE. > > Assuming PROT_NONE only makes sense for L1 entries, we can probably gate > > the tests on whether the page table page is an L1 or not to fix this. > > > > However, it does point out an issue for other OSes: Taking this patch > > effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE) > > part of the Xen API. We need to find out whether this is compatible with > > *BSD and Solaris' use of flags for not present ptes. > > > > Ian > > ^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2007-01-14 8:21 UTC | newest] Thread overview: 20+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2006-11-29 0:13 Live migration leaves page tables read-only? John Byrne 2006-11-29 0:22 ` John Byrne 2006-11-29 1:36 ` Ian Pratt 2006-11-29 2:52 ` John Byrne 2006-11-29 7:42 ` Keir Fraser 2006-11-29 16:49 ` John Byrne 2006-11-30 23:36 ` John Byrne 2006-12-01 1:13 ` Ian Pratt 2006-12-09 5:40 ` John Byrne 2006-12-09 5:44 ` John Byrne 2006-12-09 8:33 ` Ian Pratt 2006-12-09 9:22 ` Keir Fraser 2006-12-09 9:34 ` Keir Fraser 2006-12-09 9:48 ` Keir Fraser 2006-12-11 17:00 ` Joe Bonasera 2006-12-11 18:29 ` Ian Pratt 2006-12-11 19:55 ` John Byrne 2006-12-11 21:30 ` Joe Bonasera 2007-01-14 4:11 ` John Byrne 2007-01-14 8:21 ` Ian Pratt
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.