* Live migration leaves page tables read-only?
@ 2006-11-29 0:13 John Byrne
2006-11-29 0:22 ` John Byrne
0 siblings, 1 reply; 20+ messages in thread
From: John Byrne @ 2006-11-29 0:13 UTC (permalink / raw)
To: xen-devel
I have been trying to debug a problem live-migrating SAP on Xen-3.0.3
x86-64 (I also tested Xen-unstable changeset 12548) without success.
SAP seems to run fine on a given host; live-migrating it to another host
causes the guest to almost immediately panic in the mprotect() call in
the change_pte_range() routine in the set_pte_at() macro because the
page table page it is trying to update is write-protected.
My attempts at understanding where this is coming from have come to
naught. Any help in running this down would be appreciated. I am
perfectly willing/able to write some debugging code if I am given a few
clues what to look for.
Thanks,
John Byrne
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-11-29 0:13 Live migration leaves page tables read-only? John Byrne
@ 2006-11-29 0:22 ` John Byrne
2006-11-29 1:36 ` Ian Pratt
0 siblings, 1 reply; 20+ messages in thread
From: John Byrne @ 2006-11-29 0:22 UTC (permalink / raw)
To: xen-devel
I forgot to mention that a very simple test case I wrote using shared
memory and the mprotect call didn't fail. So, the only test case I have
at the moment is to run SAP.
John Byrne
John Byrne wrote:
>
> I have been trying to debug a problem live-migrating SAP on Xen-3.0.3
> x86-64 (I also tested Xen-unstable changeset 12548) without success.
>
> SAP seems to run fine on a given host; live-migrating it to another host
> causes the guest to almost immediately panic in the mprotect() call in
> the change_pte_range() routine in the set_pte_at() macro because the
> page table page it is trying to update is write-protected.
>
> My attempts at understanding where this is coming from have come to
> naught. Any help in running this down would be appreciated. I am
> perfectly willing/able to write some debugging code if I am given a few
> clues what to look for.
>
> Thanks,
>
> John Byrne
>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Live migration leaves page tables read-only?
2006-11-29 0:22 ` John Byrne
@ 2006-11-29 1:36 ` Ian Pratt
2006-11-29 2:52 ` John Byrne
0 siblings, 1 reply; 20+ messages in thread
From: Ian Pratt @ 2006-11-29 1:36 UTC (permalink / raw)
To: John Byrne, xen-devel
> I forgot to mention that a very simple test case I wrote using shared
> memory and the mprotect call didn't fail. So, the only test case I
have
> at the moment is to run SAP.
What happens if you use non-live relo?
Also, can you repro on 32b?
Thanks,
Ian
> John Byrne
>
> John Byrne wrote:
> >
> > I have been trying to debug a problem live-migrating SAP on
Xen-3.0.3
> > x86-64 (I also tested Xen-unstable changeset 12548) without success.
> >
> > SAP seems to run fine on a given host; live-migrating it to another
host
> > causes the guest to almost immediately panic in the mprotect() call
in
> > the change_pte_range() routine in the set_pte_at() macro because the
> > page table page it is trying to update is write-protected.
> >
> > My attempts at understanding where this is coming from have come to
> > naught. Any help in running this down would be appreciated. I am
> > perfectly willing/able to write some debugging code if I am given a
few
> > clues what to look for.
> >
> > Thanks,
> >
> > John Byrne
> >
> >
> >
> >
> > _______________________________________________
> > Xen-devel mailing list
> > Xen-devel@lists.xensource.com
> > http://lists.xensource.com/xen-devel
> >
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-11-29 1:36 ` Ian Pratt
@ 2006-11-29 2:52 ` John Byrne
2006-11-29 7:42 ` Keir Fraser
2006-11-30 23:36 ` John Byrne
0 siblings, 2 replies; 20+ messages in thread
From: John Byrne @ 2006-11-29 2:52 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
Ian Pratt wrote:
>> I forgot to mention that a very simple test case I wrote using shared
>> memory and the mprotect call didn't fail. So, the only test case I
> have
>> at the moment is to run SAP.
>
> What happens if you use non-live relo?
I thought I had tested that way back at the beginning without seeing the
problem, but I must not have, because I just retested it to be sure and
it died the same way. (Now I am truly confused and I need to go back and
re-examine some of my earlier experiments.)
In the meantime, any ideas where to look?
>
> Also, can you repro on 32b?
I am doing this on behalf of someone else, so I'd have to ask them to do
the setup if they have the time. I am reluctant to do so at this point.
Thanks,
John
>
> Thanks,
> Ian
>
>
>> John Byrne
>>
>> John Byrne wrote:
>>> I have been trying to debug a problem live-migrating SAP on
> Xen-3.0.3
>>> x86-64 (I also tested Xen-unstable changeset 12548) without success.
>>>
>>> SAP seems to run fine on a given host; live-migrating it to another
> host
>>> causes the guest to almost immediately panic in the mprotect() call
> in
>>> the change_pte_range() routine in the set_pte_at() macro because the
>>> page table page it is trying to update is write-protected.
>>>
>>> My attempts at understanding where this is coming from have come to
>>> naught. Any help in running this down would be appreciated. I am
>>> perfectly willing/able to write some debugging code if I am given a
> few
>>> clues what to look for.
>>>
>>> Thanks,
>>>
>>> John Byrne
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Xen-devel mailing list
>>> Xen-devel@lists.xensource.com
>>> http://lists.xensource.com/xen-devel
>>>
>>
>> _______________________________________________
>> Xen-devel mailing list
>> Xen-devel@lists.xensource.com
>> http://lists.xensource.com/xen-devel
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-11-29 2:52 ` John Byrne
@ 2006-11-29 7:42 ` Keir Fraser
2006-11-29 16:49 ` John Byrne
2006-11-30 23:36 ` John Byrne
1 sibling, 1 reply; 20+ messages in thread
From: Keir Fraser @ 2006-11-29 7:42 UTC (permalink / raw)
To: John Byrne, Ian Pratt; +Cc: xen-devel
On 29/11/06 2:52 am, "John Byrne" <john.l.byrne@hp.com> wrote:
>> What happens if you use non-live relo?
>
> I thought I had tested that way back at the beginning without seeing the
> problem, but I must not have, because I just retested it to be sure and
> it died the same way. (Now I am truly confused and I need to go back and
> re-examine some of my earlier experiments.)
>
> In the meantime, any ideas where to look?
This will be very dependent on the guest that is being migrated. What Linux
kernel is the domU running?
-- Keir
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-11-29 7:42 ` Keir Fraser
@ 2006-11-29 16:49 ` John Byrne
0 siblings, 0 replies; 20+ messages in thread
From: John Byrne @ 2006-11-29 16:49 UTC (permalink / raw)
To: Keir Fraser; +Cc: Ian Pratt, xen-devel
Keir Fraser wrote:
>
>
> On 29/11/06 2:52 am, "John Byrne" <john.l.byrne@hp.com> wrote:
>
>>> What happens if you use non-live relo?
>> I thought I had tested that way back at the beginning without seeing the
>> problem, but I must not have, because I just retested it to be sure and
>> it died the same way. (Now I am truly confused and I need to go back and
>> re-examine some of my earlier experiments.)
>>
>> In the meantime, any ideas where to look?
>
> This will be very dependent on the guest that is being migrated. What Linux
> kernel is the domU running?
>
> -- Keir
>
>
>
Linux 2.6.16.29 (+ the SLES 10 iscsi patches) with approximately the
config used by the SLES 10 Xen kernel.
I can send the config, if you need it.
John Byrne
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-11-29 2:52 ` John Byrne
2006-11-29 7:42 ` Keir Fraser
@ 2006-11-30 23:36 ` John Byrne
2006-12-01 1:13 ` Ian Pratt
1 sibling, 1 reply; 20+ messages in thread
From: John Byrne @ 2006-11-30 23:36 UTC (permalink / raw)
To: ian.pratt; +Cc: Ian Pratt, xen-devel
John Byrne wrote:
> Ian Pratt wrote:
>>> I forgot to mention that a very simple test case I wrote using shared
>>> memory and the mprotect call didn't fail. So, the only test case I
>> have
>>> at the moment is to run SAP.
>>
>> What happens if you use non-live relo?
>
> I thought I had tested that way back at the beginning without seeing the
> problem, but I must not have, because I just retested it to be sure and
> it died the same way. (Now I am truly confused and I need to go back and
> re-examine some of my earlier experiments.)
>
After redoing some of my tests and understanding more about how Xen
handles page tables, I started looking at ptwr_do_page_fault() and put
debugging code into it. (On Xen 3.0.3 x86-64.) The fixup is failing in
x86_emulate_memop(). Building a debug version of Xen provided some
additional information (the final line is from my debugging, after the
":" is domid, addr, pte, pte flags, type_info, page owner, domain):
(XEN) DOM1: (file=mm.c, line=1682) Bad type (saw 0000000028000001 != exp
00000000e0000000) for mfn c8de3 (pfn 12491)
(XEN) DOM1: (file=mm.c, line=606) Error getting mfn c8de3 (pfn 12491)
from L1 entry 00000000c8de3167 for dom1
(XEN) DOM1: (file=mm.c, line=1682) Bad type (saw 0000000028000001 != exp
00000000e0000000) for mfn c8de3 (pfn 12491)
(XEN) DOM1: (file=mm.c, line=606) Error getting mfn c8de3 (pfn 12491)
from L1 entry 00000000c8de3067 for dom1
(XEN) DOM1: (file=mm.c, line=3120) ptwr_emulate: could not
get_page_from_l1e()
(XEN) ptwr_do_page_fault,3253:1 ffff880011065bc0 80100000ca20f065 801065
28000001 ffff830000fe7080 ffff830000fe7080
I'll keep following this down, but any help would be appreciated.
Thanks,
John Byrne
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Live migration leaves page tables read-only?
2006-11-30 23:36 ` John Byrne
@ 2006-12-01 1:13 ` Ian Pratt
2006-12-09 5:40 ` John Byrne
0 siblings, 1 reply; 20+ messages in thread
From: Ian Pratt @ 2006-12-01 1:13 UTC (permalink / raw)
To: John Byrne, ian.pratt; +Cc: xen-devel
> >> What happens if you use non-live relo?
> >
> > I thought I had tested that way back at the beginning without seeing
the
> > problem, but I must not have, because I just retested it to be sure
and
> > it died the same way. (Now I am truly confused and I need to go back
and
> > re-examine some of my earlier experiments.)
> >
>
> After redoing some of my tests and understanding more about how Xen
> handles page tables, I started looking at ptwr_do_page_fault() and put
> debugging code into it. (On Xen 3.0.3 x86-64.) The fixup is failing
in
> x86_emulate_memop(). Building a debug version of Xen provided some
> additional information (the final line is from my debugging, after the
> ":" is domid, addr, pte, pte flags, type_info, page owner, domain):
You say you can repro the problem using non-live relo. In that case, you
should also be able to repro it using save/restore, which has almost
identical code paths.
Please try and isolate whether the crash happens on save or restore, and
further whether a given saved images crashes every time in the same way
when you try and restore it (mfns will be different, but pfns may be the
same).
Ian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-12-01 1:13 ` Ian Pratt
@ 2006-12-09 5:40 ` John Byrne
2006-12-09 5:44 ` John Byrne
2006-12-09 8:33 ` Ian Pratt
0 siblings, 2 replies; 20+ messages in thread
From: John Byrne @ 2006-12-09 5:40 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 1583 bytes --]
Ian Pratt wrote:
>>>> What happens if you use non-live relo?
>>> I thought I had tested that way back at the beginning without seeing
> the
>>> problem, but I must not have, because I just retested it to be sure
> and
>>> it died the same way. (Now I am truly confused and I need to go back
> and
>>> re-examine some of my earlier experiments.)
>>>
>> After redoing some of my tests and understanding more about how Xen
>> handles page tables, I started looking at ptwr_do_page_fault() and put
>> debugging code into it. (On Xen 3.0.3 x86-64.) The fixup is failing
> in
>> x86_emulate_memop(). Building a debug version of Xen provided some
>> additional information (the final line is from my debugging, after the
>> ":" is domid, addr, pte, pte flags, type_info, page owner, domain):
>
> You say you can repro the problem using non-live relo. In that case, you
> should also be able to repro it using save/restore, which has almost
> identical code paths.
>
> Please try and isolate whether the crash happens on save or restore, and
> further whether a given saved images crashes every time in the same way
> when you try and restore it (mfns will be different, but pfns may be the
> same).
>
>
> Ian
>
>
I finally ran down the problem. SAP is protecting the pages PROT_NONE,
so the page-present bit in the pte is not set and
canonicalize/uncanonicalize code in save/restore ignore the pte. I've
attached a patch. It is possible that this change should be made to the
l1e tests in xc_ptrace.c; I'm not sure.
John Byrne
Signed-off-by: John Byrne <john.l.byrne@hp.com>
[-- Attachment #2: migprotnone.patch --]
[-- Type: text/x-patch, Size: 1444 bytes --]
diff -r 1ad7dff99968 tools/libxc/xc_linux_restore.c
--- a/tools/libxc/xc_linux_restore.c Fri Dec 08 18:37:19 2006 +0000
+++ b/tools/libxc/xc_linux_restore.c Fri Dec 08 21:37:27 2006 -0600
@@ -73,7 +73,7 @@ static int uncanonicalize_pagetable(unsi
else
pte = ((uint64_t *)page)[i];
- if(pte & _PAGE_PRESENT) {
+ if(pte_present(pte)) {
pfn = (pte >> PAGE_SHIFT) & 0xffffffff;
diff -r 1ad7dff99968 tools/libxc/xc_linux_save.c
--- a/tools/libxc/xc_linux_save.c Fri Dec 08 18:37:19 2006 +0000
+++ b/tools/libxc/xc_linux_save.c Fri Dec 08 21:36:59 2006 -0600
@@ -471,7 +471,7 @@ static int canonicalize_pagetable(unsign
if (i >= xen_start && i < xen_end)
pte = 0;
- if (pte & _PAGE_PRESENT) {
+ if (pte_present(pte)) {
mfn = (pte >> PAGE_SHIFT) & 0xfffffff;
if (!MFN_IS_IN_PSEUDOPHYS_MAP(mfn)) {
diff -r 1ad7dff99968 tools/libxc/xg_private.h
--- a/tools/libxc/xg_private.h Fri Dec 08 18:37:19 2006 +0000
+++ b/tools/libxc/xg_private.h Fri Dec 08 17:48:49 2006 -0600
@@ -46,6 +46,10 @@ unsigned long csum_page (void * page);
#define _PAGE_PSE 0x080
#define _PAGE_GLOBAL 0x100
+#define _PAGE_PROTNONE 0x080 /* If not present */
+
+#define pte_present(_pteval) ((_pteval) & (_PAGE_PRESENT|_PAGE_PROTNONE))
+
#define L1_PAGETABLE_SHIFT_PAE 12
#define L2_PAGETABLE_SHIFT_PAE 21
#define L3_PAGETABLE_SHIFT_PAE 30
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-12-09 5:40 ` John Byrne
@ 2006-12-09 5:44 ` John Byrne
2006-12-09 8:33 ` Ian Pratt
1 sibling, 0 replies; 20+ messages in thread
From: John Byrne @ 2006-12-09 5:44 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
John Byrne wrote:
> ...snipped..
> I finally ran down the problem. SAP is protecting the pages PROT_NONE,
> so the page-present bit in the pte is not set and
> canonicalize/uncanonicalize code in save/restore ignore the pte. I've
> attached a patch. It is possible that this change should be made to the
> l1e tests in xc_ptrace.c; I'm not sure.
>
> John Byrne
>
The patch is against xen-unstable changeset 12815:1ad7dff99968.
John Byrne
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Live migration leaves page tables read-only?
2006-12-09 5:40 ` John Byrne
2006-12-09 5:44 ` John Byrne
@ 2006-12-09 8:33 ` Ian Pratt
2006-12-09 9:22 ` Keir Fraser
` (2 more replies)
1 sibling, 3 replies; 20+ messages in thread
From: Ian Pratt @ 2006-12-09 8:33 UTC (permalink / raw)
To: John Byrne, Ian Pratt; +Cc: xen-devel, Joe Bonasera, Christian Limpach
> I finally ran down the problem. SAP is protecting the pages PROT_NONE,
> so the page-present bit in the pte is not set and
> canonicalize/uncanonicalize code in save/restore ignore the pte. I've
> attached a patch. It is possible that this change should be made to
the
> l1e tests in xc_ptrace.c; I'm not sure.
That's a good catch, thanks. Interesting that we hadn't seen this
before.
Although your patch works today, it will break when we add PSE (super
page) support for PV guests as it will confuse PROT_NONE with PSE.
Assuming PROT_NONE only makes sense for L1 entries, we can probably gate
the tests on whether the page table page is an L1 or not to fix this.
However, it does point out an issue for other OSes: Taking this patch
effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE)
part of the Xen API. We need to find out whether this is compatible with
*BSD and Solaris' use of flags for not present ptes.
Ian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-12-09 8:33 ` Ian Pratt
@ 2006-12-09 9:22 ` Keir Fraser
2006-12-09 9:34 ` Keir Fraser
2006-12-11 17:00 ` Joe Bonasera
2007-01-14 4:11 ` John Byrne
2 siblings, 1 reply; 20+ messages in thread
From: Keir Fraser @ 2006-12-09 9:22 UTC (permalink / raw)
To: Ian Pratt, John Byrne; +Cc: xen-devel, Christian Limpach, Joe Bonasera
On 9/12/06 8:33 am, "Ian Pratt" <m+Ian.Pratt@cl.cam.ac.uk> wrote:
> Although your patch works today, it will break when we add PSE (super
> page) support for PV guests as it will confuse PROT_NONE with PSE.
> Assuming PROT_NONE only makes sense for L1 entries, we can probably gate
> the tests on whether the page table page is an L1 or not to fix this.
>
> However, it does point out an issue for other OSes: Taking this patch
> effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE)
> part of the Xen API. We need to find out whether this is compatible with
> *BSD and Solaris' use of flags for not present ptes.
If _PAGE_PRESENT is clear then the other N-1 bits can be assumed available
for things like swapcache info. Making assumptions about not-present PTEs is
not really tenable.
-- Keir
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-12-09 9:22 ` Keir Fraser
@ 2006-12-09 9:34 ` Keir Fraser
2006-12-09 9:48 ` Keir Fraser
0 siblings, 1 reply; 20+ messages in thread
From: Keir Fraser @ 2006-12-09 9:34 UTC (permalink / raw)
To: Keir Fraser, Ian Pratt, John Byrne
Cc: xen-devel, Christian Limpach, Joe Bonasera
On 9/12/06 9:22 am, "Keir Fraser" <keir@xensource.com> wrote:
>> Although your patch works today, it will break when we add PSE (super
>> page) support for PV guests as it will confuse PROT_NONE with PSE.
>> Assuming PROT_NONE only makes sense for L1 entries, we can probably gate
>> the tests on whether the page table page is an L1 or not to fix this.
>>
>> However, it does point out an issue for other OSes: Taking this patch
>> effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE)
>> part of the Xen API. We need to find out whether this is compatible with
>> *BSD and Solaris' use of flags for not present ptes.
>
> If _PAGE_PRESENT is clear then the other N-1 bits can be assumed available
> for things like swapcache info. Making assumptions about not-present PTEs is
> not really tenable.
Speaking more constructively we could have a pte_active_mask communicated
via elfnote or xenbus (or some other way) which the tools would apply to
PTEs to determine if they contain an MFN. Default would be 0x1.
-- Keir
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-12-09 9:34 ` Keir Fraser
@ 2006-12-09 9:48 ` Keir Fraser
0 siblings, 0 replies; 20+ messages in thread
From: Keir Fraser @ 2006-12-09 9:48 UTC (permalink / raw)
To: Ian Pratt, John Byrne; +Cc: xen-devel, Christian Limpach, Joe Bonasera
On 9/12/06 9:34 am, "Keir Fraser" <keir@xensource.com> wrote:
>> If _PAGE_PRESENT is clear then the other N-1 bits can be assumed available
>> for things like swapcache info. Making assumptions about not-present PTEs is
>> not really tenable.
>
> Speaking more constructively we could have a pte_active_mask communicated
> via elfnote or xenbus (or some other way) which the tools would apply to
> PTEs to determine if they contain an MFN. Default would be 0x1.
Or we could apply the special case only for images with the OS elfnote set
to 'linux', if all Linux kernels have the same PROT_NONE definition.
With any of these solutions, the problem is how to communicate the flag or
mask to xc_linux_save/xc_linux_restore, and how to propagate it across
save/restore (i.e., how is it represented in a saved image?).
-- Keir
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-12-09 8:33 ` Ian Pratt
2006-12-09 9:22 ` Keir Fraser
@ 2006-12-11 17:00 ` Joe Bonasera
2006-12-11 18:29 ` Ian Pratt
2007-01-14 4:11 ` John Byrne
2 siblings, 1 reply; 20+ messages in thread
From: Joe Bonasera @ 2006-12-11 17:00 UTC (permalink / raw)
To: Ian Pratt; +Cc: Christian Limpach, xen-devel, John Byrne
Ian Pratt wrote:
>
>> I finally ran down the problem. SAP is protecting the pages PROT_NONE,
>> so the page-present bit in the pte is not set and
>> canonicalize/uncanonicalize code in save/restore ignore the pte. I've
>> attached a patch. It is possible that this change should be made to
> the
>> l1e tests in xc_ptrace.c; I'm not sure.
>
> That's a good catch, thanks. Interesting that we hadn't seen this
> before.
>
> Although your patch works today, it will break when we add PSE (super
> page) support for PV guests as it will confuse PROT_NONE with PSE.
> Assuming PROT_NONE only makes sense for L1 entries, we can probably gate
> the tests on whether the page table page is an L1 or not to fix this.
>
> However, it does point out an issue for other OSes: Taking this patch
> effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE)
> part of the Xen API. We need to find out whether this is compatible with
> *BSD and Solaris' use of flags for not present ptes.
>
> Ian
Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it
becomes zero). Hence our PTEs always had either zero or have the PRESENT
bit set. The only exception to this was adding some fixage to allow
for the old Xen writable page table approach which temporarily made
the upper table non-PRESENT.
So you can make not-present, but non-zero entries mean anything you want.
As long as it's the guest OS that creates the entries, we'll just not do it.
Joe
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Live migration leaves page tables read-only?
2006-12-11 17:00 ` Joe Bonasera
@ 2006-12-11 18:29 ` Ian Pratt
2006-12-11 19:55 ` John Byrne
2006-12-11 21:30 ` Joe Bonasera
0 siblings, 2 replies; 20+ messages in thread
From: Ian Pratt @ 2006-12-11 18:29 UTC (permalink / raw)
To: Joe Bonasera; +Cc: Christian Limpach, xen-devel, John Byrne
> Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it
> becomes zero). Hence our PTEs always had either zero or have the
PRESENT
> bit set. The only exception to this was adding some fixage to allow
> for the old Xen writable page table approach which temporarily made
> the upper table non-PRESENT.
>
> So you can make not-present, but non-zero entries mean anything you
want.
> As long as it's the guest OS that creates the entries, we'll just not
do
> it.
Just to be confirm: in Solaris there are no not-present PTE's that
contain machine addresses.
This means we need to implement the scheme that Keir suggested to enable
the guest OS to tell xen/xc_save/restore about flags in not-present PTEs
that should trigger a m2p conversion.
Ian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-12-11 18:29 ` Ian Pratt
@ 2006-12-11 19:55 ` John Byrne
2006-12-11 21:30 ` Joe Bonasera
1 sibling, 0 replies; 20+ messages in thread
From: John Byrne @ 2006-12-11 19:55 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel, Joe Bonasera, Christian Limpach
Ian Pratt wrote:
>
>> Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it
>> becomes zero). Hence our PTEs always had either zero or have the
> PRESENT
>> bit set. The only exception to this was adding some fixage to allow
>> for the old Xen writable page table approach which temporarily made
>> the upper table non-PRESENT.
>>
>> So you can make not-present, but non-zero entries mean anything you
> want.
>> As long as it's the guest OS that creates the entries, we'll just not
> do
>> it.
>
> Just to be confirm: in Solaris there are no not-present PTE's that
> contain machine addresses.
>
> This means we need to implement the scheme that Keir suggested to enable
> the guest OS to tell xen/xc_save/restore about flags in not-present PTEs
> that should trigger a m2p conversion.
>
> Ian
>
Ian,
Silly me. I thought "xc_linux_save" meant what it said. I haven't paid
much attention to BSD or Solaris on Xen and didn't realize that went
through the same path.
I'd really like to see this fixed for 3.0.4, at least for Linux, but I
don't think I'm the person to implement a new "scheme" quickly to do it,
but I'll try if someone wants to give me some advice on how to start.
On the subject of schemes, what about support for other architectures?
Is there anything we should be thinking about for supporting guests with
different page sizes, for instance?
John Byrne
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-12-11 18:29 ` Ian Pratt
2006-12-11 19:55 ` John Byrne
@ 2006-12-11 21:30 ` Joe Bonasera
1 sibling, 0 replies; 20+ messages in thread
From: Joe Bonasera @ 2006-12-11 21:30 UTC (permalink / raw)
To: Ian Pratt; +Cc: Christian Limpach, xen-devel, John Byrne
Ian Pratt wrote:
>
>> Solaris implements PROT_NONE by entirely invalidating the PTE (ie. it
>> becomes zero). Hence our PTEs always had either zero or have the
> PRESENT
>> bit set. The only exception to this was adding some fixage to allow
>> for the old Xen writable page table approach which temporarily made
>> the upper table non-PRESENT.
>>
>> So you can make not-present, but non-zero entries mean anything you
> want.
>> As long as it's the guest OS that creates the entries, we'll just not
> do
>> it.
>
> Just to be confirm: in Solaris there are no not-present PTE's that
> contain machine addresses.
yes
> This means we need to implement the scheme that Keir suggested to enable
> the guest OS to tell xen/xc_save/restore about flags in not-present PTEs
> that should trigger a m2p conversion.
>
> Ian
^ permalink raw reply [flat|nested] 20+ messages in thread
* Re: Live migration leaves page tables read-only?
2006-12-09 8:33 ` Ian Pratt
2006-12-09 9:22 ` Keir Fraser
2006-12-11 17:00 ` Joe Bonasera
@ 2007-01-14 4:11 ` John Byrne
2007-01-14 8:21 ` Ian Pratt
2 siblings, 1 reply; 20+ messages in thread
From: John Byrne @ 2007-01-14 4:11 UTC (permalink / raw)
To: Ian Pratt; +Cc: xen-devel
Ian,
I haven't noticed a fix. Is someone working on this bug or should I open
a bugzilla for it, so it isn't forgotten?
John Byrne
Ian Pratt wrote:
>
>> I finally ran down the problem. SAP is protecting the pages PROT_NONE,
>> so the page-present bit in the pte is not set and
>> canonicalize/uncanonicalize code in save/restore ignore the pte. I've
>> attached a patch. It is possible that this change should be made to
> the
>> l1e tests in xc_ptrace.c; I'm not sure.
>
> That's a good catch, thanks. Interesting that we hadn't seen this
> before.
>
> Although your patch works today, it will break when we add PSE (super
> page) support for PV guests as it will confuse PROT_NONE with PSE.
> Assuming PROT_NONE only makes sense for L1 entries, we can probably gate
> the tests on whether the page table page is an L1 or not to fix this.
>
> However, it does point out an issue for other OSes: Taking this patch
> effectively makes Linux's PROT_NONE (flags 0x80 for a not present PTE)
> part of the Xen API. We need to find out whether this is compatible with
> *BSD and Solaris' use of flags for not present ptes.
>
> Ian
>
^ permalink raw reply [flat|nested] 20+ messages in thread
* RE: Live migration leaves page tables read-only?
2007-01-14 4:11 ` John Byrne
@ 2007-01-14 8:21 ` Ian Pratt
0 siblings, 0 replies; 20+ messages in thread
From: Ian Pratt @ 2007-01-14 8:21 UTC (permalink / raw)
To: John Byrne; +Cc: xen-devel
> I haven't noticed a fix. Is someone working on this bug or should I
open
> a bugzilla for it, so it isn't forgotten?
It's not forgotten, but I'm not aware of anyone actively working on it.
It's a bit fiddly to fix properly.
We need to add an elf note that describes how to identify not-present
PTEs that contain MFNs. For linux this is easy as testing for the
presence of a single bit being set works. In principle, you might need a
more complex scheme, but I'm not aware of any OSes that actually require
this.
Allowing a mask and value to be specified would be good, something that
could be extended into a list of mask:value,mask:value in future if need
be e.g.:
np_pte_contains_mfn_flags=c0:80,c0:40
The elf note would need to be pulled out of kernel by the domain
builder, and then we need to figure out how to make the info available
to the save/restore code.
Would be good if someone could pick this up.
Thanks,
Ian
> Ian Pratt wrote:
> >
> >> I finally ran down the problem. SAP is protecting the pages
PROT_NONE,
> >> so the page-present bit in the pte is not set and
> >> canonicalize/uncanonicalize code in save/restore ignore the pte.
I've
> >> attached a patch. It is possible that this change should be made to
> > the
> >> l1e tests in xc_ptrace.c; I'm not sure.
> >
> > That's a good catch, thanks. Interesting that we hadn't seen this
> > before.
> >
> > Although your patch works today, it will break when we add PSE
(super
> > page) support for PV guests as it will confuse PROT_NONE with PSE.
> > Assuming PROT_NONE only makes sense for L1 entries, we can probably
gate
> > the tests on whether the page table page is an L1 or not to fix
this.
> >
> > However, it does point out an issue for other OSes: Taking this
patch
> > effectively makes Linux's PROT_NONE (flags 0x80 for a not present
PTE)
> > part of the Xen API. We need to find out whether this is compatible
with
> > *BSD and Solaris' use of flags for not present ptes.
> >
> > Ian
> >
^ permalink raw reply [flat|nested] 20+ messages in thread
end of thread, other threads:[~2007-01-14 8:21 UTC | newest]
Thread overview: 20+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2006-11-29 0:13 Live migration leaves page tables read-only? John Byrne
2006-11-29 0:22 ` John Byrne
2006-11-29 1:36 ` Ian Pratt
2006-11-29 2:52 ` John Byrne
2006-11-29 7:42 ` Keir Fraser
2006-11-29 16:49 ` John Byrne
2006-11-30 23:36 ` John Byrne
2006-12-01 1:13 ` Ian Pratt
2006-12-09 5:40 ` John Byrne
2006-12-09 5:44 ` John Byrne
2006-12-09 8:33 ` Ian Pratt
2006-12-09 9:22 ` Keir Fraser
2006-12-09 9:34 ` Keir Fraser
2006-12-09 9:48 ` Keir Fraser
2006-12-11 17:00 ` Joe Bonasera
2006-12-11 18:29 ` Ian Pratt
2006-12-11 19:55 ` John Byrne
2006-12-11 21:30 ` Joe Bonasera
2007-01-14 4:11 ` John Byrne
2007-01-14 8:21 ` Ian Pratt
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.