* Live migration with MMIO pages
@ 2007-10-31 12:19 Kieran Mansley
2007-10-31 13:32 ` Keir Fraser
0 siblings, 1 reply; 16+ messages in thread
From: Kieran Mansley @ 2007-10-31 12:19 UTC (permalink / raw)
To: xen-devel
I'm trying to track down some strange behaviour I see when doing live
migration. Looking at the code that copies the pages across in xen-
unstable.hg/tools/libxc/xc_domain_save.c I see there's a test that
avoids doing anything with pages that are MMIO if it's an HVM domain:
/* Skip PFNs that aren't really there */
if ( hvm && ((n >= 0xa0 && n < 0xc0) /* VGA hole */
|| (n >= (HVM_BELOW_4G_MMIO_START >> PAGE_SHIFT)
&& n < (1ULL<<32) >> PAGE_SHIFT)) /* MMIO */ )
continue;
However there's no equivalent test for a non-HVM domain. I think adding
such a test to just ignore any MMIO pages would help. The MMIO pages
will be unmapped when the live migration finally suspends the domain
(and so the migration should be possible), but accessing them in the
mean time is probably not a good idea. Non-live migration is fine
because the domain (and so all its devices) are suspended in advance,
and the MMIO pages are unmapped.
This code is running in dom0, and so I think will need some help from
the hypervisor to get the range of mfns that they hypervisor has
reserved for MMIO. It could then compare the mfn to that range and
ignore it if there's a match. Does this sound sensible? Is there an
API to get this information already? Any chance of a patch to fix this
making it into 3.2.0?
Kieran
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-10-31 12:19 Live migration with MMIO pages Kieran Mansley
@ 2007-10-31 13:32 ` Keir Fraser
2007-10-31 14:03 ` Kieran Mansley
0 siblings, 1 reply; 16+ messages in thread
From: Keir Fraser @ 2007-10-31 13:32 UTC (permalink / raw)
To: Kieran Mansley, xen-devel
On 31/10/07 12:19, "Kieran Mansley" <kmansley@solarflare.com> wrote:
> However there's no equivalent test for a non-HVM domain. I think adding
> such a test to just ignore any MMIO pages would help. The MMIO pages
> will be unmapped when the live migration finally suspends the domain
> (and so the migration should be possible), but accessing them in the
> mean time is probably not a good idea. Non-live migration is fine
> because the domain (and so all its devices) are suspended in advance,
> and the MMIO pages are unmapped.
Is mapping the page as cacheable and then reading from it not supported by
the hardware device? E.g., does it need to be uncacheable, or do reads have
side effects?
-- Keir
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-10-31 13:32 ` Keir Fraser
@ 2007-10-31 14:03 ` Kieran Mansley
2007-10-31 14:08 ` Keir Fraser
0 siblings, 1 reply; 16+ messages in thread
From: Kieran Mansley @ 2007-10-31 14:03 UTC (permalink / raw)
To: Keir Fraser, xen-devel
On Wed, 2007-10-31 at 13:32 +0000, Keir Fraser wrote:
>
>
> On 31/10/07 12:19, "Kieran Mansley" <kmansley@solarflare.com> wrote:
>
> > However there's no equivalent test for a non-HVM domain. I think adding
> > such a test to just ignore any MMIO pages would help. The MMIO pages
> > will be unmapped when the live migration finally suspends the domain
> > (and so the migration should be possible), but accessing them in the
> > mean time is probably not a good idea. Non-live migration is fine
> > because the domain (and so all its devices) are suspended in advance,
> > and the MMIO pages are unmapped.
>
> Is mapping the page as cacheable and then reading from it not supported by
> the hardware device? E.g., does it need to be uncacheable, or do reads have
> side effects?
The page is mapped with ioremap_nocache, and needs to be uncacheable.
Reads shouldn't have side effects (although shouldn't and don't aren't
always synonymous). More generally I can think of devices where reads
do have side effects though.
The symptom of failure is that once live migration has started, trying
to write to an IO page results in it getting stuck in (or a perpetual
loop into) page_fault(). This only happens very occasionally. I've
interpreted it getting stuck in page_fault() as a result of the shadow
paging which (as I understand it) marks the normal page table entries as
read only so that writes to pages trap into the hypervisor and it can
update its dirty set.
So the live migration page reading doesn't quite explain the problem
(unless the read from hardware does have side effects) but as there was
a check there for HVM MMIO pages it seemed to me that adding something
equivalent for PV would be a good idea, and if that didn't fix the
problem I could try looking at the shadow page tables and how they might
interact badly.
Kieran
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-10-31 14:03 ` Kieran Mansley
@ 2007-10-31 14:08 ` Keir Fraser
2007-10-31 16:34 ` Kieran Mansley
0 siblings, 1 reply; 16+ messages in thread
From: Keir Fraser @ 2007-10-31 14:08 UTC (permalink / raw)
To: Kieran Mansley, xen-devel
On 31/10/07 14:03, "Kieran Mansley" <kmansley@solarflare.com> wrote:
> The symptom of failure is that once live migration has started, trying
> to write to an IO page results in it getting stuck in (or a perpetual
> loop into) page_fault(). This only happens very occasionally. I've
> interpreted it getting stuck in page_fault() as a result of the shadow
> paging which (as I understand it) marks the normal page table entries as
> read only so that writes to pages trap into the hypervisor and it can
> update its dirty set.
Yes, but then it should mark the page writable again, so that the access can
be re-executed without faulting! So this rather points at some problem with
the live-migration shadow mode w.r.t. mmio pages. You're going to have to
fix this regardless of whether you prevent xc_domain_save from mapping mmio
pages (which isn't itself a bad idea, for the reasons you note). This is
because, regardless of whether xc_domain_save ignores mmio pages, that isn't
going to change the shadow code's behaviour, and it's almost certainly that
which is causing the infinite fault loop.
-- Keir
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-10-31 14:08 ` Keir Fraser
@ 2007-10-31 16:34 ` Kieran Mansley
2007-10-31 16:44 ` Keir Fraser
0 siblings, 1 reply; 16+ messages in thread
From: Kieran Mansley @ 2007-10-31 16:34 UTC (permalink / raw)
To: Keir Fraser, xen-devel
On Wed, 2007-10-31 at 14:08 +0000, Keir Fraser wrote:
> On 31/10/07 14:03, "Kieran Mansley" <kmansley@solarflare.com> wrote:
>
> > The symptom of failure is that once live migration has started, trying
> > to write to an IO page results in it getting stuck in (or a perpetual
> > loop into) page_fault(). This only happens very occasionally. I've
> > interpreted it getting stuck in page_fault() as a result of the shadow
> > paging which (as I understand it) marks the normal page table entries as
> > read only so that writes to pages trap into the hypervisor and it can
> > update its dirty set.
>
> Yes, but then it should mark the page writable again, so that the access can
> be re-executed without faulting! So this rather points at some problem with
> the live-migration shadow mode w.r.t. mmio pages.
Yes. The reason it's failing is that sh_page_fault() in
xen/arch/x86/mm/shadow/multi.c thinks it's a bad gfn:
if ( !p2m_is_valid(p2mt) || (!(p2m_is_mmio(p2mt) || mfn_valid
(gmfn))) )
{
perfc_incr(shadow_fault_bail_bad_gfn);
SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n",
gfn_x(gfn), mfn_x(gmfn));
goto not_a_shadow_fault;
}
I think the problem is that set_mmio_p2m_entry() isn't getting called
when the IO mapping is established. There are three places where
iomem_permit_access() is called:
- XEN_DOMCTL_memory_mapping: (in xen/arch/x86/domctl.c)
- XEN_DOMCTL_iomem_permission: (in xen/common/domctl.c)
- __gnttab_map_grant_ref(): (in xen/common/grant_table.c)
The last one was written by me based on the second one, and neither of
these call set_mmio_p2m_entry(). I find this a bit suspicious, because
the first one does, and it looks necessary to me in all three cases.
There are however very few users of either of those domctl operations,
and so it's hard to tell what the difference is supposed to be, and so
why XEN_DOMCTL_iomem_permission doesn't call set_mmio_p2m_entry().
Adding calls to set_mmio_p2m_entry() in either of the cases that don't
have it might be a bit tricky too as I'm not sure a gfn exists for that
mfn at the point that they are called.
Kieran
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-10-31 16:34 ` Kieran Mansley
@ 2007-10-31 16:44 ` Keir Fraser
2007-10-31 17:14 ` Kieran Mansley
2007-10-31 17:31 ` Tim Deegan
0 siblings, 2 replies; 16+ messages in thread
From: Keir Fraser @ 2007-10-31 16:44 UTC (permalink / raw)
To: Kieran Mansley, xen-devel; +Cc: Tim Deegan
On 31/10/07 16:34, "Kieran Mansley" <kmansley@solarflare.com> wrote:
> Yes. The reason it's failing is that sh_page_fault() in
> xen/arch/x86/mm/shadow/multi.c thinks it's a bad gfn:
>
> if ( !p2m_is_valid(p2mt) || (!(p2m_is_mmio(p2mt) || mfn_valid
> (gmfn))) )
> {
> perfc_incr(shadow_fault_bail_bad_gfn);
> SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n",
> gfn_x(gfn), mfn_x(gmfn));
> goto not_a_shadow_fault;
> }
>
> I think the problem is that set_mmio_p2m_entry() isn't getting called
> when the IO mapping is established. There are three places where
> iomem_permit_access() is called:
No, basically that pagefault-handler check is nonsense for a PV guest. We
don't have a p2m table in Xen for PV guests because they are not 'translated
mode'. So there is nowhere for us to store the 'mmio' p2m type.
Perhaps Tim has a good idea what to do here. Adding a
!shadow_mode_translate() condition to the if statement would probably work
but I'm not sure it's the neatest answer.
-- Keir
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-10-31 16:44 ` Keir Fraser
@ 2007-10-31 17:14 ` Kieran Mansley
2007-10-31 17:31 ` Tim Deegan
1 sibling, 0 replies; 16+ messages in thread
From: Kieran Mansley @ 2007-10-31 17:14 UTC (permalink / raw)
To: Keir Fraser; +Cc: Tim Deegan, xen-devel
On Wed, 2007-10-31 at 16:44 +0000, Keir Fraser wrote:
> On 31/10/07 16:34, "Kieran Mansley" <kmansley@solarflare.com> wrote:
>
> > Yes. The reason it's failing is that sh_page_fault() in
> > xen/arch/x86/mm/shadow/multi.c thinks it's a bad gfn:
> >
> > if ( !p2m_is_valid(p2mt) || (!(p2m_is_mmio(p2mt) || mfn_valid
> > (gmfn))) )
> > {
> > perfc_incr(shadow_fault_bail_bad_gfn);
> > SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n",
> > gfn_x(gfn), mfn_x(gmfn));
> > goto not_a_shadow_fault;
> > }
> >
> > I think the problem is that set_mmio_p2m_entry() isn't getting called
> > when the IO mapping is established. There are three places where
> > iomem_permit_access() is called:
>
> No, basically that pagefault-handler check is nonsense for a PV guest. We
> don't have a p2m table in Xen for PV guests because they are not 'translated
> mode'. So there is nowhere for us to store the 'mmio' p2m type.
>
> Perhaps Tim has a good idea what to do here. Adding a
> !shadow_mode_translate() condition to the if statement would probably work
> but I'm not sure it's the neatest answer.
Avoiding that if statement isn't sufficient to fix the problem - it
still doesn't seem to correctly re-enable writing to the page, even
though xenperf tells me that "calls to shadow fault" == "shadow_fault
fixed fault" (i.e. the code now thinks it's successfully done what it
should).
Kieran
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-10-31 16:44 ` Keir Fraser
2007-10-31 17:14 ` Kieran Mansley
@ 2007-10-31 17:31 ` Tim Deegan
2007-10-31 18:23 ` Keir Fraser
2007-11-01 11:18 ` Kieran Mansley
1 sibling, 2 replies; 16+ messages in thread
From: Tim Deegan @ 2007-10-31 17:31 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel
At 16:44 +0000 on 31 Oct (1193849054), Keir Fraser wrote:
> On 31/10/07 16:34, "Kieran Mansley" <kmansley@solarflare.com> wrote:
> > if ( !p2m_is_valid(p2mt) || (!(p2m_is_mmio(p2mt) || mfn_valid
> > (gmfn))) )
> > {
> > perfc_incr(shadow_fault_bail_bad_gfn);
> > SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n",
> > gfn_x(gfn), mfn_x(gmfn));
> > goto not_a_shadow_fault;
> > }
> No, basically that pagefault-handler check is nonsense for a PV guest. We
> don't have a p2m table in Xen for PV guests because they are not 'translated
> mode'. So there is nowhere for us to store the 'mmio' p2m type.
Hmm.
The p2m_is_valid() check is OK because that always passes for PV guests.
The check for mfn_valid() is failing because MMIO frames aren't "valid".
For HVM guests we let that pass if the GFN is known to be acceptable
MMIO (by its p2m type). There needs to be an equivalent get-out here
for PV guests -- iomem_access_permitted(d, mfn_x(gmfn), mfn_x(gmfn))
should do.
You'll need to add the same test in _sh_propagate(), where it checks again
that the target MFN is sane:
if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) )
{
ASSERT((ft == ft_prefetch));
*sp = shadow_l1e_empty();
goto done;
}
Cheers,
Tim.
--
Tim Deegan <Tim.Deegan@eu.citrix.com>
Principal Software Engineer, Citrix Systems.
[Company #5334508: XenSource UK Ltd, c/o EC2Y 5EB, UK.]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-10-31 17:31 ` Tim Deegan
@ 2007-10-31 18:23 ` Keir Fraser
2007-11-01 11:18 ` Kieran Mansley
1 sibling, 0 replies; 16+ messages in thread
From: Keir Fraser @ 2007-10-31 18:23 UTC (permalink / raw)
To: Tim Deegan; +Cc: xen-devel
On 31/10/07 17:31, "Tim Deegan" <Tim.Deegan@eu.citrix.com> wrote:
> For HVM guests we let that pass if the GFN is known to be acceptable
> MMIO (by its p2m type). There needs to be an equivalent get-out here
> for PV guests -- iomem_access_permitted(d, mfn_x(gmfn), mfn_x(gmfn))
> should do.
We shouldn't really need to do a iomem_access_permitted() check here. If
this is a PV guest then the access check was made when the guest pagetable
was validated. We can just circumvent these checks entirely for PV guests.
-- Keir
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-10-31 17:31 ` Tim Deegan
2007-10-31 18:23 ` Keir Fraser
@ 2007-11-01 11:18 ` Kieran Mansley
2007-11-01 11:34 ` Tim Deegan
1 sibling, 1 reply; 16+ messages in thread
From: Kieran Mansley @ 2007-11-01 11:18 UTC (permalink / raw)
To: Tim Deegan; +Cc: xen-devel
On Wed, 2007-10-31 at 17:31 +0000, Tim Deegan wrote:
> Hmm.
>
> The p2m_is_valid() check is OK because that always passes for PV guests.
> The check for mfn_valid() is failing because MMIO frames aren't "valid".
>
> For HVM guests we let that pass if the GFN is known to be acceptable
> MMIO (by its p2m type). There needs to be an equivalent get-out here
> for PV guests -- iomem_access_permitted(d, mfn_x(gmfn), mfn_x(gmfn))
> should do.
>
> You'll need to add the same test in _sh_propagate(), where it checks again
> that the target MFN is sane:
>
> if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) )
> {
> ASSERT((ft == ft_prefetch));
> *sp = shadow_l1e_empty();
> goto done;
> }
I'm a bit concerned about this one in _sh_propogate(). It seems it's
checking that the mfn is valid for a good reason, and so letting it
carry on in the case of MMIO frames will lead to a rather ungraceful
failure. If I modify the above if to be:
if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct)
&& !iomem_access_permitted(d, target_mfn, target_mfn))
so that PV iomem frames can continue rather than giving up at this
point, it gets a fatal page fault (see below).
I wonder if the comment above that if has the answer:
/* N.B. For pass-through MMIO, either this test needs to be relaxed,
* and shadow_set_l1e() trained to handle non-valid MFNs (ugh), or
the
* MMIO areas need to be added to the frame-table to make them
"valid". */
i.e. relaxing the test is not sufficient.
(XEN) ----[ Xen-3.2-unstable x86_32p debug=n Not tainted ]----
(XEN) CPU: 3
(XEN) EIP: e008:[<ff179ed2>] l1e_propagate_from_guest+0x2e2/0x320
(XEN) EFLAGS: 00010246 CONTEXT: hypervisor
(XEN) eax: 00101000 ebx: 00000040 ecx: ff1a7080 edx: ff1a7080
(XEN) esi: 00000063 edi: 000d8fa0 ebp: ff1aa080 esp: ffbf7da0
(XEN) cr0: 8005003b cr4: 000006f0 cr3: 001aaf40 cr2: fcb63e80
(XEN) ds: e010 es: e010 fs: 0000 gs: 0000 ss: e010 cs: e008
(XEN) Xen stack trace from esp=ffbf7da0:
(XEN) ff1a7c00 000d8fa0 000d8fa0 ff1a7080 00000000 00000000 ff1a7080
ff1a7080(XEN) 000d8fa0 137b4001 fe688260 ff17d0b4 000d8fa0 ffbf7ef8
00000001 00000001(XEN) 00000006 ff1a79dc 00dde090 fe7f3008 00000020
00000020 3eacc063 ff1be1e0(XEN) ff1a7080 000160b3 0001379a 3e3fc063
00000002 ff1a79dc 00000000 fe7f3440(XEN) 00000020 00000003 160b3063
00000000 ffbf7ec0 fe688268 ff13b7e2 ff1aa080(XEN) c1212080 ffbf7ec0
00000000 ff1a7080 ff1aafa0 ff1ae000 00000000 ff1be1e0(XEN) 00000004
ffbf7fb4 ff182b95 00000073 ffbf7e84 ffbf7e80 fe800034 00000005(XEN)
ff1ce900 ff1ce900 0000000f ff1ce900 00000005 ff115da7 ff1ce900 00000005
(XEN) ff1ce900 ff1ce900 ff1aa0a4 ff1ce900 ff1c1104 000f5da7 0000000f
ff1c1104(XEN) 000f5da7 00000003 ff1aa0a4 ff1be100 d104955c ff1aaf78
fee09440 fee0d248(XEN) 160b3063 00000000 0003fa34 0003e8e7 0000000f
0000007b ffffffff ffffffff(XEN) d8fa0073 00000000 160b3061 00000000
00000001 00000001 00000000 ff1aa080(XEN) ff1a7080 ffbf7fb4 d104955c
ff13b7e2 ff1aa080 d104955c ffbf7fb4 00000001(XEN) 274038a1 68465ba4
00000084 00000000 ff1be1e0 ff10dec4 00000068 cfe89ff8(XEN) 00000000
cfe89ec0 c01186cf ff1c1024 cfe89e9c 00000020 ffbfcf0c ffbf7fb4(XEN)
ff1c1020 00000001 ff1aa080 0000007b 0000007b cfe89fb4 ff181f83 ff1aa080
(XEN) 0000007b 0000007b cfe89f04 ff182292 ffbf7fb4 cfe90bdc cfe89f0c
00e89f60(XEN) d1049540 cfe90bc0 cfe89f04 cff80000 000e0000 c026724b
00000061 00010286(XEN) cfe89ee4 00000069 0000007b 0000007b 00000000
00000000 00000003 ff1aa080(XEN) Xen call trace:
(XEN) [<ff179ed2>] l1e_propagate_from_guest+0x2e2/0x320
(XEN) [<ff1a7c00>] boot_edid_info+0x61/0x80
(XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c
(XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c
(XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c
(XEN) [<ff17d0b4>] sh_page_fault__shadow_3_guest_3+0x4b4/0xf80
(XEN) [<ff1a79dc>] store_edid+0x5a/0x61
(XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c
(XEN) [<ff1a79dc>] store_edid+0x5a/0x61
(XEN) [<ff13b7e2>] do_page_fault+0x122/0x630
(XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0
(XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c
(XEN) [<ff1aafa0>] acpi_parse_memory_affinity+0x20/0x30
(XEN) [<ff1ae000>] init_e820+0x80/0x6b0
(XEN) [<ff182b95>] linearise_address+0x25/0x50
(XEN) [<ff115da7>] add_entry+0x57/0x140
(XEN) [<ff1aa0a4>] ns16550_init+0x114/0x2b0
(XEN) [<ff1aa0a4>] ns16550_init+0x114/0x2b0
(XEN) [<ff1aaf78>] acpi_table_print_srat_entry+0x28/0x30
(XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0
(XEN) [<ff1a7080>] edd_check_ext+0x1b/0x1c
(XEN) [<ff13b7e2>] do_page_fault+0x122/0x630
(XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0
(XEN) [<ff10dec4>] do_multicall+0x124/0x350
(XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0
(XEN) [<ff181f83>] tracing_off+0x7/0xe
(XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0
(XEN) [<ff182292>] handle_exception+0x82/0xb6
(XEN) [<ff1aa080>] ns16550_init+0xf0/0x2b0
(XEN)
(XEN) Pagetable walk from fcb63e80:
(XEN) L3[0x003] = 00000000137b4001 55555555
(XEN) L2[0x1e5] = 0000000000000000 ffffffff
(XEN)
(XEN) ****************************************
(XEN) Panic on CPU 3:
(XEN) FATAL PAGE FAULT
(XEN) [error_code=0000]
(XEN) Faulting linear address: fcb63e80
(XEN) ****************************************
(XEN)
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-11-01 11:18 ` Kieran Mansley
@ 2007-11-01 11:34 ` Tim Deegan
2007-11-01 16:36 ` Kieran Mansley
0 siblings, 1 reply; 16+ messages in thread
From: Tim Deegan @ 2007-11-01 11:34 UTC (permalink / raw)
To: Kieran Mansley; +Cc: xen-devel
At 11:18 +0000 on 01 Nov (1193915902), Kieran Mansley wrote:
> I'm a bit concerned about this one in _sh_propogate(). It seems it's
> checking that the mfn is valid for a good reason, and so letting it
> carry on in the case of MMIO frames will lead to a rather ungraceful
> failure. If I modify the above if to be:
>
> if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct)
> && !iomem_access_permitted(d, target_mfn, target_mfn))
>
> so that PV iomem frames can continue rather than giving up at this
> point, it gets a fatal page fault (see below).
Oh dear. Yes, you'll need to protect all the stuff further down that
tries to mark the frame dirty. Probably making sh_mfn_is_dirty test for
!mfn_valid(mfn) and return 0 will be enough.
> I wonder if the comment above that if has the answer:
>
> /* N.B. For pass-through MMIO, either this test needs to be relaxed,
> * and shadow_set_l1e() trained to handle non-valid MFNs (ugh), or
> the
> * MMIO areas need to be added to the frame-table to make them
> "valid". */
That comment is gone in -unstable, since the IOMMU pass-through to HVM
domains went in. Passing through the MMIO mappings should work because
get_page_from_l1e will do the right thing later.
Another thing to worry about in _sh_propagate is that the shadows don't
retain any of the caching attribute bits from the guest entries. You
might want to add _PAGE_PCD and _PAGE_PWT to "pass_thru_flags" for PV
guests (and for l1es, _PAGE_PAT, I suppose, if PV guests are allowed to
use PAT).
Tim.
--
Tim Deegan <Tim.Deegan@eu.citrix.com>
Principal Software Engineer, Citrix Systems.
[Company #5334508: XenSource UK Ltd, reg'd c/o EC2Y 5EB, UK.]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-11-01 11:34 ` Tim Deegan
@ 2007-11-01 16:36 ` Kieran Mansley
2007-11-01 16:43 ` Tim Deegan
0 siblings, 1 reply; 16+ messages in thread
From: Kieran Mansley @ 2007-11-01 16:36 UTC (permalink / raw)
To: Tim Deegan; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 4581 bytes --]
On Thu, 2007-11-01 at 11:34 +0000, Tim Deegan wrote:
> At 11:18 +0000 on 01 Nov (1193915902), Kieran Mansley wrote:
> > I'm a bit concerned about this one in _sh_propogate(). It seems it's
> > checking that the mfn is valid for a good reason, and so letting it
> > carry on in the case of MMIO frames will lead to a rather ungraceful
> > failure. If I modify the above if to be:
> >
> > if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct)
> > && !iomem_access_permitted(d, target_mfn, target_mfn))
> >
> > so that PV iomem frames can continue rather than giving up at this
> > point, it gets a fatal page fault (see below).
>
> Oh dear. Yes, you'll need to protect all the stuff further down that
> tries to mark the frame dirty. Probably making sh_mfn_is_dirty test for
> !mfn_valid(mfn) and return 0 will be enough.
See attached patch.
I opted instead to surround the section that calls sh_mfn_is_dirty()
with
if ( mfn_valid(target_mfn) ||
!iomem_access_permitted(d, target_mfn, target_mfn) )
because if sh_mfn_is_dirty() returned zero on a bad mfn, it would remove
the _PAGE_RW flag.
I'm not 100% sure this patch is complete or correct or doesn't have side
effects, but it works for me. I'd appreciate it if you could give it a
once over to make sure it makes sense. If it does, and isn't too big a
risk for 3.2.0, applying it to that would be great.
One question in my mind is whether the tests of iomem_access_permitted()
in _sh_propogate() would be better replaced with the more general
shadow_mode_translate(). The former seemed less risky to me as
otherwise any invalid mfn could drop through when !shadow_mode_translate
(), and this test was originally designed to filter them out.
Signed-off-by Kieran Mansley <kmansley@solarflare.com>
Thanks
Kieran
fix iomem frame shadow propogation to allow live migration
diff -r b28fa67d3940 xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c Thu Nov 01 11:52:34 2007 +0000
+++ b/xen/arch/x86/mm/shadow/multi.c Thu Nov 01 16:02:36 2007 +0000
@@ -28,6 +28,7 @@
#include <xen/sched.h>
#include <xen/perfc.h>
#include <xen/domain_page.h>
+#include <xen/iocap.h>
#include <asm/page.h>
#include <asm/current.h>
#include <asm/shadow.h>
@@ -696,7 +697,8 @@ _sh_propagate(struct vcpu *v,
/* N.B. For pass-through MMIO, either this test needs to be
relaxed,
* and shadow_set_l1e() trained to handle non-valid MFNs (ugh), or
the
* MMIO areas need to be added to the frame-table to make them
"valid". */
- if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) )
+ if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) &&
+ !iomem_access_permitted(d, target_mfn, target_mfn) )
{
ASSERT((ft == ft_prefetch));
*sp = shadow_l1e_empty();
@@ -709,9 +711,12 @@ _sh_propagate(struct vcpu *v,
// SHADOW_PRESENT bit.
//
pass_thru_flags = (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER |
- _PAGE_RW | _PAGE_PRESENT);
+ _PAGE_RW | _PAGE_PRESENT );
if ( guest_supports_nx(v) )
pass_thru_flags |= _PAGE_NX_BIT;
+ if ( !mfn_valid(target_mfn) &&
+ iomem_access_permitted(d, target_mfn, target_mfn) )
+ pass_thru_flags |= _PAGE_PCD | _PAGE_PWT;
sflags = gflags & pass_thru_flags;
/* Only change memory caching type for pass-through domain */
@@ -758,10 +763,13 @@ _sh_propagate(struct vcpu *v,
// p2m_ram_logdirty p2m type: only HAP uses that.)
if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) )
{
- if ( ft & FETCH_TYPE_WRITE )
- paging_mark_dirty(d, mfn_x(target_mfn));
- else if ( !sh_mfn_is_dirty(d, target_mfn) )
- sflags &= ~_PAGE_RW;
+ if ( mfn_valid(target_mfn) ||
+ !iomem_access_permitted(d, target_mfn, target_mfn) ) {
+ if ( ft & FETCH_TYPE_WRITE )
+ paging_mark_dirty(d, mfn_x(target_mfn));
+ else if ( !sh_mfn_is_dirty(d, target_mfn) )
+ sflags &= ~_PAGE_RW;
+ }
}
/* Read-only memory */
@@ -2836,7 +2844,8 @@ static int sh_page_fault(struct vcpu *v,
gfn = guest_l1e_get_gfn(gw.eff_l1e);
gmfn = gfn_to_mfn(d, gfn, &p2mt);
- if ( !p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid
(gmfn)) )
+ if ( !shadow_mode_translate(d) &&
+ (!p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid
(gmfn))) )
{
perfc_incr(shadow_fault_bail_bad_gfn);
SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n",
[-- Attachment #2: iomem_shadow --]
[-- Type: text/plain, Size: 2758 bytes --]
fix iomem frame shadow propogation to allow live migration
diff -r b28fa67d3940 xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c Thu Nov 01 11:52:34 2007 +0000
+++ b/xen/arch/x86/mm/shadow/multi.c Thu Nov 01 16:02:36 2007 +0000
@@ -28,6 +28,7 @@
#include <xen/sched.h>
#include <xen/perfc.h>
#include <xen/domain_page.h>
+#include <xen/iocap.h>
#include <asm/page.h>
#include <asm/current.h>
#include <asm/shadow.h>
@@ -696,7 +697,8 @@ _sh_propagate(struct vcpu *v,
/* N.B. For pass-through MMIO, either this test needs to be relaxed,
* and shadow_set_l1e() trained to handle non-valid MFNs (ugh), or the
* MMIO areas need to be added to the frame-table to make them "valid". */
- if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) )
+ if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) &&
+ !iomem_access_permitted(d, target_mfn, target_mfn) )
{
ASSERT((ft == ft_prefetch));
*sp = shadow_l1e_empty();
@@ -709,9 +711,12 @@ _sh_propagate(struct vcpu *v,
// SHADOW_PRESENT bit.
//
pass_thru_flags = (_PAGE_DIRTY | _PAGE_ACCESSED | _PAGE_USER |
- _PAGE_RW | _PAGE_PRESENT);
+ _PAGE_RW | _PAGE_PRESENT );
if ( guest_supports_nx(v) )
pass_thru_flags |= _PAGE_NX_BIT;
+ if ( !mfn_valid(target_mfn) &&
+ iomem_access_permitted(d, target_mfn, target_mfn) )
+ pass_thru_flags |= _PAGE_PCD | _PAGE_PWT;
sflags = gflags & pass_thru_flags;
/* Only change memory caching type for pass-through domain */
@@ -758,10 +763,13 @@ _sh_propagate(struct vcpu *v,
// p2m_ram_logdirty p2m type: only HAP uses that.)
if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) )
{
- if ( ft & FETCH_TYPE_WRITE )
- paging_mark_dirty(d, mfn_x(target_mfn));
- else if ( !sh_mfn_is_dirty(d, target_mfn) )
- sflags &= ~_PAGE_RW;
+ if ( mfn_valid(target_mfn) ||
+ !iomem_access_permitted(d, target_mfn, target_mfn) ) {
+ if ( ft & FETCH_TYPE_WRITE )
+ paging_mark_dirty(d, mfn_x(target_mfn));
+ else if ( !sh_mfn_is_dirty(d, target_mfn) )
+ sflags &= ~_PAGE_RW;
+ }
}
/* Read-only memory */
@@ -2836,7 +2844,8 @@ static int sh_page_fault(struct vcpu *v,
gfn = guest_l1e_get_gfn(gw.eff_l1e);
gmfn = gfn_to_mfn(d, gfn, &p2mt);
- if ( !p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid(gmfn)) )
+ if ( !shadow_mode_translate(d) &&
+ (!p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid(gmfn))) )
{
perfc_incr(shadow_fault_bail_bad_gfn);
SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n",
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-11-01 16:36 ` Kieran Mansley
@ 2007-11-01 16:43 ` Tim Deegan
2007-11-02 9:31 ` Kieran Mansley
0 siblings, 1 reply; 16+ messages in thread
From: Tim Deegan @ 2007-11-01 16:43 UTC (permalink / raw)
To: Kieran Mansley; +Cc: xen-devel
At 16:36 +0000 on 01 Nov (1193934965), Kieran Mansley wrote:
> One question in my mind is whether the tests of iomem_access_permitted()
> in _sh_propogate() would be better replaced with the more general
> shadow_mode_translate(). The former seemed less risky to me as
You could do that. I think the correct test is shadow_mode_refcounts();
as Keir pointed out, guests whose refcounting isn't done by the shadow
code will have had their pagetables validated bby the PV MMU interface.
The test around the log-dirty code can just be on mfn_valid().
Cheers,
Tim.
--
Tim Deegan <Tim.Deegan@eu.citrix.com>
Principal Software Engineer, Citrix Systems.
[Company #5334508: XenSource UK Ltd, reg'd c/o EC2Y 5EB, UK.]
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-11-01 16:43 ` Tim Deegan
@ 2007-11-02 9:31 ` Kieran Mansley
2007-11-02 10:42 ` Keir Fraser
0 siblings, 1 reply; 16+ messages in thread
From: Kieran Mansley @ 2007-11-02 9:31 UTC (permalink / raw)
To: Tim Deegan; +Cc: xen-devel
[-- Attachment #1: Type: text/plain, Size: 3096 bytes --]
On Thu, 2007-11-01 at 16:43 +0000, Tim Deegan wrote:
> At 16:36 +0000 on 01 Nov (1193934965), Kieran Mansley wrote:
> > One question in my mind is whether the tests of iomem_access_permitted()
> > in _sh_propogate() would be better replaced with the more general
> > shadow_mode_translate(). The former seemed less risky to me as
>
> You could do that. I think the correct test is shadow_mode_refcounts();
> as Keir pointed out, guests whose refcounting isn't done by the shadow
> code will have had their pagetables validated bby the PV MMU interface.
>
> The test around the log-dirty code can just be on mfn_valid().
Attached is a new patch with those changes. Seems much more sensible to
me that way too, and it works just as well.
Signed-off-by: Kieran Mansley <kmansley@solarflare.com>
Thanks
Kieran
fix iomem frame shadow propogation to allow live migration
diff -r b28fa67d3940 xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c Thu Nov 01 11:52:34 2007 +0000
+++ b/xen/arch/x86/mm/shadow/multi.c Fri Nov 02 09:27:43 2007 +0000
@@ -696,7 +696,8 @@ _sh_propagate(struct vcpu *v,
/* N.B. For pass-through MMIO, either this test needs to be relaxed,
* and shadow_set_l1e() trained to handle non-valid MFNs (ugh), or the
* MMIO areas need to be added to the frame-table to make them "valid". */
- if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) )
+ if ( shadow_mode_refcounts(d) &&
+ !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) )
{
ASSERT((ft == ft_prefetch));
*sp = shadow_l1e_empty();
@@ -712,6 +713,8 @@ _sh_propagate(struct vcpu *v,
_PAGE_RW | _PAGE_PRESENT);
if ( guest_supports_nx(v) )
pass_thru_flags |= _PAGE_NX_BIT;
+ if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+ pass_thru_flags |= _PAGE_PCD | _PAGE_PWT;
sflags = gflags & pass_thru_flags;
/* Only change memory caching type for pass-through domain */
@@ -758,10 +761,12 @@ _sh_propagate(struct vcpu *v,
// p2m_ram_logdirty p2m type: only HAP uses that.)
if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) )
{
- if ( ft & FETCH_TYPE_WRITE )
- paging_mark_dirty(d, mfn_x(target_mfn));
- else if ( !sh_mfn_is_dirty(d, target_mfn) )
- sflags &= ~_PAGE_RW;
+ if ( mfn_valid(target_mfn) ) {
+ if ( ft & FETCH_TYPE_WRITE )
+ paging_mark_dirty(d, mfn_x(target_mfn));
+ else if ( !sh_mfn_is_dirty(d, target_mfn) )
+ sflags &= ~_PAGE_RW;
+ }
}
/* Read-only memory */
@@ -2836,7 +2841,8 @@ static int sh_page_fault(struct vcpu *v,
gfn = guest_l1e_get_gfn(gw.eff_l1e);
gmfn = gfn_to_mfn(d, gfn, &p2mt);
- if ( !p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid(gmfn)) )
+ if ( !shadow_mode_translate(d) &&
+ (!p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid(gmfn))) )
{
perfc_incr(shadow_fault_bail_bad_gfn);
SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n",
[-- Attachment #2: iomem_shadow --]
[-- Type: text/plain, Size: 2279 bytes --]
fix iomem frame shadow propogation to allow live migration
diff -r b28fa67d3940 xen/arch/x86/mm/shadow/multi.c
--- a/xen/arch/x86/mm/shadow/multi.c Thu Nov 01 11:52:34 2007 +0000
+++ b/xen/arch/x86/mm/shadow/multi.c Fri Nov 02 09:27:43 2007 +0000
@@ -696,7 +696,8 @@ _sh_propagate(struct vcpu *v,
/* N.B. For pass-through MMIO, either this test needs to be relaxed,
* and shadow_set_l1e() trained to handle non-valid MFNs (ugh), or the
* MMIO areas need to be added to the frame-table to make them "valid". */
- if ( !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) )
+ if ( shadow_mode_refcounts(d) &&
+ !mfn_valid(target_mfn) && (p2mt != p2m_mmio_direct) )
{
ASSERT((ft == ft_prefetch));
*sp = shadow_l1e_empty();
@@ -712,6 +713,8 @@ _sh_propagate(struct vcpu *v,
_PAGE_RW | _PAGE_PRESENT);
if ( guest_supports_nx(v) )
pass_thru_flags |= _PAGE_NX_BIT;
+ if ( !shadow_mode_refcounts(d) && !mfn_valid(target_mfn) )
+ pass_thru_flags |= _PAGE_PCD | _PAGE_PWT;
sflags = gflags & pass_thru_flags;
/* Only change memory caching type for pass-through domain */
@@ -758,10 +761,12 @@ _sh_propagate(struct vcpu *v,
// p2m_ram_logdirty p2m type: only HAP uses that.)
if ( unlikely((level == 1) && shadow_mode_log_dirty(d)) )
{
- if ( ft & FETCH_TYPE_WRITE )
- paging_mark_dirty(d, mfn_x(target_mfn));
- else if ( !sh_mfn_is_dirty(d, target_mfn) )
- sflags &= ~_PAGE_RW;
+ if ( mfn_valid(target_mfn) ) {
+ if ( ft & FETCH_TYPE_WRITE )
+ paging_mark_dirty(d, mfn_x(target_mfn));
+ else if ( !sh_mfn_is_dirty(d, target_mfn) )
+ sflags &= ~_PAGE_RW;
+ }
}
/* Read-only memory */
@@ -2836,7 +2841,8 @@ static int sh_page_fault(struct vcpu *v,
gfn = guest_l1e_get_gfn(gw.eff_l1e);
gmfn = gfn_to_mfn(d, gfn, &p2mt);
- if ( !p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid(gmfn)) )
+ if ( !shadow_mode_translate(d) &&
+ (!p2m_is_valid(p2mt) || (!p2m_is_mmio(p2mt) && !mfn_valid(gmfn))) )
{
perfc_incr(shadow_fault_bail_bad_gfn);
SHADOW_PRINTK("BAD gfn=%"SH_PRI_gfn" gmfn=%"PRI_mfn"\n",
[-- Attachment #3: Type: text/plain, Size: 138 bytes --]
_______________________________________________
Xen-devel mailing list
Xen-devel@lists.xensource.com
http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-11-02 9:31 ` Kieran Mansley
@ 2007-11-02 10:42 ` Keir Fraser
2007-11-02 10:51 ` Keir Fraser
0 siblings, 1 reply; 16+ messages in thread
From: Keir Fraser @ 2007-11-02 10:42 UTC (permalink / raw)
To: Kieran Mansley, Tim Deegan; +Cc: xen-devel
On 2/11/07 09:31, "Kieran Mansley" <kmansley@solarflare.com> wrote:
>> You could do that. I think the correct test is shadow_mode_refcounts();
>> as Keir pointed out, guests whose refcounting isn't done by the shadow
>> code will have had their pagetables validated bby the PV MMU interface.
>>
>> The test around the log-dirty code can just be on mfn_valid().
>
> Attached is a new patch with those changes. Seems much more sensible to
> me that way too, and it works just as well.
I changed the final chunk test from !shadow_mode_translate() to
shadow_mode_refcounts(). I think refcounts is a better test than translate
here, and also I think your test was the wrong way round!
I also note that guarding the mark-dirty-or-zap-writable-bit with
mfn_valid() is not really correct. mfn_valid() only checks whether the mfn <
max_page. I bet this would not work if you migrate on a machine with 4GB of
RAM, as the MMIO hole will be below max_page. Really mfn_valid needs to
handle such MMIO holes, or the shadow code needs to be using a test other
than mfn_valid in many places (e.g., the function iomem_page_test() that you
added before).
-- Keir
^ permalink raw reply [flat|nested] 16+ messages in thread
* Re: Live migration with MMIO pages
2007-11-02 10:42 ` Keir Fraser
@ 2007-11-02 10:51 ` Keir Fraser
0 siblings, 0 replies; 16+ messages in thread
From: Keir Fraser @ 2007-11-02 10:51 UTC (permalink / raw)
To: Keir Fraser, Kieran Mansley, Tim Deegan; +Cc: xen-devel
On 2/11/07 10:42, "Keir Fraser" <keir@xensource.com> wrote:
> I also note that guarding the mark-dirty-or-zap-writable-bit with
> mfn_valid() is not really correct. mfn_valid() only checks whether the mfn <
> max_page. I bet this would not work if you migrate on a machine with 4GB of
> RAM, as the MMIO hole will be below max_page. Really mfn_valid needs to
> handle such MMIO holes, or the shadow code needs to be using a test other
> than mfn_valid in many places (e.g., the function iomem_page_test() that you
> added before).
Actually Tim reckons that use of mfn_valid() is okay because although you
will lose the _PAGE_RW bit on your mmio mapping temporarily, if the mmio mfn
is less than max_page, you've fixed up the fault path now so that you should
get the _PAGE_RW bit back again when the guest attempts a write access.
However, we think that the !mfn_valid() test that gates adding
_PAGE_PAT|PAGE_PCD|_PAGE_PWT to the passthru flags should go away. We'll
already have validated those flags even for ordinary RAM mappings for a PV
guest, and there are cases where cache attributes have to be different for
RAM pages. So probably the test should unconditionally pass through those
flags if the domain is !shadow_mode_refcounts.
-- Keir
^ permalink raw reply [flat|nested] 16+ messages in thread
end of thread, other threads:[~2007-11-02 10:51 UTC | newest]
Thread overview: 16+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-10-31 12:19 Live migration with MMIO pages Kieran Mansley
2007-10-31 13:32 ` Keir Fraser
2007-10-31 14:03 ` Kieran Mansley
2007-10-31 14:08 ` Keir Fraser
2007-10-31 16:34 ` Kieran Mansley
2007-10-31 16:44 ` Keir Fraser
2007-10-31 17:14 ` Kieran Mansley
2007-10-31 17:31 ` Tim Deegan
2007-10-31 18:23 ` Keir Fraser
2007-11-01 11:18 ` Kieran Mansley
2007-11-01 11:34 ` Tim Deegan
2007-11-01 16:36 ` Kieran Mansley
2007-11-01 16:43 ` Tim Deegan
2007-11-02 9:31 ` Kieran Mansley
2007-11-02 10:42 ` Keir Fraser
2007-11-02 10:51 ` Keir Fraser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.