* Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
@ 2008-03-19 15:39 Stephen C. Tweedie
2008-03-19 16:00 ` Keir Fraser
0 siblings, 1 reply; 11+ messages in thread
From: Stephen C. Tweedie @ 2008-03-19 15:39 UTC (permalink / raw)
To: xen-devel@lists.xensource.com; +Cc: Keir Fraser, Jan Beulich
Hi,
On paravirt x86 (both 32- and 64-bit), since cset 13998:
http://xenbits.xensource.com/xen-unstable.hg?rev/13998
we translate all ptes from being mfn-based to pfn-based when the
hardware _PAGE_PRESENT bit is cleared. We do this for PROT_NONE pages,
which appear to the HV to be non-present, but which are special-cased in
the kernel to appear present (a different bit in the pte remains set for
these pages and is caught by the pte_present() tests.)
Unfortunately, it looks like recent X servers are attempting to do
mprotect(PROT_NONE) and back on regions of ioremap()ed memory. When we
do so, the translation of mfn to pfn results on x86_64 in end_pfn:
maddr.h:
static inline unsigned long mfn_to_pfn(unsigned long mfn)
{
...
if (unlikely((mfn >> machine_to_phys_order) != 0))
return end_pfn;
and when we do mprotect(PROT_READ) later on on the same ptes, this
end_pfn value is illegal:
maddr.h:
static inline unsigned long pfn_to_mfn(unsigned long pfn)
{
BUG_ON(end_pfn && pfn >= end_pfn);
so we BUG().
It needs both an updated X and an updated kernel to show the bug, but
given that, this results in an instant, completely repeatable kernel
panic on starting X on both 32- and 64-bits on some hardware.
Any suggestions? The obvious fix is to special-case these mfn_to_pfn
translations so that they can be recognised as "untranslated" by a later
pfn_to_mfn, but ideally we'd want such special pfns to be easily
recognised so that we don't completely lose the value of the BUG_ON()
above.
We'd also ideally like the HV to be able to spot such pte contents, as
they won't (indeed, cannot) be translated on migrate. That's not a
problem for dom0, of course, but might be for domUs with pci
passthrough .
Cheers,
Stephen
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
2008-03-19 15:39 Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps Stephen C. Tweedie
@ 2008-03-19 16:00 ` Keir Fraser
2008-03-19 16:31 ` Stephen C. Tweedie
2008-03-19 16:34 ` Keir Fraser
0 siblings, 2 replies; 11+ messages in thread
From: Keir Fraser @ 2008-03-19 16:00 UTC (permalink / raw)
To: Stephen C. Tweedie, xen-devel@lists.xensource.com; +Cc: Jan Beulich
Return ~mfn in this case? Still fails the usual is-it-a-valid-pfn tests, but
can be picked up and converted back to a valid mfn by pfn_to_mfn(). The key
is that most of the time invalid pfns are explicitly == end_pfn, or
max_page, or similar, so we are distinguishing from those and also can still
bug on that specific value in pfn_to_mfn().
As for picking this up in the save/restore code -- sounds a bit tricky to
me. We're better off not allowing migration of a I/O-privileged domain in
the first place. And indeed I believe the tools already have some such
safety checks.
-- Keir
On 19/3/08 15:39, "Stephen C. Tweedie" <sct@redhat.com> wrote:
> Hi,
>
> On paravirt x86 (both 32- and 64-bit), since cset 13998:
>
> http://xenbits.xensource.com/xen-unstable.hg?rev/13998
>
> we translate all ptes from being mfn-based to pfn-based when the
> hardware _PAGE_PRESENT bit is cleared. We do this for PROT_NONE pages,
> which appear to the HV to be non-present, but which are special-cased in
> the kernel to appear present (a different bit in the pte remains set for
> these pages and is caught by the pte_present() tests.)
>
> Unfortunately, it looks like recent X servers are attempting to do
> mprotect(PROT_NONE) and back on regions of ioremap()ed memory. When we
> do so, the translation of mfn to pfn results on x86_64 in end_pfn:
>
> maddr.h:
> static inline unsigned long mfn_to_pfn(unsigned long mfn)
> {
> ...
> if (unlikely((mfn >> machine_to_phys_order) != 0))
> return end_pfn;
>
> and when we do mprotect(PROT_READ) later on on the same ptes, this
> end_pfn value is illegal:
>
> maddr.h:
> static inline unsigned long pfn_to_mfn(unsigned long pfn)
> {
> BUG_ON(end_pfn && pfn >= end_pfn);
>
> so we BUG().
>
> It needs both an updated X and an updated kernel to show the bug, but
> given that, this results in an instant, completely repeatable kernel
> panic on starting X on both 32- and 64-bits on some hardware.
>
>
> Any suggestions? The obvious fix is to special-case these mfn_to_pfn
> translations so that they can be recognised as "untranslated" by a later
> pfn_to_mfn, but ideally we'd want such special pfns to be easily
> recognised so that we don't completely lose the value of the BUG_ON()
> above.
>
> We'd also ideally like the HV to be able to spot such pte contents, as
> they won't (indeed, cannot) be translated on migrate. That's not a
> problem for dom0, of course, but might be for domUs with pci
> passthrough .
>
> Cheers,
> Stephen
>
>
^ permalink raw reply [flat|nested] 11+ messages in thread* Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
2008-03-19 16:00 ` Keir Fraser
@ 2008-03-19 16:31 ` Stephen C. Tweedie
2008-03-19 16:42 ` Keir Fraser
2008-03-19 16:34 ` Keir Fraser
1 sibling, 1 reply; 11+ messages in thread
From: Stephen C. Tweedie @ 2008-03-19 16:31 UTC (permalink / raw)
To: Keir Fraser; +Cc: Stephen Tweedie, xen-devel@lists.xensource.com, Jan Beulich
Hi,
On Wed, 2008-03-19 at 16:00 +0000, Keir Fraser wrote:
> Return ~mfn in this case? Still fails the usual is-it-a-valid-pfn tests,
Right...
> The key
> is that most of the time invalid pfns are explicitly == end_pfn, or
> max_page, or similar, so we are distinguishing from those and also can still
> bug on that specific value in pfn_to_mfn().
Yep, it will still let pfn_to_mfn pick up that magic number --- it won't
catch corrupted ptes, though, which is a shame.
~mfn on its own probably won't work, though. On non-PAE, we store mfns
in a pte that only has 20 bits for the information, so we have to deal
with that restricted range.
It might be easier to do this at the pte_machine_to_phys level instead,
where we can potentially take advantage of other bits of the pte to
encode the special casing.
> As for picking this up in the save/restore code -- sounds a bit tricky to
> me. We're better off not allowing migration of a I/O-privileged domain in
> the first place.
Right, clearly. Ideally, though, we'd make this mechanism robust, and
either have the HV fail the migrate if it finds such ptes, or
alternatively have the kernel BUG on the ptes if the guest is marked
migratable.
The latter sounds cleaner, but would require a mechanism to mark guests
as unmigratable/saveable while they have IO space mapped; I'm not sure
we have a clean way for marking such things right now.
Thinking about it, if a guest could be clearly marked as having IO
permission, we could simply disable both save/migrate AND !PAGE_PRESENT
pfn-to-mfn translation in one go, and solve both problems at once.
Would either SIF_INITDOMAIN or SIF_PRIVILEGED work for this?
SIF_INITDOMAIN would be the safe fix for this particular dom0 case, I
think. We could do it for SIF_PRIVILEGED too, but ONLY if we were sure
that such domains would never be migrated or saved (we'll corrupt
PROT_NONE mappings if we migrate such domains.)
Cheers,
Stephen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
2008-03-19 16:31 ` Stephen C. Tweedie
@ 2008-03-19 16:42 ` Keir Fraser
2008-03-19 17:12 ` Stephen C. Tweedie
0 siblings, 1 reply; 11+ messages in thread
From: Keir Fraser @ 2008-03-19 16:42 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: xen-devel@lists.xensource.com, Jan Beulich
On 19/3/08 16:31, "Stephen C. Tweedie" <sct@redhat.com> wrote:
> ~mfn on its own probably won't work, though. On non-PAE, we store mfns
> in a pte that only has 20 bits for the information, so we have to deal
> with that restricted range.
Sure. Personally I could live with a solution that didn't work for non-PAE.
I guess that might not be the popular choice however. ;-)
> It might be easier to do this at the pte_machine_to_phys level instead,
> where we can potentially take advantage of other bits of the pte to
> encode the special casing.
Oh yes, the PAGE_IO type of trick I mentioned in my other email just now.
To expand on that -- we'd like a PAGE_IO flag for foreign and I/O mappings
anyway. It would simplify pte_pfn and similar routines which currently do a
nasty little MFN->PFN->MFN conversion check to detect such 'foreign'
mappings.
> Thinking about it, if a guest could be clearly marked as having IO
> permission, we could simply disable both save/migrate AND !PAGE_PRESENT
> pfn-to-mfn translation in one go, and solve both problems at once.
>
> Would either SIF_INITDOMAIN or SIF_PRIVILEGED work for this?
> SIF_INITDOMAIN would be the safe fix for this particular dom0 case, I
> think. We could do it for SIF_PRIVILEGED too, but ONLY if we were sure
> that such domains would never be migrated or saved (we'll corrupt
> PROT_NONE mappings if we migrate such domains.)
SIF_PRIVILEGED is no longer used. It's still set for dom0, but not for
hardware-capable domUs. It's tricky anyway, since a domU can be given
hardware capabilities after it has booted through mechanisms like
pciback-pcifront PCI hotplug.
-- Keir
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
2008-03-19 16:42 ` Keir Fraser
@ 2008-03-19 17:12 ` Stephen C. Tweedie
2008-03-19 18:52 ` Keir Fraser
0 siblings, 1 reply; 11+ messages in thread
From: Stephen C. Tweedie @ 2008-03-19 17:12 UTC (permalink / raw)
To: Keir Fraser; +Cc: xen-devel@lists.xensource.com, Jan Beulich
Hi,
On Wed, 2008-03-19 at 16:42 +0000, Keir Fraser wrote:
> > It might be easier to do this at the pte_machine_to_phys level instead,
> > where we can potentially take advantage of other bits of the pte to
> > encode the special casing.
>
> Oh yes, the PAGE_IO type of trick I mentioned in my other email just now.
Yep. There are a number of bits that could be used: for example, I
don't think PROT_WRITE is ever going to be set on a PROT_NONE page,
either.
But...
> SIF_PRIVILEGED is no longer used. It's still set for dom0, but not for
> hardware-capable domUs. It's tricky anyway, since a domU can be given
> hardware capabilities after it has booted through mechanisms like
> pciback-pcifront PCI hotplug.
Right; but I still think it's reasonable to expect this sort of thing to
be declared in advance for a domain. We clearly cannot enter a non-
translating mode dynamically, because of the risk of already having
untranslated PROT_NONE ptes in the page tables. But becoming unable to
save or migrate a guest is a Big Thing, and it would seem not
unreasonable to ask for a startup flag such as SIF_PRIVILEGED to be
declared in advance before we permit that.
I'll probably do a quick patch based in SIF_INITDOMAIN for now, simply
to be able to test whether it does in fact fix the symptoms we're
seeing; but that might not be the best solution to merge long term.
Cheers,
Stephen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
2008-03-19 17:12 ` Stephen C. Tweedie
@ 2008-03-19 18:52 ` Keir Fraser
2008-03-20 11:39 ` Keir Fraser
0 siblings, 1 reply; 11+ messages in thread
From: Keir Fraser @ 2008-03-19 18:52 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: xen-devel@lists.xensource.com, Jan Beulich
On 19/3/08 17:12, "Stephen C. Tweedie" <sct@redhat.com> wrote:
> On Wed, 2008-03-19 at 16:42 +0000, Keir Fraser wrote:
>
>>> It might be easier to do this at the pte_machine_to_phys level instead,
>>> where we can potentially take advantage of other bits of the pte to
>>> encode the special casing.
>>
>> Oh yes, the PAGE_IO type of trick I mentioned in my other email just now.
>
> Yep. There are a number of bits that could be used: for example, I
> don't think PROT_WRITE is ever going to be set on a PROT_NONE page,
> either.
My point is the flag would be set even when the protection is not PROT_NONE.
It has other uses beyond fixing this bug.
I might look into doing this myself as I think it is probably the best fix
for this issue.
-- Keir
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
2008-03-19 18:52 ` Keir Fraser
@ 2008-03-20 11:39 ` Keir Fraser
2008-03-28 16:41 ` Stephen C. Tweedie
0 siblings, 1 reply; 11+ messages in thread
From: Keir Fraser @ 2008-03-20 11:39 UTC (permalink / raw)
To: Stephen C. Tweedie; +Cc: xen-devel@lists.xensource.com, Jan Beulich
On 19/3/08 18:52, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
>> Yep. There are a number of bits that could be used: for example, I
>> don't think PROT_WRITE is ever going to be set on a PROT_NONE page,
>> either.
>
> My point is the flag would be set even when the protection is not PROT_NONE.
> It has other uses beyond fixing this bug.
>
> I might look into doing this myself as I think it is probably the best fix for
> this issue.
...and the issue is now fixed (changeset 488) in
http://xenbits.xensource.com/staging/linux-2.6.18-xen.hg
The patch is a minimal one given the approach of using a new _PAGE_IO flag.
For example, pte_pfn() probably no longer needs to use mfn_to_local_pfn()
and could use the simpler and faster mfn_to_pfn() instead. But that belongs
in a separate patch.
-- Keir
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
2008-03-20 11:39 ` Keir Fraser
@ 2008-03-28 16:41 ` Stephen C. Tweedie
2008-03-28 16:44 ` Keir Fraser
0 siblings, 1 reply; 11+ messages in thread
From: Stephen C. Tweedie @ 2008-03-28 16:41 UTC (permalink / raw)
To: Keir Fraser
Cc: Stephen Tweedie, Jeremy Fitzhardinge,
xen-devel@lists.xensource.com, Jan Beulich
Hi,
On Thu, 2008-03-20 at 11:39 +0000, Keir Fraser wrote:
> ...and the issue is now fixed (changeset 488) in
> http://xenbits.xensource.com/staging/linux-2.6.18-xen.hg
I've merged this for RHEL-5.2 (along with cset 492), and now have
multiple reports that it has fixed the X crashes --- thanks!
We'll need to keep it queued for pvops, too, as that is going to need
the whole PROT_NONE special handling in order to support migrate. I'm
away all next week, I'll have to deal with it when I get back.
Cheers,
Stephen
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
2008-03-28 16:41 ` Stephen C. Tweedie
@ 2008-03-28 16:44 ` Keir Fraser
2008-03-28 16:45 ` Keir Fraser
0 siblings, 1 reply; 11+ messages in thread
From: Keir Fraser @ 2008-03-28 16:44 UTC (permalink / raw)
To: Stephen C. Tweedie
Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com, Jan Beulich
On 28/3/08 16:41, "Stephen C. Tweedie" <sct@redhat.com> wrote:
>> ...and the issue is now fixed (changeset 488) in
>> http://xenbits.xensource.com/staging/linux-2.6.18-xen.hg
>
> I've merged this for RHEL-5.2 (along with cset 492), and now have
> multiple reports that it has fixed the X crashes --- thanks!
>
> We'll need to keep it queued for pvops, too, as that is going to need
> the whole PROT_NONE special handling in order to support migrate. I'm
> away all next week, I'll have to deal with it when I get back.
Great. You also need c/s 492, unless you already fixed the i386 non-PAE
build breakage that c/s 488 introduces.
-- Keir
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
2008-03-28 16:44 ` Keir Fraser
@ 2008-03-28 16:45 ` Keir Fraser
0 siblings, 0 replies; 11+ messages in thread
From: Keir Fraser @ 2008-03-28 16:45 UTC (permalink / raw)
To: Stephen C. Tweedie
Cc: Jeremy Fitzhardinge, xen-devel@lists.xensource.com, Jan Beulich
On 28/3/08 16:44, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
>> I've merged this for RHEL-5.2 (along with cset 492), and now have
Duh, sorry for the noise.
-- Keir
>> multiple reports that it has fixed the X crashes --- thanks!
>>
>> We'll need to keep it queued for pvops, too, as that is going to need
>> the whole PROT_NONE special handling in order to support migrate. I'm
>> away all next week, I'll have to deal with it when I get back.
>
> Great. You also need c/s 492, unless you already fixed the i386 non-PAE build
> breakage that c/s 488 introduces.
^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Re: Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps
2008-03-19 16:00 ` Keir Fraser
2008-03-19 16:31 ` Stephen C. Tweedie
@ 2008-03-19 16:34 ` Keir Fraser
1 sibling, 0 replies; 11+ messages in thread
From: Keir Fraser @ 2008-03-19 16:34 UTC (permalink / raw)
To: Keir Fraser, Stephen C. Tweedie, xen-devel@lists.xensource.com
Cc: Jan Beulich
Another possibility, if we can modify all places that convert to/from
PROT_NONE (or otherwise specifically hook those places) would be to use a
software-available pte flag as PAGE_IO and avoid pte-mfn conversions on such
ptes.
I reckon the ~mfn trick, or similar mapping of untranslated mfns into the
pfn 'namespace', is the easier and more incremental solution.
-- Keir
On 19/3/08 16:00, "Keir Fraser" <keir.fraser@eu.citrix.com> wrote:
> Return ~mfn in this case? Still fails the usual is-it-a-valid-pfn tests, but
> can be picked up and converted back to a valid mfn by pfn_to_mfn(). The key
> is that most of the time invalid pfns are explicitly == end_pfn, or
> max_page, or similar, so we are distinguishing from those and also can still
> bug on that specific value in pfn_to_mfn().
>
> As for picking this up in the save/restore code -- sounds a bit tricky to
> me. We're better off not allowing migration of a I/O-privileged domain in
> the first place. And indeed I believe the tools already have some such
> safety checks.
>
> -- Keir
>
> On 19/3/08 15:39, "Stephen C. Tweedie" <sct@redhat.com> wrote:
>
>> Hi,
>>
>> On paravirt x86 (both 32- and 64-bit), since cset 13998:
>>
>> http://xenbits.xensource.com/xen-unstable.hg?rev/13998
>>
>> we translate all ptes from being mfn-based to pfn-based when the
>> hardware _PAGE_PRESENT bit is cleared. We do this for PROT_NONE pages,
>> which appear to the HV to be non-present, but which are special-cased in
>> the kernel to appear present (a different bit in the pte remains set for
>> these pages and is caught by the pte_present() tests.)
>>
>> Unfortunately, it looks like recent X servers are attempting to do
>> mprotect(PROT_NONE) and back on regions of ioremap()ed memory. When we
>> do so, the translation of mfn to pfn results on x86_64 in end_pfn:
>>
>> maddr.h:
>> static inline unsigned long mfn_to_pfn(unsigned long mfn)
>> {
>> ...
>> if (unlikely((mfn >> machine_to_phys_order) != 0))
>> return end_pfn;
>>
>> and when we do mprotect(PROT_READ) later on on the same ptes, this
>> end_pfn value is illegal:
>>
>> maddr.h:
>> static inline unsigned long pfn_to_mfn(unsigned long pfn)
>> {
>> BUG_ON(end_pfn && pfn >= end_pfn);
>>
>> so we BUG().
>>
>> It needs both an updated X and an updated kernel to show the bug, but
>> given that, this results in an instant, completely repeatable kernel
>> panic on starting X on both 32- and 64-bits on some hardware.
>>
>>
>> Any suggestions? The obvious fix is to special-case these mfn_to_pfn
>> translations so that they can be recognised as "untranslated" by a later
>> pfn_to_mfn, but ideally we'd want such special pfns to be easily
>> recognised so that we don't completely lose the value of the BUG_ON()
>> above.
>>
>> We'd also ideally like the HV to be able to spot such pte contents, as
>> they won't (indeed, cannot) be translated on migrate. That's not a
>> problem for dom0, of course, but might be for domUs with pci
>> passthrough .
>>
>> Cheers,
>> Stephen
>>
>>
>
>
>
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2008-03-28 16:45 UTC | newest]
Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2008-03-19 15:39 Illegal PV kernel pfm/pfn translations on PROT_NONE ioremaps Stephen C. Tweedie
2008-03-19 16:00 ` Keir Fraser
2008-03-19 16:31 ` Stephen C. Tweedie
2008-03-19 16:42 ` Keir Fraser
2008-03-19 17:12 ` Stephen C. Tweedie
2008-03-19 18:52 ` Keir Fraser
2008-03-20 11:39 ` Keir Fraser
2008-03-28 16:41 ` Stephen C. Tweedie
2008-03-28 16:44 ` Keir Fraser
2008-03-28 16:45 ` Keir Fraser
2008-03-19 16:34 ` Keir Fraser
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.