* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
@ 2010-12-13 8:35 ` Avi Kivity
2010-12-13 8:42 ` Alexander Graf
` (33 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-12-13 8:35 UTC (permalink / raw)
To: kvm-ppc
On 12/13/2010 06:45 AM, Yoder Stuart-B08248 wrote:
> Avi/Hollis,
>
> Exchanged some emails with Alex on the topic of rewriting on
> powerpc KVM-- the current approach taken by Alex's PV patch is
> to have a guest Linux paravirt itself, by re-writing certain
> instructions.
>
> The downside to this approach (guest side patching) is that every OS
> to be run on KVM has to be modified or dynamically patched.
>
> What were the reasons for not going down the path of doing the
> re-writing in the hypervisor? (Alex couldn't remember the
> specifics). What about doing it from Qemu?
>
Rewriting is dangerous if the guest is unaware of it. As soon as it is
made aware of it, it might as well actually do it in the best way that
suits it.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
2010-12-13 8:35 ` Avi Kivity
@ 2010-12-13 8:42 ` Alexander Graf
2010-12-13 8:45 ` Avi Kivity
` (32 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Alexander Graf @ 2010-12-13 8:42 UTC (permalink / raw)
To: kvm-ppc
On 13.12.2010, at 09:35, Avi Kivity wrote:
> On 12/13/2010 06:45 AM, Yoder Stuart-B08248 wrote:
>> Avi/Hollis,
>>
>> Exchanged some emails with Alex on the topic of rewriting on
>> powerpc KVM-- the current approach taken by Alex's PV patch is
>> to have a guest Linux paravirt itself, by re-writing certain
>> instructions.
>>
>> The downside to this approach (guest side patching) is that every OS
>> to be run on KVM has to be modified or dynamically patched.
>>
>> What were the reasons for not going down the path of doing the
>> re-writing in the hypervisor? (Alex couldn't remember the
>> specifics). What about doing it from Qemu?
>>
>
> Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it.
Yeah, let me rephrase my exact memory on this:
If the HV just rewrites instructions in the guest, it behaves different from real hw which is bad. It could potentially break checksumming inside the guest.
If, however, the guest sends a hypercall to the HV saying "please patch me" or there's a flag on creation time to enable patching, I have a hard time finding a reason to do it inside the guest context.
Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :).
Alex
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
2010-12-13 8:35 ` Avi Kivity
2010-12-13 8:42 ` Alexander Graf
@ 2010-12-13 8:45 ` Avi Kivity
2010-12-13 17:12 ` Hollis Blanchard
` (31 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-12-13 8:45 UTC (permalink / raw)
To: kvm-ppc
On 12/13/2010 10:42 AM, Alexander Graf wrote:
> On 13.12.2010, at 09:35, Avi Kivity wrote:
>
> > On 12/13/2010 06:45 AM, Yoder Stuart-B08248 wrote:
> >> Avi/Hollis,
> >>
> >> Exchanged some emails with Alex on the topic of rewriting on
> >> powerpc KVM-- the current approach taken by Alex's PV patch is
> >> to have a guest Linux paravirt itself, by re-writing certain
> >> instructions.
> >>
> >> The downside to this approach (guest side patching) is that every OS
> >> to be run on KVM has to be modified or dynamically patched.
> >>
> >> What were the reasons for not going down the path of doing the
> >> re-writing in the hypervisor? (Alex couldn't remember the
> >> specifics). What about doing it from Qemu?
> >>
> >
> > Rewriting is dangerous if the guest is unaware of it. As soon as it is made aware of it, it might as well actually do it in the best way that suits it.
>
> Yeah, let me rephrase my exact memory on this:
>
> If the HV just rewrites instructions in the guest, it behaves different from real hw which is bad. It could potentially break checksumming inside the guest.
>
> If, however, the guest sends a hypercall to the HV saying "please patch me" or there's a flag on creation time to enable patching, I have a hard time finding a reason to do it inside the guest context.
>
> Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :).
The interface is a lot simpler. The guest decides what to patch and
where to jump. A "please patch me" flag needs a ton of documentation on
what patch means and what the constraints on the guest environment are.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (2 preceding siblings ...)
2010-12-13 8:45 ` Avi Kivity
@ 2010-12-13 17:12 ` Hollis Blanchard
2010-12-13 17:15 ` Avi Kivity
` (30 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Hollis Blanchard @ 2010-12-13 17:12 UTC (permalink / raw)
To: kvm-ppc
On 12/13/2010 12:42 AM, Alexander Graf wrote:
> Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :).
Don't blame me for this. :) My original patching (with Christian) was
done from host context, and those patches are in the list archives.
As far as I remember, Ben H said he preferred patching from guest
context (mostly for unspecified or "gut feeling" reasons), and then
that's what you did. IIRC it was IRC conversation, which is why it
wouldn't be in your inbox.
Hollis Blanchard
Mentor Graphics, Embedded Systems Division
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (3 preceding siblings ...)
2010-12-13 17:12 ` Hollis Blanchard
@ 2010-12-13 17:15 ` Avi Kivity
2010-12-13 17:17 ` Hollis Blanchard
` (29 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-12-13 17:15 UTC (permalink / raw)
To: kvm-ppc
On 12/13/2010 07:12 PM, Hollis Blanchard wrote:
> On 12/13/2010 12:42 AM, Alexander Graf wrote:
>> Back when I implemented this, we did however have discussions on
>> exactly that distinction between patching in host or guest space and
>> for some reason I remember that you and Hollis figured that guest
>> patching is superior. I just really can't remember why and couldn't
>> find traces of this in my inbox either :).
> Don't blame me for this. :) My original patching (with Christian) was
> done from host context, and those patches are in the list archives.
>
> As far as I remember, Ben H said he preferred patching from guest
> context (mostly for unspecified or "gut feeling" reasons), and then
> that's what you did. IIRC it was IRC conversation, which is why it
> wouldn't be in your inbox.
Well, it's the right thing IMO.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (4 preceding siblings ...)
2010-12-13 17:15 ` Avi Kivity
@ 2010-12-13 17:17 ` Hollis Blanchard
2010-12-13 19:03 ` Scott Wood
` (28 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Hollis Blanchard @ 2010-12-13 17:17 UTC (permalink / raw)
To: kvm-ppc
On 12/13/2010 12:35 AM, Avi Kivity wrote:
> On 12/13/2010 06:45 AM, Yoder Stuart-B08248 wrote:
>> Avi/Hollis,
>>
>> Exchanged some emails with Alex on the topic of rewriting on
>> powerpc KVM-- the current approach taken by Alex's PV patch is
>> to have a guest Linux paravirt itself, by re-writing certain
>> instructions.
>>
>> The downside to this approach (guest side patching) is that every OS
>> to be run on KVM has to be modified or dynamically patched.
>>
>> What were the reasons for not going down the path of doing the
>> re-writing in the hypervisor? (Alex couldn't remember the
>> specifics). What about doing it from Qemu?
>>
>
> Rewriting is dangerous if the guest is unaware of it. As soon as it
> is made aware of it, it might as well actually do it in the best way
> that suits it.
Can you list some examples of dangerous scenarios?
Hollis Blanchard
Mentor Graphics, Embedded Systems Division
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (5 preceding siblings ...)
2010-12-13 17:17 ` Hollis Blanchard
@ 2010-12-13 19:03 ` Scott Wood
2010-12-13 23:54 ` Alexander Graf
` (27 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Scott Wood @ 2010-12-13 19:03 UTC (permalink / raw)
To: kvm-ppc
On Mon, 13 Dec 2010 10:45:30 +0200
Avi Kivity <avi@redhat.com> wrote:
> On 12/13/2010 10:42 AM, Alexander Graf wrote:
> > Yeah, let me rephrase my exact memory on this:
> >
> > If the HV just rewrites instructions in the guest, it behaves different from real hw which is bad. It could potentially break checksumming inside the guest.
> >
> > If, however, the guest sends a hypercall to the HV saying "please patch me" or there's a flag on creation time to enable patching, I have a hard time finding a reason to do it inside the guest context.
> >
> > Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :).
>
> The interface is a lot simpler. The guest decides what to patch and
> where to jump. A "please patch me" flag needs a ton of documentation on
> what patch means and what the constraints on the guest environment are.
>
The constraints need to be documented, but I think "a ton" is a bit of
an exaggeration -- and having the guest do the patching itself means
that the structure of the shared page must become stable ABI. Having
the hypervisor do the bulk of the work also makes it easier to add
paravirt to new OSes (in the embedded world, often the reason someone
wants to do virtualization is to run some custom OS alongside Linux).
OTOH, having the guest do it makes it easier to do more complex
rewriting such as mtmsr[1]. And the fact that we've already got an
implementation makes for a compelling tie-breaker.
-Scott
[1] Speaking of which, what happens when an interrupt is raised in the
middle of a paravirt critical section? KVM will hold off the
interrupt delivery if it sees the critical flag set, but when will it
deliver the postponed interrupt? Seems like it will wait until the next
time an exit happens for some other reason.
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (6 preceding siblings ...)
2010-12-13 19:03 ` Scott Wood
@ 2010-12-13 23:54 ` Alexander Graf
2010-12-14 0:18 ` Scott Wood
` (26 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Alexander Graf @ 2010-12-13 23:54 UTC (permalink / raw)
To: kvm-ppc
On 13.12.2010, at 20:03, Scott Wood wrote:
> On Mon, 13 Dec 2010 10:45:30 +0200
> Avi Kivity <avi@redhat.com> wrote:
>
>> On 12/13/2010 10:42 AM, Alexander Graf wrote:
>>> Yeah, let me rephrase my exact memory on this:
>>>
>>> If the HV just rewrites instructions in the guest, it behaves different from real hw which is bad. It could potentially break checksumming inside the guest.
>>>
>>> If, however, the guest sends a hypercall to the HV saying "please patch me" or there's a flag on creation time to enable patching, I have a hard time finding a reason to do it inside the guest context.
>>>
>>> Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :).
>>
>> The interface is a lot simpler. The guest decides what to patch and
>> where to jump. A "please patch me" flag needs a ton of documentation on
>> what patch means and what the constraints on the guest environment are.
>>
>
> The constraints need to be documented, but I think "a ton" is a bit of
> an exaggeration -- and having the guest do the patching itself means
> that the structure of the shared page must become stable ABI. Having
> the hypervisor do the bulk of the work also makes it easier to add
> paravirt to new OSes (in the embedded world, often the reason someone
> wants to do virtualization is to run some custom OS alongside Linux).
>
> OTOH, having the guest do it makes it easier to do more complex
> rewriting such as mtmsr[1]. And the fact that we've already got an
> implementation makes for a compelling tie-breaker.
>
> -Scott
>
> [1] Speaking of which, what happens when an interrupt is raised in the
> middle of a paravirt critical section? KVM will hold off the
> interrupt delivery if it sees the critical flag set, but when will it
> deliver the postponed interrupt? Seems like it will wait until the next
> time an exit happens for some other reason.
mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :).
Alex
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (7 preceding siblings ...)
2010-12-13 23:54 ` Alexander Graf
@ 2010-12-14 0:18 ` Scott Wood
2010-12-14 0:24 ` Alexander Graf
` (25 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Scott Wood @ 2010-12-14 0:18 UTC (permalink / raw)
To: kvm-ppc
On Tue, 14 Dec 2010 00:54:38 +0100
Alexander Graf <alex@csgraf.de> wrote:
> On 13.12.2010, at 20:03, Scott Wood wrote:
> > [1] Speaking of which, what happens when an interrupt is raised in the
> > middle of a paravirt critical section? KVM will hold off the
> > interrupt delivery if it sees the critical flag set, but when will it
> > deliver the postponed interrupt? Seems like it will wait until the next
> > time an exit happens for some other reason.
>
> mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :).
Right, but I'm not talking about an interrupt that happens when the
virtual EE bit is zero. I'm talking about an interrupt that happens
right in the middle of the paravirt sequence -- after reading int_pending,
but before setting critical to r2.
It seems like the race window is just narrowed, not eliminated.
One option would be for KVM to single-step the guest until critical !r1. It should only be a few instructions, and it shouldn't happen very
often. This is probably the better option.
Another option would be to dispense with the critical section
altogether, by having the guest assume that these instructions clobber
certain registers -- though that would not be pleasant to maintain, as
you'd have to verify every place the instruction is used, now or in the
future. A variant of this would be to use an out-of-section annotation
(similar to get_user et al) so that each instance of the instruction has
to explicitly opt-in. The rewritten code would be faster this way,
though it may not make any practical difference.
-Scott
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (8 preceding siblings ...)
2010-12-14 0:18 ` Scott Wood
@ 2010-12-14 0:24 ` Alexander Graf
2010-12-14 8:40 ` Avi Kivity
` (24 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Alexander Graf @ 2010-12-14 0:24 UTC (permalink / raw)
To: kvm-ppc
On 14.12.2010, at 01:18, Scott Wood wrote:
> On Tue, 14 Dec 2010 00:54:38 +0100
> Alexander Graf <alex@csgraf.de> wrote:
>
>> On 13.12.2010, at 20:03, Scott Wood wrote:
>>> [1] Speaking of which, what happens when an interrupt is raised in the
>>> middle of a paravirt critical section? KVM will hold off the
>>> interrupt delivery if it sees the critical flag set, but when will it
>>> deliver the postponed interrupt? Seems like it will wait until the next
>>> time an exit happens for some other reason.
>>
>> mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :).
>
> Right, but I'm not talking about an interrupt that happens when the
> virtual EE bit is zero. I'm talking about an interrupt that happens
> right in the middle of the paravirt sequence -- after reading int_pending,
> but before setting critical to r2.
>
> It seems like the race window is just narrowed, not eliminated.
Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on.
If it really is important, we could also check int_pending right after the critical section and just do a nop exit. That way we worst case waste a few cycles for the useless guest exit, but always fetch interrupts immediately when they occur.
Alex
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (9 preceding siblings ...)
2010-12-14 0:24 ` Alexander Graf
@ 2010-12-14 8:40 ` Avi Kivity
2010-12-14 8:42 ` Avi Kivity
` (23 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-12-14 8:40 UTC (permalink / raw)
To: kvm-ppc
On 12/13/2010 09:03 PM, Scott Wood wrote:
> >
> > The interface is a lot simpler. The guest decides what to patch and
> > where to jump. A "please patch me" flag needs a ton of documentation on
> > what patch means and what the constraints on the guest environment are.
> >
>
> The constraints need to be documented, but I think "a ton" is a bit of
> an exaggeration
I guess. It's correct for x86 (which has four processor modes, and you
need to consider segmentation, etc.), perhaps not so much for powerpc.
> -- and having the guest do the patching itself means
> that the structure of the shared page must become stable ABI.
It has to be a stable ABI in any case so you can live migrate. Unless
you want the hypervisor to unpatch or something.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (10 preceding siblings ...)
2010-12-14 8:40 ` Avi Kivity
@ 2010-12-14 8:42 ` Avi Kivity
2010-12-14 8:48 ` Avi Kivity
` (22 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-12-14 8:42 UTC (permalink / raw)
To: kvm-ppc
On 12/14/2010 02:24 AM, Alexander Graf wrote:
> On 14.12.2010, at 01:18, Scott Wood wrote:
>
> > On Tue, 14 Dec 2010 00:54:38 +0100
> > Alexander Graf<alex@csgraf.de> wrote:
> >
> >> On 13.12.2010, at 20:03, Scott Wood wrote:
> >>> [1] Speaking of which, what happens when an interrupt is raised in the
> >>> middle of a paravirt critical section? KVM will hold off the
> >>> interrupt delivery if it sees the critical flag set, but when will it
> >>> deliver the postponed interrupt? Seems like it will wait until the next
> >>> time an exit happens for some other reason.
> >>
> >> mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :).
> >
> > Right, but I'm not talking about an interrupt that happens when the
> > virtual EE bit is zero. I'm talking about an interrupt that happens
> > right in the middle of the paravirt sequence -- after reading int_pending,
> > but before setting critical to r2.
> >
> > It seems like the race window is just narrowed, not eliminated.
>
> Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on.
What about when "usually" doesn't happen? Tickless kernel, everything's
asleep, interrupt missed, system is dead.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (11 preceding siblings ...)
2010-12-14 8:42 ` Avi Kivity
@ 2010-12-14 8:48 ` Avi Kivity
2010-12-14 9:08 ` Alexander Graf
` (21 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-12-14 8:48 UTC (permalink / raw)
To: kvm-ppc
On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
>> Rewriting is dangerous if the guest is unaware of it. As soon as it
>> is made aware of it, it might as well actually do it in the best way
>> that suits it.
>
> Can you list some examples of dangerous scenarios?
>
- guest checksums own kernel pages
- clever compiler reuses code for constant pool
- guest patches itself (a la linux alternatives), surprised when it sees
a different instruction
- guest jits own kernel code (like Singularity), gets confused when it
reads back something it didn't write
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (12 preceding siblings ...)
2010-12-14 8:48 ` Avi Kivity
@ 2010-12-14 9:08 ` Alexander Graf
2010-12-14 15:45 ` Yoder Stuart-B08248
` (20 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Alexander Graf @ 2010-12-14 9:08 UTC (permalink / raw)
To: kvm-ppc
Am 14.12.2010 um 09:42 schrieb Avi Kivity <avi@redhat.com>:
> On 12/14/2010 02:24 AM, Alexander Graf wrote:
>> On 14.12.2010, at 01:18, Scott Wood wrote:
>>
>> > On Tue, 14 Dec 2010 00:54:38 +0100
>> > Alexander Graf<alex@csgraf.de> wrote:
>> >
>> >> On 13.12.2010, at 20:03, Scott Wood wrote:
>> >>> [1] Speaking of which, what happens when an interrupt is raised in the
>> >>> middle of a paravirt critical section? KVM will hold off the
>> >>> interrupt delivery if it sees the critical flag set, but when will it
>> >>> deliver the postponed interrupt? Seems like it will wait until the next
>> >>> time an exit happens for some other reason.
>> >>
>> >> mtmsr with IF=1 checks for pending interrupts and enables them with a real mtmsr then which again checks interrupts in vm entry, so it immediately gets injected :).
>> >
>> > Right, but I'm not talking about an interrupt that happens when the
>> > virtual EE bit is zero. I'm talking about an interrupt that happens
>> > right in the middle of the paravirt sequence -- after reading int_pending,
>> > but before setting critical to r2.
>> >
>> > It seems like the race window is just narrowed, not eliminated.
>>
>> Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on.
>
> What about when "usually" doesn't happen? Tickless kernel, everything's asleep, interrupt missed, system is dead.
Even tickless guest kernels will get out of guest context from time to time, simply because if there are no interrupts on the host, the host is useless - it would only have a single, isolated task running that wouldn't even be able to use the network.
But yes, if we can go without black spots, we should :)
Alex
>
^ permalink raw reply [flat|nested] 36+ messages in thread* RE: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (13 preceding siblings ...)
2010-12-14 9:08 ` Alexander Graf
@ 2010-12-14 15:45 ` Yoder Stuart-B08248
2010-12-14 15:48 ` Avi Kivity
` (19 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Yoder Stuart-B08248 @ 2010-12-14 15:45 UTC (permalink / raw)
To: kvm-ppc
> -----Original Message-----
> From: Avi Kivity [mailto:avi@redhat.com]
> Sent: Tuesday, December 14, 2010 2:49 AM
> To: Hollis Blanchard
> Cc: Yoder Stuart-B08248; Alexander Graf; kvm-ppc@vger.kernel.org
> Subject: Re: re-writing on powerpc
>
> On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
> >> Rewriting is dangerous if the guest is unaware of it. As soon as
it
> >> is made aware of it, it might as well actually do it in the best
way
> >> that suits it.
> >
> > Can you list some examples of dangerous scenarios?
> >
>
> - guest checksums own kernel pages
> - clever compiler reuses code for constant pool
> - guest patches itself (a la linux alternatives), surprised when it
sees a
> different instruction
> - guest jits own kernel code (like Singularity), gets confused when it
> reads back something it didn't write
One possible solution to hiding rewriting from guest if it must
be hidden is to mark patched pages as execute only. If a guest
reads a patched page, the hypervisor can fix up the read.
For KVM on powerpc, I'm not sure that having the guest be
completely unaware of the presence of a hypervisor is what
we need to be necessarily shooting for. What has been worked on in
the embedded virtualization committee in power.org is that
we expect there to be a /hypervisor node in the guest device
tree with certain standard properties, including ones indicating
if the hypervisor support a shared area or re-writing. The
guest makes an hcall indicating if it wants to enable rewriting.
So the guest would need to be aware that it needs to checksum
before enabling this.
The problem is how much guest modification is needed-- some hcalls
to enable rewriting seems manageable. Changing every OS for
the paravirt interface involves a lot more work.
Stuart
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (14 preceding siblings ...)
2010-12-14 15:45 ` Yoder Stuart-B08248
@ 2010-12-14 15:48 ` Avi Kivity
2010-12-14 16:55 ` Scott Wood
` (18 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-12-14 15:48 UTC (permalink / raw)
To: kvm-ppc
On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote:
> > -----Original Message-----
> > From: Avi Kivity [mailto:avi@redhat.com]
> > Sent: Tuesday, December 14, 2010 2:49 AM
> > To: Hollis Blanchard
> > Cc: Yoder Stuart-B08248; Alexander Graf; kvm-ppc@vger.kernel.org
> > Subject: Re: re-writing on powerpc
> >
> > On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
> > >> Rewriting is dangerous if the guest is unaware of it. As soon as
> it
> > >> is made aware of it, it might as well actually do it in the best
> way
> > >> that suits it.
> > >
> > > Can you list some examples of dangerous scenarios?
> > >
> >
> > - guest checksums own kernel pages
> > - clever compiler reuses code for constant pool
> > - guest patches itself (a la linux alternatives), surprised when it
> sees a
> > different instruction
> > - guest jits own kernel code (like Singularity), gets confused when it
> > reads back something it didn't write
>
> One possible solution to hiding rewriting from guest if it must
> be hidden is to mark patched pages as execute only. If a guest
> reads a patched page, the hypervisor can fix up the read.
>
Yes. Something that is common to all the problems above is "using code
as data".
However, execute only would only affect the page's mapping, not the page
itself, yes? So if the page has another mapping, this doesn't work.
Of course the guest is not completely unaware of patching, so we can
say, if the guest requests patching it becomes its responsibility not to
do silly stuff. I still prefer guest patching though.
> For KVM on powerpc, I'm not sure that having the guest be
> completely unaware of the presence of a hypervisor is what
> we need to be necessarily shooting for. What has been worked on in
> the embedded virtualization committee in power.org is that
> we expect there to be a /hypervisor node in the guest device
> tree with certain standard properties, including ones indicating
> if the hypervisor support a shared area or re-writing. The
> guest makes an hcall indicating if it wants to enable rewriting.
>
> So the guest would need to be aware that it needs to checksum
> before enabling this.
>
> The problem is how much guest modification is needed-- some hcalls
> to enable rewriting seems manageable. Changing every OS for
> the paravirt interface involves a lot more work.
If a spec exists for hypervisor controlled rewriting, that changes the
picture considerably. Presumably the spec lays out all the constraints
in detail, so we don't have to worry about that.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (15 preceding siblings ...)
2010-12-14 15:48 ` Avi Kivity
@ 2010-12-14 16:55 ` Scott Wood
2010-12-14 17:48 ` Alexander Graf
` (17 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Scott Wood @ 2010-12-14 16:55 UTC (permalink / raw)
To: kvm-ppc
On Tue, 14 Dec 2010 01:24:50 +0100
Alexander Graf <alex@csgraf.de> wrote:
>
> On 14.12.2010, at 01:18, Scott Wood wrote:
>
> > Right, but I'm not talking about an interrupt that happens when the
> > virtual EE bit is zero. I'm talking about an interrupt that happens
> > right in the middle of the paravirt sequence -- after reading int_pending,
> > but before setting critical to r2.
> >
> > It seems like the race window is just narrowed, not eliminated.
>
> Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on.
It could be important for realtime loads, tickless systems (especially
if the Linux host eventually grows the ability to be tickless even
when things are running), etc., and it makes me nervous in general.
It's not something that's going to be causing problems all the time,
though.
> If it really is important, we could also check int_pending right after the critical section and just do a nop exit.
Doesn't checking int_pending require clobbering registers, which is why
we have the critical section in the first place?
> That way we worst case waste a few cycles for the useless guest exit,
> but always fetch interrupts immediately when they occur.
What useless guest exit? Either we exit when we see an interrupt
pending (in which case it's not useless), or we exit all the time, and
then what's the point of the paravirt?
-Scott
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (16 preceding siblings ...)
2010-12-14 16:55 ` Scott Wood
@ 2010-12-14 17:48 ` Alexander Graf
2010-12-14 17:53 ` Hollis Blanchard
` (16 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Alexander Graf @ 2010-12-14 17:48 UTC (permalink / raw)
To: kvm-ppc
Scott Wood wrote:
> On Tue, 14 Dec 2010 01:24:50 +0100
> Alexander Graf <alex@csgraf.de> wrote:
>
>
>> On 14.12.2010, at 01:18, Scott Wood wrote:
>>
>>
>>> Right, but I'm not talking about an interrupt that happens when the
>>> virtual EE bit is zero. I'm talking about an interrupt that happens
>>> right in the middle of the paravirt sequence -- after reading int_pending,
>>> but before setting critical to r2.
>>>
>>> It seems like the race window is just narrowed, not eliminated.
>>>
>> Hrm, is that window really that important? There's usually plenty of interrupts and mmios coming through to always have some check going on.
>>
>
> It could be important for realtime loads, tickless systems (especially
> if the Linux host eventually grows the ability to be tickless even
> when things are running), etc., and it makes me nervous in general.
>
> It's not something that's going to be causing problems all the time,
> though.
>
I agree - it's certainly wrong.
>> If it really is important, we could also check int_pending right after the critical section and just do a nop exit.
>>
>
> Doesn't checking int_pending require clobbering registers, which is why
> we have the critical section in the first place?
>
The critical section is to prevent us from overwriting the scratch
registers, yeah. And I think you're right - I had a thinko last night.
If we see that we should inject an interrupt, but we're inside of a
critical section, we could set the magic page to r/o and try to find the
critical end at which point we can just inject.
>
>> That way we worst case waste a few cycles for the useless guest exit,
>> but always fetch interrupts immediately when they occur.
>>
>
> What useless guest exit? Either we exit when we see an interrupt
> pending (in which case it's not useless), or we exit all the time, and
> then what's the point of the paravirt?
>
I was thinking of a case where we get a few false positives. But again,
I probably just had a bad thought :)
Alex
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (17 preceding siblings ...)
2010-12-14 17:48 ` Alexander Graf
@ 2010-12-14 17:53 ` Hollis Blanchard
2010-12-14 18:37 ` Scott Wood
` (15 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Hollis Blanchard @ 2010-12-14 17:53 UTC (permalink / raw)
To: kvm-ppc
On 12/14/2010 12:48 AM, Avi Kivity wrote:
> On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
>>> Rewriting is dangerous if the guest is unaware of it. As soon as it
>>> is made aware of it, it might as well actually do it in the best way
>>> that suits it.
>>
>> Can you list some examples of dangerous scenarios?
>>
Perhaps I should rephrase... any real-world dangerous scenarios? :) I
was hoping you could share some traps you've hit with Linux or Windows
on x86.
> - guest checksums own kernel pages
For runtime intrusion detection? Such guests can simply not ask the
hypervisor to enable the rewriting feature.
> - clever compiler reuses code for constant pool
Not sure what you mean here. Anyways I think clever compilers are
irrelevant, since a compiler will not ordinarily emit a supervisor-mode
instruction. The hypervisor has no need to patch normal user-mode
instructions.
> - guest patches itself (a la linux alternatives), surprised when it
> sees a different instruction
PowerPC Linux does patch itself, which is a write-only operation.
> - guest jits own kernel code (like Singularity), gets confused when it
> reads back something it didn't write
This is getting really hypothetical, but why would a JIT need to read
the generated code?
Hollis Blanchard
Mentor Graphics, Embedded Systems Division
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (18 preceding siblings ...)
2010-12-14 17:53 ` Hollis Blanchard
@ 2010-12-14 18:37 ` Scott Wood
2010-12-14 18:41 ` Scott Wood
` (14 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Scott Wood @ 2010-12-14 18:37 UTC (permalink / raw)
To: kvm-ppc
On Tue, 14 Dec 2010 18:48:18 +0100
Alexander Graf <agraf@suse.de> wrote:
> The critical section is to prevent us from overwriting the scratch
> registers, yeah. And I think you're right - I had a thinko last night.
>
> If we see that we should inject an interrupt, but we're inside of a
> critical section, we could set the magic page to r/o and try to find the
> critical end at which point we can just inject.
Yeah, I thought of that as well -- but single stepping seemed better
than messing with MMU code (one less thing to check for on the TLB miss
path), and it shouldn't happen often enough, or for enough instructions,
to be a performance issue.
-Scott
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (19 preceding siblings ...)
2010-12-14 18:37 ` Scott Wood
@ 2010-12-14 18:41 ` Scott Wood
2010-12-14 20:04 ` Scott Wood
` (13 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Scott Wood @ 2010-12-14 18:41 UTC (permalink / raw)
To: kvm-ppc
On Tue, 14 Dec 2010 10:40:57 +0200
Avi Kivity <avi@redhat.com> wrote:
> On 12/13/2010 09:03 PM, Scott Wood wrote:
> > >
> > > The interface is a lot simpler. The guest decides what to patch and
> > > where to jump. A "please patch me" flag needs a ton of documentation on
> > > what patch means and what the constraints on the guest environment are.
> > >
> >
> > The constraints need to be documented, but I think "a ton" is a bit of
> > an exaggeration
>
> I guess. It's correct for x86 (which has four processor modes, and you
> need to consider segmentation, etc.), perhaps not so much for powerpc.
Yeah, x86 seems like it could be a mess. We actually already wrote up
these constraints for PowerPC for an upcoming version of ePAPR.
> > -- and having the guest do the patching itself means
> > that the structure of the shared page must become stable ABI.
>
> It has to be a stable ABI in any case so you can live migrate. Unless
> you want the hypervisor to unpatch or something.
Well, there's a difference between "stable among a set of
implementations within which you can live upgrade" and "stable among
all implementations that can run a guest without further
modification". I'm thinking of things like completely different
hypervisors (not just KVM) being able to run the same guest image with
paravirt, newly added paravirts working on a guest that doesn't need
updating beyond the initial change to permit rewriting, etc. And if
there is a mistake made that needs to be incompatibly corrected,
breaking live migration seems less bad than requiring guest code
changes.
I think there are good arguments for both ways -- I don't see any
overwhelming reason to change from what KVM is already doing.
-Scott
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (20 preceding siblings ...)
2010-12-14 18:41 ` Scott Wood
@ 2010-12-14 20:04 ` Scott Wood
2010-12-14 23:00 ` Alexander Graf
` (12 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Scott Wood @ 2010-12-14 20:04 UTC (permalink / raw)
To: kvm-ppc
On Tue, 14 Dec 2010 12:37:32 -0600
Scott Wood <scottwood@freescale.com> wrote:
> On Tue, 14 Dec 2010 18:48:18 +0100
> Alexander Graf <agraf@suse.de> wrote:
>
> > The critical section is to prevent us from overwriting the scratch
> > registers, yeah. And I think you're right - I had a thinko last night.
> >
> > If we see that we should inject an interrupt, but we're inside of a
> > critical section, we could set the magic page to r/o and try to find the
> > critical end at which point we can just inject.
>
> Yeah, I thought of that as well -- but single stepping seemed better
> than messing with MMU code (one less thing to check for on the TLB miss
> path), and it shouldn't happen often enough, or for enough instructions,
> to be a performance issue.
Well, the TLB path might not be so bad if it can reuse an existing
check for mapping the magic page in the first place -- but if an
interrupt happens immediately after setting critical, but before saving
scratch registers, the critical end will not be the next magic page
write. So you'd still have to either single-step or emulate the stores
at least.
Or I suppose we could document that all magic page stores other than
ending critical must come before checking int_pending, though that seems
a bit ugly.
-Scott
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (21 preceding siblings ...)
2010-12-14 20:04 ` Scott Wood
@ 2010-12-14 23:00 ` Alexander Graf
2010-12-14 23:17 ` Scott Wood
` (11 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Alexander Graf @ 2010-12-14 23:00 UTC (permalink / raw)
To: kvm-ppc
On 14.12.2010, at 21:04, Scott Wood wrote:
> On Tue, 14 Dec 2010 12:37:32 -0600
> Scott Wood <scottwood@freescale.com> wrote:
>
>> On Tue, 14 Dec 2010 18:48:18 +0100
>> Alexander Graf <agraf@suse.de> wrote:
>>
>>> The critical section is to prevent us from overwriting the scratch
>>> registers, yeah. And I think you're right - I had a thinko last night.
>>>
>>> If we see that we should inject an interrupt, but we're inside of a
>>> critical section, we could set the magic page to r/o and try to find the
>>> critical end at which point we can just inject.
>>
>> Yeah, I thought of that as well -- but single stepping seemed better
>> than messing with MMU code (one less thing to check for on the TLB miss
>> path), and it shouldn't happen often enough, or for enough instructions,
>> to be a performance issue.
>
> Well, the TLB path might not be so bad if it can reuse an existing
> check for mapping the magic page in the first place -- but if an
> interrupt happens immediately after setting critical, but before saving
> scratch registers, the critical end will not be the next magic page
> write. So you'd still have to either single-step or emulate the stores
> at least.
We could also move the critical value to its own page, so we only have to trap that one :).
> Or I suppose we could document that all magic page stores other than
> ending critical must come before checking int_pending, though that seems
> a bit ugly.
That one's very hard to do with live binary patching and we've seen the mess that pv-ops brought to the x86 world (it's disabled for bare metal on sles due to performance penalties).
Alex
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (22 preceding siblings ...)
2010-12-14 23:00 ` Alexander Graf
@ 2010-12-14 23:17 ` Scott Wood
2010-12-14 23:29 ` Alexander Graf
` (10 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Scott Wood @ 2010-12-14 23:17 UTC (permalink / raw)
To: kvm-ppc
On Wed, 15 Dec 2010 00:00:08 +0100
Alexander Graf <agraf@suse.de> wrote:
>
> On 14.12.2010, at 21:04, Scott Wood wrote:
>
> > Well, the TLB path might not be so bad if it can reuse an existing
> > check for mapping the magic page in the first place -- but if an
> > interrupt happens immediately after setting critical, but before saving
> > scratch registers, the critical end will not be the next magic page
> > write. So you'd still have to either single-step or emulate the stores
> > at least.
>
> We could also move the critical value to its own page, so we only have to trap that one :).
Stable ABI...
> > Or I suppose we could document that all magic page stores other than
> > ending critical must come before checking int_pending, though that seems
> > a bit ugly.
>
> That one's very hard to do with live binary patching
Sorry, I was only talking about stores within a critical section -- not
unrelated stores that other patched instructions might do.
So that once KVM has an interrupt to deliver, and sees that critical is
engaged, it knows that the next magic page store will resolve things.
Either it is a store to critical, and KVM can now deliver the
interrupt -- or it is some other store (scratch or MSR itself) and thus
int_pending has not yet been checked.
I don't think it would be a problem for live patching. It just seems a
bit icky.
-Scott
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (23 preceding siblings ...)
2010-12-14 23:17 ` Scott Wood
@ 2010-12-14 23:29 ` Alexander Graf
2010-12-15 0:00 ` Scott Wood
` (9 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Alexander Graf @ 2010-12-14 23:29 UTC (permalink / raw)
To: kvm-ppc
On 15.12.2010, at 00:17, Scott Wood wrote:
> On Wed, 15 Dec 2010 00:00:08 +0100
> Alexander Graf <agraf@suse.de> wrote:
>
>>
>> On 14.12.2010, at 21:04, Scott Wood wrote:
>>
>>> Well, the TLB path might not be so bad if it can reuse an existing
>>> check for mapping the magic page in the first place -- but if an
>>> interrupt happens immediately after setting critical, but before saving
>>> scratch registers, the critical end will not be the next magic page
>>> write. So you'd still have to either single-step or emulate the stores
>>> at least.
>>
>> We could also move the critical value to its own page, so we only have to trap that one :).
>
> Stable ABI...
>
>>> Or I suppose we could document that all magic page stores other than
>>> ending critical must come before checking int_pending, though that seems
>>> a bit ugly.
>>
>> That one's very hard to do with live binary patching
>
> Sorry, I was only talking about stores within a critical section -- not
> unrelated stores that other patched instructions might do.
>
> So that once KVM has an interrupt to deliver, and sees that critical is
> engaged, it knows that the next magic page store will resolve things.
> Either it is a store to critical, and KVM can now deliver the
> interrupt -- or it is some other store (scratch or MSR itself) and thus
> int_pending has not yet been checked.
>
> I don't think it would be a problem for live patching. It just seems a
> bit icky.
Oh, because you'd only trap stores, but no writes? Yep, that would work.
The hard part here is that currently the ppc kvm emulator treats every memory write trap as mmio. But that's changeable.
I actually like that idea. It's probably the cleanest we can get away with without deep modifications of the guest. Single-step is always icky.
Thinking about the whole thing - can't we create an "interrupt notification page"? Some page that is always mapped read-only when interrupts are available, but read-write when they're not? Then we could just do an unconditional store after the crit section is done and everyone's happy.
Alex
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (24 preceding siblings ...)
2010-12-14 23:29 ` Alexander Graf
@ 2010-12-15 0:00 ` Scott Wood
2010-12-15 0:13 ` Alexander Graf
` (8 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Scott Wood @ 2010-12-15 0:00 UTC (permalink / raw)
To: kvm-ppc
On Wed, 15 Dec 2010 00:29:40 +0100
Alexander Graf <agraf@suse.de> wrote:
> On 15.12.2010, at 00:17, Scott Wood wrote:
>
> > So that once KVM has an interrupt to deliver, and sees that critical is
> > engaged, it knows that the next magic page store will resolve things.
> > Either it is a store to critical, and KVM can now deliver the
> > interrupt -- or it is some other store (scratch or MSR itself) and thus
> > int_pending has not yet been checked.
> >
> > I don't think it would be a problem for live patching. It just seems a
> > bit icky.
>
> Oh, because you'd only trap stores, but no writes? Yep, that would work.
"writes" or "loads"? :-)
> I actually like that idea. It's probably the cleanest we can get away with without deep modifications of the guest. Single-step is always icky.
Well, there's another complication -- if we trap on the final store to
end the critical section, the critical section won't actually be ended
until after that instruction executes. Which won't happen until we set
the page to read/write and let it go. So we'd have to look at the
instruction to see what it's doing.
> Thinking about the whole thing - can't we create an "interrupt notification page"? Some page that is always mapped read-only when interrupts are available, but read-write when they're not? > Then we could just do an unconditional store after the crit section is done and everyone's happy.
I'd limit it to interrupts that were deferred due to critical,
to avoid unnecessary MMU manipulation, and unnecessary traps when doing
mtmsr/wrtee if there's an interrupt pending and old EE = new EE = zero
(assuming the guest doesn't use a separate restore path for that case).
But otherwise sounds reasonable, if we're willing to change the
interface that much. Does it even need to be read-only, or could it be
entirely unmapped when there's a pending interrupt?
-Scott
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (25 preceding siblings ...)
2010-12-15 0:00 ` Scott Wood
@ 2010-12-15 0:13 ` Alexander Graf
2010-12-15 0:57 ` Andreas Färber
` (7 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Alexander Graf @ 2010-12-15 0:13 UTC (permalink / raw)
To: kvm-ppc
On 15.12.2010, at 01:00, Scott Wood wrote:
> On Wed, 15 Dec 2010 00:29:40 +0100
> Alexander Graf <agraf@suse.de> wrote:
>
>> On 15.12.2010, at 00:17, Scott Wood wrote:
>>
>>> So that once KVM has an interrupt to deliver, and sees that critical is
>>> engaged, it knows that the next magic page store will resolve things.
>>> Either it is a store to critical, and KVM can now deliver the
>>> interrupt -- or it is some other store (scratch or MSR itself) and thus
>>> int_pending has not yet been checked.
>>>
>>> I don't think it would be a problem for live patching. It just seems a
>>> bit icky.
>>
>> Oh, because you'd only trap stores, but no writes? Yep, that would work.
>
> "writes" or "loads"? :-)
s/writes/loads/ :). Sorry, it's 1am here :).
>
>> I actually like that idea. It's probably the cleanest we can get away with without deep modifications of the guest. Single-step is always icky.
>
> Well, there's another complication -- if we trap on the final store to
> end the critical section, the critical section won't actually be ended
> until after that instruction executes. Which won't happen until we set
> the page to read/write and let it go. So we'd have to look at the
> instruction to see what it's doing.
Yep, which is why I proposed the thing below. We'd have to emulate the uncrit store and then automatically inject the interrupt because we're not in the crit section anymore.
>
>> Thinking about the whole thing - can't we create an "interrupt notification page"? Some page that is always mapped read-only when interrupts are available, but read-write when they're not? > Then we could just do an unconditional store after the crit section is done and everyone's happy.
>
> I'd limit it to interrupts that were deferred due to critical,
> to avoid unnecessary MMU manipulation, and unnecessary traps when doing
> mtmsr/wrtee if there's an interrupt pending and old EE = new EE = zero
> (assuming the guest doesn't use a separate restore path for that case).
Hrm, we already do treat EE 0 -> 1 changes differently in the code:
/* Check if we have to fetch an interrupt */
lwz r31, (KVM_MAGIC_PAGE + KVM_MAGIC_INT)(0)
cmpwi r31, 0
beq+ no_check
/* Check if we may trigger an interrupt */
andi. r30, r30, MSR_EE
beq no_check
SCRATCH_RESTORE
/* Nag hypervisor */
kvm_emulate_mtmsrd_orig_ins:
tlbsync
b kvm_emulate_mtmsrd_branch
So the only thing we need to do is to get rid of the MAGIC_INT check and unconditionally store a random register into the notification page (-2*PAGE_SIZE) instead of the tlbsync instruction (sure, things there are slightly more compllicated because we actually use the real mtmsr, but you get the point).
>
> But otherwise sounds reasonable, if we're willing to change the
> interface that much. Does it even need to be read-only, or could it be
> entirely unmapped when there's a pending interrupt?
I don't see why we shouldn't add this to the interface. But I'd leave this in the back of our heads for a couple more days so in case we end up coming up with an even better idea, we rather implement that one :)
It could be read-only or unmapped - doesn't really matter. It has to be a store, because otherwise we'd clobber a guest register. It could however be a single physical page shared by all guests. We don't care about the contents written into that page anyways.
Alex
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (26 preceding siblings ...)
2010-12-15 0:13 ` Alexander Graf
@ 2010-12-15 0:57 ` Andreas Färber
2010-12-15 9:48 ` Avi Kivity
` (6 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Andreas Färber @ 2010-12-15 0:57 UTC (permalink / raw)
To: kvm-ppc
Am 14.12.2010 um 18:53 schrieb Hollis Blanchard:
> On 12/14/2010 12:48 AM, Avi Kivity wrote:
>> On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
>>>> Rewriting is dangerous if the guest is unaware of it. As soon as
>>>> it is made aware of it, it might as well actually do it in the
>>>> best way that suits it.
>>>
>>> Can you list some examples of dangerous scenarios?
>> - guest jits own kernel code (like Singularity), gets confused when
>> it reads back something it didn't write
> This is getting really hypothetical, but why would a JIT need to
> read the generated code?
Mono/ppc actually does that. It generates trampoline functions and
searches emitted code for lis/ori/.../blrl sequences, for instance, to
patch addresses for subsequent invocations.
Andreas
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (27 preceding siblings ...)
2010-12-15 0:57 ` Andreas Färber
@ 2010-12-15 9:48 ` Avi Kivity
2010-12-15 11:16 ` Sethi Varun-B16395
` (5 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-12-15 9:48 UTC (permalink / raw)
To: kvm-ppc
On 12/14/2010 07:53 PM, Hollis Blanchard wrote:
> On 12/14/2010 12:48 AM, Avi Kivity wrote:
>> On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
>>>> Rewriting is dangerous if the guest is unaware of it. As soon as
>>>> it is made aware of it, it might as well actually do it in the best
>>>> way that suits it.
>>>
>>> Can you list some examples of dangerous scenarios?
>>>
> Perhaps I should rephrase... any real-world dangerous scenarios? :)
That's much less fun.
> I was hoping you could share some traps you've hit with Linux or
> Windows on x86.
We've hit a lot of issues with the very limited patching we do for
Windows XP (Linux does its own patching):
- Windows hibernation saves the patched code, but not the payload, so we
have to set up hooks to re-enable the payload when Windows resumes from
hibernation
- We need the vcpu id in the payload code, and no easy way to get at
it. After several wierd hacks we settled on peeking at the Windows
processor control block, a guest specific per-cpu data structure.
- Some patched instructions are called before the stack is set up, so
the return doesn't work very well
- others I'm suppressing
>> - guest checksums own kernel pages
> For runtime intrusion detection? Such guests can simply not ask the
> hypervisor to enable the rewriting feature.
Which is sad.
>> - clever compiler reuses code for constant pool
> Not sure what you mean here. Anyways I think clever compilers are
> irrelevant, since a compiler will not ordinarily emit a
> supervisor-mode instruction. The hypervisor has no need to patch
> normal user-mode instructions.
I meant a really clever compiler. And by using code for the constant
pool I using IP-relative addressing to fetch a constant using a small
offset. If the constant happens to be a patched instruction, it won't
be so constant.
>> - guest patches itself (a la linux alternatives), surprised when it
>> sees a different instruction
> PowerPC Linux does patch itself, which is a write-only operation.
Other self-patchers might be different; say you use xor to toggle
between two variants, reducing the amount of data you need to keep for
patching.
>> - guest jits own kernel code (like Singularity), gets confused when
>> it reads back something it didn't write
> This is getting really hypothetical, but why would a JIT need to read
> the generated code?
>
Any wierd hypothetical idea will be in mission-critical production use
somewhere, see Andreas reply.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 36+ messages in thread* RE: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (28 preceding siblings ...)
2010-12-15 9:48 ` Avi Kivity
@ 2010-12-15 11:16 ` Sethi Varun-B16395
2010-12-15 11:18 ` Avi Kivity
` (4 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Sethi Varun-B16395 @ 2010-12-15 11:16 UTC (permalink / raw)
To: kvm-ppc
> -----Original Message-----
> From: kvm-ppc-owner@vger.kernel.org [mailto:kvm-ppc-
> owner@vger.kernel.org] On Behalf Of Avi Kivity
> Sent: Tuesday, December 14, 2010 9:18 PM
> To: Yoder Stuart-B08248
> Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org
> Subject: Re: re-writing on powerpc
>
> On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote:
> > > -----Original Message-----
> > > From: Avi Kivity [mailto:avi@redhat.com]
> > > Sent: Tuesday, December 14, 2010 2:49 AM
> > > To: Hollis Blanchard
> > > Cc: Yoder Stuart-B08248; Alexander Graf; kvm-ppc@vger.kernel.org
> > > Subject: Re: re-writing on powerpc
> > >
> > > On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
> > > >> Rewriting is dangerous if the guest is unaware of it. As soon
> > > as
> > it
> > > >> is made aware of it, it might as well actually do it in the
> > > best
> > way
> > > >> that suits it.
> > > >
> > > > Can you list some examples of dangerous scenarios?
> > > >
> > >
> > > - guest checksums own kernel pages
> > > - clever compiler reuses code for constant pool
> > > - guest patches itself (a la linux alternatives), surprised when it
> > sees a
> > > different instruction
> > > - guest jits own kernel code (like Singularity), gets confused when
> > > it reads back something it didn't write
> >
> > One possible solution to hiding rewriting from guest if it must be
> > hidden is to mark patched pages as execute only. If a guest reads a
> > patched page, the hypervisor can fix up the read.
> >
>
> Yes. Something that is common to all the problems above is "using code
> as data".
>
> However, execute only would only affect the page's mapping, not the page
> itself, yes? So if the page has another mapping, this doesn't work.
>
But KVM would be aware of guest page mappings, so access permissions for any particular mapping
can be controlled by KVM.
-Varun
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (29 preceding siblings ...)
2010-12-15 11:16 ` Sethi Varun-B16395
@ 2010-12-15 11:18 ` Avi Kivity
2010-12-15 11:32 ` Sethi Varun-B16395
` (3 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-12-15 11:18 UTC (permalink / raw)
To: kvm-ppc
On 12/15/2010 01:16 PM, Sethi Varun-B16395 wrote:
>
> > -----Original Message-----
> > From: kvm-ppc-owner@vger.kernel.org [mailto:kvm-ppc-
> > owner@vger.kernel.org] On Behalf Of Avi Kivity
> > Sent: Tuesday, December 14, 2010 9:18 PM
> > To: Yoder Stuart-B08248
> > Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org
> > Subject: Re: re-writing on powerpc
> >
> > On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote:
> > > > -----Original Message-----
> > > > From: Avi Kivity [mailto:avi@redhat.com]
> > > > Sent: Tuesday, December 14, 2010 2:49 AM
> > > > To: Hollis Blanchard
> > > > Cc: Yoder Stuart-B08248; Alexander Graf; kvm-ppc@vger.kernel.org
> > > > Subject: Re: re-writing on powerpc
> > > >
> > > > On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
> > > > >> Rewriting is dangerous if the guest is unaware of it. As soon
> > > > as
> > > it
> > > > >> is made aware of it, it might as well actually do it in the
> > > > best
> > > way
> > > > >> that suits it.
> > > > >
> > > > > Can you list some examples of dangerous scenarios?
> > > > >
> > > >
> > > > - guest checksums own kernel pages
> > > > - clever compiler reuses code for constant pool
> > > > - guest patches itself (a la linux alternatives), surprised when it
> > > sees a
> > > > different instruction
> > > > - guest jits own kernel code (like Singularity), gets confused when
> > > > it reads back something it didn't write
> > >
> > > One possible solution to hiding rewriting from guest if it must be
> > > hidden is to mark patched pages as execute only. If a guest reads a
> > > patched page, the hypervisor can fix up the read.
> > >
> >
> > Yes. Something that is common to all the problems above is "using code
> > as data".
> >
> > However, execute only would only affect the page's mapping, not the page
> > itself, yes? So if the page has another mapping, this doesn't work.
> >
>
> But KVM would be aware of guest page mappings, so access permissions for any particular mapping
> can be controlled by KVM.
kvm isn't aware of all guest mappings (only those that were instantiated
in shadow tlb/pagetables).
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 36+ messages in thread* RE: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (30 preceding siblings ...)
2010-12-15 11:18 ` Avi Kivity
@ 2010-12-15 11:32 ` Sethi Varun-B16395
2010-12-15 12:25 ` Avi Kivity
` (2 subsequent siblings)
34 siblings, 0 replies; 36+ messages in thread
From: Sethi Varun-B16395 @ 2010-12-15 11:32 UTC (permalink / raw)
To: kvm-ppc
> -----Original Message-----
> From: Avi Kivity [mailto:avi@redhat.com]
> Sent: Wednesday, December 15, 2010 4:49 PM
> To: Sethi Varun-B16395
> Cc: Yoder Stuart-B08248; Hollis Blanchard; Alexander Graf; kvm-
> ppc@vger.kernel.org
> Subject: Re: re-writing on powerpc
>
> On 12/15/2010 01:16 PM, Sethi Varun-B16395 wrote:
> >
> > > -----Original Message-----
> > > From: kvm-ppc-owner@vger.kernel.org [mailto:kvm-ppc-
> > > owner@vger.kernel.org] On Behalf Of Avi Kivity
> > > Sent: Tuesday, December 14, 2010 9:18 PM
> > > To: Yoder Stuart-B08248
> > > Cc: Hollis Blanchard; Alexander Graf; kvm-ppc@vger.kernel.org
> > > Subject: Re: re-writing on powerpc
> > >
> > > On 12/14/2010 05:45 PM, Yoder Stuart-B08248 wrote:
> > > > > -----Original Message-----
> > > > > From: Avi Kivity [mailto:avi@redhat.com]
> > > > > Sent: Tuesday, December 14, 2010 2:49 AM
> > > > > To: Hollis Blanchard
> > > > > Cc: Yoder Stuart-B08248; Alexander Graf; kvm-
> ppc@vger.kernel.org
> > > > > Subject: Re: re-writing on powerpc
> > > > >
> > > > > On 12/13/2010 07:17 PM, Hollis Blanchard wrote:
> > > > > >> Rewriting is dangerous if the guest is unaware of it.
> As soon
> > > > > as
> > > > it
> > > > > >> is made aware of it, it might as well actually do it in
> the
> > > > > best
> > > > way
> > > > > >> that suits it.
> > > > > >
> > > > > > Can you list some examples of dangerous scenarios?
> > > > > >
> > > > >
> > > > > - guest checksums own kernel pages
> > > > > - clever compiler reuses code for constant pool
> > > > > - guest patches itself (a la linux alternatives), surprised
> when it
> > > > sees a
> > > > > different instruction
> > > > > - guest jits own kernel code (like Singularity), gets
> confused when
> > > > > it reads back something it didn't write > > One possible
> > > solution to hiding rewriting from guest if it must be > hidden is
> > > to mark patched pages as execute only. If a guest reads a >
> > > patched page, the hypervisor can fix up the read.
> > > >
> > >
> > > Yes. Something that is common to all the problems above is "using
> > > code as data".
> > >
> > > However, execute only would only affect the page's mapping, not the
> > > page itself, yes? So if the page has another mapping, this doesn't
> work.
> > >
> >
> > But KVM would be aware of guest page mappings, so access permissions
> > for any particular mapping can be controlled by KVM.
>
> kvm isn't aware of all guest mappings (only those that were instantiated
> in shadow tlb/pagetables).
I am not sure if I understand, but guest would have to be instantiate the mapping in the tlb (for BookE) before page can be accessed.
That's when we can set the access permissions.
-Varun
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (31 preceding siblings ...)
2010-12-15 11:32 ` Sethi Varun-B16395
@ 2010-12-15 12:25 ` Avi Kivity
2010-12-17 21:59 ` Benjamin Herrenschmidt
2010-12-17 22:00 ` Benjamin Herrenschmidt
34 siblings, 0 replies; 36+ messages in thread
From: Avi Kivity @ 2010-12-15 12:25 UTC (permalink / raw)
To: kvm-ppc
On 12/15/2010 01:32 PM, Sethi Varun-B16395 wrote:
> > >
> > > But KVM would be aware of guest page mappings, so access permissions
> > > for any particular mapping can be controlled by KVM.
> >
> > kvm isn't aware of all guest mappings (only those that were instantiated
> > in shadow tlb/pagetables).
> I am not sure if I understand, but guest would have to be instantiate the mapping in the tlb (for BookE) before page can be accessed.
> That's when we can set the access permissions.
You're right, for a shadow tlb kvm has all guest mappings at all time.
For page table models, it doesn't.
--
error compiling committee.c: too many arguments to function
^ permalink raw reply [flat|nested] 36+ messages in thread* RE: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (32 preceding siblings ...)
2010-12-15 12:25 ` Avi Kivity
@ 2010-12-17 21:59 ` Benjamin Herrenschmidt
2010-12-17 22:00 ` Benjamin Herrenschmidt
34 siblings, 0 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2010-12-17 21:59 UTC (permalink / raw)
To: kvm-ppc
On Wed, 2010-12-15 at 11:32 +0000, Sethi Varun-B16395 wrote:
> > kvm isn't aware of all guest mappings (only those that were instantiated
> > in shadow tlb/pagetables).
> I am not sure if I understand, but guest would have to be instantiate the mapping in the tlb (for BookE) before page can be accessed.
> That's when we can set the access permissions.
But then you need to track them and add overhead to your TLB management,
which you really don't want.
Ben.
^ permalink raw reply [flat|nested] 36+ messages in thread* Re: re-writing on powerpc
2010-12-13 4:45 re-writing on powerpc Yoder Stuart-B08248
` (33 preceding siblings ...)
2010-12-17 21:59 ` Benjamin Herrenschmidt
@ 2010-12-17 22:00 ` Benjamin Herrenschmidt
34 siblings, 0 replies; 36+ messages in thread
From: Benjamin Herrenschmidt @ 2010-12-17 22:00 UTC (permalink / raw)
To: kvm-ppc
On Mon, 2010-12-13 at 09:12 -0800, Hollis Blanchard wrote:
> On 12/13/2010 12:42 AM, Alexander Graf wrote:
> > Back when I implemented this, we did however have discussions on exactly that distinction between patching in host or guest space and for some reason I remember that you and Hollis figured that guest patching is superior. I just really can't remember why and couldn't find traces of this in my inbox either :).
> Don't blame me for this. :) My original patching (with Christian) was
> done from host context, and those patches are in the list archives.
>
> As far as I remember, Ben H said he preferred patching from guest
> context (mostly for unspecified or "gut feeling" reasons), and then
> that's what you did. IIRC it was IRC conversation, which is why it
> wouldn't be in your inbox.
I didn't want to mention rumors of patents I heard about...
Ben.
^ permalink raw reply [flat|nested] 36+ messages in thread