* RE: paravirt_ops support in IA64
2008-02-18 3:28 paravirt_ops support in IA64 Dong, Eddie
@ 2008-02-18 3:52 ` Zhang, Xiantao
2008-02-18 11:31 ` Isaku Yamahata
` (3 subsequent siblings)
4 siblings, 0 replies; 7+ messages in thread
From: Zhang, Xiantao @ 2008-02-18 3:52 UTC (permalink / raw)
To: linux-ia64
Dong, Eddie wrote:
>
> In X86, there are another enhancement (dynamic patching) base on
> pv_ops. The purpose is to improve cpu predication by converting
> indriect function call to direct function call for both C & ASM code.
> We may take similar approach some time later too.
>
> We really need advices from community before we jump into
coding.
> CC some active members that I though may be interested in pv_ops
> since KVM-IA64 mailinglist doesn;t exist yet.
Hi, Eddie
we just created the kvm-ia64-devel mailing list, and cc to the guys from
this list who maybe interested in this topic.
If you or other guys who are interested in kvm-ia64-devel or pv_virt_ops
for ia64, please subscribe it here
http://kvm.qumranet.com/kvmwiki/Lists%2C_IRC
Xiantao
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: paravirt_ops support in IA64
2008-02-18 3:28 paravirt_ops support in IA64 Dong, Eddie
2008-02-18 3:52 ` Zhang, Xiantao
@ 2008-02-18 11:31 ` Isaku Yamahata
2008-02-19 0:13 ` Keith Owens
2008-02-18 11:54 ` Robin Holt
` (2 subsequent siblings)
4 siblings, 1 reply; 7+ messages in thread
From: Isaku Yamahata @ 2008-02-18 11:31 UTC (permalink / raw)
To: linux-ia64
[Added CC:virtualization@lists.linux-foundation.org]
On Mon, Feb 18, 2008 at 11:28:41AM +0800, Dong, Eddie wrote:
> Hi, Tony & all:
> Recently Xen-IA64 community is considering to add paravirt_ops
> support to keep sync with X86 and reduce maintenance effort. With
> pv_ops, sensitive instructions or some high level primitive
> functionalities (such as MMU ops) are replaced with pv_ops which is a
> function table call whose exact function pointer is initialized at Linux
> startup time depending on different hypervisor (or native) runing
> underlayer.
I've been working on forward porting Xenfied Linux.
Now I have domU boot and disk/network working with linux 2.6.25-rc1.
I'm planning to post those patch in this week. At worst I'll post
the cpu virtualization part which is discussed in this thread.
For those curious, please see
http://people.valinux.co.jp/~yamahata/xen-ia64/20080214/xen-ia64-20080214.patch
Sorry for the single jumbo patch, I'm now splitting it up into many
small patches for post.
> With this, we can reuse many code with X86 such as irqchip with
> X86, and similar dma support with X86, similar xenoprof/PMU profiling
> support etc. While CPU side pv_ops is quit different especially for
> those ASM code, since IA64 processor doesn;t have memory/stack ready at
> most IVT handler code.
>
> In X86, ASM side pv_ops can save clobber registers to stack and
> do function call, but IA64 can't due to unavailable of memory access.
>
> #define DISABLE_INTERRUPTS(clobbers)
> \
> PARA_SITE(PARA_PATCH(pv_irq_ops, PV_IRQ_irq_disable), clobbers,
> \
> pushl %eax; pushl %ecx; pushl %edx;
> \
> call *%cs:pv_irq_ops+PV_IRQ_irq_disable;
> \
> popl %edx; popl %ecx; popl %eax)
> \
>
>
> One of the 1st biggest argument is how to support those ASM IVT
> handler code. Some ideas discussed include:
Although ivt.S has the top priority, but there are other two codes which
xen paravirtualizes currently.
pal.S:
More specifically ia64_pal_call_static()
Maybe we can go without paravirtualized ia64_pall_call_static().
Since pal static convension is very stable, having implementation
or each paravirtualization technology might be acceptable because
it won't cause maintenance const much.
entry.S:
The kernel leaving point. This is the counter part of ivt.S.
More concretely ia64_switch_to(), ia64_leave_syscall() and
ia64_leave_kernel(). They require certainly paravirtualization
because they include sensitive instructions and performance critical.
Those functions can't be switched very easily compared to the ivt
case so that some kind of facitilty which switch those or
binary patching them are necessary.
> 1: Dual IVT source code, dual IVT table.
> This is current Xen did, and probably are not warmly
> welcomed since it is not in upstream yet and have maintenance effort.
Pros:
- Optimal code can be possible for native and each paravirtualized case.
- Doesn't introduce any further restriction on native case.
Cons:
- Ugly and maintenance cost as you already stated.
> 2: Same IVT source code, but dual/mulitple compile to generate
> dual/multiple IVT table. I.e. we replace those primitive ops (sensitive
> instructions) with a MACRO which uses compile option for different
> hypervisor type.
> The pseudo code of the MACRO could be: (take read CR.IVR
> as example)
>
> AltA:
> #define ASM_READ_IVR /* read IVR to GR24 */
> #ifdef XEN
> breg1 = return address
> br xen_readivr
> #else /* native
> mov GR24=CR.IVR;
> #endif
> Or
> AltB:
> #define ASM_READ_IVR /* read IVR to GR24 */
> #ifdef XEN
> in place code of function xen_readivr
> #else /* native
> mov GR24=CR.IVR;
> #endif
>
> From maintenance effort point of view, it is minimized,
> but not exactly what X86 pv_ops look like.
>
> Both approach will cause code size issue, but altB is
> much worse in this area, while AltA need one additional BR clobber
> register
Pros:
- single code
- hopefull less maintenance cost compared to #1
Cons:
- requires restriction on register usage. And we need to define its
convension.
When modifying ivt.S in the future after converting ivt.S,
those convesion must be kept in mind.
- suboptimal for paravirtualized case compared to #1 case
> 3: Single IVT table, using indirect function call for pv_ops.
> This is more like X86 pv_ops, but we need to pay 2
> additional BR clobber registers due to indirect function call, like
> following pseudo code:
>
> AltC:
> breg0 = pv_ops base
> breg0 += offset for this pv_ops
> breg1 = return address;
> br breg0. /* pv_ops clobbered breg0/breg1 */
>
>
> For both #2 & #3, we need to modify Linux IVT code to get
> clobber register for those MACROs, #3 need 2 br registers and 1-2 GR
> registers for the function body. #2A needs least clobber register, just
> 1-2 GR registers.
#2B may also need clobber 1(or 2?) GR registers depending on the
original instruction.
Pros:
- single code/binary
- less maintenance cost
Cons:
- requires restriction on register usage. And we need to define its
convension.
When modifying ivt.S in the future after converting ivt.S,
those convesion must be kept in mind.
- more clobbered register (for AltC)
- suboptimal even for native case.
Presumably we can use binary patching technique to mitigate those overhead.
Probably for native case, we can convert those branch with single
instruction.
For example we can make 'br breg0' into direct branch.
AltD(AltC'):
breg1 = return address;
br native_pv_ops_ops <== binary patch at boot time
> In X86, there are another enhancement (dynamic patching) base on
> pv_ops. The purpose is to improve cpu predication by converting indriect
> function call to direct function call for both C & ASM code. We may take
> similar approach some time later too.
>
> We really need advices from community before we jump into
> coding.
> CC some active members that I though may be interested in pv_ops
> since KVM-IA64 mailinglist doesn;t exist yet.
The final goal is merging up Xenified Linux/IA64 domU/dom0 code.
I expect that it requires many clean up and abstraction.
The first step is merging domU first and it would requires
- XenLinux portabiliy clean up
Those kind of patches can be pused into upstream independently
- cpu instruction paravirtualization
- assembly code
- instrinsics
- iosapic paravirtualization(event channel)
- xen irq chip
- and more...
thanks,
--
yamahata
^ permalink raw reply [flat|nested] 7+ messages in thread* Re: paravirt_ops support in IA64
2008-02-18 11:31 ` Isaku Yamahata
@ 2008-02-19 0:13 ` Keith Owens
0 siblings, 0 replies; 7+ messages in thread
From: Keith Owens @ 2008-02-19 0:13 UTC (permalink / raw)
To: Isaku Yamahata
Cc: Jack Steiner, linux-ia64, Dong, Eddie, virtualization, Robin Holt,
Christoph Lameter, Zhang, Xiantao, xen-ia64-devel
Isaku Yamahata (on Mon, 18 Feb 2008 20:31:16 +0900) wrote:
>On Mon, Feb 18, 2008 at 11:28:41AM +0800, Dong, Eddie wrote:
>> 2: Same IVT source code, but dual/mulitple compile to generate
>> dual/multiple IVT table. I.e. we replace those primitive ops (sensitive
>> instructions) with a MACRO which uses compile option for different
>> hypervisor type.
>> The pseudo code of the MACRO could be: (take read CR.IVR
>> as example)
>>
>> AltA:
>> #define ASM_READ_IVR /* read IVR to GR24 */
>> #ifdef XEN
>> breg1 = return address
>> br xen_readivr
>> #else /* native
>> mov GR24=CR.IVR;
>> #endif
>> Or
>> AltB:
>> #define ASM_READ_IVR /* read IVR to GR24 */
>> #ifdef XEN
>> in place code of function xen_readivr
>> #else /* native
>> mov GR24=CR.IVR;
>> #endif
>>
>> From maintenance effort point of view, it is minimized,
>> but not exactly what X86 pv_ops look like.
>>
>> Both approach will cause code size issue, but altB is
>> much worse in this area, while AltA need one additional BR clobber
>> register
>
>
>Pros:
>- single code
>- hopefull less maintenance cost compared to #1
>
>Cons:
>- requires restriction on register usage. And we need to define its
> convension.
> When modifying ivt.S in the future after converting ivt.S,
> those convesion must be kept in mind.
>- suboptimal for paravirtualized case compared to #1 case
Please, please, please do _NOT_ hide register numbers inside small
macros like this. It makes it far too easy to miss register side
effects when looking at IA64 assembler code. Instead make the register
usage a parameter to the macro, so a human looking at the source code
can see which registers are being used. The macros in
include/asm-ia64/mca_asm.h are good examples of this approach. IOW
#define ASM_READ_IVR(breg, greg) ... move greg=cr.ivr
ASM_READ_IVR(breg1, gr24) # make it obvious which registers are hit
For large macros like SAVE_ALL there is no choice but to hide register
side effects inside the macro, but that should be the exception, not
the rule.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: paravirt_ops support in IA64
2008-02-18 3:28 paravirt_ops support in IA64 Dong, Eddie
2008-02-18 3:52 ` Zhang, Xiantao
2008-02-18 11:31 ` Isaku Yamahata
@ 2008-02-18 11:54 ` Robin Holt
2008-02-18 13:58 ` Dong, Eddie
2008-02-18 15:43 ` Dong, Eddie
4 siblings, 0 replies; 7+ messages in thread
From: Robin Holt @ 2008-02-18 11:54 UTC (permalink / raw)
To: linux-ia64
On Mon, Feb 18, 2008 at 11:52:30AM +0800, Zhang, Xiantao wrote:
> Dong, Eddie wrote:
> >
> > In X86, there are another enhancement (dynamic patching) base on
> > pv_ops. The purpose is to improve cpu predication by converting
> > indriect function call to direct function call for both C & ASM code.
> > We may take similar approach some time later too.
> >
> > We really need advices from community before we jump into
> coding.
> > CC some active members that I though may be interested in pv_ops
> > since KVM-IA64 mailinglist doesn;t exist yet.
>
> Hi, Eddie
>
> we just created the kvm-ia64-devel mailing list, and cc to the guys from
> this list who maybe interested in this topic.
> If you or other guys who are interested in kvm-ia64-devel or pv_virt_ops
> for ia64, please subscribe it here
Can we please keep the discussion about changes to arch/ia64 core
functionality on this list and not move it elsewhere. This list is low
enough volume and the contributors to the discussion will be consistent
enough that most non-interested people will ignore things. The other
changes related to virtualization specific to ia64 are fine in that
forum, but if you are changing ivt.S, entry.S, etc, IMHO, those should
be discussed here. Also, could you repost the patches?
On a different note, I am willingly and woefully unaware of what the
paravirt _NEEDS_ are. Could those be summarised as well? I will
admit to being completely ignorant about how paravirt works on x86.
Please don't state them in terms of we want to change the code this way,
but rather in terms of we need to intercept these points in the kernel
for this reason/purpose.
Thanks,
Robin
^ permalink raw reply [flat|nested] 7+ messages in thread
* RE: paravirt_ops support in IA64
2008-02-18 3:28 paravirt_ops support in IA64 Dong, Eddie
` (2 preceding siblings ...)
2008-02-18 11:54 ` Robin Holt
@ 2008-02-18 13:58 ` Dong, Eddie
2008-02-18 15:43 ` Dong, Eddie
4 siblings, 0 replies; 7+ messages in thread
From: Dong, Eddie @ 2008-02-18 13:58 UTC (permalink / raw)
To: linux-ia64
Robin Holt wrote:
> those should be discussed here. Also, could you repost the patches?
The work is just started, so sorry we don't have patches in hand right
now.
But we can provide some example code for better reference.
>
> On a different note, I am willingly and woefully unaware of what the
> paravirt _NEEDS_ are. Could those be summarised as well? I will
Sure. In both X86 & IA64, the classical processor architecture is not
virtualizable (before VT and similar HW technology are invented), so
para_virtualization technology is invented by modifying guest Linux
source code to work with hypervisor cooperatively, such as Xen.
But this kind of paravirtualization technology needs to modify Linux
source a lot and get a lot of debate in community. The original
Xen PV patches are not accepted so far and it is a
big issue to OSVs to maintain 2 version of Linux OS (guest Linux
and bare metal Linux). Meanwhile there are new hypervisors come
in which need different modification to Linux. VMware proposed
VMI (Virtual Machine Interface) in 06 OLS to replace sensitive
instructions
by VMI so that same Linux binary can run for both native and guest.
Later on Rusty Russell proposed a function table based solution called
paravirt_ops (pv_ops) and eventually get community buy in. Today the
X86/32 bits support is in upstream.
The basic idea of X86 pv_ops is to replace original processor sensitive
instructions such as cli/sti to a call to a pv_ops function table, whose
contents are initialized for each hypervisor. For example:
--- xx 2008-02-18 20:09:32.000000000 +0800
+++ entry_32.S 2007-12-18 16:22:59.000000000 +0800
@@ -264,7 +264,7 @@
#ifdef CONFIG_PREEMPT
ENTRY(resume_kernel)
- cli
+ DISABLE_INTERRUPTS(CLBR_ANY)
cmpl $0,TI_preempt_count(%ebp) # non-zero preempt_count ?
jnz restore_nocheck
need_resched:
The definition of DISABLE_INTERRUPTS behavior like (pseudo
code):
#define DISABLE_INTERRUPTS(clb) pv_irq_ops->irq_disable()
At the Linux startup time, the initialization code will set
pv_irq_ops->irq_disable to xen_irqdisable if underlying hyperviosr is
Xen,
native_irqdisable if on bare metal for example.
> admit to being completely ignorant about how paravirt works on x86.
> Please don't state them in terms of we want to change the code this
> way, but rather in terms of we need to intercept these points in the
> kernel for this reason/purpose.
Agree, basically IA64 side has same maintenance issue for OSVs
such as Redhat to release 2 images for native & Xen guest today.
to be more important, today Xen PV (paravirtualized) guest only support
2.6.18 Linux, forward porting to latest Linux needs a lot of effort. So
we want to push those changes to upstream. That is the whole purpose.
People ever pushed current Xen/IA64 PV Linux changes to upstream,
but didn't get promising result yet. Due to the ermeging pv_ops support
in X86 side, if IA64 side also takes pv_ops based solution, we can reuse
a lot of common code and thus less patch than before. It also can
support
different hypervisor in future easily such as KVM for
paravirtualization.
Current Xen/IA64 is not doing pv_ops yet, we are still in brainstorming/
design phase. If we could get more comments from you guys before we
choose a way to go, that would save us a lot of effort.
Yes, right now we are mainly focusing on entry.S & ivt.S changes.
>
> Thanks,
> Robin
> -
Thanks, Eddie
^ permalink raw reply [flat|nested] 7+ messages in thread* RE: paravirt_ops support in IA64
2008-02-18 3:28 paravirt_ops support in IA64 Dong, Eddie
` (3 preceding siblings ...)
2008-02-18 13:58 ` Dong, Eddie
@ 2008-02-18 15:43 ` Dong, Eddie
4 siblings, 0 replies; 7+ messages in thread
From: Dong, Eddie @ 2008-02-18 15:43 UTC (permalink / raw)
To: linux-ia64
Isaku Yamahata wrote:
>> 2: Same IVT source code, but dual/mulitple compile to generate
>> dual/multiple IVT table. I.e. we replace those primitive ops
>> (sensitive instructions) with a MACRO which uses compile option for
>> different hypervisor type. The pseudo code of the MACRO
could be:
>> (take read CR.IVR
>> as example)
>>
>> AltA:
>> #define ASM_READ_IVR /* read IVR to GR24 */
>> #ifdef XEN
>> breg1 = return address
>> br xen_readivr
>> #else /* native
>> mov GR24=CR.IVR;
>> #endif
>> Or
>> AltB:
>> #define ASM_READ_IVR /* read IVR to GR24 */
>> #ifdef XEN
>> in place code of function xen_readivr
>> #else /* native
>> mov GR24=CR.IVR;
>> #endif
>>
>> From maintenance effort point of view, it is minimized,
>> but not exactly what X86 pv_ops look like.
>>
>> Both approach will cause code size issue, but altB is
>> much worse in this area, while AltA need one additional BR clobber
>> register
>
>
> Pros:
> - single code
> - hopefull less maintenance cost compared to #1
>
> Cons:
> - requires restriction on register usage. And we need to define its
> convension.
> When modifying ivt.S in the future after converting ivt.S,
> those convesion must be kept in mind.
> - suboptimal for paravirtualized case compared to #1 case
>
>
>> 3: Single IVT table, using indirect function call for pv_ops.
>> This is more like X86 pv_ops, but we need to pay 2
>> additional BR clobber registers due to indirect function call, like
>> following pseudo code:
>>
>> AltC:
>> breg0 = pv_ops base
>> breg0 += offset for this pv_ops
>> breg1 = return address;
>> br breg0. /* pv_ops clobbered breg0/breg1 */
>>
>>
>> For both #2 & #3, we need to modify Linux IVT code to get
>> clobber register for those MACROs, #3 need 2 br registers and 1-2 GR
>> registers for the function body. #2A needs least clobber register,
>> just 1-2 GR registers.
>
> #2B may also need clobber 1(or 2?) GR registers depending on the
> original instruction.
Yes, clobber GR # is almost same for all Alts.
>
> Pros:
> - single code/binary
> - less maintenance cost
>
> Cons:
> - requires restriction on register usage. And we need to define its
> convension.
> When modifying ivt.S in the future after converting ivt.S,
> those convesion must be kept in mind.
> - more clobbered register (for AltC)
> - suboptimal even for native case.
After binary patching, native side won't have impact.
We can have in place patching, i..e. replace whole AltC
code dynamically with "mov GRx=CR.IVR;nop;nop..."
>
> Presumably we can use binary patching technique to mitigate those
> overhead. Probably for native case, we can convert those branch with
> single instruction.
> For example we can make 'br breg0' into direct branch.
If it is single IVT table, we don't know the target address of
the function call.
> AltD(AltC'):
> breg1 = return address;
> br native_pv_ops_ops <== binary patch at boot time
>
?? Are u talking about AltA?
thanks, Eddie
^ permalink raw reply [flat|nested] 7+ messages in thread