All of lore.kernel.org
 help / color / mirror / Atom feed
* RE: passing hypercall parameters by pointer
@ 2005-08-17 20:44 Ian Pratt
       [not found] ` <mailman.1124311483.4826@unix-os.sc.intel.com>
  2005-08-17 22:04 ` Hollis Blanchard
  0 siblings, 2 replies; 21+ messages in thread
From: Ian Pratt @ 2005-08-17 20:44 UTC (permalink / raw)
  To: Hollis Blanchard, xen-devel; +Cc: Jimi Xenidis

> Many Xen hypercalls pass mlocked pointers as parameters for 
> both input and output. For example, xc_get_pfn_list() is a 
> nice one with multiple levels of structures/mlocking.
> 
> Considering just the tools for the moment, those pointers are 
> userspace addresses. Ultimately the hypervisor ends up with 
> that userspace address, from which it reads and writes data. 
> This is OK for x86, since userspace, kernel, and hypervisor 
> all share the same virtual address space (and userspace has 
> carefully mlocked the relevent memory).
> 
> On PowerPC though, the hypervisor runs in real mode (no MMU 
> translation).  
> Unlike x86, PowerPC exceptions arrive in real mode, and also 
> PowerPC does not force a TLB flush when switching between 
> real and virtual modes. So a virtual address is pretty much 
> worthless as a hypervisor parameter; performing the MMU 
> translation in software is infeasible.

I think I'd prefer to hide all of this by co-operation between the
kernel and the hypervisor's copy to/from user.

The kernel can easily translate a virtual address and length into a list
of psuedo-phyiscal frame numbers and initial offset. Xen's copy from
user function can then use this list when doing its work. 

Ian


> Although it rarely passes parameters by pointer, the way the 
> pSeries hypervisor handles this is having the kernel always 
> pass a "pseudo-physical" 
> address (to borrow Xen terminology), which is trivially 
> translatable to a "machine" address in the hypervisor. The 
> processor has some notion of a large (e.g. 64M) chunk of 
> contiguous machine memory, so the hypervisor keeps a table of 
> chunks which can be used to translate pseudo-physical addresses.
> 
> Of course, userspace doesn't know psuedo-physical addresses, 
> only the kernel does. So one way or another, to pass 
> parameters by pointer to the PPC hypervisor, the kernel is 
> going to need to translate them. That also means userspace 
> memory areas will be limited to one page (since virtually 
> consecutive pages may not be representable by a single 
> pseudo-physical address).
> 
> If we're stuck with structure addresses in hypercalls, one 
> possible solution is to modify libxc so that all parameter 
> addresses are physical pointers within the same page, then 
> pass that page's physical address into the hypercall. 
> Something like this:
> 
> ulong magicpage_vaddr;
> ulong magicpage_paddr;
> 
> libxc_init() {
> #ifdef __powerpc__
> 	posix_memalign(&magicpage_vaddr, PAGE_SIZE, PAGE_SIZE);
> 	mlock(magicpage_vaddr);
> 	magicpage_paddr = new_translate_syscall(magicpage_vaddr);
> #endif
> 	...
> }
> 
> xc_get_pfn_list() {
> 	dom0_op_t *op;
> 	ulong op_paddr;
> 	magicalloc(&op, &op_paddr, sizeof(dom0_op_t));
> 	...
> }
> 
> #ifdef __powerpc__
> magicalloc(ulong &usable_addr, ulong &hcall_addr, int bytes) {
> 	*usable_addr = magicpage_vaddr + offset;
> 	*hcall_addr = magicpage_paddr + offset;
> 	offset += bytes;
> }
> 
> do_xen_hypercall(ptr) {
> 	ptr -= magicpage_vaddr - magicpage_paddr;
> 	do_privcmd(..., ptr);
> }
> #endif
> 
> (Note that this is for discussion only, not a proposed interface.)
> 
> Each architecture would provide their own magicalloc and 
> do_xen_hypercall, and for x86 magicalloc would be 
> malloc+mlock and both pointers are the same. x86 
> do_xen_hypercall would remain unchanged. Basically, any 
> current use of mlock in libxc would be replaced with calls to 
> magicalloc.
> 
> For example, if we're willing to change the embedded pointers 
> in dom0_ops to offsets, we do not need to invent a new 
> "translate" system call.
> 
> Other suggestions are welcome.
> 
> --
> Hollis Blanchard
> IBM Linux Technology Center
> 
> _______________________________________________
> Xen-devel mailing list
> Xen-devel@lists.xensource.com
> http://lists.xensource.com/xen-devel
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: passing hypercall parameters by pointer
@ 2005-08-19 12:41 Ian Pratt
  0 siblings, 0 replies; 21+ messages in thread
From: Ian Pratt @ 2005-08-19 12:41 UTC (permalink / raw)
  To: Keir Fraser, Jimi Xenidis; +Cc: Tian, Kevin, xen-devel

> This is all potentially fixable before 3.0 final. Paravirt 
> x86 can continue to use guest virtual addresses. The idea 
> would be that the registration scheme would essentially 
> create a parameter-passing 'address space' into which you 
> hook pages of memory. On x86 we would map the address space 
> onto regions of kernel va space. On other arches we would map 
> the address space onto physical addresses that get mapped 
> into Xen's va space. 
> get_user/put_user/copy_from_user/copy_to_user will take guest 
> addresses that point into this parameter-passing address space.
> 
> At least we can scope it out by doing a few hypercalls to 
> start with -- probably dom0_ops first and see how it pans 
> out. I think it will work quite well...

I'd be inclined to first go after the ops that are needed for the
paravirtualized drivers (mem_op, grantab_op). Perhaps people could post
a few patch examples for dicsussion?

NB: This in no way represents a commitment to get this into 3.0-final.
Let's have a look at the patches and decide.

[Right now, anything that isn't fixing bugs or sorting out xenbus/tools
is actually a distraction]

Ian

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: passing hypercall parameters by pointer
@ 2005-08-19 11:34 Ian Pratt
  2005-08-19 11:52 ` Jimi Xenidis
  2005-08-19 12:20 ` Keir Fraser
  0 siblings, 2 replies; 21+ messages in thread
From: Ian Pratt @ 2005-08-19 11:34 UTC (permalink / raw)
  To: Keir Fraser, Jimi Xenidis; +Cc: Tian, Kevin, xen-devel

> The current mlock() scheme in libxc is screwed anyway -- we 
> mlock/munlock regions that may overlap at page granularity. 
> Fixing this would lead naturally to a preallocation scheme.

That's a very good point. For the moment, we should remove all the
munlock() calls for safety. The amount of unnecessary memory we'll end
up pinning will be tiny, so we shouldn't worry about it.

Post 3.0 we can completely redo the dom0 op interface, but the rest of
the hypercall interface will have to remain backward compatible, at
least for x86_*. Since passing by VA is so convenient on the
architectures that support it we may not want to do anything different
on these anyhow.

For VT paravirt drivers I think pre-registration will work fine. The set
of hypercalls we need to support is small anyhow.

Ian

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: passing hypercall parameters by pointer
@ 2005-08-18  6:56 Tian, Kevin
  2005-08-18 15:58 ` Hollis Blanchard
  0 siblings, 1 reply; 21+ messages in thread
From: Tian, Kevin @ 2005-08-18  6:56 UTC (permalink / raw)
  To: Hollis Blanchard, Ian Pratt; +Cc: Jimi Xenidis, xen-devel

>From: Hollis Blanchard
>Sent: Thursday, August 18, 2005 6:05 AM
>        case DOM0_GETMEMLIST:
>            op->u.getmemlist.buffer =
virt_to_phys(op->u.getmemlist.buffer);
>            break;

If following Ian's suggestion, you have to create a list of pfn here
instead of only converting start address. There's no guaranty that the
buffer is limited in one page. ;-)

Thanks,
Kevin
>        case DOM0_SETDOMAININFO:
>            ...
>        case DOM0_READCONSOLE:
>            ...
>        }
>    }
>    break;
>    }
>
>Right now the kernel doesn't peer inside the hypercall structures at
all.
>
>--
>Hollis Blanchard
>IBM Linux Technology Center
>
>_______________________________________________
>Xen-devel mailing list
>Xen-devel@lists.xensource.com
>http://lists.xensource.com/xen-devel

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: passing hypercall parameters by pointer
@ 2005-08-18  6:56 Tian, Kevin
  0 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2005-08-18  6:56 UTC (permalink / raw)
  To: Hollis Blanchard, Sharma, Arun
  Cc: Jimi Xenidis, Ian Pratt, xen-devel, Yu, Ke, Ling, Xiaofeng

>From: Hollis Blanchard
>Sent: Thursday, August 18, 2005 6:11 AM
>
>I have no answer for parameters that are very large, but I wonder how
many
>cases there are. For example, DOM0_READCONSOLE could just be limited
>to 4KB
>reads, and if there's more data than that, call it again. Perhaps there
is
>some case-specific solution to xc_get_pfn_list() as well.
>

If one hypercall wants to get specific context at one point atomically,
"call it again" several times actually returns mixed contexts belonging
to different time points. That's not desired. Even if people want to add
atomic protection for such type of case, performance will be affected a
lot and more risk to suffer dead-lock.

Thanks,
Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: passing hypercall parameters by pointer
@ 2005-08-18  6:56 Tian, Kevin
  0 siblings, 0 replies; 21+ messages in thread
From: Tian, Kevin @ 2005-08-18  6:56 UTC (permalink / raw)
  To: Ian Pratt, Hollis Blanchard, xen-devel; +Cc: Jimi Xenidis

>From: Ian Pratt
>Sent: Thursday, August 18, 2005 4:44 AM
>> On PowerPC though, the hypervisor runs in real mode (no MMU 
>> translation).  
>> Unlike x86, PowerPC exceptions arrive in real mode, and also 
>> PowerPC does not force a TLB flush when switching between 
>> real and virtual modes. So a virtual address is pretty much 
>> worthless as a hypervisor parameter; performing the MMU 
>> translation in software is infeasible.
>
>I think I'd prefer to hide all of this by co-operation between the
>kernel and the hypervisor's copy to/from user.
>
>The kernel can easily translate a virtual address and length into a
list
>of psuedo-phyiscal frame numbers and initial offset. Xen's copy from
>user function can then use this list when doing its work.
>
>Ian
>

So this is a common concern for hypervisor residing in a different
address space as guest. For PowerPC, it's real mode (hypervisor) VS
virtual mode (guest). For vmx domain, hypervisor has its own monitor
page table separated from shadow page table. Expect the final solution
to be uniform too. ;-)

See if I understand your suggestion closely here. Previous Xiaofeng's
patch has following flow when accessing guest address space:
---hypervisor---
- Search gva in guest page table to get pfn
- Get mfn by pfn
- map mfn into hypervisor's space
- Then directly access the new va'

Then your suggestion is to make gva->pfn search happening in guest. And
hypervisor will still have rest steps to manipulate monitor page table
first and then access new va'. (PowerPC will access mfn directly).
Finally in either option, copy_from/to_user becomes a memcpy to a new
va' without exception happening.

Now, question comes out. The pseudo-physical frame number list itself
also presents as a parameter to hypervisor, and there's no promise that
this list will be confined in single page. You also need extra info in
this list if multiple parameters are pointers. How to access this
scalable list effectively seems to be same puzzle as the subject. For
x86 people may set a maximum limitation, but how about 64bit platform?
Good example is always get_pfn_list, which always breaks assumption for
size of parameter. ;-)

Thanks,
Kevin

^ permalink raw reply	[flat|nested] 21+ messages in thread
* RE: passing hypercall parameters by pointer
@ 2005-08-18  0:47 Ling, Xiaofeng
  0 siblings, 0 replies; 21+ messages in thread
From: Ling, Xiaofeng @ 2005-08-18  0:47 UTC (permalink / raw)
  To: Sharma, Arun, Ian Pratt; +Cc: xen-devel, Yu, Ke



Arun Sharma <mailto:arun.sharma@intel.com> wrote:
> Ian Pratt wrote:
> The other alternative (which we talked about at OLS) is to use a
> couple of pinned pages for parameter passing - but it doesn't work
> very well for:  
> 
> a) Multiple levels of structures/pointers
A good example is do_multicall.
A complete implementation need to enum all the hypercall and 
try to deal with each hypercall if it uses points.

> b) Arguments which may be bigger than a couple of pages
> (xc_get_pfn_list() for a bigmem domain for example).
> 
> 	-Arun

^ permalink raw reply	[flat|nested] 21+ messages in thread
* passing hypercall parameters by pointer
@ 2005-08-17 19:51 Hollis Blanchard
  0 siblings, 0 replies; 21+ messages in thread
From: Hollis Blanchard @ 2005-08-17 19:51 UTC (permalink / raw)
  To: xen-devel; +Cc: Jimi Xenidis

Many Xen hypercalls pass mlocked pointers as parameters for both input and 
output. For example, xc_get_pfn_list() is a nice one with multiple levels of 
structures/mlocking.

Considering just the tools for the moment, those pointers are userspace 
addresses. Ultimately the hypervisor ends up with that userspace address, from 
which it reads and writes data. This is OK for x86, since userspace, kernel, 
and hypervisor all share the same virtual address space (and userspace has 
carefully mlocked the relevent memory).

On PowerPC though, the hypervisor runs in real mode (no MMU translation).  
Unlike x86, PowerPC exceptions arrive in real mode, and also PowerPC does not 
force a TLB flush when switching between real and virtual modes. So a virtual 
address is pretty much worthless as a hypervisor parameter; performing the 
MMU translation in software is infeasible.

Although it rarely passes parameters by pointer, the way the pSeries 
hypervisor handles this is having the kernel always pass a "pseudo-physical" 
address (to borrow Xen terminology), which is trivially translatable to a 
"machine" address in the hypervisor. The processor has some notion of a large 
(e.g. 64M) chunk of contiguous machine memory, so the hypervisor keeps a 
table of chunks which can be used to translate pseudo-physical addresses.

Of course, userspace doesn't know psuedo-physical addresses, only the kernel 
does. So one way or another, to pass parameters by pointer to the PPC 
hypervisor, the kernel is going to need to translate them. That also means  
userspace memory areas will be limited to one page (since virtually 
consecutive pages may not be representable by a single pseudo-physical 
address).

If we're stuck with structure addresses in hypercalls, one possible solution 
is to modify libxc so that all parameter addresses are physical pointers 
within the same page, then pass that page's physical address into the 
hypercall. Something like this:

ulong magicpage_vaddr;
ulong magicpage_paddr;

libxc_init() {
#ifdef __powerpc__
	posix_memalign(&magicpage_vaddr, PAGE_SIZE, PAGE_SIZE);
	mlock(magicpage_vaddr);
	magicpage_paddr = new_translate_syscall(magicpage_vaddr);
#endif
	...
}

xc_get_pfn_list() {
	dom0_op_t *op;
	ulong op_paddr;
	magicalloc(&op, &op_paddr, sizeof(dom0_op_t));
	...
}

#ifdef __powerpc__
magicalloc(ulong &usable_addr, ulong &hcall_addr, int bytes) {
	*usable_addr = magicpage_vaddr + offset;
	*hcall_addr = magicpage_paddr + offset;
	offset += bytes;
}

do_xen_hypercall(ptr) {
	ptr -= magicpage_vaddr - magicpage_paddr;
	do_privcmd(..., ptr);
}
#endif

(Note that this is for discussion only, not a proposed interface.)

Each architecture would provide their own magicalloc and do_xen_hypercall, and 
for x86 magicalloc would be malloc+mlock and both pointers are the same. x86 
do_xen_hypercall would remain unchanged. Basically, any current use of mlock 
in libxc would be replaced with calls to magicalloc.

For example, if we're willing to change the embedded pointers in dom0_ops to 
offsets, we do not need to invent a new "translate" system call.

Other suggestions are welcome.

-- 
Hollis Blanchard
IBM Linux Technology Center

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2005-08-19 15:31 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2005-08-17 20:44 passing hypercall parameters by pointer Ian Pratt
     [not found] ` <mailman.1124311483.4826@unix-os.sc.intel.com>
2005-08-17 21:07   ` Arun Sharma
2005-08-17 22:11     ` Hollis Blanchard
2005-08-17 22:04 ` Hollis Blanchard
  -- strict thread matches above, loose matches on Subject: below --
2005-08-19 12:41 Ian Pratt
2005-08-19 11:34 Ian Pratt
2005-08-19 11:52 ` Jimi Xenidis
2005-08-19 12:17   ` Keir Fraser
2005-08-19 13:57     ` Hollis Blanchard
2005-08-19 14:35       ` Keir Fraser
2005-08-19 15:18         ` Hollis Blanchard
2005-08-19 15:31           ` Keir Fraser
2005-08-19 12:20 ` Keir Fraser
2005-08-18  6:56 Tian, Kevin
2005-08-18 15:58 ` Hollis Blanchard
2005-08-19  2:00   ` Jimi Xenidis
2005-08-19 10:32     ` Keir Fraser
2005-08-18  6:56 Tian, Kevin
2005-08-18  6:56 Tian, Kevin
2005-08-18  0:47 Ling, Xiaofeng
2005-08-17 19:51 Hollis Blanchard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.