Interface to enable in-kernel hcall handling

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

* Interface to enable in-kernel hcall handling
@ 2013-11-16  8:59 Paul Mackerras
  2013-11-18 21:31 ` Alexander Graf
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Mackerras @ 2013-11-16  8:59 UTC (permalink / raw)
  To: kvm-ppc, kvm

I have been thinking about adding an interface to PPC KVM's PAPR
emulation to allow userspace to control whether or not individual
hypercalls or groups of hypercalls get handled in the kernel
(vs. being passed up to userspace to be handled there).

I can think of a couple of possible interfaces, differing in how the
set of hypercalls to be enabled/disabled is specified.  In each case I
envisage a new VM ioctl which takes an argument specifying which
hypercalls to enable, and possibly another VM ioctl to disable some or
all hypercalls.

One is to use the string defined in PAPR for the group of hypercalls.
This is the string that gets included in the ibm,hypertas-functions
property in the /rtas node of the device tree to indicate to the guest
that the group of hypercalls is available to it, for example,
"hcall-pft" for H_ENTER, H_REMOVE, etc., "hcall-tce" for H_PUT_TCE,
H_GET_TCE and friends, and so on.  This way, userspace can iterate
through the strings in the ibm,hypertas-functions property and call
the enable-hypercall ioctl for each one.

The second is to pass the individual hypercall number and do them one
by one.  The problem with this one is that it may not make sense to
have some of the hypercalls in a related group handled in the kernel
and others in userspace.

The third is to pass a bitmap with one bit per possible hypercall.

Any thoughts/opinions on the relative merits of these ideas?

Paul.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Interface to enable in-kernel hcall handling
  2013-11-16  8:59 Interface to enable in-kernel hcall handling Paul Mackerras
@ 2013-11-18 21:31 ` Alexander Graf
  2013-11-19  1:02   ` Paul Mackerras
  0 siblings, 1 reply; 4+ messages in thread
From: Alexander Graf @ 2013-11-18 21:31 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm-ppc, kvm@vger.kernel.org mailing list


On 16.11.2013, at 03:59, Paul Mackerras <paulus@samba.org> wrote:

> I have been thinking about adding an interface to PPC KVM's PAPR
> emulation to allow userspace to control whether or not individual
> hypercalls or groups of hypercalls get handled in the kernel
> (vs. being passed up to userspace to be handled there).
> 
> I can think of a couple of possible interfaces, differing in how the
> set of hypercalls to be enabled/disabled is specified.  In each case I
> envisage a new VM ioctl which takes an argument specifying which
> hypercalls to enable, and possibly another VM ioctl to disable some or
> all hypercalls.
> 
> One is to use the string defined in PAPR for the group of hypercalls.
> This is the string that gets included in the ibm,hypertas-functions
> property in the /rtas node of the device tree to indicate to the guest
> that the group of hypercalls is available to it, for example,
> "hcall-pft" for H_ENTER, H_REMOVE, etc., "hcall-tce" for H_PUT_TCE,
> H_GET_TCE and friends, and so on.  This way, userspace can iterate
> through the strings in the ibm,hypertas-functions property and call
> the enable-hypercall ioctl for each one.
> 
> The second is to pass the individual hypercall number and do them one
> by one.  The problem with this one is that it may not make sense to
> have some of the hypercalls in a related group handled in the kernel
> and others in userspace.
> 
> The third is to pass a bitmap with one bit per possible hypercall.
> 
> Any thoughts/opinions on the relative merits of these ideas?

I think either way works. I personally like the string variant the least, as it means we have to parse strings in the kernel. The question whether user space thinks it makes sense to only intercept groups versus individual hypercalls IMHO is not up to us. Maybe user space wants to accelerate H_GET_TCE, but intercept H_PUT_TCE to do magic in the background.


Alex

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Interface to enable in-kernel hcall handling
  2013-11-18 21:31 ` Alexander Graf
@ 2013-11-19  1:02   ` Paul Mackerras
  2013-11-19 10:18     ` Alexander Graf
  0 siblings, 1 reply; 4+ messages in thread
From: Paul Mackerras @ 2013-11-19  1:02 UTC (permalink / raw)
  To: Alexander Graf; +Cc: kvm-ppc, kvm@vger.kernel.org mailing list

On Mon, Nov 18, 2013 at 04:31:39PM -0500, Alexander Graf wrote:
> 
> On 16.11.2013, at 03:59, Paul Mackerras <paulus@samba.org> wrote:
> 
> > I have been thinking about adding an interface to PPC KVM's PAPR
> > emulation to allow userspace to control whether or not individual
> > hypercalls or groups of hypercalls get handled in the kernel
> > (vs. being passed up to userspace to be handled there).
> > 
> > I can think of a couple of possible interfaces, differing in how the
> > set of hypercalls to be enabled/disabled is specified.  In each case I
> > envisage a new VM ioctl which takes an argument specifying which
> > hypercalls to enable, and possibly another VM ioctl to disable some or
> > all hypercalls.
> > 
> > One is to use the string defined in PAPR for the group of hypercalls.
> > This is the string that gets included in the ibm,hypertas-functions
> > property in the /rtas node of the device tree to indicate to the guest
> > that the group of hypercalls is available to it, for example,
> > "hcall-pft" for H_ENTER, H_REMOVE, etc., "hcall-tce" for H_PUT_TCE,
> > H_GET_TCE and friends, and so on.  This way, userspace can iterate
> > through the strings in the ibm,hypertas-functions property and call
> > the enable-hypercall ioctl for each one.
> > 
> > The second is to pass the individual hypercall number and do them one
> > by one.  The problem with this one is that it may not make sense to
> > have some of the hypercalls in a related group handled in the kernel
> > and others in userspace.
> > 
> > The third is to pass a bitmap with one bit per possible hypercall.
> > 
> > Any thoughts/opinions on the relative merits of these ideas?
> 
> I think either way works. I personally like the string variant the least, as it means we have to parse strings in the kernel. The question whether user space thinks it makes sense to only intercept groups versus individual hypercalls IMHO is not up to us. Maybe user space wants to accelerate H_GET_TCE, but intercept H_PUT_TCE to do magic in the background.

The problem with splitting a group of related hypercalls between
kernel and userspace tends to be locking.  Generally there would be
some data structure that is accessed by the hypercalls in the group.
If all the hypercalls in the group are implemented in one place then
you can manage concurrent access using the usual primitives (spinlocks
or pthread mutexes, typically).  But if some are done in one place and
some in another then the locking gets way more complex.  I'd prefer to
avoid that extra complexity.

Paul.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: Interface to enable in-kernel hcall handling
  2013-11-19  1:02   ` Paul Mackerras
@ 2013-11-19 10:18     ` Alexander Graf
  0 siblings, 0 replies; 4+ messages in thread
From: Alexander Graf @ 2013-11-19 10:18 UTC (permalink / raw)
  To: Paul Mackerras; +Cc: kvm-ppc, kvm@vger.kernel.org mailing list


On 19.11.2013, at 02:02, Paul Mackerras <paulus@samba.org> wrote:

> On Mon, Nov 18, 2013 at 04:31:39PM -0500, Alexander Graf wrote:
>> 
>> On 16.11.2013, at 03:59, Paul Mackerras <paulus@samba.org> wrote:
>> 
>>> I have been thinking about adding an interface to PPC KVM's PAPR
>>> emulation to allow userspace to control whether or not individual
>>> hypercalls or groups of hypercalls get handled in the kernel
>>> (vs. being passed up to userspace to be handled there).
>>> 
>>> I can think of a couple of possible interfaces, differing in how the
>>> set of hypercalls to be enabled/disabled is specified.  In each case I
>>> envisage a new VM ioctl which takes an argument specifying which
>>> hypercalls to enable, and possibly another VM ioctl to disable some or
>>> all hypercalls.
>>> 
>>> One is to use the string defined in PAPR for the group of hypercalls.
>>> This is the string that gets included in the ibm,hypertas-functions
>>> property in the /rtas node of the device tree to indicate to the guest
>>> that the group of hypercalls is available to it, for example,
>>> "hcall-pft" for H_ENTER, H_REMOVE, etc., "hcall-tce" for H_PUT_TCE,
>>> H_GET_TCE and friends, and so on.  This way, userspace can iterate
>>> through the strings in the ibm,hypertas-functions property and call
>>> the enable-hypercall ioctl for each one.
>>> 
>>> The second is to pass the individual hypercall number and do them one
>>> by one.  The problem with this one is that it may not make sense to
>>> have some of the hypercalls in a related group handled in the kernel
>>> and others in userspace.
>>> 
>>> The third is to pass a bitmap with one bit per possible hypercall.
>>> 
>>> Any thoughts/opinions on the relative merits of these ideas?
>> 
>> I think either way works. I personally like the string variant the least, as it means we have to parse strings in the kernel. The question whether user space thinks it makes sense to only intercept groups versus individual hypercalls IMHO is not up to us. Maybe user space wants to accelerate H_GET_TCE, but intercept H_PUT_TCE to do magic in the background.
> 
> The problem with splitting a group of related hypercalls between
> kernel and userspace tends to be locking.  Generally there would be
> some data structure that is accessed by the hypercalls in the group.
> If all the hypercalls in the group are implemented in one place then
> you can manage concurrent access using the usual primitives (spinlocks
> or pthread mutexes, typically).  But if some are done in one place and
> some in another then the locking gets way more complex.  I'd prefer to
> avoid that extra complexity.

I agree, but I don't see how the two things are related. If we allow user space to actually emulate a hypercall, we need to make sure that the locking bits work good enough to at least prevent privilege escalation either way:

Imagine I want to trap H_PUT_TCE, but not H_GET_TCE. I would use the migration protocol to inject TCE entries when I trap to user space. Locking here needs to work regardless of whether it happens on H_PUT_TCE or any other guest triggered code path.

So the worst case that can happen by not handling things in-kernel in group granularities is that user space could shoot itself in the foot because it simply can't emulate that functionality. I don't see why we'd have to care.

I'd rather keep the interface flexible enough to allow weird use cases rather than try to be too smart and keep people from doing fun things :).


Alex


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2013-11-19 10:18 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2013-11-16  8:59 Interface to enable in-kernel hcall handling Paul Mackerras
2013-11-18 21:31 ` Alexander Graf
2013-11-19  1:02   ` Paul Mackerras
2013-11-19 10:18     ` Alexander Graf

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox