Re: [PATCH 1/2][RFC][v2] kvm: Batch writes to MMIO

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Laurent Vivier <Laurent.Vivier@bull.net>
To: Avi Kivity <avi@qumranet.com>
Cc: kvm@vger.kernel.org, Hollis Blanchard <hollisb@us.ibm.com>,
	"Zhang, Xiantao" <xiantao.zhang@intel.com>
Subject: Re: [PATCH 1/2][RFC][v2] kvm: Batch writes to MMIO
Date: Mon, 19 May 2008 10:36:30 +0200	[thread overview]
Message-ID: <1211186190.6146.6.camel@frecb07144> (raw)
In-Reply-To: <482FDD1E.2090304@qumranet.com>

Thank you for your comments.

I agree with all.

I'll move my code to mulator_write_emulated_onepage().
I'll use kvm_io_device mechanism: is there an existing ioctl() to
register MMIO zone ?
I'll add a KVM_CAP_

Regards,
Laurent

Le dimanche 18 mai 2008 à 10:39 +0300, Avi Kivity a écrit :
> [resend to new list]
> 
> Laurent Vivier wrote:
> > This patch is the kernel part of the "batch writes to MMIO" patch.
> >
> > It intoduces the ioctl interface to define MMIO zone it is allowed to delay.
> > Inside a zone, we can define sub-part we must not delay.
> >
> > If an MMIO can be delayed, it is stored in a ring buffer which common for all VCPUs.
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index dab3d4f..930986b 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1518,6 +1518,103 @@ out:
> >  	return r;
> >  }
> >  
> > +static struct kvm_delayed_mmio_zone *kvm_mmio_find_zone(struct kvm *kvm,
> > +							u64 addr, u32 size)
> >   
> 
> gpa_t addr
> 
> > +{
> > +	int i;
> > +	struct kvm_delayed_mmio_zone *zone;
> > +
> > +	for (i = 0; i < kvm->arch.nb_mmio_zones; i++) {
> > +		zone = &kvm->arch.mmio_zone[i];
> > +
> > +		/* (addr,size) is fully included in
> > +		 * (zone->addr, zone->size)
> > +		 */
> > +
> > +		if (zone->addr <= addr &&
> > +		    addr + size <= zone->addr + zone->size)
> > +			return zone;
> > +	}
> > +	return NULL;
> > +}
> > +
> > +static struct kvm_excluded_mmio_zone *
> > +kvm_mmio_find_excluded(struct kvm_delayed_mmio_zone *zone, u64 addr, u32 size)
> > +{
> > +	static struct kvm_excluded_mmio_zone *excluded;
> > +	int i;
> > +
> > +	addr -= zone->addr;
> > +	for (i = 0; i < zone->nb_excluded_zones; i++) {
> > +		excluded = &zone->excluded[i];
> > +
> > +		if ((excluded->offset <= addr &&
> > +		     addr < excluded->offset + excluded->size) ||
> > +		     (excluded->offset < addr + size &&
> > +		      addr + size <= excluded->offset +
> > +				    excluded->size))
> > +			return excluded;
> > +	}
> > +	return NULL;
> > +}
> > +
> >   
> 
> Instead of included/excluded zones (with the default excluded), I
> suggest having just included zones (with the default excluded).  If the
> user specifies an excluded zone within an included zone, simply split
> the included zone into two:
> 
>   [100, 200) + ^[150, 160) = [100, 150) + [160, 200)
> 
> Or maybe, push all this logic to userspace and only allow adding include
> zones (and clearing previously set include zones).
> 
> > +static int kvm_vm_ioctl_set_mmio(struct kvm *kvm,
> > +				 struct kvm_mmio_zone *zone)
> >   
> 
> "mmio" is too generic a name.  Please use delayed mmio or batched mmio
> consistently.
> 
> > +{
> > +	struct kvm_delayed_mmio_zone *z;
> > +
> > +	if (zone->is_delayed &&
> > +	    kvm->arch.nb_mmio_zones >= KVM_MAX_DELAYED_MMIO_ZONE)
> > +		return -ENOMEM;
> >   
> 
> -ENOBUFS.  -ENOMEM is generally not recoverable, ENOBUFS means just this
> specific operation failed.
> 
> > +
> > +	if (zone->is_delayed) {
> > +
> > +		/* already defined ? */
> > +
> > +		if (kvm_mmio_find_zone(kvm, zone->addr, 1) ||
> > +		    kvm_mmio_find_zone(kvm, zone->addr + zone->size - 1, 1))
> > +			return 0;
> > +
> > +		z = &kvm->arch.mmio_zone[kvm->arch.nb_mmio_zones];
> > +		z->addr = zone->addr;
> > +		z->size = zone->size;
> > +		kvm->arch.nb_mmio_zones++;
> > +		return 0;
> > +	}
> > +
> > +	/* exclude some parts of the delayed MMIO zone */
> > +
> > +	z = kvm_mmio_find_zone(kvm, zone->addr, zone->size);
> > +	if (z == NULL)
> > +		return -EINVAL;
> > +
> > +	if (z->nb_excluded_zones >= KVM_MAX_EXCLUDED_MMIO_ZONE)
> > +		return -ENOMEM;
> > +
> > +	if (kvm_mmio_find_excluded(z, zone->addr, 1) ||
> > +	    kvm_mmio_find_excluded(z, zone->addr + zone->size - 1, 1))
> > +		return 0;
> > +
> > +	z->excluded[z->nb_excluded_zones].offset = zone->addr - z->addr;
> > +	z->excluded[z->nb_excluded_zones].size = zone->size;
> > +	z->nb_excluded_zones++;
> > +
> > +	return 0;
> > +}
> > +
> >   
> 
> The whole thing needs locking against concurrent calls and against users
> of the structure.
> 
> >  
> > +static int batch_mmio(struct kvm_vcpu *vcpu)
> > +{
> > +	struct kvm_batch *batch = vcpu->kvm->arch.batch;
> > +	spinlock_t *lock = &vcpu->kvm->arch.batch_lock;
> > +	int next;
> >   
> 
> Not sure a new lock is warranted. I think we can reuse kvm->lock.
> 
> > +
> > +	/* check if this MMIO can be delayed */
> > +
> > +	if (!kvm_is_delayed_mmio(vcpu->kvm,
> > +				 vcpu->mmio_phys_addr, vcpu->mmio_size))
> > +		return 0;
> >   
> 
> This check is performed without any locks.
> 
> > +
> >  static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> >  {
> >  	int r;
> > @@ -2857,6 +3012,11 @@ again:
> >  			goto again;
> >  	}
> >  
> > +	if (!r &&
> > +	    vcpu->mmio_is_write && kvm_run->exit_reason == KVM_EXIT_MMIO
> > +	    && !need_resched() && batch_mmio(vcpu))
> > +		goto again;
> > +
> >   
> 
> Better to do this in emulator_write_emulated_onepage(), similar to how
> local mmio is handled.
> 
> In fact, we can register batched mmio regions using the existing
> kvm_io_device mechanism.
> 
> >  
> > +#define KVM_MAX_DELAYED_MMIO_ZONE 10
> > +#define KVM_MAX_EXCLUDED_MMIO_ZONE 10
> >   
> 
> We'll want more, esp. as they're cheap.
> 
> >  
> >  /*
> >   * ioctls for vcpu fds
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 64ed402..c8f1bdf 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -824,6 +824,8 @@ static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> >  #ifdef CONFIG_X86
> >  	else if (vmf->pgoff == KVM_PIO_PAGE_OFFSET)
> >  		page = virt_to_page(vcpu->arch.pio_data);
> > +	else if (vmf->pgoff == KVM_MMIO_PAGE_OFFSET)
> > +		page = virt_to_page(vcpu->kvm->arch.batch);
> >  #endif
> >   
> 
> I expect this will be interesting for ppc and ia64.
> 
> I didn't see a KVM_CAP_ for this.
> 
-- 
------------- Laurent.Vivier@bull.net ---------------
"The best way to predict the future is to invent it."
- Alan Kay

next prev parent reply	other threads:[~2008-05-19  8:36 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2008-05-15 13:52 [PATCH 0/2][RFC][v2] Batch writes to MMIO Laurent Vivier
2008-05-15 13:52 ` [PATCH 1/2][RFC][v2] kvm: " Laurent Vivier
2008-05-15 13:52   ` [PATCH 2/2][RFC][v2] kvm-userspace: " Laurent Vivier
     [not found]   ` <482FDD1E.2090304@qumranet.com>
2008-05-19  8:36     ` Laurent Vivier [this message]
2008-05-19 10:21       ` [PATCH 1/2][RFC][v2] kvm: " Avi Kivity
2008-05-20  3:04       ` Hollis Blanchard
2008-05-20  7:39         ` Laurent Vivier
2008-05-20 15:43           ` Hollis Blanchard
2008-05-20 15:52             ` Laurent Vivier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1211186190.6146.6.camel@frecb07144 \
    --to=laurent.vivier@bull.net \
    --cc=avi@qumranet.com \
    --cc=hollisb@us.ibm.com \
    --cc=kvm@vger.kernel.org \
    --cc=xiantao.zhang@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox