From: Laurent Vivier <Laurent.Vivier@bull.net>
To: Avi Kivity <avi@qumranet.com>
Cc: kvm@vger.kernel.org, Hollis Blanchard <hollisb@us.ibm.com>,
"Zhang, Xiantao" <xiantao.zhang@intel.com>
Subject: Re: [PATCH 1/2][RFC][v2] kvm: Batch writes to MMIO
Date: Mon, 19 May 2008 10:36:30 +0200 [thread overview]
Message-ID: <1211186190.6146.6.camel@frecb07144> (raw)
In-Reply-To: <482FDD1E.2090304@qumranet.com>
Thank you for your comments.
I agree with all.
I'll move my code to mulator_write_emulated_onepage().
I'll use kvm_io_device mechanism: is there an existing ioctl() to
register MMIO zone ?
I'll add a KVM_CAP_
Regards,
Laurent
Le dimanche 18 mai 2008 à 10:39 +0300, Avi Kivity a écrit :
> [resend to new list]
>
> Laurent Vivier wrote:
> > This patch is the kernel part of the "batch writes to MMIO" patch.
> >
> > It intoduces the ioctl interface to define MMIO zone it is allowed to delay.
> > Inside a zone, we can define sub-part we must not delay.
> >
> > If an MMIO can be delayed, it is stored in a ring buffer which common for all VCPUs.
> >
> > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
> > index dab3d4f..930986b 100644
> > --- a/arch/x86/kvm/x86.c
> > +++ b/arch/x86/kvm/x86.c
> > @@ -1518,6 +1518,103 @@ out:
> > return r;
> > }
> >
> > +static struct kvm_delayed_mmio_zone *kvm_mmio_find_zone(struct kvm *kvm,
> > + u64 addr, u32 size)
> >
>
> gpa_t addr
>
> > +{
> > + int i;
> > + struct kvm_delayed_mmio_zone *zone;
> > +
> > + for (i = 0; i < kvm->arch.nb_mmio_zones; i++) {
> > + zone = &kvm->arch.mmio_zone[i];
> > +
> > + /* (addr,size) is fully included in
> > + * (zone->addr, zone->size)
> > + */
> > +
> > + if (zone->addr <= addr &&
> > + addr + size <= zone->addr + zone->size)
> > + return zone;
> > + }
> > + return NULL;
> > +}
> > +
> > +static struct kvm_excluded_mmio_zone *
> > +kvm_mmio_find_excluded(struct kvm_delayed_mmio_zone *zone, u64 addr, u32 size)
> > +{
> > + static struct kvm_excluded_mmio_zone *excluded;
> > + int i;
> > +
> > + addr -= zone->addr;
> > + for (i = 0; i < zone->nb_excluded_zones; i++) {
> > + excluded = &zone->excluded[i];
> > +
> > + if ((excluded->offset <= addr &&
> > + addr < excluded->offset + excluded->size) ||
> > + (excluded->offset < addr + size &&
> > + addr + size <= excluded->offset +
> > + excluded->size))
> > + return excluded;
> > + }
> > + return NULL;
> > +}
> > +
> >
>
> Instead of included/excluded zones (with the default excluded), I
> suggest having just included zones (with the default excluded). If the
> user specifies an excluded zone within an included zone, simply split
> the included zone into two:
>
> [100, 200) + ^[150, 160) = [100, 150) + [160, 200)
>
> Or maybe, push all this logic to userspace and only allow adding include
> zones (and clearing previously set include zones).
>
> > +static int kvm_vm_ioctl_set_mmio(struct kvm *kvm,
> > + struct kvm_mmio_zone *zone)
> >
>
> "mmio" is too generic a name. Please use delayed mmio or batched mmio
> consistently.
>
> > +{
> > + struct kvm_delayed_mmio_zone *z;
> > +
> > + if (zone->is_delayed &&
> > + kvm->arch.nb_mmio_zones >= KVM_MAX_DELAYED_MMIO_ZONE)
> > + return -ENOMEM;
> >
>
> -ENOBUFS. -ENOMEM is generally not recoverable, ENOBUFS means just this
> specific operation failed.
>
> > +
> > + if (zone->is_delayed) {
> > +
> > + /* already defined ? */
> > +
> > + if (kvm_mmio_find_zone(kvm, zone->addr, 1) ||
> > + kvm_mmio_find_zone(kvm, zone->addr + zone->size - 1, 1))
> > + return 0;
> > +
> > + z = &kvm->arch.mmio_zone[kvm->arch.nb_mmio_zones];
> > + z->addr = zone->addr;
> > + z->size = zone->size;
> > + kvm->arch.nb_mmio_zones++;
> > + return 0;
> > + }
> > +
> > + /* exclude some parts of the delayed MMIO zone */
> > +
> > + z = kvm_mmio_find_zone(kvm, zone->addr, zone->size);
> > + if (z == NULL)
> > + return -EINVAL;
> > +
> > + if (z->nb_excluded_zones >= KVM_MAX_EXCLUDED_MMIO_ZONE)
> > + return -ENOMEM;
> > +
> > + if (kvm_mmio_find_excluded(z, zone->addr, 1) ||
> > + kvm_mmio_find_excluded(z, zone->addr + zone->size - 1, 1))
> > + return 0;
> > +
> > + z->excluded[z->nb_excluded_zones].offset = zone->addr - z->addr;
> > + z->excluded[z->nb_excluded_zones].size = zone->size;
> > + z->nb_excluded_zones++;
> > +
> > + return 0;
> > +}
> > +
> >
>
> The whole thing needs locking against concurrent calls and against users
> of the structure.
>
> >
> > +static int batch_mmio(struct kvm_vcpu *vcpu)
> > +{
> > + struct kvm_batch *batch = vcpu->kvm->arch.batch;
> > + spinlock_t *lock = &vcpu->kvm->arch.batch_lock;
> > + int next;
> >
>
> Not sure a new lock is warranted. I think we can reuse kvm->lock.
>
> > +
> > + /* check if this MMIO can be delayed */
> > +
> > + if (!kvm_is_delayed_mmio(vcpu->kvm,
> > + vcpu->mmio_phys_addr, vcpu->mmio_size))
> > + return 0;
> >
>
> This check is performed without any locks.
>
> > +
> > static int __vcpu_run(struct kvm_vcpu *vcpu, struct kvm_run *kvm_run)
> > {
> > int r;
> > @@ -2857,6 +3012,11 @@ again:
> > goto again;
> > }
> >
> > + if (!r &&
> > + vcpu->mmio_is_write && kvm_run->exit_reason == KVM_EXIT_MMIO
> > + && !need_resched() && batch_mmio(vcpu))
> > + goto again;
> > +
> >
>
> Better to do this in emulator_write_emulated_onepage(), similar to how
> local mmio is handled.
>
> In fact, we can register batched mmio regions using the existing
> kvm_io_device mechanism.
>
> >
> > +#define KVM_MAX_DELAYED_MMIO_ZONE 10
> > +#define KVM_MAX_EXCLUDED_MMIO_ZONE 10
> >
>
> We'll want more, esp. as they're cheap.
>
> >
> > /*
> > * ioctls for vcpu fds
> > diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
> > index 64ed402..c8f1bdf 100644
> > --- a/virt/kvm/kvm_main.c
> > +++ b/virt/kvm/kvm_main.c
> > @@ -824,6 +824,8 @@ static int kvm_vcpu_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
> > #ifdef CONFIG_X86
> > else if (vmf->pgoff == KVM_PIO_PAGE_OFFSET)
> > page = virt_to_page(vcpu->arch.pio_data);
> > + else if (vmf->pgoff == KVM_MMIO_PAGE_OFFSET)
> > + page = virt_to_page(vcpu->kvm->arch.batch);
> > #endif
> >
>
> I expect this will be interesting for ppc and ia64.
>
> I didn't see a KVM_CAP_ for this.
>
--
------------- Laurent.Vivier@bull.net ---------------
"The best way to predict the future is to invent it."
- Alan Kay
next prev parent reply other threads:[~2008-05-19 8:36 UTC|newest]
Thread overview: 9+ messages / expand[flat|nested] mbox.gz Atom feed top
2008-05-15 13:52 [PATCH 0/2][RFC][v2] Batch writes to MMIO Laurent Vivier
2008-05-15 13:52 ` [PATCH 1/2][RFC][v2] kvm: " Laurent Vivier
2008-05-15 13:52 ` [PATCH 2/2][RFC][v2] kvm-userspace: " Laurent Vivier
[not found] ` <482FDD1E.2090304@qumranet.com>
2008-05-19 8:36 ` Laurent Vivier [this message]
2008-05-19 10:21 ` [PATCH 1/2][RFC][v2] kvm: " Avi Kivity
2008-05-20 3:04 ` Hollis Blanchard
2008-05-20 7:39 ` Laurent Vivier
2008-05-20 15:43 ` Hollis Blanchard
2008-05-20 15:52 ` Laurent Vivier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=1211186190.6146.6.camel@frecb07144 \
--to=laurent.vivier@bull.net \
--cc=avi@qumranet.com \
--cc=hollisb@us.ibm.com \
--cc=kvm@vger.kernel.org \
--cc=xiantao.zhang@intel.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox