Re: KVM in-kernel APIC update

public inbox for kvm@vger.kernel.org
 help / color / mirror / Atom feed

From: Avi Kivity <avi-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
To: Gregory Haskins <ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
Cc: kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org
Subject: Re: KVM in-kernel APIC update
Date: Wed, 04 Apr 2007 10:40:54 +0300	[thread overview]
Message-ID: <46135686.4090905@qumranet.com> (raw)
In-Reply-To: <46128F80.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>

Gregory Haskins wrote:
> My current thoughts are that we at least move the IOAPIC into the kernel as well.  That will give sufficient control to generate ISA bus interrupts for guests that understand APICs.  If we want to be able to generate ISA interrupts for legacy guests which talk to the 8259s that will prove to be insufficient.  The good news is that moving the 8259s down as well is probably not a huge deal either, especially since I have already prepped the usermode side.  Thoughts?
>   

I would avoid moving down anything that's not strictly necessary.  If we 
want to keep the PIC in qemu, for example, just export the APIC-PIC 
interface to qemu.

I still don't have an opinion as to whether it is necessary; I'll need 
to study the details.  Xen pushes most of the platform into the 
hypervisor, but then Xen's communication path to qemu is much more 
expensive (involving the scheduler and a potential cpu switch) than 
kvm.  We'd need to balance possible performance improvements (I'd expect 
negligible) and interface simplification (possibly non-negligible) 
against further diverging from qemu.


> So heres a question for you guys out there.  What is the expected use of the in-kernel APIC?  My interests lie in the ability to send IPIs for SMP, as well as being able to inject asynchronous hypercall interrupts.  I assume there are other reasons too, such as PV device interrupts, etc and I would like to make sure I am seeing the big picture before making any bad design decisions.  My question is, how do we expect the PV devices to look from a bus perspective?  
>
> The current Bochs/QEMU system model paints a fairly simple ISA architecture utilizing a single IOAPIC + dual 8259 setup.  Do we expect in-kernel injected IRQs to follow the ISA model (e.g. either legacy or PCI interrupts only limited to IRQ0-15) or do we want to expand on this?  The PCI hypercall device introduced a while back would be an example of something ISA based.  Alternatives would be to utilize unused "pins" (such as IRQ16-23) on IOAPIC #0, or introducing new an entirely new bus/IOAPICs just for KVM, etc.  
>   

There are two extreme models, which I think are both needed.  On one 
end, support for closed OSes (e.g. Windows) requires fairly strict 
conformance to the PCI model, which means going through the IOAPIC or 
PIC or however the interrupt lines are wired in qemu. This seems to 
indicate that an in-kernel IOAPIC is needed.  On the other end (Linux), 
a legacy-free and emulation-free device can just inject interrupts 
directly and use shared memory to ack interrupts and indicate their source.

> If the latter, we also need to decide what the resource conveyance model and vector allocation policy should be.  For instance, do we publish said resources formally in the MP/ACPI tables in Bochs?  Doing so would allow MP/ACPI compliant OSs like linux to naturally route the IRQ.  Conversely, do we do something more direct just like we do for KVM discovery via wrmsr?
>   

I think we can go the direct route for cooperative guests.

I also suggest doing the work in stages and measuring; that is, first 
push the local apic, determine what the remaining bottlenecks are and 
tackle them.  I'm pretty sure that Linux guests would only require the 
local apic but Windows (and older Linux kernels) might require more.

>  struct kvm_vcpu;
>  
> +struct kvm_irqinfo {
> +	int         vector;
> +	int         nmi;
> +};
> +
> +#define KVM_IRQFLAGS_NMI  (1 << 0)
> +#define KVM_IRQFLAGS_PEEK (1 << 1)
> +
> +struct kvm_irqdevice {
> +	int  (*pending)(struct kvm_irqdevice *this, int flags);
> +	int  (*read)(struct kvm_irqdevice *this, int flags, 
> +		     struct kvm_irqinfo *info);
>   

Aren't pending() and read() + PEEK overlapping?

> +	int  (*inject)(struct kvm_irqdevice *this, int irq, int flags);
> +	int  (*summary)(struct kvm_irqdevice *this, void *data);
> +	void (*destructor)(struct kvm_irqdevice *this);
> +
> +	void *private;
> +};
>   

Consider using container_of() to simulate C++ inheritance.  Messier but 
less indirections.   Also consider a kvm_irqdevice_operations structure.

> +
> +#define MAX_APIC_INT_VECTOR      256
> +
> +struct kvm_apic {
> +	u32	           	status;
> +	u32	           	vcpu_id;
> +	spinlock_t 		lock;
> +	u32			pcpu_lock_owner;
>   

Isn't this vcpu->cpu?

> +	atomic_t		timer_pending;
> +	u64	           	apic_base_msr;
> +	unsigned long      	base_address;
> +	u32	           	timer_divide_count;
> +	struct hrtimer       	apic_timer;
> +	int                	intr_pending_count[MAX_APIC_INT_VECTOR];
> +	ktime_t           	timer_last_update;
> +	struct {
> +		int deliver_mode;
> +		int source[6];
> +	} direct_intr;
> +	u32	           	err_status;
> +	u32	           	err_write_count;
> +	struct kvm_vcpu        	*vcpu;
> +	struct page	   	*regs_page;
> +	void               	*regs;
> +	struct kvm_irqdevice    ext; /* Used for external/NMI interrupts */
> +};
> +
>  /*
>   * x86 supports 3 paging modes (4-level 64-bit, 3-level 64-bit, and 2-level
>   * 32-bit).  The kvm_mmu structure abstracts the details of the current mmu
> @@ -236,6 +281,11 @@ struct kvm_pio_request {
>  	int rep;
>  };
>  
> +#define KVM_VCPU_INIT_SIPI_SIPI_STATE_NORM          0
> +#define KVM_VCPU_INIT_SIPI_SIPI_STATE_WAIT_SIPI     1
> +
> +#define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
> +
>  struct kvm_vcpu {
>  	struct kvm *kvm;
>  	union {
> @@ -248,12 +298,9 @@ struct kvm_vcpu {
>  	u64 host_tsc;
>  	struct kvm_run *run;
>  	int interrupt_window_open;
> -	unsigned long irq_summary; /* bit vector: 1 per word in irq_pending */
> -#define NR_IRQ_WORDS KVM_IRQ_BITMAP_SIZE(unsigned long)
> -	unsigned long irq_pending[NR_IRQ_WORDS];
>  	unsigned long regs[NR_VCPU_REGS]; /* for rsp: vcpu_load_rsp_rip() */
>  	unsigned long rip;      /* needs vcpu_load_rsp_rip() */
> -
> +	struct kvm_irqdevice irq_dev;
>  	unsigned long cr0;
>  	unsigned long cr2;
>  	unsigned long cr3;
> @@ -261,10 +308,8 @@ struct kvm_vcpu {
>  	struct page *para_state_page;
>  	gpa_t hypercall_gpa;
>  	unsigned long cr4;
> -	unsigned long cr8;
>   

I think we should keep cr8 here and accept the duplication.

>  	u64 pdptrs[4]; /* pae */
>  	u64 shadow_efer;
> -	u64 apic_base;
>  	u64 ia32_misc_enable_msr;
>  	int nmsrs;
>  	struct vmx_msr_entry *guest_msrs;
> @@ -298,6 +343,11 @@ struct kvm_vcpu {
>  	int sigset_active;
>  	sigset_t sigset;
>  
> +	struct kvm_apic apic;
> +	wait_queue_head_t halt_wq;
> +	/* For AP startup */
> +	unsigned long init_sipi_sipi_state;
> +
>  	struct {
>  		int active;
>  		u8 save_iopl;
> @@ -319,6 +369,23 @@ struct kvm_mem_alias {
>  	gfn_t target_gfn;
>  };
>  
> +#define kvm_irq_pending(dev, flags)     (dev)->pending(dev, flags)
> +#define kvm_irq_read(dev, flags, info)  (dev)->read(dev, flags, info)
> +#define kvm_irq_inject(dev, irq, flags) (dev)->inject(dev, irq, flags)
> +#define kvm_irq_summary(dev, data)      (dev)->summary(dev, data)
>   

Static inlines please, no #defines.  In this case, I'd just call the 
functions directly.

> +
> +#define kvm_vcpu_irq_pending(vcpu, flags) \
> +   kvm_irq_pending(&vcpu->irq_dev, flags)
> +#define kvm_vcpu_irq_read(vcpu, flags, info) \
> +   kvm_irq_read(&vcpu->irq_dev, flags, info)
> +#define kvm_vcpu_irq_inject(vcpu, irq, flags) \
> +   kvm_irq_inject(&vcpu->irq_dev, irq, flags)
> +#define kvm_vcpu_irq_summary(vcpu, data)  \
> +   kvm_irq_summary(&vcpu->irq_dev, data)
> +
>   

Ditto.

>  
> +struct kvm_mmio_handler {
> +	unsigned long (*read)(struct kvm_vcpu *v,
> +			      unsigned long addr,
> +			      unsigned long length);
> +	void (*write)(struct kvm_vcpu *v,
> +			   unsigned long addr,
> +			   unsigned long length,
> +			   unsigned long val);
> +	int (*in_range)(struct kvm_vcpu *v, unsigned long addr);
> +};
>   

This would need to be split out into a separate patch.  Addresses are gpa_t.

> +
> +#define KVM_MMIO_HANDLER_NR_ARRAY_SIZE 1
> +static struct kvm_mmio_handler *kvm_mmio_handlers[KVM_MMIO_HANDLER_NR_ARRAY_SIZE] =
> +{
> +    &apic_mmio_handler,
> +};
> +
>   

We'll need dynamic registration if we want kernel apic support to be 
optional.  Also, this needs to be per-vcpu as each vcpu's apic can be 
placed at a different address.


> @@ -1479,7 +1533,11 @@ int kvm_get_msr_common(struct kvm_vcpu *vcpu, u32 msr, u64 *pdata)
>  		data = 3;
>  		break;
>  	case MSR_IA32_APICBASE:
> -		data = vcpu->apic_base;
> +		data = vcpu->apic.apic_base_msr;
> +		break;
> +	case MSR_IA32_TIME_STAMP_COUNTER:
> +		// FIXME
> +                //data = guest_read_tsc();
>  		break;
>   

The tsc stuff looks spurious.

> +	case KVM_GET_OPTIONS: {
> +		r = -EFAULT;
> +		if (copy_to_user(&kvm->options, argp, sizeof(kvm->options)))
> +		    goto out;
> +		r = 0;
> +		break;
> +	}
> +	case KVM_SET_OPTIONS: {
> +		r = -EFAULT;
> +		if (copy_from_user(&kvm->options, argp, sizeof(kvm->options)))
> +		    goto out;
> +		r = 0;
> +		break;
> +	}
>   

This is difficult to expand over time.  I prefer a separate ioctl() for 
each option.

> +
> +struct bitarray {
> +    unsigned long summary; /* 1 per word in pending */
> +    unsigned long pending[NR_IRQ_WORDS];
> +};
> +
> +static int bitarray_pending(struct bitarray *this)
> +{
> +	return this->summary ? 1 : 0;	
> +}
> +
> +static int bitarray_findhighest(struct bitarray *this)
> +{
> +	if(!this->summary)
>   

Space after if, here and elsewhere.

> +		return 0;
> +
> +	int word_index = __ffs(this->summary);
> +	int bit_index = __ffs(this->pending[word_index]);
> +
> +	return word_index * BITS_PER_LONG + bit_index;	
> +}
> +
> +static void bitarray_set(struct bitarray *this, int nr)
> +{
> +	set_bit(nr, &this->pending);
> +	set_bit(nr / BITS_PER_LONG, &this->summary); 
>   

Since you need locking anyway, best to use the unlocked versions 
(__set_bit()).

> +typedef struct {
> +	struct bitarray irq_pending;
> +	struct bitarray nmi_pending;
>   

Why is nmi_pending a bitarray?

> @@ -108,20 +109,12 @@ static unsigned get_addr_size(struct kvm_vcpu *vcpu)
>  
>  static inline u8 pop_irq(struct kvm_vcpu *vcpu)
>  {
> -	int word_index = __ffs(vcpu->irq_summary);
> -	int bit_index = __ffs(vcpu->irq_pending[word_index]);
> -	int irq = word_index * BITS_PER_LONG + bit_index;
> -
> -	clear_bit(bit_index, &vcpu->irq_pending[word_index]);
> -	if (!vcpu->irq_pending[word_index])
> -		clear_bit(word_index, &vcpu->irq_summary);
> -	return irq;
> +	return kvm_vcpu_irq_read(vcpu, 0, NULL);
>  }
>  
>  static inline void push_irq(struct kvm_vcpu *vcpu, u8 irq)
>  {
> -	set_bit(irq, vcpu->irq_pending);
> -	set_bit(irq / BITS_PER_LONG, &vcpu->irq_summary);
> +	kvm_vcpu_irq_inject(vcpu, irq, 0);
>  }
>   


It would be helpful to unify the vmx and svm irq code first (I can merge 
something like that immediately), so this patch becomes simpler.

-- 
Do not meddle in the internals of kernels, for they are subtle and quick to panic.


-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys-and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV

next prev parent reply	other threads:[~2007-04-04  7:40 UTC|newest]

Thread overview: 22+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2007-04-03 22:31 KVM in-kernel APIC update Gregory Haskins
     [not found] ` <46128F80.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-04  7:40   ` Avi Kivity [this message]
     [not found]     ` <46135686.4090905-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-04 16:23       ` Gregory Haskins
     [not found]         ` <46138A98.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-04 16:49           ` Avi Kivity
     [not found]             ` <4613D736.1080207-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-04 17:10               ` Gregory Haskins
     [not found]                 ` <4613959F.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-04 17:43                   ` Avi Kivity
     [not found]                     ` <4613E3CB.6000904-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-04 17:56                       ` Gregory Haskins
     [not found]                         ` <4613A090.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-04 20:20                           ` Ingo Molnar
     [not found]                             ` <20070404202046.GD6070-X9Un+BFzKDI@public.gmane.org>
2007-04-05  6:48                               ` Avi Kivity
2007-04-04 22:00                   ` Dor Laor
2007-04-04 20:12       ` Ingo Molnar
     [not found]         ` <20070404201205.GC6070-X9Un+BFzKDI@public.gmane.org>
2007-04-04 21:55           ` Dor Laor
2007-04-05  6:47           ` Avi Kivity
     [not found]             ` <46149B7C.5020004-atKUWr5tajBWk0Htik3J/w@public.gmane.org>
2007-04-05  7:37               ` Dor Laor
2007-04-05 10:39               ` Ingo Molnar
     [not found]                 ` <20070405103933.GA8936-X9Un+BFzKDI@public.gmane.org>
2007-04-05 10:50                   ` Ingo Molnar
     [not found]                     ` <20070405105042.GA11779-X9Un+BFzKDI@public.gmane.org>
2007-04-05 11:29                       ` Avi Kivity
2007-04-05 14:37           ` Dong, Eddie
2007-04-04 20:32   ` Ingo Molnar
     [not found]     ` <20070404203235.GA11369-X9Un+BFzKDI@public.gmane.org>
2007-04-04 21:22       ` Gregory Haskins
     [not found]         ` <4613D0CE.BA47.005A.0-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org>
2007-04-04 21:40           ` Anthony Liguori
2007-04-05  7:11           ` Avi Kivity

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=46135686.4090905@qumranet.com \
    --to=avi-atkuwr5tajbwk0htik3j/w@public.gmane.org \
    --cc=ghaskins-Et1tbQHTxzrQT0dZR+AlfA@public.gmane.org \
    --cc=kvm-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f@public.gmane.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox