From: Alexander Popov <alex.popov@linux.com>
To: Paolo Bonzini <pbonzini@redhat.com>
Cc: Thomas Gleixner <tglx@linutronix.de>,
Christoph Hellwig <hch@infradead.org>,
Ingo Molnar <mingo@redhat.com>,
Marc Zyngier <marc.zyngier@arm.com>,
"H. Peter Anvin" <hpa@zytor.com>,
Andrew Morton <akpm@linux-foundation.org>,
Kees Cook <keescook@chromium.org>,
Dmitry Vyukov <dvyukov@google.com>,
Jiang Liu <jiang.liu@linux.intel.com>,
Jason Cooper <jason@lakedaemon.net>,
Radim Krcmar <rkrcmar@redhat.com>, Joerg Roedel <joro@8bytes.org>,
linux-kernel@vger.kernel.org, x86@kernel.org,
kvm@vger.kernel.org
Subject: Re: [PATCH 1/1] x86/apic: Introduce paravirq irq_domain
Date: Sat, 13 Aug 2016 01:07:30 +0300 [thread overview]
Message-ID: <ef7cd804-8a2a-8ed3-1897-7f916f29a50f@linux.com> (raw)
In-Reply-To: <e4efdf4e-6454-6d51-50fd-5113999137e7@redhat.com>
On 12.08.2016 14:43, Paolo Bonzini wrote:
> On 12/08/2016 12:56, Alexander Popov wrote:
>> Maybe the name "paravirq" is not very good, I'll try to describe the idea.
>>
>> There is some kernel module for special interactions between guest VMs.
>> Currently it has to register a MSI-capable PCI device to handle interrupts
>> injected by the hypervisor. And the bare-metal hypervisor has to emulate
>> such a device for guest VMs.
>>
>> So I've implemented paravirq irq_domain to avoid this redundant emulation.
>> With it we can just call:
>> - paravirq_alloc_irq() to allocate a LAPIC irq;
>> - request_irq() for it;
>> - irqd_cfg(irq_get_irq_data()) to get the corresponding interrupt vector
>> and inform the hypervisor about it.
>> Now we happily handle the irq from the hypervisor when it injects this vector.
>>
>> The irq_mask/irq_unmask parameters of paravirq_init_chip() are the pointers
>> to the functions from the interaction module which ask the hypervisor to
>> start/stop injecting interrupts to the guest VM.
>>
>> Paravirq irq_domain allows to avoid the PCI device emulation in the hypervisor
>> and provides the ability to run slimmer Linux guests without precompiled
>> PCI and MSI support.
>>
>> Did I manage to answer your questions?
>
> It's a bit clearer. My doubt is that the caller of paravirq_init_chip
> has to provide irq_mask and irq_unmask, but it doesn't know who will
> call paravirq_alloc_irq. So there are two cases:
>
> 1) there is only one device, and then your solution doesn't scale well
> to multiple devices
>
> 2) there is some kind of commonality between all devices using
> paravirq_alloc_irq, and then it should be abstracted in a bus.
>
> The latter would be similar to what Xen and Hyper-V do, for example.
> Using PCI is more similar to the KVM approach.
Excuse me, I don't see the problem.
The caller of paravirq_init_chip() provides irq_mask/irq_unmask
function pointers only once, and paravirq_init_chip() saves them in
.irq_mask/.irq_unmask fields of struct irq_chip paravirq_chip.
When later, for example, disable_irq() is called for one of several irqs
allocated in paravirq irq_domain, paravirq_chip->irq_mask() is called
with struct irq_desc *desc argument corresponding to that particular irq.
I.e. our irq_mask()/irq_unmask() callbacks get irq_desc of the interrupt
which should be masked/unmasked and can ask the hypervisor to stop/start
injecting the vector of that particular interrupt.
>>>> Signed-off-by: Alexander Popov <alex.popov@linux.com>
>>>> ---
>>>> arch/x86/Kconfig | 8 +++
>>>> arch/x86/include/asm/irqdomain.h | 6 ++
>>>> arch/x86/include/asm/paravirq.h | 9 +++
>>>> arch/x86/kernel/apic/Makefile | 2 +
>>>> arch/x86/kernel/apic/paravirq.c | 128
>>>> +++++++++++++++++++++++++++++++++++++++
>>>> arch/x86/kernel/apic/vector.c | 1 +
>>>> 6 files changed, 154 insertions(+)
>>>> create mode 100644 arch/x86/include/asm/paravirq.h
>>>> create mode 100644 arch/x86/kernel/apic/paravirq.c
>>>>
>>>> diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
>>>> index 5c6e747..209bd88 100644
>>>> --- a/arch/x86/Kconfig
>>>> +++ b/arch/x86/Kconfig
>>>> @@ -760,6 +760,14 @@ config PARAVIRT_TIME_ACCOUNTING
>>>>
>>>> If in doubt, say N here.
>>>>
>>>> +config X86_PARAVIRQ
>>>> + bool "Enable paravirq irq_domain"
>>>> + depends on PARAVIRT && X86_LOCAL_APIC
>>>> + default n
>>>> + ---help---
>>>> + This option enables paravirq irq_domain for interrupts injected
>>>> + by the hypervisor using Intel VT-x technology.
>>>> +
>>>> config PARAVIRT_CLOCK
>>>> bool
>>>>
>>>> diff --git a/arch/x86/include/asm/irqdomain.h
>>>> b/arch/x86/include/asm/irqdomain.h
>>>> index d26075b..e3192f6 100644
>>>> --- a/arch/x86/include/asm/irqdomain.h
>>>> +++ b/arch/x86/include/asm/irqdomain.h
>>>> @@ -60,4 +60,10 @@ extern void arch_init_htirq_domain(struct irq_domain
>>>> *domain);
>>>> static inline void arch_init_htirq_domain(struct irq_domain *domain) { }
>>>> #endif
>>>>
>>>> +#ifdef CONFIG_X86_PARAVIRQ
>>>> +extern void arch_init_paravirq_domain(struct irq_domain *domain);
>>>> +#else
>>>> +static inline void arch_init_paravirq_domain(struct irq_domain *domain) { }
>>>> +#endif
>>>> +
>>>> #endif
>>>> diff --git a/arch/x86/include/asm/paravirq.h
>>>> b/arch/x86/include/asm/paravirq.h
>>>> new file mode 100644
>>>> index 0000000..a137de2
>>>> --- /dev/null
>>>> +++ b/arch/x86/include/asm/paravirq.h
>>>> @@ -0,0 +1,9 @@
>>>> +#ifndef _ASM_X86_PARAVIRQ_H
>>>> +#define _ASM_X86_PARAVIRQ_H
>>>> +
>>>> +int paravirq_init_chip(void (*irq_mask)(struct irq_data *data),
>>>> + void (*irq_unmask)(struct irq_data *data));
>>>> +int paravirq_alloc_irq(void);
>>>> +void paravirq_free_irq(unsigned int irq);
>>>> +
>>>> +#endif /* _ASM_X86_PARAVIRQ_H */
>>>> diff --git a/arch/x86/kernel/apic/Makefile b/arch/x86/kernel/apic/Makefile
>>>> index 8e63ebd..84f9ce0 100644
>>>> --- a/arch/x86/kernel/apic/Makefile
>>>> +++ b/arch/x86/kernel/apic/Makefile
>>>> @@ -28,3 +28,5 @@ obj-$(CONFIG_X86_BIGSMP) += bigsmp_32.o
>>>>
>>>> # For 32bit, probe_32 need to be listed last
>>>> obj-$(CONFIG_X86_LOCAL_APIC) += probe_$(BITS).o
>>>> +
>>>> +obj-$(CONFIG_X86_PARAVIRQ) += paravirq.o
>>>> diff --git a/arch/x86/kernel/apic/paravirq.c
>>>> b/arch/x86/kernel/apic/paravirq.c
>>>> new file mode 100644
>>>> index 0000000..430b819
>>>> --- /dev/null
>>>> +++ b/arch/x86/kernel/apic/paravirq.c
>>>> @@ -0,0 +1,128 @@
>>>> +/*
>>>> + * An irq_domain for interrupts injected by the hypervisor using
>>>> + * Intel VT-x technology.
>>>> + *
>>>> + * Copyright (C) 2016 Alexander Popov <alex.popov@linux.com>.
>>>> + *
>>>> + * This file is released under the GPLv2.
>>>> + */
>>>> +
>>>> +#include <linux/init.h>
>>>> +#include <linux/irq.h>
>>>> +#include <asm/irqdomain.h>
>>>> +#include <asm/paravirq.h>
>>>> +
>>>> +static struct irq_domain *paravirq_domain;
>>>> +
>>>> +static struct irq_chip paravirq_chip = {
>>>> + .name = "PARAVIRQ",
>>>> + .irq_ack = irq_chip_ack_parent,
>>>> +};
>>>> +
>>>> +static int paravirq_domain_alloc(struct irq_domain *domain,
>>>> + unsigned int virq, unsigned int nr_irqs, void *arg)
>>>> +{
>>>> + int ret = 0;
>>>> +
>>>> + BUG_ON(domain != paravirq_domain);
>>>> +
>>>> + if (nr_irqs != 1)
>>>> + return -EINVAL;
>>>> +
>>>> + ret = irq_domain_set_hwirq_and_chip(paravirq_domain,
>>>> + virq, virq, ¶virq_chip, NULL);
>>>> + if (ret) {
>>>> + pr_warn("setting chip, hwirq for irq %u failed\n", virq);
>>>> + return ret;
>>>> + }
>>>> +
>>>> + __irq_set_handler(virq, handle_edge_irq, 0, "edge");
>>>> +
>>>> + return 0;
>>>> +}
>>>> +
>>>> +static void paravirq_domain_free(struct irq_domain *domain,
>>>> + unsigned int virq, unsigned int nr_irqs)
>>>> +{
>>>> + struct irq_data *irq_data;
>>>> +
>>>> + BUG_ON(domain != paravirq_domain);
>>>> + BUG_ON(nr_irqs != 1);
>>>> +
>>>> + irq_data = irq_domain_get_irq_data(paravirq_domain, virq);
>>>> + if (irq_data)
>>>> + irq_domain_reset_irq_data(irq_data);
>>>> + else
>>>> + pr_warn("irq %u is not in paravirq irq_domain\n", virq);
>>>> +}
>>>> +
>>>> +static const struct irq_domain_ops paravirq_domain_ops = {
>>>> + .alloc = paravirq_domain_alloc,
>>>> + .free = paravirq_domain_free,
>>>> +};
>>>> +
>>>> +int paravirq_alloc_irq(void)
>>>> +{
>>>> + struct irq_alloc_info info;
>>>> +
>>>> + if (!paravirq_domain)
>>>> + return -ENODEV;
>>>> +
>>>> + if (!paravirq_chip.irq_mask || !paravirq_chip.irq_unmask)
>>>> + return -EINVAL;
>>>> +
>>>> + init_irq_alloc_info(&info, NULL);
>>>> +
>>>> + return irq_domain_alloc_irqs(paravirq_domain, 1, NUMA_NO_NODE, &info);
>>>> +}
>>>> +EXPORT_SYMBOL(paravirq_alloc_irq);
>>>> +
>>>> +void paravirq_free_irq(unsigned int virq)
>>>> +{
>>>> + struct irq_data *irq_data;
>>>> +
>>>> + if (!paravirq_domain) {
>>>> + pr_warn("paravirq irq_domain is not initialized\n");
>>>> + return;
>>>> + }
>>>> +
>>>> + irq_data = irq_domain_get_irq_data(paravirq_domain, virq);
>>>> + if (irq_data)
>>>> + irq_domain_free_irqs(virq, 1);
>>>> + else
>>>> + pr_warn("irq %u is not in paravirq irq_domain\n", virq);
>>>> +}
>>>> +EXPORT_SYMBOL(paravirq_free_irq);
>>>> +
>>>> +int paravirq_init_chip(void (*irq_mask)(struct irq_data *data),
>>>> + void (*irq_unmask)(struct irq_data *data))
>>>> +{
>>>> + if (!paravirq_domain)
>>>> + return -ENODEV;
>>>> +
>>>> + if (paravirq_chip.irq_mask || paravirq_chip.irq_unmask)
>>>> + return -EEXIST;
>>>> +
>>>> + if (!irq_mask || !irq_unmask)
>>>> + return -EINVAL;
>>>> +
>>>> + paravirq_chip.irq_mask = irq_mask;
>>>> + paravirq_chip.irq_unmask = irq_unmask;
>>>> +
>>>> + return 0;
>>>> +}
>>>> +EXPORT_SYMBOL(paravirq_init_chip);
>>>> +
>>>> +void arch_init_paravirq_domain(struct irq_domain *parent)
>>>> +{
>>>> + paravirq_domain = irq_domain_add_tree(NULL, ¶virq_domain_ops, NULL);
>>>> + if (!paravirq_domain) {
>>>> + pr_warn("failed to initialize paravirq irq_domain\n");
>>>> + return;
>>>> + }
>>>> +
>>>> + paravirq_domain->name = paravirq_chip.name;
>>>> + paravirq_domain->parent = parent;
>>>> + paravirq_domain->flags |= IRQ_DOMAIN_FLAG_AUTO_RECURSIVE;
>>>> +}
>>>> +
>>>> diff --git a/arch/x86/kernel/apic/vector.c b/arch/x86/kernel/apic/vector.c
>>>> index 6066d94..878b440 100644
>>>> --- a/arch/x86/kernel/apic/vector.c
>>>> +++ b/arch/x86/kernel/apic/vector.c
>>>> @@ -438,6 +438,7 @@ int __init arch_early_irq_init(void)
>>>>
>>>> arch_init_msi_domain(x86_vector_domain);
>>>> arch_init_htirq_domain(x86_vector_domain);
>>>> + arch_init_paravirq_domain(x86_vector_domain);
>>>>
>>>> BUG_ON(!alloc_cpumask_var(&vector_cpumask, GFP_KERNEL));
>>>> BUG_ON(!alloc_cpumask_var(&vector_searchmask, GFP_KERNEL));
>>>> --
>>>> 2.5.5
>>>>
>>>>
>>
next prev parent reply other threads:[~2016-08-12 22:05 UTC|newest]
Thread overview: 12+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-08-11 14:02 [PATCH 1/1] x86/apic: Introduce paravirq irq_domain Alexander Popov
2016-08-11 16:44 ` Paolo Bonzini
2016-08-12 10:56 ` Alexander Popov
2016-08-12 11:43 ` Paolo Bonzini
2016-08-12 22:07 ` Alexander Popov [this message]
2016-08-13 6:20 ` Paolo Bonzini
2016-08-15 11:51 ` Alexander Popov
2016-08-15 12:37 ` Paolo Bonzini
2016-08-16 20:00 ` Alexander Popov
2016-08-17 14:36 ` Jan Kiszka
2016-08-17 22:58 ` Alexander Popov
2016-08-19 10:47 ` Jan Kiszka
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=ef7cd804-8a2a-8ed3-1897-7f916f29a50f@linux.com \
--to=alex.popov@linux.com \
--cc=akpm@linux-foundation.org \
--cc=dvyukov@google.com \
--cc=hch@infradead.org \
--cc=hpa@zytor.com \
--cc=jason@lakedaemon.net \
--cc=jiang.liu@linux.intel.com \
--cc=joro@8bytes.org \
--cc=keescook@chromium.org \
--cc=kvm@vger.kernel.org \
--cc=linux-kernel@vger.kernel.org \
--cc=marc.zyngier@arm.com \
--cc=mingo@redhat.com \
--cc=pbonzini@redhat.com \
--cc=rkrcmar@redhat.com \
--cc=tglx@linutronix.de \
--cc=x86@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox