From mboxrd@z Thu Jan  1 00:00:00 1970
From: agraf@suse.de (Alexander Graf)
Date: Tue, 20 Sep 2016 12:05:03 +0200
Subject: [PATCH v4 2/2] KVM: arm/arm64: Route vtimer events to user space
In-Reply-To: <57E103E8.7040506@arm.com>
References: <1474283695-212421-1-git-send-email-agraf@suse.de>
 <1474283695-212421-3-git-send-email-agraf@suse.de> <57DFFAD3.3080009@arm.com>
 <0960d2a7-6100-3212-c544-d5377df34d57@suse.de> <57E0FF95.7040305@arm.com>
 <580a8a26-ca8f-a08a-a9da-d19d1a595cb6@suse.de> <57E103E8.7040506@arm.com>
Message-ID: <57E109CF.8030705@suse.de>
To: linux-arm-kernel@lists.infradead.org
List-Id: linux-arm-kernel.lists.infradead.org

On 09/20/2016 11:39 AM, Marc Zyngier wrote:
> On 20/09/16 10:26, Alexander Graf wrote:
>>
>> On 20.09.16 11:21, Marc Zyngier wrote:
>>> On 19/09/16 18:39, Alexander Graf wrote:
>>>>
>>>> On 19.09.16 16:48, Marc Zyngier wrote:
>>>>> On 19/09/16 12:14, Alexander Graf wrote:
>>>>>> We have 2 modes for dealing with interrupts in the ARM world. We can either
>>>>>> handle them all using hardware acceleration through the vgic or we can emulate
>>>>>> a gic in user space and only drive CPU IRQ pins from there.
>>>>>>
>>>>>> Unfortunately, when driving IRQs from user space, we never tell user space
>>>>>> about timer events that may result in interrupt line state changes, so we
>>>>>> lose out on timer events if we run with user space gic emulation.
>>>>>>
>>>>>> This patch fixes that by routing vtimer expiration events to user space.
>>>>>> With this patch I can successfully run edk2 and Linux with user space gic
>>>>>> emulation.
>>>>>>
>>>>>> Signed-off-by: Alexander Graf <agraf@suse.de>
>>>>>>
>>>>>> ---
>>>>>>
>>>>>> v1 -> v2:
>>>>>>
>>>>>>    - Add back curly brace that got lost
>>>>>>
>>>>>> v2 -> v3:
>>>>>>
>>>>>>    - Split into patch set
>>>>>>
>>>>>> v3 -> v4:
>>>>>>
>>>>>>    - Improve documentation
>>>>>> ---
>>>>>>   Documentation/virtual/kvm/api.txt |  30 ++++++++-
>>>>>>   arch/arm/include/asm/kvm_host.h   |   3 +
>>>>>>   arch/arm/kvm/arm.c                |  22 ++++---
>>>>>>   arch/arm64/include/asm/kvm_host.h |   3 +
>>>>>>   include/uapi/linux/kvm.h          |  14 +++++
>>>>>>   virt/kvm/arm/arch_timer.c         | 125 +++++++++++++++++++++++++++-----------
>>>>>>   6 files changed, 155 insertions(+), 42 deletions(-)
>>>>>>
>>>>>> diff --git a/Documentation/virtual/kvm/api.txt b/Documentation/virtual/kvm/api.txt
>>>>>> index 23937e0..1c0bd86 100644
>>>>>> --- a/Documentation/virtual/kvm/api.txt
>>>>>> +++ b/Documentation/virtual/kvm/api.txt
>>>>>> @@ -3202,9 +3202,14 @@ struct kvm_run {
>>>>>>   	/* in */
>>>>>>   	__u8 request_interrupt_window;
>>>>>>   
>>>>>> -Request that KVM_RUN return when it becomes possible to inject external
>>>>>> +[x86] Request that KVM_RUN return when it becomes possible to inject external
>>>>>>   interrupts into the guest.  Useful in conjunction with KVM_INTERRUPT.
>>>>>>   
>>>>>> +[arm*] Bits set to 1 in here mask IRQ lines that would otherwise potentially
>>>>>> +trigger forever. These lines are available:
>>>>>> +
>>>>>> +    KVM_IRQWINDOW_VTIMER  -  Masks hw virtual timer irq while in guest
>>>>>> +
>>>>>>   	__u8 padding1[7];
>>>>>>   
>>>>>>   	/* out */
>>>>>> @@ -3519,6 +3524,18 @@ Hyper-V SynIC state change. Notification is used to remap SynIC
>>>>>>   event/message pages and to enable/disable SynIC messages/events processing
>>>>>>   in userspace.
>>>>>>   
>>>>>> +		/* KVM_EXIT_ARM_TIMER */
>>>>>> +		struct {
>>>>>> +			__u8 timesource;
>>>>>> +		} arm_timer;
>>>>>> +
>>>>>> +Indicates that a timer triggered that user space needs to handle and
>>>>>> +potentially mask with vcpu->run->request_interrupt_window to allow the
>>>>>> +guest to proceed. This only happens for timers that got enabled through
>>>>>> +KVM_CAP_ARM_TIMER. The following time sources are available:
>>>>>> +
>>>>>> +    KVM_ARM_TIMER_VTIMER  - virtual cpu timer
>>>>>> +
>>>>>>   		/* Fix the size of the union. */
>>>>>>   		char padding[256];
>>>>>>   	};
>>>>>> @@ -3739,6 +3756,17 @@ Once this is done the KVM_REG_MIPS_VEC_* and KVM_REG_MIPS_MSA_* registers can be
>>>>>>   accessed, and the Config5.MSAEn bit is accessible via the KVM API and also from
>>>>>>   the guest.
>>>>>>   
>>>>>> +6.11 KVM_CAP_ARM_TIMER
>>>>>> +
>>>>>> +Architectures: arm, arm64
>>>>>> +Target: vcpu
>>>>>> +Parameters: args[0] contains a bitmap of timers to select (see 5.)
>>>>>> +
>>>>>> +This capability allows to route per-core timers into user space. When it's
>>>>>> +enabled and no in-kernel interrupt controller is in use, the timers selected
>>>>>> +by args[0] trigger KVM_EXIT_ARM_TIMER guest exits when they are pending,
>>>>>> +unless masked by vcpu->run->request_interrupt_window (see 5.).
>>>>>> +
>>>>>>   7. Capabilities that can be enabled on VMs
>>>>>>   ------------------------------------------
>>>>>>   
>>>>>> diff --git a/arch/arm/include/asm/kvm_host.h b/arch/arm/include/asm/kvm_host.h
>>>>>> index de338d9..77d1f73 100644
>>>>>> --- a/arch/arm/include/asm/kvm_host.h
>>>>>> +++ b/arch/arm/include/asm/kvm_host.h
>>>>>> @@ -180,6 +180,9 @@ struct kvm_vcpu_arch {
>>>>>>   
>>>>>>   	/* Detect first run of a vcpu */
>>>>>>   	bool has_run_once;
>>>>>> +
>>>>>> +	/* User space wants timer notifications */
>>>>>> +	bool user_space_arm_timers;
>>>>> Please move this to the timer structure.
>>>> Sure.
>>>>
>>>>>>   };
>>>>>>   
>>>>>>   struct kvm_vm_stat {
>>>>>> diff --git a/arch/arm/kvm/arm.c b/arch/arm/kvm/arm.c
>>>>>> index c84b6ad..57bdb71 100644
>>>>>> --- a/arch/arm/kvm/arm.c
>>>>>> +++ b/arch/arm/kvm/arm.c
>>>>>> @@ -187,6 +187,7 @@ int kvm_vm_ioctl_check_extension(struct kvm *kvm, long ext)
>>>>>>   	case KVM_CAP_ARM_PSCI_0_2:
>>>>>>   	case KVM_CAP_READONLY_MEM:
>>>>>>   	case KVM_CAP_MP_STATE:
>>>>>> +	case KVM_CAP_ARM_TIMER:
>>>>>>   		r = 1;
>>>>>>   		break;
>>>>>>   	case KVM_CAP_COALESCED_MMIO:
>>>>>> @@ -474,13 +475,7 @@ static int kvm_vcpu_first_run_init(struct kvm_vcpu *vcpu)
>>>>>>   			return ret;
>>>>>>   	}
>>>>>>   
>>>>>> -	/*
>>>>>> -	 * Enable the arch timers only if we have an in-kernel VGIC
>>>>>> -	 * and it has been properly initialized, since we cannot handle
>>>>>> -	 * interrupts from the virtual timer with a userspace gic.
>>>>>> -	 */
>>>>>> -	if (irqchip_in_kernel(kvm) && vgic_initialized(kvm))
>>>>>> -		ret = kvm_timer_enable(vcpu);
>>>>>> +	ret = kvm_timer_enable(vcpu);
>>>>>>   
>>>>>>   	return ret;
>>>>>>   }
>>>>>> @@ -601,6 +596,13 @@ int kvm_arch_vcpu_ioctl_run(struct kvm_vcpu *vcpu, struct kvm_run *run)
>>>>>>   			run->exit_reason = KVM_EXIT_INTR;
>>>>>>   		}
>>>>>>   
>>>>>> +		if (kvm_check_request(KVM_REQ_PENDING_TIMER, vcpu)) {
>>>>> Since this is a very unlikely event (in the grand scheme of things), how
>>>>> about making this unlikely()?
>>>>>
>>>>>> +			/* Tell user space about the pending vtimer */
>>>>>> +			ret = 0;
>>>>>> +			run->exit_reason = KVM_EXIT_ARM_TIMER;
>>>>>> +			run->arm_timer.timesource = KVM_ARM_TIMER_VTIMER;
>>>>>> +		}
>>>>> More importantly: why does it have to be indirected by a
>>>>> make_request/check_request, and not be handled as part of the
>>>>> kvm_timer_sync() call? We do update the state there, and you could
>>>>> directly find out whether an exit is required.
>>>> I can try - it seemed like it could easily become quite racy because we
>>>> call kvm_timer_sync_hwstate() at multiple places.
>>> It shouldn't. We only do it at exactly two locations (depending whether
>>> we've entered the guest or not).
>>>
>>> Also, take the following scenario:
>>> (1) guest programs the timer to expire at time T
>>> (2) guest performs an MMIO access which traps
>>> (3) during the world switch, the timer expires and we mark the timer
>>> interrupt as pending
>>> (4) we exit to handle the MMIO, no sign of the timer being pending
>>>
>>> Is the timer event lost? Or simply delayed? I think this indicates that
>>> the timer state should always be signalled to userspace, no matter what
>>> the exit reason is.
>> That's basically what I'm trying to get running right now, yes. I pushed
>> the interrupt pending status field into the kvm_sync_regs struct and
>> check it on every exit in user space - to make sure we catch pending
>> state changes before mmio reads.
>>
>> On top of that we also need a force exit event when the state changes,
>> in case there's no other event pending. Furthermore we probably want to
>> indicate the user space status of the pending bit into the kernel to not
>> exit too often.
> All you need is to do is to stash the line state in the run structure.

That's what I do now, yes. The sync_regs struct is basically an arch 
specific add-on to the run structure, so we don't modify padding / 
alignment with the change.

> You shouldn't need any other information. And you trigger the exit on
> "timer line high + timer line unmasked" in order to inject the
> interrupt. Which is basically what the vgic does. It would greatly help
> if you adopted a similar behaviour.

We also need to know "timer line low + timer line masked", as otherwise 
we might get spurious interrupts in the guest, no?

Either way, I agree that this approach in general is saner. I don't 
think it's easier to implement though, but we'll get to that when I send 
a new version :)


Alex