All of lore.kernel.org
 help / color / mirror / Atom feed
From: gleb@redhat.com (Gleb Natapov)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
Date: Tue, 15 Jan 2013 17:31:00 +0200	[thread overview]
Message-ID: <20130115153100.GE12489@redhat.com> (raw)
In-Reply-To: <50F56C3B.6040308@arm.com>

On Tue, Jan 15, 2013 at 02:48:27PM +0000, Marc Zyngier wrote:
> On 15/01/13 14:27, Gleb Natapov wrote:
> > On Tue, Jan 15, 2013 at 01:46:04PM +0000, Marc Zyngier wrote:
> >> On 15/01/13 13:34, Gleb Natapov wrote:
> >>> On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
> >>>> On 15/01/13 13:18, Gleb Natapov wrote:
> >>>>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> >>>>>> When the guest accesses I/O memory this will create data abort
> >>>>>> exceptions and they are handled by decoding the HSR information
> >>>>>> (physical address, read/write, length, register) and forwarding reads
> >>>>>> and writes to QEMU which performs the device emulation.
> >>>>>>
> >>>>>> Certain classes of load/store operations do not support the syndrome
> >>>>>> information provided in the HSR and we therefore must be able to fetch
> >>>>>> the offending instruction from guest memory and decode it manually.
> >>>>>>
> >>>>>> We only support instruction decoding for valid reasonable MMIO operations
> >>>>>> where trapping them do not provide sufficient information in the HSR (no
> >>>>>> 16-bit Thumb instructions provide register writeback that we care about).
> >>>>>>
> >>>>>> The following instruction types are NOT supported for MMIO operations
> >>>>>> despite the HSR not containing decode info:
> >>>>>>  - any Load/Store multiple
> >>>>>>  - any load/store exclusive
> >>>>>>  - any load/store dual
> >>>>>>  - anything with the PC as the dest register
> >>>>>>
> >>>>>> This requires changing the general flow somewhat since new calls to run
> >>>>>> the VCPU must check if there's a pending MMIO load and perform the write
> >>>>>> after userspace has made the data available.
> >>>>>>
> >>>>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
> >>>>>> (1) Guest complicated mmio instruction traps.
> >>>>>> (2) The hardware doesn't tell us enough, so we need to read the actual
> >>>>>>     instruction which was being exectuted.
> >>>>>> (3) KVM maps the instruction virtual address to a physical address.
> >>>>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
> >>>>>> (5) We read the physical address, but now that's the wrong thing.
> >>>>> How can this happen?! The guest cannot reuse physical page before it
> >>>>> flushes it from all vcpus tlb cache. For that it needs to send
> >>>>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
> >>>>> while it does emulation.
> >>>>
> >>>> I don't know how this works on x86, but a KVM/ARM guest can definitely
> >>>> handle an IPI.
> >>>>
> >>> How can a vcpu handle an IPI while it is not in a guest mode?
> >>
> >> I think there is some misunderstanding. A guest IPI is of course handled
> >> while running the guest. You completely lost me here.
> > You need IPI from one guest vcpu to another to invalidate its TLB on
> > x86. That prevents the race from happening there.
> 
> We don't need this on ARM (starting with v7, v6 is an entirely different
> story, and we do not support KVM on v6).
> 
> The TLB is propagated by the HW using the following (pseudocode) sequence:
> 	tlb_invalidate VA
> 	barrier
> 
> Leaving the barrier guaranties that all TLB invalidations have been
> propagated.
> 
That explains why __get_user_pages_fast() is missing on ARM :)

> >>
> >>>> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
> >>>> we're doing some set/way operation which is handled separately).
> >>>>
> >>> What prevents a page to be swapped out while code is fetched from it?
> >>
> >> Why would you prevent it? TLB invalidation is broadcast by the HW. If
> >> you swap a page out, you flag the entry as invalid and invalidate the
> >> corresponding TLB. If you hit it, you swap the page back in.
> >>
> > There is no IPI (or anything that requires response from cpu whose TLB
> > is invalidated) involved in invalidating remote TLB?
> 
> No. The above sequence is all you have to do.
> 
> This is why the above race is a bit hairy. A vcpu will happily
> invalidate TLBs, but as the faulting vcpu already performed the
> translation, we're screwed.
> 
> Thankfully, this is a case that only matters when we have to emulate an
> MMIO operation that is not automatically decoded by the HW. They are
> rare (the Linux kernel doesn't use them). In this case, we stop the
> world (IPI).
> 
Got it. Thanks.

--
			Gleb.

WARNING: multiple messages have this Message-ID (diff)
From: Gleb Natapov <gleb@redhat.com>
To: Marc Zyngier <marc.zyngier@arm.com>
Cc: Christoffer Dall <c.dall@virtualopensystems.com>,
	"kvm@vger.kernel.org" <kvm@vger.kernel.org>,
	"linux-arm-kernel@lists.infradead.org"
	<linux-arm-kernel@lists.infradead.org>,
	"kvmarm@lists.cs.columbia.edu" <kvmarm@lists.cs.columbia.edu>,
	Marcelo Tosatti <mtosatti@redhat.com>,
	Rusty Russell <rusty.russell@linaro.org>
Subject: Re: [PATCH v5 13/14] KVM: ARM: Handle I/O aborts
Date: Tue, 15 Jan 2013 17:31:00 +0200	[thread overview]
Message-ID: <20130115153100.GE12489@redhat.com> (raw)
In-Reply-To: <50F56C3B.6040308@arm.com>

On Tue, Jan 15, 2013 at 02:48:27PM +0000, Marc Zyngier wrote:
> On 15/01/13 14:27, Gleb Natapov wrote:
> > On Tue, Jan 15, 2013 at 01:46:04PM +0000, Marc Zyngier wrote:
> >> On 15/01/13 13:34, Gleb Natapov wrote:
> >>> On Tue, Jan 15, 2013 at 01:29:40PM +0000, Marc Zyngier wrote:
> >>>> On 15/01/13 13:18, Gleb Natapov wrote:
> >>>>> On Tue, Jan 08, 2013 at 01:40:05PM -0500, Christoffer Dall wrote:
> >>>>>> When the guest accesses I/O memory this will create data abort
> >>>>>> exceptions and they are handled by decoding the HSR information
> >>>>>> (physical address, read/write, length, register) and forwarding reads
> >>>>>> and writes to QEMU which performs the device emulation.
> >>>>>>
> >>>>>> Certain classes of load/store operations do not support the syndrome
> >>>>>> information provided in the HSR and we therefore must be able to fetch
> >>>>>> the offending instruction from guest memory and decode it manually.
> >>>>>>
> >>>>>> We only support instruction decoding for valid reasonable MMIO operations
> >>>>>> where trapping them do not provide sufficient information in the HSR (no
> >>>>>> 16-bit Thumb instructions provide register writeback that we care about).
> >>>>>>
> >>>>>> The following instruction types are NOT supported for MMIO operations
> >>>>>> despite the HSR not containing decode info:
> >>>>>>  - any Load/Store multiple
> >>>>>>  - any load/store exclusive
> >>>>>>  - any load/store dual
> >>>>>>  - anything with the PC as the dest register
> >>>>>>
> >>>>>> This requires changing the general flow somewhat since new calls to run
> >>>>>> the VCPU must check if there's a pending MMIO load and perform the write
> >>>>>> after userspace has made the data available.
> >>>>>>
> >>>>>> Rusty Russell fixed a horrible race pointed out by Ben Herrenschmidt:
> >>>>>> (1) Guest complicated mmio instruction traps.
> >>>>>> (2) The hardware doesn't tell us enough, so we need to read the actual
> >>>>>>     instruction which was being exectuted.
> >>>>>> (3) KVM maps the instruction virtual address to a physical address.
> >>>>>> (4) The guest (SMP) swaps out that page, and fills it with something else.
> >>>>>> (5) We read the physical address, but now that's the wrong thing.
> >>>>> How can this happen?! The guest cannot reuse physical page before it
> >>>>> flushes it from all vcpus tlb cache. For that it needs to send
> >>>>> synchronous IPI to all vcpus and IPI will not be processed by a vcpu
> >>>>> while it does emulation.
> >>>>
> >>>> I don't know how this works on x86, but a KVM/ARM guest can definitely
> >>>> handle an IPI.
> >>>>
> >>> How can a vcpu handle an IPI while it is not in a guest mode?
> >>
> >> I think there is some misunderstanding. A guest IPI is of course handled
> >> while running the guest. You completely lost me here.
> > You need IPI from one guest vcpu to another to invalidate its TLB on
> > x86. That prevents the race from happening there.
> 
> We don't need this on ARM (starting with v7, v6 is an entirely different
> story, and we do not support KVM on v6).
> 
> The TLB is propagated by the HW using the following (pseudocode) sequence:
> 	tlb_invalidate VA
> 	barrier
> 
> Leaving the barrier guaranties that all TLB invalidations have been
> propagated.
> 
That explains why __get_user_pages_fast() is missing on ARM :)

> >>
> >>>> Furthermore, TLB invalidation doesn't require an IPI on ARMv7 (unless
> >>>> we're doing some set/way operation which is handled separately).
> >>>>
> >>> What prevents a page to be swapped out while code is fetched from it?
> >>
> >> Why would you prevent it? TLB invalidation is broadcast by the HW. If
> >> you swap a page out, you flag the entry as invalid and invalidate the
> >> corresponding TLB. If you hit it, you swap the page back in.
> >>
> > There is no IPI (or anything that requires response from cpu whose TLB
> > is invalidated) involved in invalidating remote TLB?
> 
> No. The above sequence is all you have to do.
> 
> This is why the above race is a bit hairy. A vcpu will happily
> invalidate TLBs, but as the faulting vcpu already performed the
> translation, we're screwed.
> 
> Thankfully, this is a case that only matters when we have to emulate an
> MMIO operation that is not automatically decoded by the HW. They are
> rare (the Linux kernel doesn't use them). In this case, we stop the
> world (IPI).
> 
Got it. Thanks.

--
			Gleb.

  reply	other threads:[~2013-01-15 15:31 UTC|newest]

Thread overview: 160+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2013-01-08 18:38 [PATCH v5 00/14] KVM/ARM Implementation Christoffer Dall
2013-01-08 18:38 ` Christoffer Dall
2013-01-08 18:38 ` [PATCH v5 01/14] ARM: Add page table and page defines needed by KVM Christoffer Dall
2013-01-08 18:38   ` Christoffer Dall
2013-01-08 18:38 ` [PATCH v5 02/14] ARM: Section based HYP idmap Christoffer Dall
2013-01-08 18:38   ` Christoffer Dall
2013-01-14 10:27   ` Gleb Natapov
2013-01-14 10:27     ` Gleb Natapov
2013-01-14 10:49     ` Will Deacon
2013-01-14 10:49       ` Will Deacon
2013-01-14 11:07       ` Gleb Natapov
2013-01-14 11:07         ` Gleb Natapov
2013-01-14 13:07         ` Russell King - ARM Linux
2013-01-14 13:07           ` Russell King - ARM Linux
2013-01-14 16:13   ` Russell King - ARM Linux
2013-01-14 16:13     ` Russell King - ARM Linux
2013-01-14 17:09     ` Christoffer Dall
2013-01-14 17:09       ` Christoffer Dall
2013-01-08 18:38 ` [PATCH v5 03/14] KVM: ARM: Initial skeleton to compile KVM support Christoffer Dall
2013-01-08 18:38   ` Christoffer Dall
2013-01-14 15:09   ` Will Deacon
2013-01-14 15:09     ` Will Deacon
2013-01-14 15:40     ` Christoffer Dall
2013-01-14 15:40       ` Christoffer Dall
2013-01-14 16:24   ` Russell King - ARM Linux
2013-01-14 16:24     ` Russell King - ARM Linux
2013-01-14 17:33     ` Christoffer Dall
2013-01-14 17:33       ` Christoffer Dall
2013-01-16  2:56       ` Rusty Russell
2013-01-16  2:56         ` Rusty Russell
2013-01-16  9:44         ` Russell King - ARM Linux
2013-01-16  9:44           ` Russell King - ARM Linux
2013-01-17  2:11           ` Rusty Russell
2013-01-17  2:11             ` Rusty Russell
2013-01-14 18:49   ` Gleb Natapov
2013-01-14 18:49     ` Gleb Natapov
2013-01-14 22:17     ` Christoffer Dall
2013-01-14 22:17       ` Christoffer Dall
2013-01-15 13:32       ` Gleb Natapov
2013-01-15 13:32         ` Gleb Natapov
2013-01-15 13:43         ` [kvmarm] " Alexander Graf
2013-01-15 13:43           ` Alexander Graf
2013-01-15 15:35           ` Gleb Natapov
2013-01-15 15:35             ` Gleb Natapov
2013-01-15 16:21             ` Alexander Graf
2013-01-15 16:21               ` Alexander Graf
2013-01-08 18:39 ` [PATCH v5 04/14] KVM: ARM: Hypervisor initialization Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-14 15:11   ` Will Deacon
2013-01-14 15:11     ` Will Deacon
2013-01-14 16:35     ` Christoffer Dall
2013-01-14 16:35       ` Christoffer Dall
2013-01-08 18:39 ` [PATCH v5 05/14] KVM: ARM: Memory virtualization setup Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-08 18:39 ` [PATCH v5 06/14] KVM: ARM: Inject IRQs and FIQs from userspace Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-15  9:56   ` Gleb Natapov
2013-01-15  9:56     ` Gleb Natapov
2013-01-15 12:15     ` [kvmarm] " Peter Maydell
2013-01-15 12:15       ` Peter Maydell
2013-01-15 12:52       ` Gleb Natapov
2013-01-15 12:52         ` Gleb Natapov
2013-01-15 14:04         ` Peter Maydell
2013-01-15 14:04           ` Peter Maydell
2013-01-15 14:40           ` Christoffer Dall
2013-01-15 14:40             ` Christoffer Dall
2013-01-15 15:17           ` Gleb Natapov
2013-01-15 15:17             ` Gleb Natapov
2013-01-15 16:25             ` Alexander Graf
2013-01-15 16:25               ` Alexander Graf
2013-01-16 10:40               ` Gleb Natapov
2013-01-16 10:40                 ` Gleb Natapov
2013-01-08 18:39 ` [PATCH v5 07/14] KVM: ARM: World-switch implementation Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-15  9:43   ` Gleb Natapov
2013-01-15  9:43     ` Gleb Natapov
2013-01-16  2:08     ` Christoffer Dall
2013-01-16  2:08       ` Christoffer Dall
2013-01-16  4:08       ` Christoffer Dall
2013-01-16  4:08         ` Christoffer Dall
2013-01-16 12:57         ` Gleb Natapov
2013-01-16 12:57           ` Gleb Natapov
2013-01-16 15:40           ` Christoffer Dall
2013-01-16 15:40             ` Christoffer Dall
2013-01-16 16:17             ` Gleb Natapov
2013-01-16 16:17               ` Gleb Natapov
2013-01-16 12:12       ` Gleb Natapov
2013-01-16 12:12         ` Gleb Natapov
2013-01-16 13:14         ` Russell King - ARM Linux
2013-01-16 13:14           ` Russell King - ARM Linux
2013-01-16 15:42         ` Christoffer Dall
2013-01-16 15:42           ` Christoffer Dall
2013-01-16 15:52           ` Gleb Natapov
2013-01-16 15:52             ` Gleb Natapov
2013-01-16 16:17             ` Christoffer Dall
2013-01-16 16:17               ` Christoffer Dall
2013-01-16 16:21               ` Gleb Natapov
2013-01-16 16:21                 ` Gleb Natapov
2013-01-08 18:39 ` [PATCH v5 08/14] KVM: ARM: Emulation framework and CP15 emulation Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-14 16:36   ` Russell King - ARM Linux
2013-01-14 16:36     ` Russell King - ARM Linux
2013-01-14 17:38     ` Christoffer Dall
2013-01-14 17:38       ` Christoffer Dall
2013-01-14 18:33       ` Russell King - ARM Linux
2013-01-14 18:33         ` Russell King - ARM Linux
2013-01-08 18:39 ` [PATCH v5 09/14] KVM: ARM: User space API for getting/setting co-proc registers Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-08 18:39 ` [PATCH v5 10/14] KVM: ARM: Demux CCSIDR in the userspace API Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-08 18:39 ` [PATCH v5 11/14] KVM: ARM: VFP userspace interface Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-08 18:39 ` [PATCH v5 12/14] KVM: ARM: Handle guest faults in KVM Christoffer Dall
2013-01-08 18:39   ` Christoffer Dall
2013-01-08 18:40 ` [PATCH v5 13/14] KVM: ARM: Handle I/O aborts Christoffer Dall
2013-01-08 18:40   ` Christoffer Dall
2013-01-14 16:43   ` Russell King - ARM Linux
2013-01-14 16:43     ` Russell King - ARM Linux
2013-01-14 18:25     ` Christoffer Dall
2013-01-14 18:25       ` Christoffer Dall
2013-01-14 18:43       ` Russell King - ARM Linux
2013-01-14 18:43         ` Russell King - ARM Linux
2013-01-14 18:50         ` Will Deacon
2013-01-14 18:50           ` Will Deacon
2013-01-14 18:53           ` [kvmarm] " Alexander Graf
2013-01-14 18:53             ` Alexander Graf
2013-01-14 18:56             ` Christoffer Dall
2013-01-14 18:56               ` Christoffer Dall
2013-01-14 19:00             ` Will Deacon
2013-01-14 19:00               ` Will Deacon
2013-01-14 19:12               ` Christoffer Dall
2013-01-14 19:12                 ` Christoffer Dall
2013-01-14 22:36                 ` Will Deacon
2013-01-14 22:36                   ` Will Deacon
2013-01-14 22:51                   ` Christoffer Dall
2013-01-14 22:51                     ` Christoffer Dall
2013-01-15  7:00                   ` Gleb Natapov
2013-01-15  7:00                     ` Gleb Natapov
2013-01-15 13:18   ` Gleb Natapov
2013-01-15 13:18     ` Gleb Natapov
2013-01-15 13:29     ` Marc Zyngier
2013-01-15 13:29       ` Marc Zyngier
2013-01-15 13:34       ` Gleb Natapov
2013-01-15 13:34         ` Gleb Natapov
2013-01-15 13:46         ` Marc Zyngier
2013-01-15 13:46           ` Marc Zyngier
2013-01-15 14:27           ` Gleb Natapov
2013-01-15 14:27             ` Gleb Natapov
2013-01-15 14:42             ` Christoffer Dall
2013-01-15 14:42               ` Christoffer Dall
2013-01-15 14:48             ` Marc Zyngier
2013-01-15 14:48               ` Marc Zyngier
2013-01-15 15:31               ` Gleb Natapov [this message]
2013-01-15 15:31                 ` Gleb Natapov
2013-01-08 18:40 ` [PATCH v5 14/14] KVM: ARM: Add maintainer entry for KVM/ARM Christoffer Dall
2013-01-08 18:40   ` Christoffer Dall
2013-01-14 16:00 ` [PATCH v5 00/14] KVM/ARM Implementation Will Deacon
2013-01-14 16:00   ` Will Deacon
2013-01-14 22:31   ` Christoffer Dall
2013-01-14 22:31     ` Christoffer Dall

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20130115153100.GE12489@redhat.com \
    --to=gleb@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.