From: Alexandru Elisei <alexandru.elisei@arm.com>
To: Andre Przywara <andre.przywara@arm.com>
Cc: kvm@vger.kernel.org, will@kernel.org,
julien.thierry.kdev@gmail.com, sami.mujawar@arm.com,
lorenzo.pieralisi@arm.com, maz@kernel.org
Subject: Re: [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio
Date: Wed, 5 Feb 2020 11:25:03 +0000 [thread overview]
Message-ID: <fb3cd270-4bb5-b26e-2f3f-c01b8ae05931@arm.com> (raw)
In-Reply-To: <20200203122338.6565e96a@donnerap.cambridge.arm.com>
Hi,
On 2/3/20 12:23 PM, Andre Przywara wrote:
> On Thu, 23 Jan 2020 13:47:54 +0000
> Alexandru Elisei <alexandru.elisei@arm.com> wrote:
>
> Hi,
>
>> kvmtool uses brlock for protecting accesses to the ioport and mmio
>> red-black trees. brlock allows concurrent reads, but only one writer,
>> which is assumed not to be a VCPU thread. This is done by issuing a
>> compiler barrier on read and pausing the entire virtual machine on
>> writes. When KVM_BRLOCK_DEBUG is defined, brlock uses instead a pthread
>> read/write lock.
>>
>> When we will implement reassignable BARs, the mmio or ioport mapping
>> will be done as a result of a VCPU mmio access. When brlock is a
>> read/write lock, it means that we will try to acquire a write lock with
>> the read lock already held by the same VCPU and we will deadlock. When
>> it's not, a VCPU will have to call kvm__pause, which means the virtual
>> machine will stay paused forever.
>>
>> Let's avoid all this by using separate pthread_rwlock_t locks for the
>> mmio and the ioport red-black trees and carefully choosing our read
>> critical region such that modification as a result of a guest mmio
>> access doesn't deadlock.
>>
>> Signed-off-by: Alexandru Elisei <alexandru.elisei@arm.com>
>> ---
>> ioport.c | 20 +++++++++++---------
>> mmio.c | 26 +++++++++++++++++---------
>> 2 files changed, 28 insertions(+), 18 deletions(-)
>>
>> diff --git a/ioport.c b/ioport.c
>> index d224819c6e43..c044a80dd763 100644
>> --- a/ioport.c
>> +++ b/ioport.c
>> @@ -2,9 +2,9 @@
>>
>> #include "kvm/kvm.h"
>> #include "kvm/util.h"
>> -#include "kvm/brlock.h"
>> #include "kvm/rbtree-interval.h"
>> #include "kvm/mutex.h"
>> +#include "kvm/rwsem.h"
>>
>> #include <linux/kvm.h> /* for KVM_EXIT_* */
>> #include <linux/types.h>
>> @@ -16,6 +16,8 @@
>>
>> #define ioport_node(n) rb_entry(n, struct ioport, node)
>>
>> +static DECLARE_RWSEM(ioport_lock);
>> +
>> static struct rb_root ioport_tree = RB_ROOT;
>>
>> static struct ioport *ioport_search(struct rb_root *root, u64 addr)
>> @@ -68,7 +70,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>> struct ioport *entry;
>> int r;
>>
>> - br_write_lock(kvm);
>> + down_write(&ioport_lock);
>>
>> entry = ioport_search(&ioport_tree, port);
>> if (entry) {
>> @@ -96,7 +98,7 @@ int ioport__register(struct kvm *kvm, u16 port, struct ioport_operations *ops, i
>> r = device__register(&entry->dev_hdr);
>> if (r < 0)
>> goto out_erase;
>> - br_write_unlock(kvm);
>> + up_write(&ioport_lock);
>>
>> return port;
>>
>> @@ -104,7 +106,7 @@ out_erase:
>> rb_int_erase(&ioport_tree, &entry->node);
>> out_free:
>> free(entry);
>> - br_write_unlock(kvm);
>> + up_write(&ioport_lock);
>> return r;
>> }
>>
>> @@ -113,7 +115,7 @@ int ioport__unregister(struct kvm *kvm, u16 port)
>> struct ioport *entry;
>> int r;
>>
>> - br_write_lock(kvm);
>> + down_write(&ioport_lock);
>>
>> r = -ENOENT;
>> entry = ioport_search(&ioport_tree, port);
>> @@ -128,7 +130,7 @@ int ioport__unregister(struct kvm *kvm, u16 port)
>> r = 0;
>>
>> done:
>> - br_write_unlock(kvm);
>> + up_write(&ioport_lock);
>>
>> return r;
>> }
>> @@ -171,8 +173,10 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>> void *ptr = data;
>> struct kvm *kvm = vcpu->kvm;
>>
>> - br_read_lock(kvm);
>> + down_read(&ioport_lock);
>> entry = ioport_search(&ioport_tree, port);
>> + up_read(&ioport_lock);
>> +
>> if (!entry)
>> goto out;
> I don't think it's valid to drop the lock that early. A concurrent ioport_unregister would free the entry pointer, so we have a potential use-after-free here.
> I guess you are thinking about an x86 CF8/CFC config space access here, that in turn would take the write lock when updating an I/O BAR?
>
> So I think the same comment that you added below on kvm__emulate_mmio() applies here? More on this below then ....
Yes, it applies. More on this below.
>
>>
>> @@ -188,8 +192,6 @@ bool kvm__emulate_io(struct kvm_cpu *vcpu, u16 port, void *data, int direction,
>> }
>>
>> out:
>> - br_read_unlock(kvm);
>> -
>> if (ret)
>> return true;
>>
>> diff --git a/mmio.c b/mmio.c
>> index 61e1d47a587d..4e0ff830c738 100644
>> --- a/mmio.c
>> +++ b/mmio.c
>> @@ -1,7 +1,7 @@
>> #include "kvm/kvm.h"
>> #include "kvm/kvm-cpu.h"
>> #include "kvm/rbtree-interval.h"
>> -#include "kvm/brlock.h"
>> +#include "kvm/rwsem.h"
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> @@ -15,6 +15,8 @@
>>
>> #define mmio_node(n) rb_entry(n, struct mmio_mapping, node)
>>
>> +static DECLARE_RWSEM(mmio_lock);
>> +
>> struct mmio_mapping {
>> struct rb_int_node node;
>> void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr);
>> @@ -61,7 +63,7 @@ static const char *to_direction(u8 is_write)
>>
>> int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool coalesce,
>> void (*mmio_fn)(struct kvm_cpu *vcpu, u64 addr, u8 *data, u32 len, u8 is_write, void *ptr),
>> - void *ptr)
>> + void *ptr)
>> {
>> struct mmio_mapping *mmio;
>> struct kvm_coalesced_mmio_zone zone;
>> @@ -88,9 +90,9 @@ int kvm__register_mmio(struct kvm *kvm, u64 phys_addr, u64 phys_addr_len, bool c
>> return -errno;
>> }
>> }
>> - br_write_lock(kvm);
>> + down_write(&mmio_lock);
>> ret = mmio_insert(&mmio_tree, mmio);
>> - br_write_unlock(kvm);
>> + up_write(&mmio_lock);
>>
>> return ret;
>> }
>> @@ -100,10 +102,10 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
>> struct mmio_mapping *mmio;
>> struct kvm_coalesced_mmio_zone zone;
>>
>> - br_write_lock(kvm);
>> + down_write(&mmio_lock);
>> mmio = mmio_search_single(&mmio_tree, phys_addr);
>> if (mmio == NULL) {
>> - br_write_unlock(kvm);
>> + up_write(&mmio_lock);
>> return false;
>> }
>>
>> @@ -114,7 +116,7 @@ bool kvm__deregister_mmio(struct kvm *kvm, u64 phys_addr)
>> ioctl(kvm->vm_fd, KVM_UNREGISTER_COALESCED_MMIO, &zone);
>>
>> rb_int_erase(&mmio_tree, &mmio->node);
>> - br_write_unlock(kvm);
>> + up_write(&mmio_lock);
>>
>> free(mmio);
>> return true;
>> @@ -124,8 +126,15 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
>> {
>> struct mmio_mapping *mmio;
>>
>> - br_read_lock(vcpu->kvm);
>> + /*
>> + * The callback might call kvm__register_mmio which takes a write lock,
>> + * so avoid deadlocks by protecting only the node search with a reader
>> + * lock. Note that there is still a small time window for a node to be
>> + * deleted by another vcpu before mmio_fn gets called.
>> + */
> Do I get this right that this means the locking is not "fully" correct?
> I don't think we should tolerate this. The underlying problem seems to be that the lock protects two separate things: namely the RB tree to find the handler, but also the handlers and their data structures itself. So far this was feasible, but this doesn't work any longer.
>
> I think refcounting would be the answer here: Once mmio_search() returns an entry, a ref counter increases, preventing this entry from being removed by kvm__deregister_mmio(). If the emulation has finished, we decrement the counter, and trigger the free operation if it has reached zero.
>
> Does that make sense?
The only situation you end up with use-after-free if there's a race inside the
guest between one thread which reprograms the BAR address/disables access to
memory BARs, and another thread thread which tries to access the memory region
described by the BAR. My reasoning for putting the comment there instead of fixing
the race was that the guest is broken in this case and it won't function correctly
regardless of what kvmtool does. And having this use-after-free error in kvmtool
might actually benefit debugging the guest.
Adding a refcounter to prevent that from happening should be fairly straightforward.
Thanks,
Alex
>
> Cheers,
> Andre.
>
>> + down_read(&mmio_lock);
>> mmio = mmio_search(&mmio_tree, phys_addr, len);
>> + up_read(&mmio_lock);
>>
>> if (mmio)
>> mmio->mmio_fn(vcpu, phys_addr, data, len, is_write, mmio->ptr);
>> @@ -135,7 +144,6 @@ bool kvm__emulate_mmio(struct kvm_cpu *vcpu, u64 phys_addr, u8 *data, u32 len, u
>> to_direction(is_write),
>> (unsigned long long)phys_addr, len);
>> }
>> - br_read_unlock(vcpu->kvm);
>>
>> return true;
>> }
next prev parent reply other threads:[~2020-02-05 11:33 UTC|newest]
Thread overview: 88+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-23 13:47 [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE 1.1 support Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 01/30] Makefile: Use correct objcopy binary when cross-compiling for x86_64 Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 02/30] hw/i8042: Compile only for x86 Alexandru Elisei
2020-01-27 18:07 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 03/30] pci: Fix BAR resource sizing arbitration Alexandru Elisei
2020-01-27 18:07 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 04/30] Remove pci-shmem device Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 05/30] Check that a PCI device's memory size is power of two Alexandru Elisei
2020-01-27 18:07 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 06/30] arm/pci: Advertise only PCI bus 0 in the DT Alexandru Elisei
2020-01-27 18:08 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 07/30] ioport: pci: Move port allocations to PCI devices Alexandru Elisei
2020-02-07 17:02 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 08/30] pci: Fix ioport allocation size Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 09/30] arm/pci: Fix PCI IO region Alexandru Elisei
2020-01-29 18:16 ` Andre Przywara
2020-03-04 16:20 ` Alexandru Elisei
2020-03-05 13:06 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 10/30] virtio/pci: Make memory and IO BARs independent Alexandru Elisei
2020-01-29 18:16 ` Andre Przywara
2020-03-05 15:41 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 11/30] vfio/pci: Allocate correct size for MSIX table and PBA BARs Alexandru Elisei
2020-01-29 18:16 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 12/30] vfio/pci: Don't assume that only even numbered BARs are 64bit Alexandru Elisei
2020-01-30 14:50 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 13/30] vfio/pci: Ignore expansion ROM BAR writes Alexandru Elisei
2020-01-30 14:50 ` Andre Przywara
2020-01-30 15:52 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 14/30] vfio/pci: Don't access potentially unallocated regions Alexandru Elisei
2020-01-29 18:17 ` Andre Przywara
2020-03-06 10:54 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 15/30] virtio: Don't ignore initialization failures Alexandru Elisei
2020-01-30 14:51 ` Andre Przywara
2020-03-06 11:20 ` Alexandru Elisei
2020-03-30 9:27 ` André Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 16/30] Don't ignore errors registering a device, ioport or mmio emulation Alexandru Elisei
2020-01-30 14:51 ` Andre Przywara
2020-03-06 11:28 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 17/30] hw/vesa: Don't ignore fatal errors Alexandru Elisei
2020-01-30 14:52 ` Andre Przywara
2020-03-06 12:33 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 18/30] hw/vesa: Set the size for BAR 0 Alexandru Elisei
2020-02-03 12:20 ` Andre Przywara
2020-02-03 12:27 ` Alexandru Elisei
2020-02-05 17:00 ` Andre Przywara
2020-03-06 12:40 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 19/30] Use independent read/write locks for ioport and mmio Alexandru Elisei
2020-02-03 12:23 ` Andre Przywara
2020-02-05 11:25 ` Alexandru Elisei [this message]
2020-01-23 13:47 ` [PATCH v2 kvmtool 20/30] pci: Add helpers for BAR values and memory/IO space access Alexandru Elisei
2020-02-05 17:00 ` Andre Przywara
2020-02-05 17:02 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 21/30] virtio/pci: Get emulated region address from BARs Alexandru Elisei
2020-02-05 17:01 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 22/30] vfio: Destroy memslot when unmapping the associated VAs Alexandru Elisei
2020-02-05 17:01 ` Andre Przywara
2020-03-09 12:38 ` Alexandru Elisei
2020-01-23 13:47 ` [PATCH v2 kvmtool 23/30] vfio: Reserve ioports when configuring the BAR Alexandru Elisei
2020-02-05 18:34 ` Andre Przywara
2020-01-23 13:47 ` [PATCH v2 kvmtool 24/30] vfio/pci: Don't write configuration value twice Alexandru Elisei
2020-02-05 18:35 ` Andre Przywara
2020-03-09 15:21 ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 25/30] pci: Implement callbacks for toggling BAR emulation Alexandru Elisei
2020-02-06 18:21 ` Andre Przywara
2020-02-07 10:12 ` Alexandru Elisei
2020-02-07 15:39 ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 26/30] pci: Toggle BAR I/O and memory space emulation Alexandru Elisei
2020-02-06 18:21 ` Andre Przywara
2020-02-07 11:08 ` Alexandru Elisei
2020-02-07 11:36 ` Andre Przywara
2020-02-07 11:44 ` Alexandru Elisei
2020-03-09 14:54 ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 27/30] pci: Implement reassignable BARs Alexandru Elisei
2020-02-07 16:50 ` Andre Przywara
2020-03-10 14:17 ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 28/30] arm/fdt: Remove 'linux,pci-probe-only' property Alexandru Elisei
2020-02-07 16:51 ` Andre Przywara
2020-02-07 17:38 ` Andre Przywara
2020-03-10 16:04 ` Alexandru Elisei
2020-01-23 13:48 ` [PATCH v2 kvmtool 29/30] vfio: Trap MMIO access to BAR addresses which aren't page aligned Alexandru Elisei
2020-02-07 16:51 ` Andre Przywara
2020-01-23 13:48 ` [PATCH v2 kvmtool 30/30] arm/arm64: Add PCI Express 1.1 support Alexandru Elisei
2020-02-07 16:51 ` Andre Przywara
2020-03-10 16:28 ` Alexandru Elisei
2020-02-07 17:02 ` [PATCH v2 kvmtool 00/30] Add reassignable BARs and PCIE " Andre Przywara
2020-05-13 14:56 ` Marc Zyngier
2020-05-13 15:15 ` Alexandru Elisei
2020-05-13 16:41 ` Alexandru Elisei
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=fb3cd270-4bb5-b26e-2f3f-c01b8ae05931@arm.com \
--to=alexandru.elisei@arm.com \
--cc=andre.przywara@arm.com \
--cc=julien.thierry.kdev@gmail.com \
--cc=kvm@vger.kernel.org \
--cc=lorenzo.pieralisi@arm.com \
--cc=maz@kernel.org \
--cc=sami.mujawar@arm.com \
--cc=will@kernel.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox